Category Archives: Code

Poor Man’s DIP

Sometimes a lower-layer component needs to invoke a service on a higher-layer component. Consider, for example, a timer component (T) that periodically calls a handler function in a user-interface component (U). Component T is probably part of the OS kernel and thus clearly “lower” than component U.

In this setting, there is an upward dependency from T to U; such upward dependencies are undesirable, at least if they are bound at compile-time. Implemented naively, there is a hard-coded call to the UI component like this:


// file: OS/Timer.c

#include "UI/UIManager.h"  // Bad!
...
static void OSTimerNotify(void) {
    ...
    UIManagerNotify10msTimeout(); // Bad!
    ...
}

// file: OS/Timer.c

#include "UI/UIManager.h" // Bad!

...

static void OSTimerNotify(void) {

...

UIManagerNotify10msTimeout(); // Bad!

...

}

Dependency lines that point up in a component diagram are not just ugly: they denote that the lower-layer component cannot be independently reused and tested.

The classic dependency inversion principle (DIP) is usually applied to solve this problem: instead of having a hard-coded function call in the timer to the handling component, the timer calls back on a function pointer that is set to the timer-handling routine in the initialization code of the higher-layer component:


// file: UI/UIManager.c

#include "OS/Timer.h"

static void UIManager10msTimout(void) {
    // Do something ever 10 ms.
}
void UIManagerInit(void) {
    ...
    OSTimerSet10msCallback(&UIMananager10msTimeout);
    ...
}

// file: OS/Timer.h

typedef void (*TIMER_10MS_CALLBACK)(void);

// file: OS/Timer.c

#include "OS/Timer.h"

static TIMER_10MS_CALLBACK s_10msCallback;

void OSTimerSet10msCallback(TIMER_10MS_CALLBACK callback) {
    s_10msCallback = callback;
}
static void OSTimerNotify(void) {
    ...
    if (s_10msCallback != NULL)
        (*s_10msCallback)();
    ...
}

// file: UI/UIManager.c

#include "OS/Timer.h"

static void UIManager10msTimout(void) {

// Do something ever 10 ms.

}

void UIManagerInit(void) {

...

OSTimerSet10msCallback(&UIMananager10msTimeout);

...

}

// file: OS/Timer.h

typedef void (*TIMER_10MS_CALLBACK)(void);

// file: OS/Timer.c

#include "OS/Timer.h"

static TIMER_10MS_CALLBACK s_10msCallback;

void OSTimerSet10msCallback(TIMER_10MS_CALLBACK callback) {

s_10msCallback = callback;

}

static void OSTimerNotify(void) {

...

if (s_10msCallback != NULL)

(*s_10msCallback)();

...

}

Note that there is still a T to U dependency, but now this dependency is only present at run-time, which is OK, as this doesn’t hinder reuse and testability. The U to T compile-time dependency is quite natural and doesn’t violate any design principles. So, the undesirable compile-time dependency has been successfully inverted. The classic DIP recipe looks like this:

1. In T export a callback interface
2. In U implement the callback interface
3. In U (or some init/startup code) register the implementation with T
4. In T call back on the interface

When you are working in a constrained environment like embedded systems, you often cannot afford the memory and performance overhead that accompanies such late (run-time) binding, so you might try what I call the “Poor Man’s DIP”: simply export a “callback interface” as a function prototype and “implement” it by defining the function in the upper-layer component:


// file: UI/UIManager.c

#include "OS/Timer.h"

// Implementation of the call-back interface.
void OSTimerNotify10msTimeout(void) {
    // Do something ever 10 ms.
}

// file: OS/Timer.h

// Call-back interface.
extern void OSTimerNotify10msTimeout(void);

// file: OS/Timer.c

#include "OS/Timer.h"

static void OSTimerNotify(void) {
    ...
    OSTimerNotify10msTimeout();
    ...
}

// file: UI/UIManager.c

#include "OS/Timer.h"

// Implementation of the call-back interface.

void OSTimerNotify10msTimeout(void) {

// Do something ever 10 ms.

}

// file: OS/Timer.h

// Call-back interface.

extern void OSTimerNotify10msTimeout(void);

// file: OS/Timer.c

#include "OS/Timer.h"

static void OSTimerNotify(void) {

...

OSTimerNotify10msTimeout();

...

}

This pattern gives you most of the advantages of the classical (run-time bound) DIP but doesn’t incur any overhead. It can (and should) be applied whenever there is a dependency from a lower-layer component to an upper-layer component that doesn’t need to change at run-time but stays fixed throughout the lifetime of the application.

Regular Expressions — Sweetest Poison

It’s amazing how much time you can save by using regular expressions; it’s even more amazing how much time you can spend getting them to work correctly.

Because they are so powerful and easy to use, regexps can easily be misused, for instance by applying them to problems that are not “regular”, that is, where balancing is important:


    if (a > b) {
        ...
        if (x > y) {
            ...
        }
    }

if (a > b) {

...

if (x > y) {

...

}

Parsing problems like this are not suited for a regular expression matcher, as you need to retain state information and regexps simply cannot keep track of which blocks or braces are open or closed. In cases like this, what you really need is a parser. Period.

Alas, often people can’t be bothered writing a true parser, even if lex/yacc-like tools greatly simplify the work. And I’m guilty of this myself. Years ago I wrote a profiling tool for embedded systems. Since the embedded C code that I wanted to profile had to be instrumented (each function required enter/exit logging calls to get out the execution timing data) I needed to write a tool to do the job. I was not particularly interested in this job — hacking the actual performance analysis code was much more fun — so I decided, well, to go for a heuristic “parser” based on regexps.

In less than one hour I had cobbled together a little script that seemed to work fine. Over the next couple of months I had to spend endless hours fixing all the nasty corner cases; even today it doesn’t work in all circumstances! But I’ve learned my lessons: don’t use regexps when you need a true parser. Again, period.

But even if the problem is regular, people often define regexps sloppily. Look at the following example that checks if a .cfg file appears anywhere in a given string:


    while (<>) {
        print "Found: $&\\n"
            if /\\w:\\\\(\\w+\\\\)*\\w+\\.cfg/;
    }

while (<>) {

print "Found: $&\\n"

if /\\w:\\\$\\w+\\\$*\\w+\\.cfg/;

}

So let’s see what we’ve got here. We are obviously looking for a Windows-style absolute path: a single drive letter, followed by a colon and a backslash, followed by n optional directories (each of which followed by a backslash), followed by a mandatory filename that has a .cfg extension. Looks really neat…

These are the regexps people love to write and I don’t know how many times I’ve had to fix one because of this pathological “simplicity”. It might work today, but it is far from future-proof. Sooner or later the surrounding context will change and this regexp will match much more (or much less) than was intended.

Here a some of the major shortcomings:

– Using word characters \w is way too restrictive. According to the Windows long filename specification, a filename may contain any UTF-16 character, but for all practical purposes \w is really only a shortcut for [a-zA-Z0-9_]. If a filename contains a blank or umlaut, the expression won’t match anymore.

– Actually a corollary of the previous item: you cannot have partial relativity within an absolute path, e. g. C:\files\services\base\..\items\main.cfg would not match because the \w character class does not allow for dots.

– The regexp is not aligned on a word boundary, which means that if your editor happens to create backup files like C:\config\user.cfg~ they’ll match, too.

Often — but not always — using regexps means striking a careful balance between accuracy and convenience. It makes little sense to implement the complete Windows filename spec in a regexp. But investing a little energy to tighten them up usually pays off in spades. How about this?


    while (<>) {
        print "Found: $&\\n"
            if /\\b[a-zA-Z]:\\\\([^\\\\]+\\\\)*[^\\\\]+\\.cfg\\b/;
    }

while (<>) {

print "Found: $&\\n"

if /\\b[a-zA-Z]:\\\$[^\\\\]+\\\$*[^\\\\]+\\.cfg\\b/;

}

At the cost of being only slightly more difficult to read, this solution is much more resilient to change due to the use of some good practices. First of all, it is easy do define a set of valid drive letters, so I used [a-zA-Z] instead of \w; second, the whole regexp is aligned on word boundaries, which means no more regexp over-/underruns and third, by stating that everything between separators (backslashes, in this case) is a series of non-separators we won’t run into “strange character” problems.

Next time you write a regexp think this: “I know that by using regexps I’m saving hours of development time, so I can afford to spend another 10 minutes to make them more robust”.

The Safety Net That Wasn’t

The other day, I wasted time debugging some Java code. When I say “wasted” I do not complain about debugging per se — debugging is part of my life as a developer. Time was wasted because debugging should not have been necessary in this case. Let me explain…

It just so happened that I called a method but violated a constraint on a parameter. Within the called method, the constraint was properly enforced via the use of an assertion, just like in this example:


    public int invertValues(int rows, int[] values) {
        assert rows <= MAX_ROWS;
        ...
    }

public int invertValues(int rows, int[] values) {

assert rows <= MAX_ROWS;

...

}

Normally, my violation of the method’s contract would have been immediately reported and I wouldn’t have had to debug this bug. Normally, yes, but not in this case, as I forgot to run my program with assertions enabled. So instead of


    java -ea MyProgram

java -ea MyProgram

I wrote what I had written thousands of times before:


    java MyProgram

java MyProgram

Silly me, silly me, silly me! That’s what I thought initially. But then I was reminded of the words of Donald A. Norman. In his best-selling book “The Design of Everyday Things” he observes that users frequently — and falsely — blame themselves when they make a mistake, when in fact it is the failure of the designer to prevent such mistakes in the first place. Is it possible that Java’s assertion facility is ill-designed? After having thought about it for some time, I’m convinced it is.

Assertions first appeared in the C programming language and they came with two promises: first, assertions are enabled by default (that is, until you explicitly define NDEBUG) and second, they don’t incur any inefficiencies once turned off. These two properties are essential and Java’s implementation misses both of them.

The violation of the first principle means that you cannot trust your assertion safety net: It is just too easy for you, your teammates or your users to forget the ‘-ea’ command-line switch. If you don’t trust a feature, you don’t want to use it. What use is an anti-lock break system that you have to enable manually every time you start your car?

Efficiency has always been a major concern to developers. If you execute your Java code with assertions disabled (which is, as we know, unfortunately the default) you will most likely not notice any speed penalty. What you will notice, however, is the additional footprint for your assertions that will always travel with your Java program. There is no way to compile assertions out. Take a look at this C example:


    int binary_search(const int* values, size_t values_len, int search_value)
    {
    #ifndef NDEBUG
        // Ensure that given values are sorted.
        int i;
        for (i = 1; i < values_len; ++i) {
            assert(values[i] >= values[i - 1];
        }
    #endif
        ...
        // Actual implementation of binary search.
        ...
    }

int binary_search(const int* values, size_t values_len, int search_value)

{

#ifndef NDEBUG

// Ensure that given values are sorted.

int i;

for (i = 1; i < values_len; ++i) {

assert(values[i] >= values[i - 1];

}

#endif

...

// Actual implementation of binary search.

...

}

A prerequisite of any binary search implementation is that the input values are sorted, so why not assert it? Since we need to iterate over all elements, a simple assert expression is not sufficient. Contrary to Java, this is not a problem in C and C++: the code for the assert as well as the for-loop will be removed from the release build, thanks to the pre-processor.

While assertions — especially non-trivial assertions that require supporting debug code — already waste memory, you can do worse if you use the kind of assertion that allows you to specify a string to be displayed when an assertion fails:


    ...
    assert factory != null : "Factory must exist at this point!";
    ...

...

assert factory != null : "Factory must exist at this point!";

...

This string is of little use. If a programmer ever sees it, (s)he will have to look at the surrounding code anyway (as provided by the filename, line number pairs in the stack trace), since it is unlikely that such an assertion message can provide enough context. But, hey, I wouldn’t really mind the string if it came at no cost, but in my view, wasting dozens of bytes in addition for the string is not justified. I prefer the traditional approach, that is, an explanation in the form of a comment:


    ...
    // Ensure that a factory exists.
    // If this assertion fails, it is highly likely that the
    // initialization order in Startup.init() is messed-up.
    // Double-check that Factory.init() is called right
    // at the beginning.
    assert factory != null;
    ...

...

// Ensure that a factory exists.

// If this assertion fails, it is highly likely that the

// initialization order in Startup.init() is messed-up.

// Double-check that Factory.init() is called right

// at the beginning.

assert factory != null;

...

Assertions are like built-in self-tests and one of the cheapest and most effective bug-prevention tools available; this fact has been confirmed once again in a recently published study by Microsoft Research. If developers cannot rely on them (because someone did forget to pass ‘-ea’ or inadvertently swallowed the assertion by catching ‘Throwable’ or ‘Error’ in surrounding code) or always have to worry about assertion code-bloat, they won’t use them. This is the true waste of Java assertions.

Get into ‘Insert’ Mode

Here I am, trying to write something. I’m sitting at my desk, staring at my screen an it looks like this:

It is empty. I just have no clue how to even start.

Are you familiar with such situations? Among writers, this is a well-known phenomenon and it’s called “writer’s block”. But similar things happen in all creative fields: sooner or later, people hit a massive roadblock and don’t know where to start. A painter sits in front of a blank canvas, an engineer in front of a blank piece of paper and a programmer in front of an empty editor buffer.

Is there any help? Sure. You can use a technique called “free writing“, which means you just write down whatever comes to your mind, regardless of how silly it looks. It’s important that you don’t judge what you write, you don’t pay attention to spelling or layout, your only job is to produce a constant stream of words — any words. This exercise will warm-up your brains and hopefully remove the block. Applied to programming, you set up a project, you write a “main” routine (even if it only prints out “Hello, World, I don’t know how to implement this freaking application”) and a test driver that invokes it.

The next thing that you do is write a “shitty first draft“, as suggested by Anne Lamott. You probably know the old saying: the better is the enemy of the good. By looking for the perfect solution, we often end up achieving nothing because we cannot accept temporary uncertainty and ugliness. That’s really, really sad. Instead, write a first draft, even if it is a lousy one. Then, put it aside and let it mature, but make sure you revisit it regularly. You will be amazed at how new ideas and insights emerge. Experienced programmers are familiar with this idea, but they call it prototyping. They jot down code, they smear and sketch without paying attention to things like style and error-handling, often in a dynamic language like Perl or Python.

So if you have an idea that you think is worthwhile implementing, start it. Start somewhere — anywhere — even if the overall task seems huge. Get into ‘insert’ mode (if you are using the ‘vi’ editor, press the ‘I’ key). Remember the Chinese proverb: “The hardest part of a journey of a thousand miles is leaving your house”.

Intended Use vs. Real Use

Often, things are invented to solve a particular problem, but then the invention is used for something completely different.

Take Post-it® Notes, for instance. In 1970, Spencer Silver at 3M research laboratories was looking for a very strong adhesive, but what he found was much weaker than what was already available at his company: It stuck to objects, but could easily be lifted off. Years later, a colleague of his, Arthur Fry, digged up Spencer’s weak adhesive — the rest is history.

Another example is the discovery of this blue little pill called Viagra®. Pfizer was looking for medications to treat heart diseases, but the desired effects of the drug were minimal. Instead, male subjects reported completely different effects — again, the rest is history.

In 1991, a team of developers at Sun were working on a new programming language called “Oak” — the goal was to create a language and execution platform for all kinds of embedded electronic devices. They changed the name to “Java” and it has become a big success: You can find it almost everywhere, except — big surprise — in embedded systems.

I would never have guessed how minute Java’s impact on embedded systems was until I read Michael Barr’s recent article, provokingly called “Real men program in C” where he presents survey result showing the usage statistics of various programming languages on embedded systems projects.

The 60-80% dominance of C didn’t surprise me — C is the lingua franca of systems programming: high-level enough to support most system-level programming abstractions, yet low-level enough to give you efficient access to hardware. If it is fine for the Linux kernel (which is around 10 million lines of uncommented source code, SLOC) it should be fine for your MP3 player as well.

Naturally, at least to me, C++ must be way behind C — Barr reports a 25% share. C++ is a powerful but difficult language. It is more or less built on top of C, so it is “backwards-efficient”. Alas, to master it, you need to read at least 10 books by Bjarne Stroustrup, Scott Myers, Herb Sutter et. al. and practice for five years — day and night. But the biggest problem with C++ is that it somehow encourages C++ experts to endlessly tinker with their code, using more and more advanced and difficult language features until nobody else understands the code anymore. (Even days after everything is already working they keep polishing — if people complain that they don’t understand their template meta-programming gibberisch, they turn away in disgust.)

But how come Java is only at 2%? Barr, who mentions Java only in his footnotes (maybe to stress the insignificance of Java even more) has this to say: “The use of Java has never been more than a blip in embedded software development, and peaked during the telecom bubble — in the same year as C++.”

Compared to C++, Java has even more weaknesses when it comes to embedded systems programming. First of all, there is no efficient access to hardware, so Java code is usually confined to upper layers of the system. Second, Java, being an interpreted language, cannot be as fast as compiled native code and JIT (just-in-time) compilation is only feasible on larger systems with enough memory and computational horsepower. As for footprint, it is often claimed that Java code is leaner than native code. Obviously, this is true, as the instruction set of the JVM is more “high-level” than the native instruction set of the target CPU. However, for small systems, the size of the VM and the Java runtime libraries have to be taken into account and this “overhead” will only amortize in larger systems. But two more properties of Java frequently annoy systems programmers: the fact that all memory allocation goes via the heap (i. e. you cannot efficiently pass objects via the stack) and the fact that the ‘byte’ data type is signed, which can be quite a nuisance if you want to work with unsigned 8-bit data (something that happens rather frequently in embedded systems). Finally, if C++ seduces programmers to over-engineer their code by using every obscure feature the language has to offer, Java seduces programmers to over-objectify their code — something that can lead to a lot of inefficiency by itself.

I don’t think that the embedded world is that black and white. I’m convinced that for small systems (up to 20 KSLOC) C is usually the best choice — maybe sprinkled with some assembly language in the device drivers and other performance-critical areas. Medium-sized systems can and large systems definitely will benefit from languages like C++ and Java, but only in upper layers like application/user interface frameworks and internal applications. Java clearly wins if external code (e. g. applets, plug-ins) will be installed after the system has been deployed. In such cases, Java has proven as a reliable, secure and portable framework for dynamically handling applications. For the rest, that is, the “core” or the “kernel” of a larger system, C is usually the best and most efficient choice.

Holistic Bug Fixing

The other day, while doing my morning jog, I was thinking about a particularly nasty bug I had been chasing for quite some time. I was wearing a portable radio and the guy on the radio was talking about Paul Simon’s all-time classic “Fifty ways to leave your lover”. He wasn’t so much talking about the song — rather about the amazing drum beat created by drummer Steve Gadd. Steve’s performance is incredible — about 100 beats per minute, it just sounds like “prrrrrrrrrr…”. Anyway, the broadcast led my thoughts astray and I suddenly thought: “How many ways are there to fix a bug?” Which brings us back to the topic of this post…

After having chased down a nasty bug, all you have to do is fix it. But a fix is not a fix, right?

Imagine this setting. There is a test execution framework that you and your team mates are using to run automated regression testing. Since your software is configurable (i. e. certain features can be turned on/off either at build time or at run-time), your tests need to be configurable, too. The framework sports a preprocessor (implemented in Perl) that gives you just that — very much like what the C preprocessor gives to C developers. Here is what a typical test script looks like:


    |File: sample.test|
    ...
    #include "config.h"

    Execute test step 1
    Execute test step 2
    #if CONFIGURED_FEATURE_A
    Execute test step 3
    #endif
    Execute test step 4
    #ifndef CONFIGURED_FEATURE_Z
    Execute test step 5
    #endif
    ...

|File: sample.test|

...

#include "config.h"

Execute test step 1

Execute test step 2

#if CONFIGURED_FEATURE_A

Execute test step 3

#endif

Execute test step 4

#ifndef CONFIGURED_FEATURE_Z

Execute test step 5

#endif

...

During execution of this test script, a log file is updated:


    |File: sample.test.log|

    # Log file for test script 'sample.test'
    PASS test step1
    PASS test step2
    PASS test step4
    ...
    FAIL test step25
    ...
    PASS test step 42

|File: sample.test.log|

# Log file for test script 'sample.test'

PASS test step1

PASS test step2

PASS test step4

...

FAIL test step25

...

PASS test step 42

This is obviously just an artificial (or at least, simplified) example, but I guess it is sufficient to explain the idea.

Now, your test execution framework probably has code like this:


    |File: testexec.sh|

    ...
    for $test in *.test ; do
        # Preprocess.
        preprocess_test.pl -I dev/project/build/config $test

        # Execute.
        execute_test.pl $test

        # Now check if there are any FAILs in the log.
        grep ^FAIL $test.log
        if (($?)) ; then
            echo FAIL $test
        else
            echo PASS $test
        fi
    done
    ...

|File: testexec.sh|

...

for $test in *.test ; do

# Preprocess.

preprocess_test.pl -I dev/project/build/config $test

# Execute.

execute_test.pl $test

# Now check if there are any FAILs in the log.

grep ^FAIL $test.log

if (($?)) ; then

echo FAIL $test

else

echo PASS $test

done

...

A loop iterates over all test scripts in the current directory. In a first step, the test script is fed through a preprocessor Perl script where the configuration business is done. Next, the preprocessed test script is executed and finally, it is checked whether the execution of the test script resulted in any failing test steps.

So far so good. But on a dark and rainy day, you find out by coincidence that some of your test scripts have not been executed at all; worse yet, testexec.sh reported success on them! The logs clearly show that nothing has been executed due to a fatal error:


    |File: mytest.test.log|

    preprocess_test.pl: cannot find include file 'config_ex.h'.

|File: mytest.test.log|

preprocess_test.pl: cannot find include file 'config_ex.h'.

You immediately know what the problem is. A couple of weeks back you added another include directive to some of your test scripts since they depend on additional configuration switches:


    |File: mytest.test|

    #include "config.h"
    #include "config_ex.h"
    ...

|File: mytest.test|

#include "config.h"

#include "config_ex.h"

...

Unfortunately, config_ex.h is located in a different directory than config.h, so the preprocessor — who is given only a single include base path (dev/project/build/config) — cannot find it.

What possibilities do we have to get rid of this problem?

Level 1: Fix the symptom.

A very simple fix would be to change the failing test script by changing the #include statement to include config_ex.h based on an explicit path:


    |File: mytest.test|

    #include "config.h"
    #include "dev/project/build/other/config_ex.h"
    ...

|File: mytest.test|

#include "config.h"

#include "dev/project/build/other/config_ex.h"

...

This would do the job. Yet, this approach is ugly: other developers (including yourself) can easily step into the same pitfall again (most likely they have already).

You would never only fix the symptom, would you? EVERYONE knows that it is bad to only fix the symptom!

But, hey, such a hack is not bad per se. Sometimes, you need a quick fix to be able to carry on. Maybe you cannot change the test execution script yourself because it is located on a remote testbot. Or your company doesn’t like the idea of collective code ownership and only Sam is allowed to change the test execution script and — unfortunately — Sam has already left for the weekend. Anyway, fixing the symptom is sometimes appropriate, but at the very least you should ensure that it will be cleaned up later, by (for instance) using TODOs that can be tracked automatically:


    |File: mytest.test|

    #include "config.h"
    #include "dev/project/build/other/config_ex.h" // TODO:2009-07-29:ralf:Quick fix to get my test running again
                                                   // (missing include base path in test execution script)
    ...

|File: mytest.test|

#include "config.h"

#include "dev/project/build/other/config_ex.h" // TODO:2009-07-29:ralf:Quick fix to get my test running again

// (missing include base path in test execution script)

...

Level 2: Fix immediate problem.

If you can, you’d better fix the bug in the test execution script directly by adding the missing include base path:


    |File: testexec.sh|

    ...
    for $test in *.test ; do
        # Preprocess.
        preprocess_test.pl -I dev/project/build/config -I dev/project/build/other $test

        # Execute.
        execute_test.pl $test
        ...

|File: testexec.sh|

...

for $test in *.test ; do

# Preprocess.

preprocess_test.pl -I dev/project/build/config -I dev/project/build/other $test

# Execute.

execute_test.pl $test

...

Now this looks good and you can claim that you “fixed the problem, not the symptom”, right? While this level 2 fix is certainly more pleasing than the level 1 fix above, I think we can do much better.

Level 3: Prevent bug from happening again.

Our level 2 fix still has shortcomings. The same mistake (i. e. forgetting to add or update an include path in the execution script) can and will lead to the same misery. So we need to safeguard against future errors.

Looking at the invocation of the preprocessor, we can clearly see that there is no error handling at all: either preprocess_test.pl doesn’t produce an exit code in case of fatal errors or our test execution script doesn’t evaluate it. So here is a potential level 3 fix:


    ...
    for $test in *.test ; do
        # Preprocess.
        preprocess_test.pl -I dev/project/build/config -I dev/project/build/other $test
        if (($?)) ; then
            echo "Fatal: Preprocessing of file $test failed."
            exit 2;
        fi

        # Execute.
        execute_test.pl $test
        ...

...

for $test in *.test ; do

# Preprocess.

preprocess_test.pl -I dev/project/build/config -I dev/project/build/other $test

if (($?)) ; then

echo "Fatal: Preprocessing of file $test failed."

exit 2;

# Execute.

execute_test.pl $test

...

Now this is slick, isn’t it? Never will a wrong or missing include path trouble you again. But the best is yet to come, please bear with me.

Level 4: Prevent a whole class of bugs.

Our previous fix makes sure that a particular kind of error will not occur again. In a highly automated environment (where only machines look at output) this is not enough. Consider what happens if somebody makes a modification to execute_test.pl that leads to a crash. In this case, no output would be produced, and hence no FAIL messages would be generated and as a result, grep wouldn’t find any FAILs.

Of course, this can only happen because of the “textual” interface of execute_test.pl. A better design would use exit codes instead of grep — 0 for no errors, 1 for “normal” test errors and anything else for fatal errors:


    for $test in *.test ; do
        # Preprocess.
        preprocess_test.pl -I dev/project/build/config -I dev/project/build/other $test
        if (($?)) ; then
            echo "Fatal: Preprocessing of file $test failed."
            exit 2;
        fi

        # Execute.
        execute_test.pl $test
        # Save exit code.
        my_error=$?
        if (($error == 0)) ; then
            echo PASS $test
        elif (($error == 1)) ; then
            echo FAIL $test
        else
            echo "Fatal: Test execution of file $test failed."
            exit 2;
        fi

        ...

for $test in *.test ; do

# Preprocess.

preprocess_test.pl -I dev/project/build/config -I dev/project/build/other $test

if (($?)) ; then

echo "Fatal: Preprocessing of file $test failed."

exit 2;

# Execute.

execute_test.pl $test

# Save exit code.

my_error=$?

if (($error == 0)) ; then

echo PASS $test

elif (($error == 1)) ; then

echo FAIL $test

else

echo "Fatal: Test execution of file $test failed."

exit 2;

...

You probably think that this only solves yet another kind of bug, but not really a whole class of bugs. What if somebody someday adds another step and forgets to check an exit code again? Wouldn’t we get the same problem again? We would, of course, at least until we pull out our level 4 laser gun: post-condition checking.

What our test execution script actually promises to do is this: “If you give me a set of N test scripts I will give you back a set of P passed test scripts and a set of F failed test scripts. Either P or F maybe zero but P + F is always N”. This is the post-condition and it holds as long as the pre-conditions (e. g. well-formed test scripts) are respected. So here we have our (hopefully) bullet-proof level 4 version:


    test_glob=*.test

    # Get total number of test scripts.
    tests_total=`ls -1 $test_glob | wc -l`

    for $test in $test_glob ; do
        # Preprocess.
        preprocess_test.pl -I dev/project/build/config -I dev/project/build/other $test
        if (($?)) ; then
            echo "Fatal: Preprocessing of file $test failed."
            exit 2;
        fi

        # Execute.
        execute_test.pl $test
        my_error=$?
        if (($error == 0)) ; then
            $(($tests_ok++))
            echo PASS $test
        elif (($error == 1)) ; then
            $(($tests_bad++))
            echo FAIL $test
        else
            echo "Fatal: Test execution of file $test failed."
            exit 2;
        fi

        # Check post-condition.
        if (( $((tests_ok + tests_bad)) != tests_total )) ; then
            echo "Fatal: Not all tests were executed."
            exit 2;
        fi

        ...

test_glob=*.test

# Get total number of test scripts.

tests_total=`ls -1 $test_glob | wc -l`

for $test in $test_glob ; do

# Preprocess.

preprocess_test.pl -I dev/project/build/config -I dev/project/build/other $test

if (($?)) ; then

echo "Fatal: Preprocessing of file $test failed."

exit 2;

# Execute.

execute_test.pl $test

my_error=$?

if (($error == 0)) ; then

$(($tests_ok++))

echo PASS $test

elif (($error == 1)) ; then

$(($tests_bad++))

echo FAIL $test

else

echo "Fatal: Test execution of file $test failed."

exit 2;

# Check post-condition.

if (( $((tests_ok + tests_bad)) != tests_total )) ; then

echo "Fatal: Not all tests were executed."

exit 2;

...

I used a testing example to show you the many facets of bug-fixing but these principles equally apply to “real” source code. Here is a summary of what I wanted to show:

– it is fine to do a quick and dirty “symptom-level” bug-fix every now and then — as long as you are explicit about it.
– Repeatedly zoom out in the search of the root cause, zoom out as much out as possible, but stay within your circle of influence (“We wouldn’t have all of these bugs if these spec coders were fired” is clearly beyond your circle of influence).
– Fortify your fix by making sure that the same or similar bugs will not creep in again.

Approxion

Code – People – Everything

Category Archives: Code

Poor Man’s DIP

Regular Expressions — Sweetest Poison

The Safety Net That Wasn’t

Get into ‘Insert’ Mode

Intended Use vs. Real Use

Holistic Bug Fixing