Category Archives: C/C++/Embedded

Single-step your code!

watch_your_step.jpegI was so proud: after I had gotten rid of some minor compile-time issues (ie. typos), my unit tests ran over my newly written code without any errors. Granted, the changes I made comprised less than 500 lines, but still, it meant something to me. Feeling happy and content, I hummed R. Kelly’s “The World’s Greatest” while carrying on with other work.

A few days later, I wrote some black-box tests and — to my big surprise — I got a couple of “fails”. After some debugging, I found more than five bugs in the code that had passed my unit tests so nicely. I was completely puzzled. What went wrong? Why didn’t my unit tests catch these trivial bugs?

As it turned out, I forgot to register my test code with the CppUnit test framework, so my tests were not executed at all! Once I had added the missing line

to my test suite class, all five bugs surfaced in an instant. I was so angry! My first reaction was to curse CppUnit: with JUnit this would not have happened. I would have used the @Test annotation and my test would have been auto-registered — unless I had forgotten to tag it with the @Test annotation…

Later that day, I realized that the actual mistake was a violation of Steve Maguire’s powerful principle: “Step through every line of code that you added or changed with your debugger”. Had I set a breakpoint in my code, I would have seen that it was never executed.

Years ago, I used to be a passionate follower of this principle, but somehow unlearned it, largely — I presume — due to the rising unit testing hype. Don’t get me wrong: I think that unit testing is great (and black-box testing is great, too), but it is no replacement for single-stepping through your code.

Reviewing your own code is good, but actually stepping through your code is much cooler. The cursor showing the next statement to be executed focuses your attention and you really experience the program flow instead of having to make guesses about it. Further, you have all the data available and you can even modify it. You can invoke functions from your debugger (e. g. ‘call myfunc()’ in gdb), play with different combinations of parameters, member variables and the like, re-execute just executed code without restarting the debugger by setting the “next statement to execute” a couple of lines up. Probably the biggest benefit is that you get a deeper understanding of your code: maybe you step over a library call that works as expected but takes two seconds to execute; or you observe that you unnecessarily visit the remaining elements of a collection after you found what you’ve been looking for — no unit test would give you this kind of insight.

Often, it is difficult to unit test for certain failure causes, like malloc() returning NULL on out-of-memory conditions:

How would you unit test that? Such error handling code is usually left untested and is the reason why so much software crashes under heavy load. While you’re in a debugger, testing is easy: just set the “next statement to execute” to the error-handling code (right before stepping over the call to malloc), step through it and convince yourself that it works as expected. Again, how would you unit test that? Answer: factor out the error-handling code:

Now, you can call your error handling code from your unit tests. Still, testing the code by using the debugger is easier, doesn’t require any context set-up and gives more insight.

It helps, of course, if you write your code such that debugging is as painless as possible. A line like this is fine, of course:

but writing it like this is (probably) more readable and you can inspect (and alter) intermediate values in your debugger:

If you think this is too much typing, get better at typing and/or get yourself a better editor. If you think this wastes code, bear in mind that we don’t live in the 1970s anymore. If you think that you can always step inside convertSensorReading() and inspect/change the parameters there, you are right, at least as long as you have access to the source code of the function you want to step into.

Macros are bad since you cannot step into them. Use them only if you have no other choice; instead prefer (inline) functions and template functions: they come with the same efficiency advantages and you get type-safety and debuggability as a bonus.

And, speaking of the preprocessor, stop using #define’d symbolic constants: all preprocessor symbols are inlined during the preprocessor phase and I don’t know of any debugger that can resolve their values. Instead, use enums or, even better, const variables:

Mouse over MIN_COUNT in your debugger and you will see nothing; mouse over MAX_COUNT and you will get “the answer” ;-)

Automated unit tests are great, but stepping through your code gives quick feedback and a lot of insight into what is happening at run-time. Sometimes, hard-to-write unit tests can be avoided by consequently following the “step through all of your code” paradigm. As a simple guideline write unit tests — if you like — before starting with the implementation. Then single-step your code by executing your unit tests in a debugger and watch your step.

Poor Man’s DIP

Sometimes a lower-layer component needs to invoke a service on a higher-layer component. Consider, for example, a timer component (T) that periodically calls a handler function in a user-interface component (U). Component T is probably part of the OS kernel and thus clearly “lower” than component U.

In this setting, there is an upward dependency from T to U; such upward dependencies are undesirable, at least if they are bound at compile-time. Implemented naively, there is a hard-coded call to the UI component like this:

Dependency lines that point up in a component diagram are not just ugly: they denote that the lower-layer component cannot be independently reused and tested.

The classic dependency inversion principle (DIP) is usually applied to solve this problem: instead of having a hard-coded function call in the timer to the handling component, the timer calls back on a function pointer that is set to the timer-handling routine in the initialization code of the higher-layer component:

Note that there is still a T to U dependency, but now this dependency is only present at run-time, which is OK, as this doesn’t hinder reuse and testability. The U to T compile-time dependency is quite natural and doesn’t violate any design principles. So, the undesirable compile-time dependency has been successfully inverted. The classic DIP recipe looks like this:

1. In T export a callback interface
2. In U implement the callback interface
3. In U (or some init/startup code) register the implementation with T
4. In T call back on the interface

When you are working in a constrained environment like embedded systems, you often cannot afford the memory and performance overhead that accompanies such late (run-time) binding, so you might try what I call the “Poor Man’s DIP”: simply export a “callback interface” as a function prototype and “implement” it by defining the function in the upper-layer component:

This pattern gives you most of the advantages of the classical (run-time bound) DIP but doesn’t incur any overhead. It can (and should) be applied whenever there is a dependency from a lower-layer component to an upper-layer component that doesn’t need to change at run-time but stays fixed throughout the lifetime of the application.

The Pizza Box Problem

Consider this real-world problem: In a UMTS network, short messages (SMS) are comprised of a protocol header and the actual text message. The text message can be encoded in many different formats, but for the sake of this example I want to focus only on two encodings: 8-bit characters and 7-bit characters.

Since the standard SMS alphabet only uses 7-bit characters, it often makes sense to use a 7-bit encoding for the text message, as you can squeeze more characters in the available 140 octets (an octet is a byte comprising 8 bits; remember that it is not specified how many bits there are in a byte).

In the header, there is a so-called ‘user data length’ element that tells how many characters the text message comprises. The stress is on characters – you don’t know over how many of the following octets the message is distributed. But of course, you can find out. A so-called ‘data coding scheme’ octet tells you whether the text message uses 7-bit encoding or 8-bit encoding. Thus, calculating the total number of used octets should be straightforward:

This code is short, simple and – unfortunately – wrong. I’ve seen this mistake in several guises and the reason for this bug is that programmers obviously don’t know about what I call the ‘Pizza Box Problem’. It goes like this.

pizza_boxes.jpgYou are a pizza delivery guy and you have to deliver pizza (stored in pizza cases) to your customers. To keep your pizzas hot, you stuff them into thermal bags, each of which is capable of holding 8 pizza boxes.

How many bags do you need to deliver, say, 21 pizza boxes?

Every pizza delivery guy immediately knows the answer: 3. It is not 21 / 8, since integer division causes the result to be equal to 2!

What you need is this: if your division yields a fractional part, you want to increase the result of the integer division by one. You could resort to floating point arithmetic (and use the ceil() function, for instance) but that would be inefficient.

The trick is that you add the divisor minus one to the dividend before performing the integer division:

It works like this: if ‘boxes’ is already evenly divisible by ‘bag_size’, adding one less than the ‘bag_size’ doesn’t change the overall result; otherwise, the dividend will be increased such that the next ‘bag_size’ multiple is crossed:

Applying what we have just learned to our SMS problem, we conclude that the code in the if block should look like this:

We have ‘char_count’ * 7 bits (pizza boxes) that we want to store in octets (thermal bags) of size 8.

Enjoy your pizza!

The Return of the Pizza Delivery Guy

[update 2009-03-29: The equation

can obviously be simplified to

Here is the proof:

since (char_count * 8) / 8 = char_count and 7 / 8 = 0 we get

— end update]

const static or static const or what?

This issue crops up time and again: somebody looks at code like this:

and complains loudly that this was not legal C code: “static”, he screams, “must come before const”:

I’ve never had a compiler that generated wrong code from the first version. I guess this myth is nourished by the fact that GCC and PC-Lint issue odd warnings when confronted with const-first declarations:

(PC-Lint’s warning message is particularly weird, isn’t it?)
Both tools process the second version without any complaint.

I really don’t know where this rumor comes from. Maybe it was true in K&R C, but C++98 and C99 certainly don’t care about the order of qualifiers — they don’t even care about the position of the type!

is perfectly legal, but

still complains like before (at least when using the -Wall -W combination; -Wall alone doesn’t produce this warning).

The C99 grammar (see 6.7 Declarations) clearly says:

so the order obviously doesn’t matter.

Personally, I prefer the first version, because the fact that a variable is a constant is more important to me than its scope. Still, it is probably wiser to use the second version: not because the first one is ill-formed, but because of misbehaving compilers and static analysis tools.

[update 2009-05-20: Christian Hujer found the missing link: chapter 6.11. (“Future Language Directions”) of the ISO C99 standard clearly says:

“The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature.”

There you have it. Forget my crazy preferences, write future-proof code by following the standard’s advice. – end update]

TODO or not TODO

Are TODOs in source code good or evil?

In my view, TODOs are a tool and like any tool, it can be misused. I do agree with most of the critism but if they are used with care and if they are actively managed they can improve productivity.

Sure, TODOs tend to accumulate and, yes, there are selfish developers who use them to acquit themselves from checking in sloppy code (which they will never clean-up). Yet, there are situations where they are appropriate.

Sometimes it just makes sense to check-in code that is not 100% done. For instance, if all of your requirements are not settled yet or you are uncertain about some boundary cases. Here is my litmus test: if your unfinished code already provides value to somebody else and it doesn’t break existing tests it is OK to check it in. Be it a tester who needs a running system to develop test cases against or a coder who needs your class to continue her work. Maybe it’s a sales person who needs a quick tentative solution for his trade show. Unfinished code is especially appreciated in the ‘bazaar’ software development model.

If you accept the notion that unfinished code is useful if it already provides value, you probably also agree that it is not a good idea to keep the list of open issues in your head; even scrap paper is only slightly better. To me, it is much better to record ‘tentativeness’ very close to the code.

However, it is essential to manage those TODOs — if you don’t manage them, they will eat you alive.

In order to manage you need to track, and tracking is much easier if your TODOs follow a standardized format. Once you have a standardized format you can easily extract metrics. If everybody used their own style of TODOs it would be very hard to control them. A mix of TODO, FIXME, BUGBUG, ???, ### is TODO hell.

After years of experimenting, I’ve found this style of TODO comments to be the best:

Let me briefly describe why I consider this format to be superior.

First of all, it starts with ‘TODO’ in all capital letters, which is good, as it is recognized by many editors/IDEs and highlighted in a special manner. This makes it easy to recognize TODOs for human beings when browsing source code.

Next comes the date in ISO 8601 ‘extended’ format. Not only does this tell you when the TODO was added: it allows you to easily sort your TODOs:

The sorted output will show you which TODOs have been around for a long time, maybe for too long a time.

After the ISO date comes the name of the TODO owner. I suggest you use login names, but it doesn’t really matter as long as it is unique and can be easily mapped to a real person’s name. Due to the fact that the name is surrounded by colons you can build a list showing who owns the most TODOs:

Finally, there is a brief summary that describes the reason for the TODO; if necessary, additional text follows on the next lines. Describing a TODO is essential: it tells developers why it is there; leaving it out is akin to putting up a ‘Watch out’ warning sign without giving any additional clues.

In order to extract information automatically, the standardized format must be enforced. This grep command will catch most ill-formed TODOs:

In addition to pinpointing typical TODO comment keywords it ensures that the date is well-formed and that the owner and description fields are not omitted.

Ideally, you run such scripts automatically every time a developer checks-in changes. If you don’t have testbots you should at least run them as part of your daily build procedure. In case you don’t even have daily builds (pity you!) it is the job of the project lead to execute them regularly.

TODOs are useful for modern, release-early-and-often software development processes; by standardizing their format, all of the common disadvantages can be overcome. If you don’t use them yet, I suggest you put them near the top of your TODO list.

How to Statically Initialize Arrays with Arbitrary Values

[Warning: Low-level C stuff ahead!]

Imagine a situation where you want to statically initialize an array with values different to 0:

This approach works, at least until someday you want to increase the array size to, say, 200. In this case, you have to add 192 times “42, ” to the initializer list. What a dread!

Everything would be easy, if you wanted to zero-initialize the array:

With zero-initialization, all you have to do is specify the value of the first element – all of the remaining elements will automatically be set to zero.

But sometimes you need a value different to 0 and you don’t want an additional call to “memset()” at run-time. Or you cannot use “memset()” because your array is stored in a read-only ROM segment and you cannot change the array’s values dynamically.

Basically, what you want is this:

Then, you would only have to make a single change to alter the size of the array (or the initialization value).

Alas, it turns out that it is impossible to define a macro that does the job we expect from “STATIC_INIT”. Think about it for a while. How would you solve this problem?

Sometimes, it is possible to replace a call to an impossible macro with the inclusion of a header file; I call this technique the “Replace macro call with file inclusion” trick:

The two defines represent the macro parameters and the inclusion of the header file represents the actual macro call.

You probably wonder what the contents of “static_init.h” are, but it’s instructive to spend some time on this problem yourself. Afterwards, you can have a look at my solution.

Note that this approach is not limited to single values – you can also use it for more complex initializations. For instance, if you need an alternating sequence of ’42’ and ’13’ you would do this:

I’ve also used the “Replace macro call with file inclusion” trick to encapsulate #pragmas and other compiler-specific features. Consider the case where you are working on a multi-platform project that uses different compilers. Consider further that you have a piece of code that generates compiler warnings and you want to locally turn compiler warnings off:

Now the problem with this approach is that #pragmas are compiler-dependent, which means that you will end up with something like this:

Not only does this litter the code – it is also a maintenance nightmare.

Usually, the solution is to encapsulate compiler specific features in #defines; alas this obvious strategy doesn’t work for #pragmas:

So it is time to roll out our trick once again:

Where, for instance, “warnings_off” looks like this:

You probably won’t need this trick very often, but when you do, it is good to know that it’s there.