Category Archives: Code

Dangerously Confusing Interfaces II

confused.jpgLast week was a sad week for me. A bug in my code made it into the final version that was shipped to an important customer.

When something like this happens, it is almost always due to fact that there is a higher-level “bug” in the software development process, but I don’t want to go there. Instead, I want to focus on the technicalities. Once more, I got bitten by another instance of a dangerously confusing interface. Let me explain.

I’ve always had a liking for interfaces that are self-evident; that is, one knows immediately what’s going on by just looking at how the interface is used — without having to consult the documentation. Let me give you a counter example, an interface that is far from what I desire:

The interface to ‘sort_temperatures’ is not at all self-explanatory. What is clear is that it sorts ‘values_count’ values, which are obviously temperature values, but what the heck does the ‘true’ argument stand for? In order to find out, you have to look at the declaration of ‘sort_temperatures’ and/or its API documentation:

Now it is clear what the boolean parameter is for, but at the cost of having to take a detour. Maybe you got so distracted by this detour that you forgot what you originally were about to do.

Programmers often make this mistake when designing interfaces. They use boolean parameters when they have two mutual exclusive modes of operation. This is easy for the implementer of the routine, but confusing to the ones who have to use it. Contrast this with this alternative:

By replacing the boolean parameter with symbolic constants you not only make the code more readable, you also open it up for future extension: adding more modes becomes straightforward.

Now have a look at this code and try to guess what it does:

That’s fairly simple: ‘calc_hash’ calculates a SHA-256 checksum over ‘mydata’ (which is ‘mydata_len’ bytes in size) and stores it into the provided buffer ‘hash’. The length of the hash is stored to ‘hash_len’ via call-by-reference.

But is this code correct? The answer is — you guessed it — no. If you can’t spot the bug, you are in good company. You can’t see it with just the information given. ‘calc_hash’ interface is not self-evident.

I wrote this code more or less one year ago. It contains a bug that remained dormant until the product was in the hands of our customer. And it is there because of a silly interface.

The last (pointer) parameter ‘hash_len’ actually serves as both, an input AND output parameter. When you call ‘calc_hash’ it is expected that ‘*hash_len’ contains the size of the provided ‘hash’ buffer; on return ‘*hash_len’ will contain the actual number of bytes used by the hash algorithm; that is,the size (or length) of the SHA-256 checksum stored in ‘hash’. The whole idea behind this is that ‘calc_hash’ (or rather its author) wants to offer protection against buffer overruns — for cases where the provided ‘hash’buffer is not large enough to accommodate the checksum.

So the problem here is that ‘hash_len’ (being a stack variable) is not properly initialized to ‘HASH_SHA256_LEN’; it’s value is more or less arbitrary. If it is by chance greater or equal to 32 (the value of the ‘HASH_SHA256_LEN’ symbolic constant) everything is fine and the checksum is correctly calculated. If it is not, ‘calc_hash’ returns ‘false’ and an error is reported.

For as long as a year — by sheer coincidence — ‘*hashLen’ was never below 32 (which is not that unlikely, given that ‘size_t’ can accommodate values ranging from 0 to 4,294,967,296); but in the hands of the customer — and very much in line with Murphy’s Law — it happened.

OK, accuse me of not having initialized ‘*hashLen’ properly, accuse me of not having read the API documentation carefully. Maybe I did read the API documentation and then I was interrupted. I don’t know. But what I know for sure is that this bug would have never happened if the interface had been clearer.

The first problem with ‘calc_hash’ is that ‘hash_len’ is an IN/OUT parameter, which is unusual. I’m not aware of any function in the C/C++ standard library (or the POSIX libraries) which makes use of IN/OUT parameters. Since the input value (not just the output value) is passed by reference, neither the compiler nor static analysis tools like PC-Lint are able to detect its uninitialized state. One obvious improvement is to pass the length of the buffer by value:

Granted, there is now one more argument to pass (‘hash_buf_len’), but if the unlucky programmer ever forgets to initialize it, the compiler will issue a warning.

But let’s not stop here. I’d like to pose the following question: what good is the hash buffer length check, anyway?

In my view, it is not at all necessary. The length of a particular checksum is constant and known a priori — that’s why the hash buffer is allocated statically like this:

What additional benefit does a developer get by providing the same information again to the ‘calc_hash’? Isn’t this redundancy that just begs for consistency errors?

And what use is the output value that tells how long the checksum is? Again, this should, no it MUST be known a priori, there should never be a mismatch between what the caller of ‘calc_hash’ expects and what is returned. Of course, there can be error conditions but if ‘calc_hash’ fails, it should return ‘false’ and not a length different to what the caller expects.

Note that ‘calc_hash’ is not at all comparable to functions in the C API that add an additional output buffer length parameter like ‘strncpy’ or ‘snprintf’. These functions carry the length parameter for completely legitimate safety reasons because the total length of the output is usually not known a priori (for instance, as some input may stem from a human user and one has little control over how many characters (s)he will enter).

Based on these arguments, I dismiss the ‘hash_len’ parameter altogether and propose the following simplified interface:

and would use it like this:

Easy to read, hard to get wrong. It is impossible to forget to initialize a variable that just isn’t there.

Circular Adventures II: A Ring within a Ring

“Is all that we see or seem
But a dream within a dream?”
–Edgar Allan Poe

In part one of this series I’ve discussed the mod operator and demonstrated that it is a flexible tool for solving circular problems. I’ve also explained that the mod operator, as implemented by most (system) programming languages, is not able to handle problems involving negative dividends. As a remedy, I’ve introduced an efficient alternative of the mod operator for divisors that are base-2 numbers.

Today, I want to have a look at some first circular use-cases. Again, we meet our good old friend, the ring buffer.

Sometimes, you need to store data and keep previous versions of that data, but the memory available is limited. In such situations, an obvious approach is to use a ring buffer.

In this example, we have a ring buffer (again, pretend that the last and first element are joined) comprising eight (N = 8) fixed-size slots for storing information. Each slot contains a data set. We start by writing to slot 0, then slot 1 and so on; when the ring buffer is full (that is, slot 7 has been written to), we wrap around to slot 0 and overwrite what has been stored there previously, thus kicking out the oldest data set. By using this scheme, we not only have access to the most recent data set, but also to up to seven predecessors.

In addition to the actual ring buffer, we need a variable to keep track of the most-recently updated (“MRU”) slot number; that is, an index that points to the “newest” data set. Based on the current value of the MRU we can easily calculate which slot is to be updated upon the next write access:

Updating the ring buffer is just one part of the story, reading another. For instance, to get the predecessor of the current data set we need to calculate the slot index based on the previous value of MRU. Again, by using the mod operator, this can be achieved easily:

Dumping the whole ring buffer in reverse chronological order is achieved with this simple loop:

Note that for this to work, the mod operator has to be a mathematical modulo operator since the dividends may become negative. If you don’t have access to a true mathematical mod operator, you should make N a base-2 number and use the modulo replacement operation given in the previous installment (mod_base2).

Now let’s make our ring buffer example a bit more interesting. Let’s pretend our ring buffer must be stored in persistent, non-volatile memory (NVM), like flash. One problem with flash memory is that it wears out over time: the more often you update flash memory, the shorter the data retention time gets. (It’s like using a bucket with a crack to carry water from one place to another and the crack gets bigger with every lap you go.)

Therefore, storing the most-recently updated value in a single flash variable is not a good idea, since this memory location would wear-out quickly: every update to a ring buffer would require an update to the MRU variable, which means that the update load on the MRU variable is N times the update load on the individual slots.

A frequently used countermeasure is to have a “distributed MRU variable”, which is achieved by prefixing or suffixing every slot with a so-called “age counter” field.

The age counter starts at zero and is incremented with every write to a slot. So after three updates to an initially empty ring buffer, we would have this:

After ten updates, the ring buffer would look like this:

The slot containing the age counter with the highest value must be the MRU slot, slot 1 in our example. For obvious reasons, the value range of the age counter must be larger than the slot range of the ring buffer, but sooner or later, even the age counter will wrap around. After many writes to the ring buffer we might get this picture:

So which slot is the MRU slot now? The age counter is obviously an unsigned 8-bit type, since it wrapped around at value 255. Starting with 253 (the age counter value of slot 0) and taken modulo 256, all counter values increase by one up to 3. But the next value, 252, is not a successor of 3; it is too far ahead (or rather behind). This means that the slot containing age counter value 3 must be the most recently updated slot (which is 6).

The MRU is the basis for all ring buffer operations and it makes sense to maintain its value in a RAM variable. But how can one determine the MRU programmatically, for instance after a reset when all RAM content is lost? Here is a method to search for it:

It is a good exercise to convince yourself that this algorithm actually works. Think about the initial “empty” ring buffer case where all age counters are zero as well as the case where the “highest” age counter is stored in the last slot.

Luckily, the dividends in this algorithm cannot become negative, so if we wanted to code this in C, we could use the ‘%’ operator. But since we are doing a modulo 256 operation, we can apply the optimization given in the first installement, which is replacing the modulo 256 operation with a (uint8_t) cast:

Once we are in possession of the MRU, we can work with this ring buffer just like we did with the previous ring buffer. The only thing we have to do in addition is to memorize the highest (or current) age counter value for future updates to the ring buffer:

Finding the MRU in our second ring buffer was not particularly difficult — we just needed to compare the age counters until we found a “discontinuity”. We knew that the slots are updated, one after another, cyclically, with increasing age counter values. Things get more complicated when there is no such strict sequential ordering. Let me explain.

Imaginge you want to keep track of when certain events occur. For each kind of event, you would store a timestamp from a 32-bit timer source that indicates when this particular event occurred:

Further, assume that the 32-bit timer can wrap around (otherwise we wouldn’t have a circular problem), just like our 8-bit age counter wrapped around in our ring buffer example. How would you know which event occurred first or last? How would you calculate the relative time between two events? This and more is the topic of the next part of this series. Stay tuned…

More circular adventures…

Circular Adventures I: The Modulo Operation

“All things from eternity are of like forms and come round in a circle”
–Marcus Aurelius, AD 121-180

In computer science, just like in real life, many things go round in circles. There are ring buffers, circular lists, wrap-around counters and timers, just to name a few. In this multi-part article, I want to explore the theory behind circular behavior. Equipped with this knowledge, I attempt to provide solutions to “recurring” problems.

Let’s start with the very basics; that is, the movement of an index within a ring buffer.

Assume you have a ring buffer containing N = 10 elements and an index i:

(For simplicity, I depict circular structures as a flat arrays — just imagine the first and last element are joined to form a ring.)

Question: How do you advance the index by a signed offset n, taking wrap-around into account?
Answer: i = (i + n) mod N

Many circular problems like ring buffer operations can be elegantly expressed by using the ‘mod’ operator. Alas, only theoretically, as in practice, there are at least two problems surrounding the ‘mod’ operator: portability and efficiency.

The ‘mod’ operator, as it is used in the equation above, is the mathematical ‘mod’ operator — the one that is used in number theory. Programming languages, however, sport many different flavors of the ‘mod’ operator, which vary in the way they treat negative operands.

All of them obey this equation:

Yet the results for negative operands depend on whether the ‘div’ operator rounds towards zero or negative infinity. If the ‘div’ operator rounds towards zero, the expression -5 div 2 yields -2; if it rounds towards negative infinity the result is -3.

The following example illustrates how this influences the calculation of -3 mod 8:

This means that -3 mod 8 might be -3 or +5 depending on the style the implementers of your programming language have chosen. Some languages (like Ada) have even two modulo operators (rem and mod), while others allow to control the behavior at run-time (Perl: ‘use integer’). Still others (like C90 and C++98) leave it as ‘implementation-defined’. Have a look at this for a nice overview on how different programming languages implement the modulo operator.

(And just in case you haven’t guessed it already: C/C++/Java’s approximation of ‘mod’ is ‘%’ and ‘div’ is ‘/’.)

Now, if you only travel through a circle in positive direction (by adding positive offsets), either rounding style will do; however, if you intend to travel backwards (or calculate differences between indices that yield negative values), only the ‘negative-infinity’ modulus operator will do the job; that is, wrap (in the 10 element ring buffer example) from -1, -2, -3 … to 9, 8, 7 …, respectively.

How about efficiency? As can be seen in the table above, the calculation of the remainder is based on a division operation, which is, — alas — a rather expensive operation on many platforms. On most processors, the ‘div’ operation is many times slower than other primitive operations. As an example, some popular ARM processors don’t even have a ‘div’ instruction, so division is done through a software library which consumes up to 40 cycles. Compare this to the 5 cycle multiplication instruction and the other instructions that typically execute in a single cycle.

Due to these portability and efficiency issues, many developers shun the modulo operator in performance-critical code. Instead of

they use the computationally equivalent

or, if it guaranteed that our index is not more than N – 1 off the ends

But this approach is still not very efficient, since it uses branches and on modern multi-stage pipelined processors a branch might invalidate already prefetched and preprocessed instructions. Isn’t there a true mathematical ‘mod’ operator out there that is also efficient at the same time? You bet there is, but only if N happens to be a base-2 number.

If you are lucky and N is a base-2 number (like 64, 1024, or 4096) i mod N is computationally equivalent to

where ‘and’ is the binary and operator (‘&’ in C, C++, and Java). This works even if i is negative, but requires that your environment stores negative numbers in two’s complement fashion, which is the case for pretty much all systems that you will ever program for.

As an example, consider -2 mod 16. -2 is 0xFFFFFFFE in 32-bit two’s complement notation and 16 – 1 corresponds to 0x0000000F:

which is 14, exactly, what a mathematical ‘mod’ operator would yield.

The ‘and’ operation is easy to compute for any processor. In C/C++ we might define an optimized ‘mod’ function like this:

It is a good idea to use this optimization whenever possible. Sometimes, it makes even sense to round-up the size of circular structures to a base-2 boundary, just to be able to use this kind of optimization.

A variant of this theme is casting a (potentially negative) value to an unsigned type in C/C++. Casting x to an uint8_t is equivalent to calculating x mod 256. While most optimizing compilers will generate the same code for x & 0xFF and (uint8_t)x, there is a certain likelihood that the latter might be a bit faster. The obvious disadvantage of casting to unsigned is, that this approach practically limits N to the value range of uint8_t, uint16_t, and uint32_t, which is 256, 65536, and 4294967296, respectively. Because of the fact that the performance gain in only hypothetical, it is usually much wiser to go for the ‘mod_base2’ optimization, though.

This concludes my first installment, which is mainly about some of the many facets of the ‘mod’ operator. Just like the constant PI appears in all circular problems in mathematics, some variant of the ‘mod’ operator appears in all circular problems in computer science. Next time, I will explore what it means to have two indices into a circular structure, which turns out to be the foundation of many interesting circular use cases.

More circular adventures…

Using PC-Lint in a Linux Environment

PC-Lint is my favorite non-FLOSS tool. Not only does it find bugs and portability issues early in the development cycle: by using it regularly and listening to its words developers can significantly improve their C/C++ programming skills.

This post details how to run PC-Lint (which is normally intended for DOS/Windows environments) in Linux, saving developers from having to buy FlexeLint, the much more expensive Unix/Linux version.
Continue reading

Dear Compiler, What’s My Size?

Sometimes, when you work with classes or structs, your code is not only dependent on particular fields (attributes, member variables) but on all of them at the same time.

The copy constructor, assignment operator, and the equality operator are prominent examples: if you add a field to your class, these functions need to be maintained in parallel; that is, all of them potentially need to be updated to support this new field. And we all know that it is so easy to forget to update one of them…

I got bitten by this when I used a class that lacked an operator to stream out all of its members. Since I didn’t own the source code of the class (it was part of a library), I couldn’t modify it directly. (I actually could have modified it, but this would have been outright stupid). Instead, I decided to add the missing functionality in my own code. Here is a simplified example:

Here’s what I added in my code:

It wasn’t long until I got a new version of the library in which “Fruit” was extended by another attribute, “weight”. Since my code didn’t support this new attribute yet, it was broken. Even worse: by (unconsciously) ignoring the new attribute my code failed silently at run-time! Just in case you didn’t know: If there is one thing I absolutely can’t stand it’s if CODE FAILS SILENTLY AT RUN-TIME.

Something had to be done to prevent this from happening again. After a while I came up with the idea of checking the size of class Fruit at compile-time against a hard-coded value. Even though this wasn’t perfect (you can easily think of cases where you would get false positives) it gave me a good chance of catching similar modifications next time I rebuilt my own code:

(Note: static assertions are like plain assertions but checked by the compiler at compile-time, not at run-time. I’m using a C++ 2011 feature here, but you can implement static assertions yourself, if your compiler lacks support for them. Read more here.)

Now, if the compiler flagged a failed assertion I would review the public interface of class Fruit and look for any new attributes, update my code along with the reference value of the size and everything would be fine. But how would I get the reference size value (i. e. 24) in the first place?

Calculating the size of a class or struct by hand is not trivial. There can be hundreds of members, including union members, padding bytes, hidden pointers to vtables and the like. A viable solution is to add a temporary print statement that writes sizeof(Fruit) to stdout, but that would require linking and running the program. And what if you are working on an embedded system where there is no stdout or where it is impossible to inspect the value in a debugger?

No, these approaches are too slow and cumbersome. If the compiler knows the size of Fruit it should be able to tell me. But how?

This question haunted me for quite some time. I devised a cunning plan: why not add a statement containing a syntax error that somehow forced the compiler to spit out the size of my class. Among other things, I tried this:

This hack exploits the fact that in C/C++ it is illegal to declare an array of negative size. I had hoped to get a message like this:

Alas, all compilers that I checked-out just gave a worthless error message among these lines:

There was no clue as to what the actual size was. Too bad. The compiler knows the answer but it doesn’t want to tell me!

I was just about to give up when I suddenly had another idea: what the compiler would always tell me is the line on which an error occurred. Why not attempt to create many arrays, where the size is decremented, starting from sizeof(Fruit):

I would just include this header file and the compiler would compile line after line until it hits a syntax error due to a negative array size. When it does, it would report the corresponding line number which in turn corresponds to the size of Fruit. G-r-e-a-t!

Generating this include file (getsize.h) was straightforward: I cobbled together a simple Bash for-loop:

I used it like this:

Compiling this code yields the following error messages (the exact messages may vary, depending on the compiler that you use):

The first line that is reported as erroneous is the size of class Fruit, 24 in our example. Once you know the size you just need to update the reference value and comment out the two lines to have them ready when you need them again.

You might find all of this a bit hacky, but I happen to like it very much. It serves a good purpose and nicely solved my problem. On top of that, it gave me a really good time. What more can one expect?

[Update 2015-12-06: Bernd Hanebaum sent me a nice implementation of this idea that is based on template metaprogramming — many thanks, Bernd!]

Habits for Safer Coding

bear.jpgIf you have been coding in C and C++ for some time, you know that it is easy to introduce bugs: type in a single wrong character and you wreak havoc. It is therefore a good idea to avoid using certain (dangerous) language features or at least using them in a way that is safer.

My favorite example is always putting braces around blocks, even if the block only consists of a single statement:

I’ve witnessed cases where really experienced programmers omitted these two “redundant” characters for brevity only to be later faced with this classic bug:

It is so easy to make this mistake — especially during merging — that MISRA (Guidelines for the Use of the C++ Language in Critical Systems) made curly braces around if/else/do/while blocks a “required rule”. Note that by following this rule, you are also protected from “dangling else” mistakes:

In general, I follow such “best practices” if they meet these criteria:

– They must solve a real problem
– They must not require any thinking

The “always-put-braces-around-blocks” rule does solve a real problem: I’m not aware of any compiler (not even Sun’s Java compiler) that warns you about the mistake shown above. (Of course, static code analysis tools (like PC-Lint) do check for these errors, and many, many more, but unfortunately only few developers use them.)

Does it require any thinking? I don’t think so. The only thing I have to remember is this: whenever I use ‘if’, ‘else’, ‘for’, ‘while’, ‘do’ I put curly braces around my code. That’s all!

Maybe you have come across this convention:

Advocates of this style reason as follows: in C/C++ you can easily — accidentally — use an assignment operator when you actually wanted to use an equality operator. Since you cannot assign to a constant, the compiler will refuse to compile your buggy code:

I must admit that I’ve never liked this convention. One reason is that it looks unnatural. Another is that it lacks symmetry; that is, it is not consistent with other comparison operators. What if you need to change the comparison some day to also include values greater than MAX_RETRIES?

Or what if MAX_RETRIES is changed from a compile-time constant to a “normal” variable whose value is obtained from a config file? Do you swap then? In fact, this convention doesn’t provide any help for mistakes in variable/variable comparisons, which occur also very frequently in day-to-day programming:

For my taste, the utility of this convention is rather limited and it requires too much thinking: “If I compare a compile-time constant (or a literal) against a variable using the equality operator, I put the constant first”. Isn’t this a clear hindsight rule? It’s a bit like attaching a tag saying “always lock the door” to your key.

Why not, instead, turn this into an habit: “If I do an equality comparison, I double check that there are two equal signs”. That’s not any more difficult to adopt, doesn’t impair readability and has the additional benefit of working in all cases — not just when comparing against constants.

I’m prepared to follow even conventions that look funny if they solve a real problem but I don’t think that this convention does: today, most compilers produce warnings when you make this kind of mistake (for GCC it’s -Wparenthesis, which is part of -Wall).

In C, there are numerous ways to make silly mistakes (in C++ there are many, many more). Instead of using questionable practices that address one tiny area, it is much better to enable all compiler warnings or — even better — use static code checkers like PC-Lint. In my opinion, that’s the most worthwhile habit for safer coding of all.