Last week was a sad week for me. A bug in my code made it into the final version that was shipped to an important customer.
When something like this happens, it is almost always due to fact that there is a higher-level “bug” in the software development process, but I don’t want to go there. Instead, I want to focus on the technicalities. Once more, I got bitten by another instance of a dangerously confusing interface. Let me explain.
I’ve always had a liking for interfaces that are self-evident; that is, one knows immediately what’s going on by just looking at how the interface is used — without having to consult the documentation. Let me give you a counter example, an interface that is far from what I desire:
1 2 3 |
sort_temperatures(values, values_count, true); |
The interface to ‘sort_temperatures’ is not at all self-explanatory. What is clear is that it sorts ‘values_count’ values, which are obviously temperature values, but what the heck does the ‘true’ argument stand for? In order to find out, you have to look at the declaration of ‘sort_temperatures’ and/or its API documentation:
1 2 3 4 5 6 7 8 9 10 |
/** Sorts an array of temperature values. * @param values Begin of temperature values array. * @param values_count Total number of temperature values. * @param ascending If true, sort values in ascending order; * if false, sort in descending order. */ void sort_temperatures(double* values, size_t values_count, bool ascending); |
Now it is clear what the boolean parameter is for, but at the cost of having to take a detour. Maybe you got so distracted by this detour that you forgot what you originally were about to do.
Programmers often make this mistake when designing interfaces. They use boolean parameters when they have two mutual exclusive modes of operation. This is easy for the implementer of the routine, but confusing to the ones who have to use it. Contrast this with this alternative:
1 2 3 4 |
sort_temperatures(values, values_count, SORT_MODE_ASCENDING | SORT_MODE_REMOVE_DUPLICATES); |
By replacing the boolean parameter with symbolic constants you not only make the code more readable, you also open it up for future extension: adding more modes becomes straightforward.
Now have a look at this code and try to guess what it does:
1 2 3 4 5 6 7 8 |
uint8_t hash[HASH_SHA256_LEN]; size_t hash_len; if (!calc_hash(HASH_SHA256,mydata, mydata_len, hash, &hash_len) || hash_len != HASH_SHA256_LEN) { LOG << "Hash computation failed" << endl; return false; } |
That’s fairly simple: ‘calc_hash’ calculates a SHA-256 checksum over ‘mydata’ (which is ‘mydata_len’ bytes in size) and stores it into the provided buffer ‘hash’. The length of the hash is stored to ‘hash_len’ via call-by-reference.
But is this code correct? The answer is — you guessed it — no. If you can’t spot the bug, you are in good company. You can’t see it with just the information given. ‘calc_hash’ interface is not self-evident.
I wrote this code more or less one year ago. It contains a bug that remained dormant until the product was in the hands of our customer. And it is there because of a silly interface.
The last (pointer) parameter ‘hash_len’ actually serves as both, an input AND output parameter. When you call ‘calc_hash’ it is expected that ‘*hash_len’ contains the size of the provided ‘hash’ buffer; on return ‘*hash_len’ will contain the actual number of bytes used by the hash algorithm; that is,the size (or length) of the SHA-256 checksum stored in ‘hash’. The whole idea behind this is that ‘calc_hash’ (or rather its author) wants to offer protection against buffer overruns — for cases where the provided ‘hash’buffer is not large enough to accommodate the checksum.
So the problem here is that ‘hash_len’ (being a stack variable) is not properly initialized to ‘HASH_SHA256_LEN’; it’s value is more or less arbitrary. If it is by chance greater or equal to 32 (the value of the ‘HASH_SHA256_LEN’ symbolic constant) everything is fine and the checksum is correctly calculated. If it is not, ‘calc_hash’ returns ‘false’ and an error is reported.
For as long as a year — by sheer coincidence — ‘*hashLen’ was never below 32 (which is not that unlikely, given that ‘size_t’ can accommodate values ranging from 0 to 4,294,967,296); but in the hands of the customer — and very much in line with Murphy’s Law — it happened.
OK, accuse me of not having initialized ‘*hashLen’ properly, accuse me of not having read the API documentation carefully. Maybe I did read the API documentation and then I was interrupted. I don’t know. But what I know for sure is that this bug would have never happened if the interface had been clearer.
The first problem with ‘calc_hash’ is that ‘hash_len’ is an IN/OUT parameter, which is unusual. I’m not aware of any function in the C/C++ standard library (or the POSIX libraries) which makes use of IN/OUT parameters. Since the input value (not just the output value) is passed by reference, neither the compiler nor static analysis tools like PC-Lint are able to detect its uninitialized state. One obvious improvement is to pass the length of the buffer by value:
1 2 3 4 |
bool calc_hash(HASH_TYPE type, const void* src, size_t src_len, void* hash, size_t hash_buf_len, size_t* hash_len); |
Granted, there is now one more argument to pass (‘hash_buf_len’), but if the unlucky programmer ever forgets to initialize it, the compiler will issue a warning.
But let’s not stop here. I’d like to pose the following question: what good is the hash buffer length check, anyway?
In my view, it is not at all necessary. The length of a particular checksum is constant and known a priori — that’s why the hash buffer is allocated statically like this:
1 2 3 |
uint8_t hash[HASH_SHA256_LEN]; |
What additional benefit does a developer get by providing the same information again to the ‘calc_hash’? Isn’t this redundancy that just begs for consistency errors?
And what use is the output value that tells how long the checksum is? Again, this should, no it MUST be known a priori, there should never be a mismatch between what the caller of ‘calc_hash’ expects and what is returned. Of course, there can be error conditions but if ‘calc_hash’ fails, it should return ‘false’ and not a length different to what the caller expects.
Note that ‘calc_hash’ is not at all comparable to functions in the C API that add an additional output buffer length parameter like ‘strncpy’ or ‘snprintf’. These functions carry the length parameter for completely legitimate safety reasons because the total length of the output is usually not known a priori (for instance, as some input may stem from a human user and one has little control over how many characters (s)he will enter).
Based on these arguments, I dismiss the ‘hash_len’ parameter altogether and propose the following simplified interface:
1 2 3 |
bool calc_hash(HASH_TYPE type, const void* src, size_t src_len, void* hash); |
and would use it like this:
1 2 3 4 5 6 7 |
uint8_t hash[HASH_SHA256_LEN]; if (!calc_hash(HASH_SHA256, mydata, mydata_len, hash)) { LOG << "Hash computation failed" << endl; return false; } |
Easy to read, hard to get wrong. It is impossible to forget to initialize a variable that just isn’t there.