Category Archives: Code

Dangerously Confusing Interfaces III

Just like the other “Dangerously Confusing Interfaces” posts, this one was also inspired by a real-world blunder that I made.

Here’s the background: usually, routines that accept data via a pointer from the caller either execute synchronously or copy the data into their own internal data structures for later processing. Take the venerable ‘fwrite’ from the C standard library as an example:


size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

‘fwrite’ blocks until the data has been written, either to disk or to an internal buffer. In either case, once ‘fwrite’ returns, it doesn’t care about the original data anymore. That’s why it’s safe (and common practice) to pass a pointer to a local buffer on the stack:


void LogHello() {
    const char message[] = "Hello";
    fwrite(message, sizeof(message), 1, g_log_file);
}

void LogHello() {

const char message[] = "Hello";

fwrite(message, sizeof(message), 1, g_log_file);

}

All standard library and POSIX APIs behave like ‘fwrite’, which is both, safe and convenient. However, with embedded systems, the story is different: in some cases, memory is so tight that additional buffers/internal storage can’t be afforded. Such functions don’t copy the provided data but only store a pointer to your data and expect the memory pointed-to by this pointer to be still valid long after the function call has returned. Here is an example from the AUTOSAR standard, which is used by almost all embedded automotive products:


Std_ReturnType NvM_WriteBlock(NvM_blockIdType blockId, const uint8* data);

Std_ReturnType NvM_WriteBlock(NvM_blockIdType blockId, const uint8* data);

‘NvM_WriteBlock’ is used to store data to a given non-volatile memory block. However, what this function does is only enqueue a request for the given block ID together with the data pointer (not a copy of your data). This is done for the sake of efficiency, because there can be multiple write requests in parallel. The queue is later processed in another task, long after any local buffer would have been removed from the stack.

Passing a pointer to a buffer with automatic storage is an easy mistake to make, especially since such “non-copy” interfaces are so rarely encountered. How can “write-like” interfaces that don’t make a copy of the provided data be made safer, such that misuse is less likely? Obviously, just adding documentation is not enough — nobody reads documentation, especially in the heat of the moment.

In my view, the root of the problem is that such functions accept just about any pointer. What if the caller was forced to explicitly cast the pointer to another type? A type with a cunningly chosen typename, one that reminded the caller of the potential pitfall? Here is my approach:


typedef struct {
    char dummy;
} uncopied_memory;

void SomeWritelikeFunction(const uncopied_memory* data, size_t len);

typedef struct {

char dummy;

} uncopied_memory;

void SomeWritelikeFunction(const uncopied_memory* data, size_t len);

Whenever a pointer is passed to this function, developers have to write something like this to make the compiler happy:


SomeWritelikeFunction((uncopied_memory*) data, sizeof(data));

SomeWritelikeFunction((uncopied_memory*) data, sizeof(data));

Typing ‘uncopied_memory’ should shake up even the most focused developers and remind them to double-check what they are passing into this function.

Of course, within ‘SomeWritelikeFunction’, the provided pointer needs to be cast back into something more useful, like a ‘const uint8_t*’. Further, note that the ‘dummy’ member within ‘uncopied_memory’ must not be used; it only exists to make sure that the cast to ‘uncopied_memory*’ in the calling function is safe: a pointer to a struct is aligned such that it is aligned with the struct’s first member, which is ‘char’ and ‘char’ has the weakest alignment requirements.

Cutting-up The Perfect Sausage

In a previous post I wrote about the benefits of frequent local commits and publishing them as a perfect sausage; that is, a single, perfect commit. Today, I want to slightly adjust my advice.

It seems obvious that perfect sausages make it easier to track and review changes to a code base, but that’s not necessarily always the case. Let me explain.

Imagine a typical scenario. Developer Joe needs to implement a small feature. While coding away, he suddenly realizes that the existing code is not really prepared for the new functionality: some rework on the class hierarchy is necessary. Being this clean coder that he is, he also renames quite a few really bad identifier names, factors out some common functionality and improves upon some missing comments here and there. Eventually, he implements the feature, codes a little, tests a little, refactors even more and finally releases everything (squashed together) as a perfect sausage.

Imagine further that busy developer Sue has been given the task to peer-review what Joe implemented. Naturally, she’s got a lot of other work already on her plate and when she takes a look at Joe’s changeset, she is shocked: to implement this little feature, Joe changed 800+ lines of code, much more than the 50 lines she’d expected. Sue has to learn the new class hierarchy and skim many editorial changes. When she finally reaches the meat of Joe’s change, her attention will have decreased to a level where she might overlook that Joe introduced that nasty off-by-one error (both in the implementation and the unit tests, of course).

I believe it would have been much easier for Sue if Joe had released his initial refactorings (possibly distributed over many small local commits) as an atomic “refactoring sausage” first, followed by a smaller atomic “functionality sausage”. Then, she could have focused on either change at a time, or even skip the refactoring altogether and only review the functional modifications. Other developers could learn about the new class hierarchy by just looking at the refactoring commit — without the burden of all the changes that were necessary for the new feature.

Another benefit of doing larger refactorings in a discrete step is that the overall risk is reduced because a) any regressions will likely be found by the continuous integration machinery and b) developers working in the same area are less likely to get merge conflicts.

Differentiating between refactoring and functional commits requires discipline, though. If you find out during feature development, that some refactoring is necessary first, you just stash your current changes, do the refactoring, deploy the refactored code, pop your original changes back and continue working on the feature. But in my view, the overall benefit outweighs these little inconveniences.

Some practical advice: First, don’t get religious about it! If only trivial refactorings are required (like changing a name/comment here and there), it’s probably not worth the effort. Instead, do it as-you-go, or, for extra credit, as a “refactoring sausage” after you have released the “functional sausage”. Second, it might be useful to establish a convention that makes it clear from the commit message what category a particular commit belongs to. Here is an example that discriminates at a fine level of granularity:

Category	Tag	Meaning
Editorial	EDI	Editorial changes, executable stays binary identical
Refactoring	REF	Structural changes, but no change in behavior
Functional	FUN	Functional changes
Test	TST	Changes in unit tests
Tuning	TUN	Improved efficiency, no change in behavior

With these tags, a commit log could look like this:


2016-09-07 Joe Coder  [REF] Extended class hierarchy as prep step for "flux capacitor" feature.
2016-09-07 Joe Coder  [REF] Various cleanup 
...
2016-09-08 Joe Coder  [FUN] Implemented "flux capacitor" feature.
...
2016-09-12 Joe Coder  [TST] 100% code coverage for "flux capacitor".

2016-09-07 Joe Coder [REF] Extended class hierarchy as prep step for "flux capacitor" feature.

2016-09-07 Joe Coder [REF] Various cleanup

...

2016-09-08 Joe Coder [FUN] Implemented "flux capacitor" feature.

...

2016-09-12 Joe Coder [TST] 100% code coverage for "flux capacitor".

The take-away is this: big commits that mix major refactorings and functionality are often detrimental to traceability and make reviews harder. As a general rule, don’t mix significant functional and non-functional changes — a sausage is easier to digest if it’s consumed in smaller pieces.

Using the C Preprocessor to Perform Compile-time Computations

Sometimes, it is desirable to perform computations already at compile-time — either for efficiency or to avoid redundancy. Alas, what a compiler can compute at compile-time is rather limited — mostly just a combination of unary and binary operators. What if you need to do something more complex?

For the sake of illustration, I chose computing the (floored) log2 of an integer as an example, but the techniques presented below can be easily adapted to other use cases:


const int MY_CONST = STATIC_LOG2(1234); // Compute log2(1234)
                                        // at compile-time

const int MY_CONST = STATIC_LOG2(1234); // Compute log2(1234)

// at compile-time

In C++ — provided you’re brave enough — you can always resort to template metaprogramming:


template<int value, int shift>
struct static_log2_impl {
    enum { result = value & (1 << shift) ? shift : 
        static_log2_impl<value, shift - 1>::result 
    };
};

template<int value>
struct static_log2_impl<value, -1> {
    enum { result = -1 };
};

#define STATIC_LOG2(value)\
    (static_log2_impl<value, sizeof(int) * CHAR_BIT - 1>::result)

template<int value, int shift>

struct static_log2_impl {

enum { result = value & (1 << shift) ? shift :

static_log2_impl<value, shift - 1>::result

};

template<int value>

struct static_log2_impl<value, -1> {

enum { result = -1 };

};

#define STATIC_LOG2(value)\

(static_log2_impl<value, sizeof(int) * CHAR_BIT - 1>::result)

But since template metaprogramming was not deliberately built into C++ but rather discovered by accident, template metaprogramming code is far from pleasant to look at. If you are lucky and your compiler supports C++11 (or rather C++11’s constexpr feature), you have a better option:


constexpr int static_log2_constexpr_impl(int value, int shift) {
    return shift == -1 ? -1 : 
        value & (1<< shift) ? shift :
            static_log2_constexpr_impl(value, shift - 1);
}

#define STATIC_LOG2(value)\
    (static_log2_constexpr_impl(value, sizeof(int) * CHAR_BIT - 1))

constexpr int static_log2_constexpr_impl(int value, int shift) {

return shift == -1 ? -1 :

value & (1<< shift) ? shift :

static_log2_constexpr_impl(value, shift - 1);

}

#define STATIC_LOG2(value)\

(static_log2_constexpr_impl(value, sizeof(int) * CHAR_BIT - 1))

It’s still recursive, but at least this solution is using real functions and not structs to achive its goal — much easier on the eyes!

But what if you code at the other end of the spectrum? What if you are limited to plain C?

Many years ago, I discovered the following technique that has proven useful in quite a few situations; it is used like this:


#define STATIC_LOG2_ARG 89
#include static_log2.h
char myarray[STATIC_LOG2_VALUE]; // sizeof(myarray) == 6

#define STATIC_LOG2_ARG 89

#include static_log2.h

char myarray[STATIC_LOG2_VALUE]; // sizeof(myarray) == 6

The “argument” is passed via a symbolic constant (STATIC_LOG2_ARG), the computation is done by “calling the function” (by including static_log2.h) and the “return value” is stored in another symbolic constant (STATIC_LOG2_VALUE).

Here’s an excerpt of what’s contained in the static_log2.h header file:


#if STATIC_LOG2_ARG & (1 << 31)
#define STATIC_LOG2_VALUE 31

#elif STATIC_LOG2_ARG & (1 << 30)
#define STATIC_LOG2_VALUE 30

#elif STATIC_LOG2_ARG & (1 << 29)
#define STATIC_LOG2_VALUE 29 

#elif STATIC_LOG2_ARG & (1 << 28)
#define STATIC_LOG2_VALUE 28 

...

#elif STATIC_LOG2_ARG & (1 << 2)
#define STATIC_LOG2_VALUE 1

#elif STATIC_LOG2_ARG & (1 << 1)
#define STATIC_LOG2_VALUE 0

#else
#define STATIC_LOG2_VALUE -1 
#endif

#if STATIC_LOG2_ARG & (1 << 31)

#define STATIC_LOG2_VALUE 31

#elif STATIC_LOG2_ARG & (1 << 30)

#define STATIC_LOG2_VALUE 30

#elif STATIC_LOG2_ARG & (1 << 29)

#define STATIC_LOG2_VALUE 29

#elif STATIC_LOG2_ARG & (1 << 28)

#define STATIC_LOG2_VALUE 28

...

#elif STATIC_LOG2_ARG & (1 << 2)

#define STATIC_LOG2_VALUE 1

#elif STATIC_LOG2_ARG & (1 << 1)

#define STATIC_LOG2_VALUE 0

#else

#define STATIC_LOG2_VALUE -1

#endif

In the C++ examples, iteration is done using recursion, but here everything is unrolled/inlined.

For another case where this approach is employed, checkout this post. You probably don’t need this technique very often, but it’s good to have it in your bag of macro tricks.

Bug Hunting Adventures #11: Bad Weather

“It is only in sorrow bad weather masters us;
in joy we face the storm and defy it”
— Amelia Barr

Imagine a weather monitoring system where environmental data is collected by various sensors and distributed via messages to other components for further processing.

In the code below, produce_env_measurement() represents a task that constantly produces messages containing various environmental measurements while another task (represented by process_env_measurement()) consumes them. To ensure data integrity, a Fletcher-16 checksum is appended to every message, but the application nevertheless doesn’t work reliably. Where’s the bug?

Code
Solution

Distrusting Experts

“You don’t know what you don’t know until you know it.”
— anon

I recently wrote a post about Scott Meyers, one of my programming heroes. The one I’m writing about today ranks even higher: Michael Abrash. For decades, I have admired his ability to craft awesome, non-obvious, super-tight code. But he’s not just an excellent performance programmer; he is also an execellent writer and storyteller.

It’s no coincidence that he’s also the author of one of my favorite programming books: “ZEN of Code Optimization” [1]. Released in 1994, it predates the Internet age (lo and behold, it comes with a 3.5″ floppy disk), but most of the content is timeless advice for every developer who cares about performance. (We all should. Michael’s rule #1: “From the user’s perspective, performance is fundamental”.)

In chapter 10, “Patient Coding, Faster Code”, he explains that just doing micro optimizations (or recoding in assembly language) usually doesn’t cut it; all you end up with is “fast slow code”. For best results, you need key insights; that is, view the problem from a different angle and use an offbeat approach to solve it.

He presents the venerable “greatest common divisor of two given numbers” problem as a case in point. After toying with a brute-force approach (incrementing a variable and checking if it divides both numbers) he employs Euclid’s key insight, which improves the run-time by orders of magnitude:

gcd(int1, int2) ≡ gcd(int2, int1 % int2)

Or in plain English: the GCD of a larger positive integer ‘int1’ and a smaller positive integer ‘int2’ is the same as the GCD of ‘int2’ and the remainder of ‘int1’ divided by ‘int2’.

This is his original code:


unsigned int gcd(unsigned int int1, unsigned int int2) {
   unsigned int temp;

   /* Swap if necessary to make sure that int1 >= int2 */
   if (int1 < int2) {
      temp = int1;
      int1 = int2;
      int2 = temp;
   }
   /* Now loop, dividing int1 by int2 and checking the remainder,
      until the remainder is 0. At each step, if the remainder isn't
      0, assign int2 to int1, and the remainder to int2, then
      repeat */
   for (;;) {
      /* If the remainder of int1 divided by int2 is 0, then int2 is
         the gcd */
      if ((temp = int1 % int2) == 0) {
         return(int2);
      }
      /* Make int2 the larger integer and the remainder the
         smaller integer, and repeat the process */
      int1 = int2;
      int2 = temp;
   }
}

unsigned int gcd(unsigned int int1, unsigned int int2) {

unsigned int temp;

/* Swap if necessary to make sure that int1 >= int2 */

if (int1 < int2) {

temp = int1;

int1 = int2;

int2 = temp;

}

/* Now loop, dividing int1 by int2 and checking the remainder,

until the remainder is 0. At each step, if the remainder isn't

0, assign int2 to int1, and the remainder to int2, then

repeat */

for (;;) {

/* If the remainder of int1 divided by int2 is 0, then int2 is

the gcd */

if ((temp = int1 % int2) == 0) {

return(int2);

}

/* Make int2 the larger integer and the remainder the

smaller integer, and repeat the process */

int1 = int2;

int2 = temp;

}

Needless to say, being that assembly wizard that he is, he goes on and reimplements this routine in assembly language to make it even faster. But as for C, this is probably as fast as it can get. Or is it?

I’ve used this code blindly and almost literally for many, many years. After all, it’s from Michael Abrash, and to me, his code has always been beyond any doubt. Definitely not the right attitude, but let me explain.

I’m a big fan of code katas, little exercises that you do (and redo time and again, sometimes in different programming languages) to flex your programming muscle. I don’t know why, but when I did the GCD code kata this time, I stopped and scratched my head. What the heck is this int swapping business about? Why is this code necessary, as the comment says?

Just for the fun of it, I took it out and reran the unit tests — they all passed.

Actually, it’s not that hard to see why it’s not necessary. Let’s assume ‘int1’ is 366 and ‘int2’ is 60. Since ‘int1’ is greater than ‘int2’, no swapping is necessary. Within the ‘for’ loop, the variables take on the following values:

iteration	temp	int1	int2	return
1	366 % 60	60	6
2	60 % 6	60	6	6

Now if ‘int1’ is 60 and ‘int2’ is 366, the code as it is would swap their values, but let’s pretend there was no swapping code:

iteration	temp	int1	int2	return
1	60 % 366	366	60
2	366 % 60	60	6
3	60 % 6	60	6	6

Aha! There is one more iteration and this additional iteration (actually the very first iteration) swaps ‘int1’ and ‘int2’. The ‘for’ loop by itself takes care of rearranging the arguments, there is no need for the extra swap test at the beginning of the ‘gcd’ routine. I was puzzled. Is it really possible that I have discovered redundant code in Michael Abrash’s sacred code? I felt quite uneasy because I wanted my hero to be infallible.

A couple of days ago, I sent him an email about my findings. His reply was short and to the point:

“Believe me, I never claimed to be immune to cycle-wasting! And anyway, There Ain’t No Such Thing As The Fastest Code :)”

There you have it. It’s that simple. I found unnecessary code in a 20+ years old routine. That’s what I call delayed gratification!

The basic problem was that I blindly trusted what I saw, probably because it came from an expert. However, nobody is perfect, every once in a while even the best make blunders. Have a flexible mind, assume nothing and question everything — this is the right attitude for every software developer and proclaimed in many of Michael’s writings [2].

—

[1] “ZEN of Code Optimization” is available online as part I of “Michael Abrash’s Graphics Programming Black Book, Special Edition”.

[2] Some years ago, I gave a presentation titled “The Art of Writing Efficient Software”. It includes a lot of Michael Abrash’s advice plus more. Check it out here. I truly believe that efficiency is still (and forever will be) of utmost importance for end users.

A Micro Logger for Constrained Systems

Drei Notizzettel vor Holzhintergrund Logging is one of the oldest and most widely used software debugging tools: for decades, developers have put printf-like statements in their code to find out what their code is really doing, in situ. Even today, logging is still crucial for finding bugs in software, especially sporadic problems in concurrent and event-driven systems.

Fortunately, most programming environments come with powerful logging facilities (like Android’s Log) — but what if you are working on a highly-constrained system that simply can’t afford them?

Let’s have a look at an example that uses printf-like logging:


#include "log.h"

uint32_t readInput() {
    log_info("Enter readInput"); 
    uint32_t sampledValue = IoGetRawInput();
    ...
    if (sampledValue > SAMPLED_VALUE_MAX) {
        log_warn("Sampled value exceeded threshold: %d", sampledValue);
        sampledValue = 0;
    }
    ...
    log_info("Exit readInput");
    return sampledValue;
}

#include "log.h"

uint32_t readInput() {

log_info("Enter readInput");

uint32_t sampledValue = IoGetRawInput();

...

if (sampledValue > SAMPLED_VALUE_MAX) {

log_warn("Sampled value exceeded threshold: %d", sampledValue);

sampledValue = 0;

}

...

log_info("Exit readInput");

return sampledValue;

}

Traditional (printf-like) logging is expensive because:

Log strings consume memory (ROM), which can easily increase the overall size of the executable by 10% to 30%.
printf-based logging is extremely flexible, but the parsing of format strings can incur a significant run-time overhead.
The amount of log data produced at run-time is quite large because all the log strings need to be transmitted (or stored locally, in case you keep logging data in a file), usually accompanied with timestamps and other bookkeeping information.

I recently had a crazy idea how to compress a log string of (almost) arbitrary size to a single byte: Instead of using real (read: expensive) C strings, why not define a constant whose symbol name serves as the “log string” and only log it’s address?


... 
static const uint8_t A_log_string_that_consumes_only_a_single_byte = 0;
mlog_impl(&A_log_string_that_consumes_only_a_single_byte);
...

...

static const uint8_t A_log_string_that_consumes_only_a_single_byte = 0;

mlog_impl(&A_log_string_that_consumes_only_a_single_byte);

...

On the receiving side, you can reconstruct the original log strings from the map file produced by the linker.

Given an appropriate set of magic macros, the example above looks like this:


#include "mlog.h"

uint32_t readInput() {
    MLOG_INFO(Enter_readInput); 
    uint32_t sampledValue = IoGetRawInput();
    ...
    if (sampledValue > MAX_SAMPLE_VALUE) {
        MLOG_WARN_VALUE(Sampled_value_exceeded_threshold, sampledValue);
        sampledValue = 0;
    }
    ...
    MLOG_INFO(Exit_readInput); 
    return sampledValue;
}

#include "mlog.h"

uint32_t readInput() {

MLOG_INFO(Enter_readInput);

uint32_t sampledValue = IoGetRawInput();

...

if (sampledValue > MAX_SAMPLE_VALUE) {

MLOG_WARN_VALUE(Sampled_value_exceeded_threshold, sampledValue);

sampledValue = 0;

}

...

MLOG_INFO(Exit_readInput);

return sampledValue;

}

It may be hard to believe, but these three “log strings” really only consume three bytes of memory. To send out their addresses, four bytes are needed (assuming 32-bit addresses are used) and ‘sampledValue’ requires the transmission of another four bytes. Let’s compare these two approaches assuming that ‘sampledValue’ is 42 (two digits) and taking into account that the size of a C string is the number of characters plus one for a trailing ‘\0’:

	footprint (bytes)	transmission (bytes)
traditional logging	16 + 37 + 15	16 + 37 + 15
MLOG logging	1 + 1 + 1	4 + 4 + 4 + 4
savings %	96	76

That’s what I call an improvement!

I’ve implemented this idea for the GCC toolchain. The log server emits MLOG traces via UDP, which are received by a little tool that converts them back into human-readable log messages by exchanging MLOG symbol addresses with (beautified) MLOG symbol names from the linker map file. Extracting the MLOG symbols from the map file is the most GCC-dependent part. I believe it is a straightforward task to port MLOG to other platforms and toolchains.

Check it out at GitHub.

EPILOGUE:
This is my 100th blog post (actually 101st, damn)! Thanks, faithful reader, for bearing with me through all these years.
— Yours Truly

Approxion

Code – People – Everything