Category Archives: C/C++/Embedded

Circular Adventures II: A Ring within a Ring

“Is all that we see or seem
But a dream within a dream?”
–Edgar Allan Poe

In part one of this series I’ve discussed the mod operator and demonstrated that it is a flexible tool for solving circular problems. I’ve also explained that the mod operator, as implemented by most (system) programming languages, is not able to handle problems involving negative dividends. As a remedy, I’ve introduced an efficient alternative of the mod operator for divisors that are base-2 numbers.

Today, I want to have a look at some first circular use-cases. Again, we meet our good old friend, the ring buffer.

Sometimes, you need to store data and keep previous versions of that data, but the memory available is limited. In such situations, an obvious approach is to use a ring buffer.

In this example, we have a ring buffer (again, pretend that the last and first element are joined) comprising eight (N = 8) fixed-size slots for storing information. Each slot contains a data set. We start by writing to slot 0, then slot 1 and so on; when the ring buffer is full (that is, slot 7 has been written to), we wrap around to slot 0 and overwrite what has been stored there previously, thus kicking out the oldest data set. By using this scheme, we not only have access to the most recent data set, but also to up to seven predecessors.

In addition to the actual ring buffer, we need a variable to keep track of the most-recently updated (“MRU”) slot number; that is, an index that points to the “newest” data set. Based on the current value of the MRU we can easily calculate which slot is to be updated upon the next write access:

Updating the ring buffer is just one part of the story, reading another. For instance, to get the predecessor of the current data set we need to calculate the slot index based on the previous value of MRU. Again, by using the mod operator, this can be achieved easily:

Dumping the whole ring buffer in reverse chronological order is achieved with this simple loop:

Note that for this to work, the mod operator has to be a mathematical modulo operator since the dividends may become negative. If you don’t have access to a true mathematical mod operator, you should make N a base-2 number and use the modulo replacement operation given in the previous installment (mod_base2).

Now let’s make our ring buffer example a bit more interesting. Let’s pretend our ring buffer must be stored in persistent, non-volatile memory (NVM), like flash. One problem with flash memory is that it wears out over time: the more often you update flash memory, the shorter the data retention time gets. (It’s like using a bucket with a crack to carry water from one place to another and the crack gets bigger with every lap you go.)

Therefore, storing the most-recently updated value in a single flash variable is not a good idea, since this memory location would wear-out quickly: every update to a ring buffer would require an update to the MRU variable, which means that the update load on the MRU variable is N times the update load on the individual slots.

A frequently used countermeasure is to have a “distributed MRU variable”, which is achieved by prefixing or suffixing every slot with a so-called “age counter” field.

The age counter starts at zero and is incremented with every write to a slot. So after three updates to an initially empty ring buffer, we would have this:

After ten updates, the ring buffer would look like this:

The slot containing the age counter with the highest value must be the MRU slot, slot 1 in our example. For obvious reasons, the value range of the age counter must be larger than the slot range of the ring buffer, but sooner or later, even the age counter will wrap around. After many writes to the ring buffer we might get this picture:

So which slot is the MRU slot now? The age counter is obviously an unsigned 8-bit type, since it wrapped around at value 255. Starting with 253 (the age counter value of slot 0) and taken modulo 256, all counter values increase by one up to 3. But the next value, 252, is not a successor of 3; it is too far ahead (or rather behind). This means that the slot containing age counter value 3 must be the most recently updated slot (which is 6).

The MRU is the basis for all ring buffer operations and it makes sense to maintain its value in a RAM variable. But how can one determine the MRU programmatically, for instance after a reset when all RAM content is lost? Here is a method to search for it:

It is a good exercise to convince yourself that this algorithm actually works. Think about the initial “empty” ring buffer case where all age counters are zero as well as the case where the “highest” age counter is stored in the last slot.

Luckily, the dividends in this algorithm cannot become negative, so if we wanted to code this in C, we could use the ‘%’ operator. But since we are doing a modulo 256 operation, we can apply the optimization given in the first installement, which is replacing the modulo 256 operation with a (uint8_t) cast:

Once we are in possession of the MRU, we can work with this ring buffer just like we did with the previous ring buffer. The only thing we have to do in addition is to memorize the highest (or current) age counter value for future updates to the ring buffer:

Finding the MRU in our second ring buffer was not particularly difficult — we just needed to compare the age counters until we found a “discontinuity”. We knew that the slots are updated, one after another, cyclically, with increasing age counter values. Things get more complicated when there is no such strict sequential ordering. Let me explain.

Imaginge you want to keep track of when certain events occur. For each kind of event, you would store a timestamp from a 32-bit timer source that indicates when this particular event occurred:

Further, assume that the 32-bit timer can wrap around (otherwise we wouldn’t have a circular problem), just like our 8-bit age counter wrapped around in our ring buffer example. How would you know which event occurred first or last? How would you calculate the relative time between two events? This and more is the topic of the next part of this series. Stay tuned…

More circular adventures…

Circular Adventures I: The Modulo Operation

“All things from eternity are of like forms and come round in a circle”
–Marcus Aurelius, AD 121-180

In computer science, just like in real life, many things go round in circles. There are ring buffers, circular lists, wrap-around counters and timers, just to name a few. In this multi-part article, I want to explore the theory behind circular behavior. Equipped with this knowledge, I attempt to provide solutions to “recurring” problems.

Let’s start with the very basics; that is, the movement of an index within a ring buffer.

Assume you have a ring buffer containing N = 10 elements and an index i:

(For simplicity, I depict circular structures as a flat arrays — just imagine the first and last element are joined to form a ring.)

Question: How do you advance the index by a signed offset n, taking wrap-around into account?
Answer: i = (i + n) mod N

Many circular problems like ring buffer operations can be elegantly expressed by using the ‘mod’ operator. Alas, only theoretically, as in practice, there are at least two problems surrounding the ‘mod’ operator: portability and efficiency.

The ‘mod’ operator, as it is used in the equation above, is the mathematical ‘mod’ operator — the one that is used in number theory. Programming languages, however, sport many different flavors of the ‘mod’ operator, which vary in the way they treat negative operands.

All of them obey this equation:

Yet the results for negative operands depend on whether the ‘div’ operator rounds towards zero or negative infinity. If the ‘div’ operator rounds towards zero, the expression -5 div 2 yields -2; if it rounds towards negative infinity the result is -3.

The following example illustrates how this influences the calculation of -3 mod 8:

This means that -3 mod 8 might be -3 or +5 depending on the style the implementers of your programming language have chosen. Some languages (like Ada) have even two modulo operators (rem and mod), while others allow to control the behavior at run-time (Perl: ‘use integer’). Still others (like C90 and C++98) leave it as ‘implementation-defined’. Have a look at this for a nice overview on how different programming languages implement the modulo operator.

(And just in case you haven’t guessed it already: C/C++/Java’s approximation of ‘mod’ is ‘%’ and ‘div’ is ‘/’.)

Now, if you only travel through a circle in positive direction (by adding positive offsets), either rounding style will do; however, if you intend to travel backwards (or calculate differences between indices that yield negative values), only the ‘negative-infinity’ modulus operator will do the job; that is, wrap (in the 10 element ring buffer example) from -1, -2, -3 … to 9, 8, 7 …, respectively.

How about efficiency? As can be seen in the table above, the calculation of the remainder is based on a division operation, which is, — alas — a rather expensive operation on many platforms. On most processors, the ‘div’ operation is many times slower than other primitive operations. As an example, some popular ARM processors don’t even have a ‘div’ instruction, so division is done through a software library which consumes up to 40 cycles. Compare this to the 5 cycle multiplication instruction and the other instructions that typically execute in a single cycle.

Due to these portability and efficiency issues, many developers shun the modulo operator in performance-critical code. Instead of

they use the computationally equivalent

or, if it guaranteed that our index is not more than N – 1 off the ends

But this approach is still not very efficient, since it uses branches and on modern multi-stage pipelined processors a branch might invalidate already prefetched and preprocessed instructions. Isn’t there a true mathematical ‘mod’ operator out there that is also efficient at the same time? You bet there is, but only if N happens to be a base-2 number.

If you are lucky and N is a base-2 number (like 64, 1024, or 4096) i mod N is computationally equivalent to

where ‘and’ is the binary and operator (‘&’ in C, C++, and Java). This works even if i is negative, but requires that your environment stores negative numbers in two’s complement fashion, which is the case for pretty much all systems that you will ever program for.

As an example, consider -2 mod 16. -2 is 0xFFFFFFFE in 32-bit two’s complement notation and 16 – 1 corresponds to 0x0000000F:

which is 14, exactly, what a mathematical ‘mod’ operator would yield.

The ‘and’ operation is easy to compute for any processor. In C/C++ we might define an optimized ‘mod’ function like this:

It is a good idea to use this optimization whenever possible. Sometimes, it makes even sense to round-up the size of circular structures to a base-2 boundary, just to be able to use this kind of optimization.

A variant of this theme is casting a (potentially negative) value to an unsigned type in C/C++. Casting x to an uint8_t is equivalent to calculating x mod 256. While most optimizing compilers will generate the same code for x & 0xFF and (uint8_t)x, there is a certain likelihood that the latter might be a bit faster. The obvious disadvantage of casting to unsigned is, that this approach practically limits N to the value range of uint8_t, uint16_t, and uint32_t, which is 256, 65536, and 4294967296, respectively. Because of the fact that the performance gain in only hypothetical, it is usually much wiser to go for the ‘mod_base2’ optimization, though.

This concludes my first installment, which is mainly about some of the many facets of the ‘mod’ operator. Just like the constant PI appears in all circular problems in mathematics, some variant of the ‘mod’ operator appears in all circular problems in computer science. Next time, I will explore what it means to have two indices into a circular structure, which turns out to be the foundation of many interesting circular use cases.

More circular adventures…

Using PC-Lint in a Linux Environment

PC-Lint is my favorite non-FLOSS tool. Not only does it find bugs and portability issues early in the development cycle: by using it regularly and listening to its words developers can significantly improve their C/C++ programming skills.

This post details how to run PC-Lint (which is normally intended for DOS/Windows environments) in Linux, saving developers from having to buy FlexeLint, the much more expensive Unix/Linux version.
Continue reading

Habits for Safer Coding

bear.jpgIf you have been coding in C and C++ for some time, you know that it is easy to introduce bugs: type in a single wrong character and you wreak havoc. It is therefore a good idea to avoid using certain (dangerous) language features or at least using them in a way that is safer.

My favorite example is always putting braces around blocks, even if the block only consists of a single statement:

I’ve witnessed cases where really experienced programmers omitted these two “redundant” characters for brevity only to be later faced with this classic bug:

It is so easy to make this mistake — especially during merging — that MISRA (Guidelines for the Use of the C++ Language in Critical Systems) made curly braces around if/else/do/while blocks a “required rule”. Note that by following this rule, you are also protected from “dangling else” mistakes:

In general, I follow such “best practices” if they meet these criteria:

– They must solve a real problem
– They must not require any thinking

The “always-put-braces-around-blocks” rule does solve a real problem: I’m not aware of any compiler (not even Sun’s Java compiler) that warns you about the mistake shown above. (Of course, static code analysis tools (like PC-Lint) do check for these errors, and many, many more, but unfortunately only few developers use them.)

Does it require any thinking? I don’t think so. The only thing I have to remember is this: whenever I use ‘if’, ‘else’, ‘for’, ‘while’, ‘do’ I put curly braces around my code. That’s all!

Maybe you have come across this convention:

Advocates of this style reason as follows: in C/C++ you can easily — accidentally — use an assignment operator when you actually wanted to use an equality operator. Since you cannot assign to a constant, the compiler will refuse to compile your buggy code:

I must admit that I’ve never liked this convention. One reason is that it looks unnatural. Another is that it lacks symmetry; that is, it is not consistent with other comparison operators. What if you need to change the comparison some day to also include values greater than MAX_RETRIES?

Or what if MAX_RETRIES is changed from a compile-time constant to a “normal” variable whose value is obtained from a config file? Do you swap then? In fact, this convention doesn’t provide any help for mistakes in variable/variable comparisons, which occur also very frequently in day-to-day programming:

For my taste, the utility of this convention is rather limited and it requires too much thinking: “If I compare a compile-time constant (or a literal) against a variable using the equality operator, I put the constant first”. Isn’t this a clear hindsight rule? It’s a bit like attaching a tag saying “always lock the door” to your key.

Why not, instead, turn this into an habit: “If I do an equality comparison, I double check that there are two equal signs”. That’s not any more difficult to adopt, doesn’t impair readability and has the additional benefit of working in all cases — not just when comparing against constants.

I’m prepared to follow even conventions that look funny if they solve a real problem but I don’t think that this convention does: today, most compilers produce warnings when you make this kind of mistake (for GCC it’s -Wparenthesis, which is part of -Wall).

In C, there are numerous ways to make silly mistakes (in C++ there are many, many more). Instead of using questionable practices that address one tiny area, it is much better to enable all compiler warnings or — even better — use static code checkers like PC-Lint. In my opinion, that’s the most worthwhile habit for safer coding of all.

When bytes bite

rulers.jpgNot long ago I had a discussion with a friend who had just bought a new 2 TB hard disk. He complained that those hard disk manufacturers always cheat: they sell you 2 TB but in the end the drives only have 1.8 TB of memory available. For them, a kilobyte comprises only 1000 bytes and not 1024, as for the rest of mankind.

We use the term “byte” everyday but it is surprising how man developers don’t know exactly what a byte is. “Byte” — and especially byte quantifiers like kilo, mega, and giga — seem to be surrounded by many misuses and misconceptions.

Traditionally, a byte is a collection of bits used to encode a single character for a system. It could be 4, 7, 8, 9, or any other number that the designer of a system happens to choose. This is the main reason for the CHAR_BITS symbolic constant in ISO C’s limits.h: it specifies precisely how many bits there are in a character (char).

Today, of course, we can safely assume that a byte comprises exactly 8 bits, but it is important to note that this is no universal, standardized definition. That’s why the term “octet” is used in RFCs and ASN.1: an octet is defined to be always 8 bits.

But what the heck is a kilobyte? Is it 1024 bytes? Or 1000? We use byte quantifiers so frequently, but not always correctly.

Some folks use the well known SI prefixes to mean powers of 1024:

While hard disk manufacturers usually have a different definition:

Which makes quite a difference, especially if sizes grow larger: for a 1 TB hard disk you might get about 10% less than you thought you’d paid for…

But don’t blame it on the hard disk manufacturers. They’re right, of course. Using SI prefixes to mean ‘powers of 1024’ is strongly discouraged by the SI institute.

Still, it is sometimes useful to use powers of 1024 but how should this be done without annoying the SI institute, or — more importantly — without confusing people?

Fortunately, there is an ISO standard (with roots back to work done by IEC in 1998) that addresses this problem: ISO/IEC IEC 80000-13:2008.

According to this standard, you use binary prefixes like “kibi” and it’s friends, where kibi is short for “kilo binary”; and instead of using SI’s “k” prefix you use “Ki”. Have a look at this table:

Binary prefixes can (and should) be applied to any other unit when the value is based on 1024: 1 Mib/s is 1024^2 bits per second, while 1 Mb/s is 1000^2 bits per second.

The international system of units is a wonderful achievement of the 20th century. We, especially as developers, should honor it and finally stop misusing its well-defined prefixes and instead use ISO binary prefixes.

Dangerously Confusing Interfaces

confused.jpgDesigning intuitive interfaces that are easy to use and easy to learn is hard, often very hard; and for economic reasons it might not always be possible to strive for perfection. Nevertheless, in my view, at the very least, interfaces should be designed such that obvious, day-to-day usage doesn’t lead to damage.

In his classic book “Writing Solid Code”, Steve Maguire calls confusing interfaces that lead to unexpected bugs “Candy Machine Interfaces”. He tells a story from a vending machine at Microsoft that used to cause him grief: The machine displayed “45 cent” for “number 21”, but after he had finally inserted the last coin he would sometimes enter “45” instead of “21” (and would get a jalapeƱo flavored bubble-gum instead of the peanut butter cookie that he wanted so much — Ha Ha Ha!). He suggests an easy fix: replace the numeric keypad with a letter keypad and no confusion between money and items would be possible anymore.

The other day I did something like this:

My goal was to recursively copy the ‘gamma’ folder to my home folder. What I expected was a ‘gamma’ folder within my home directory, but instead I ended up with hundreds of files from the ‘gamma’ directory right at the top-level of my home directory — the ‘gamma’ directory simply wasn’t created!

I have to confess that similar things sometimes happen to me with other recursive-copy-like tools, too — this seems to be my candy machine problem. Now you know it.

As for ‘rsync’, there is a feature that allows you to copy just the contents of a directory, without creating the directory, flat into a target directory. Granted, this is sometimes useful, but do you know how to activate this mode? By appending a trailing slash to the source directory! That’s what happened in my case. But I didn’t even add the slash myself: if you use Bash’s TAB completion (like I did) a trailing slash is automatically appended for directories…

But good old ‘cp’ puzzles me even more. If you use it like this

it will copy ‘from3’ to a folder named ‘to2’ under ‘to1’ such that both directories (‘from3’ and ‘to2’) will have the same contents, which is more or less a copy-and-rename-at-the-same-time operation. Unless ‘to2’ already exists, in which case ‘from3’ will be copied in ‘to2’ resulting in ‘to1/to2/from3’. Unless, as an exception within an exception, there is already a ‘from3’ directory under ‘to2’; in this case ‘cp’ will copy ‘from3’ flat into the existing ‘to2/from3’ which might overwrite existing files in that folder.

Both, ‘cp’ and ‘rsync’ suffer from fancy interfaces that try to add smart features — which is normally good — but they do it in an implicit, hard-to-guess, hard-to-remember way — which is always bad. Flat copies are sometimes useful but they might be dangerous as they could inadvertently overwrite existing files or at least deluge a target directory. A potential cure could be an explicit ‘–flat’ command-line option.

To me, a wonderfully simple approach is the one taken by Subversion: checkouts are always flat and I’ve never had any problems with it:

This copies (actually checks-out) the contents of the ‘trunk’ flat into the specified destination directory — always, without any exceptions. That’s the only thing you have to learn and remember. There are no trailing backslashes or any other implicit rules. It will also create the target parent directories up to any level, if needed.

Naturally, dangerously confusing interfaces exist in programming interfaces, too. Sometimes the behavior of a method depends on some global state, sometimes it is easy to confuse parameters. The ‘memset’ function from the C standard library is a classic example:

Does this put 40 times the value of 32 in ‘buffer’ or is it the other way around?

I have no idea how many programmers don’t know the answer to this question or how many bugs can be attributed to this bad interface, but I suspect that in both cases the answer must be “way too many”. I don’t want to guess or look up the specification in a manual — I want the compiler to tell me if I’m wrong. Here is an alternative implementation:

Now you write

If you confuse the fill character with the length parameter the compiler will bark at you — a parameter mix-up is impossible. Even though this is more to type than the original (dangerous) interface: it is usually worth the while if there are two parameters of the same (or convertible) type next to each other.

Like I said in the beginning: designing intuitive interfaces is hard but spending extra effort to avoid errors for the most typical cases is usually a worthwhile investment: don’t make people think, make it difficult for them to do wrong things — even if it sometimes means a little bit more typing.