Category Archives: Code

Bug Hunting Adventures #10: For Whom The Bell Tolls

“Then later that night when the ship’s bell rang
Could it be the north wind they’d been feelin’?”

“The Wreck Of The Edmund Fitzgerald”
— Gordon Lightfoot

At my home, I’m using a Raspberry Pi as a watchdog (aptly named “Brutus”) for all kinds of tasks: burglar detection, network intrusion detection, and server monitoring, just to name a few. Still, most of the time, my watchdog hangs around, idling away. That’s not the way I like it, so I’m constantly on the lookout for new jobs that I can assign to Brutus, small or big.

My current plan is to create a little ship’s bell app that emits pleasing bell sounds every 30 minutes, just like it has been done traditionally on all ships since the 16th century: double-strikes for full hours and an additional single-strike for half an hour. But unlike civil clocks, ship’s bells don’t have dedicated indications for every one of the 12 (or rather 24) hours in a day; instead, bell patterns repeat every four hours:

Bell pattern	Time (a.m. and p.m.)
1	12:30	4:00	8:00
2	1:00	5:00	9:00
2 1	1:30	5:30	9:30
2 2	2:00	6:00	10:00
2 2 1	2:30	6:30	10:30
2 2 2	3:00	7:00	11:00
2 2 2 1	3:30	7:30	11:30
2 2 2 2	4:00	8:00	12:00

In this table, a “2” denotes a double-strike whereas a “1” signifies a single-strike of the bell.

The code below is a first draft of my ship’s bell app. It is running as a thread, sleeping most of the time (so you can still call Brutus a lazy dog). When it wakes up, it checks the current local time and determines how many strikes are to be done (‘compute_strikes’). Afterwards, the thread puts itself to rest again. However, I didn’t want to wake it up every second to check the wall time — that would be too inefficient. Instead, I base the sleep time on the temporal distance between now and the next half hour (‘compute_sleep_time’) and sleep for half of this time before checking again.

Alas, my initial implementation comes with a bug and the bell doesn’t work as it is supposed to. Can you spot it? (The bug is in the algorithm — it has nothing to do with any Python language quirks, of course.)

Code
Solution
Ship’s Bell app code at GitHub.

Code Kata 4: Struct Member Alignment

“The whole is greater than the sum of its parts”
— Aristotle

How time flies! More than two years have passed since I posted my last kata, so it’s high time for a new one. Today’s kata is based on a real problem that I had to solve last week when I worked on a code generator that produced C/C++ code. I hope you enjoy this task as much as I did!

When you define a struct in C, the overall size of the struct is not necessarily the sum of its members. Much like in real life, the whole is greater than the sum of the parts. Consider this example:


struct Foo {
    uint8_t a;
    uint32_t b;
};

struct Foo {

uint8_t a;

uint32_t b;

};

Only novice programmers would assume that sizeof(Foo) equals 5; experienced programmers know that 8 is a much safer bet. How come?

Most computer architectures can access data only if it is properly aligned in memory, e. g. a 32-bit (4-byte) integer can only be accessed if it is stored at an address that is evenly divisible by 4. The compiler usually achieves this by inserting invisible padding bytes between the struct members. Thus, internally, struct Foo is likely to look like this:


struct Foo_internal {
    uint8_t a;
    uint8_t _pad1;
    uint8_t _pad2;
    uint8_t _pad3;
    uint32_t b;
};

struct Foo_internal {

uint8_t a;

uint8_t _pad1;

uint8_t _pad2;

uint8_t _pad3;

uint32_t b;

};

As a first step, familiarize yourself with struct padding. Check out this must-read by legendary Eric S. Raymond, especially if you are a systems programmer.

Now that you have read it, you should understand why sometimes there is also trailing padding (but never leading padding) and hence why the size of the following struct is most likely 12:


struct Bar {
    uint8_t a;
    uint32_t b;
    uint8_t c;
};

struct Bar {

uint8_t a;

uint32_t b;

uint8_t c;

};

Equipped with this knowledge we are ready to tackle our first programming task: assume that every primitive struct member of base-2 size is to be aligned on its base-2 boundary (a 2-byte integer on an address that is evenly divisible by 2, a 4-byte integer on an address that is evenly divisible by 4 and so on). Implement an algorithm that computes the overall size of a struct given an ordered list of its members. Instead of real types, provide a list of integer values where the values represent the sizes of the members. Here are examples for Foo and Bar (in Python):


assert(struct_size([1, 4]) == 8)     # Foo
assert(struct_size([1, 4, 1] == 12)  # Bar

assert(struct_size([1, 4]) == 8) # Foo

assert(struct_size([1, 4, 1] == 12) # Bar

One weakness of this approach is that you cannot differentiate between a unit32_t (size is 4, alignment is 4) and an array of 4 uint8_ts (size is 4, alignment is 1):


struct Baz {
    uint8_t a;
    uint8_t b[4];
    uint8_t c;
};

struct Baz {

uint8_t a;

uint8_t b[4];

uint8_t c;

};

Extend your algorithm to accept a list of pairs, where the first pair member specifies the size and the second pair member specifies the alignment:


assert(struct_size([(1, 1), (4, 4)]) == 8)          # Foo
assert(struct_size([(1, 1), (4, 4), (1, 1)] == 12)  # Bar
assert(struct_size([(1, 1), (4, 1), (1, 1)] == 6)   # Baz

assert(struct_size([(1, 1), (4, 4)]) == 8) # Foo

assert(struct_size([(1, 1), (4, 4), (1, 1)] == 12) # Bar

assert(struct_size([(1, 1), (4, 1), (1, 1)] == 6) # Baz

But there is one more feature we need to add before we’re done: support for arbitrarily nested structs:


struct Fuz {
    uint8_t a;
    struct Foo b;
    uint8_t c;
};

struct Fuz {

uint8_t a;

struct Foo b;

uint8_t c;

};

How does a member that itself is a struct impact the alignment of the outer struct? Devise a suitable data structure for passing in nested struct specifications.

[Update 2016-03-12: I’ve uploaded a sample solution to GitHub]

So long Scott Meyers, So long C++?

Last month, Scott Meyers wrote a blog post where he announced that he will withdraw from active involvement in C++.

Scott is famous for at least two things: first, his excellent way of explaining dry technical stuff in an entertaining way (I read all his books on C++, except for “Effective Modern C++” which is still on my to-be-read pile) and second, his He-man like hairdo.

After 25 years of dedicating his life exclusively to C++ he has become tired — tired of a language that gets more and more complex at an seemingly ever-increasing rate, all in the name of backwards compatibility and efficiency. (He didn’t say that, but this is my view.)

Ah, speaking of efficiency, the “e” word.

Just because C++ gives you control over efficiency doesn’t mean that you will get it in the end. Due to a lack of compiler and hardware knowledge, many C++ developers have a wrong (insufficient, at least) notion about efficiency. There are many misconceptions, because they don’t know how compilers, CPUs, or memories work.

One example is not understanding the effects of caching. Many C++ developers blindly trust std::map’s or std::unordered_map’s O(log n) and O(1) promises but there are situations where an O(n) std::vector (or plain C-style array) can be orders of magnitude faster because it accesses memory in a cache-friendly way. There is a nice talk by Scott on YouTube where he gives a good overview about caching and its consequences.

Another common efficiency fallacy is illustrated by this little for loop:


for (uint8_t i = 0; i < arrayLength; ++i) {
    if (array[i] == 42) {
        ...
    }
}

for (uint8_t i = 0; i < arrayLength; ++i) {

if (array[i] == 42) {

...

}

Many developers I’ve met believe that using a ‘uint8_t’ for the loop counter is more efficient than using a plain ‘int’. But what most likely will happen is that by using ‘uint8_t’ the code becomes both, bigger and slower, especially on modern RISC-style processor architectures like ARM and PowerPC. Why? If the value of ‘arrayLength’ is not known at compile-time, the compiler has to create additional code that ensures that ‘i’ wraps around for values greater or equal to 256. Internally, the compiler assigns a 32-bit register to ‘i’ (provided you are targeting a 32-bit platform) and adding 1 to 255 in a 32-bit register is different to adding 1 to 255 in an 8-bit register. Behind the scenes, your compiler rewrites your loop to look like this:


for (int i = 0; i < arrayLength; i = ((i + 1) & 0xFF) {
    if (array[i] == 42) {
        ...
    }
}

for (int i = 0; i < arrayLength; i = ((i + 1) & 0xFF) {

if (array[i] == 42) {

...

}

Granted, in most situations this additional code will not amount to much, but maybe in a low-level driver or some communications stack, situations which systems languages like C++ were made for. But this example shows a problem that many would-be efficiency experts share: for the sake of (false) efficiency, they increase complexity and risk correctness and security. What happens if some day ‘arrayLength’ can be larger than 255? The for loop will loop forever, of course.

So while C++ is a language that has the potential to yield extremely efficient systems, you neither get efficiency automatically nor for free. C++ has a steep learning curve and there are many pitfalls. I truly belief that much of C++’s efficiency is wasted on too many developers. If you don’t need utmost efficiency or don’t know how to put the corresponding language features to best use, better keep away from C++ and use a managed language. You will be much more productive and create programs that are also (probably) more secure by default.

Getting back to Scott Meyers, I must admit that I’m somewhat happy about his decision. Not because he left C++ per se but because he now has time to focus on other important topics — topics that he will explain with the same quality he is renowned for. Like some programmers say: when one curly brace closes, another one opens.

Continuous “Commit”ment

“Strive for continuous improvement, instead of perfection.”
— Kim Collins
Checking in early and often is a well-accepted practice these days. Instead of keeping changes local for an extended period of time, software is continuously integrated and thus the overall integration risk is significantly reduced because every time you check into the central repository, your changes become immediately visible and available to others. Problems show up early, which is for sure a good thing.

On the other hand, your code must have a certain level of quality before you can foist it upon others. At the very least, it must compile without errors. Most likely, it is also required to be in line with other project or coding standards, for instance, code must be free of compiler (or MISRA) warnings. Some projects even demand that every code that is checked-in has been 100% code-coverage tested.

Having such “pre-commit quality gates” is a blessing but it stands in the way of the “commit early, commit often” paradigm: Depending on the size of the task it may take hours (if not days) to meet all check-in criteria. Deferring commits for such a long time would certainly be foolish because check-ins are important for another reason: With every commit you drive in a piton that not only saves you from data loss but also allows you to go back and forth in time.

Fortunately, with distributed version control systems like Git, you can check in locally at your heart’s content without affecting others. When your code is nice and shiny (and is in line with your project standards), you integrate it by “pushing” to the central repository. I love to work like this!

Often, I check in every couple of minutes, sometimes even though my code doesn’t compile yet. Maybe after some minutes of tedious editing, only to ensure that my changes are not lost. Just like Hansel and Gretel, I like to leave bread crumbs behind, but unlike theirs, mine won’t be eaten by the birds.

The hard part is deciding on a commit message, though. I don’t want to break my flow by thinking about something suitable. Many times, the changes are not even coherent, so the commit messages cannot be meaningful. What’s my solution? I simply run


git add -A && git commit -m "Checkpoint"

git add -A && git commit -m "Checkpoint"

over and over again.

As you can imagine, there will be dozens of “checkpoints” before I’m finished with my high-level task. Consequently, the commit history (what git log shows) is an utter mess. Even though Git supports various commands to alter the commit history (git rebase -i, for example), doing it manually is both, tedious and error-prone. I therefore decided to automate this process through a little tool named git-autocommit.

git-autocommit is a short Bash script that you invoke in your working directory. It runs indefinitely (at least until you hit Ctrl-C) and periodically executes git add -A && git commit -m "<git-autocommit>". When you’re done with your changes, you hit Ctrl-C and run git-autocommit again. Then, the script checks if there is a series of <git-autocommit> commit messages already at the top of your commit history and if so, performs a soft reset to the predecessor of the first autocommit; otherwise, it just waits for new changes in your working directory and autocommits them as before.

The upshot of this is that all the changes that you’ve done (those changes that have been autocommitted) are now staged and ready to be committed again, but this time en bloc and with a descriptive check-in comment. Once checked-in, all the intermediate autocommits are gone from the log and it looks as if you’ve made a perfect sausage.

Circular Adventures VI: When the Winner is Known Beforehand

“Girls have an unfair advantage over men: if they can’t get what they want by being smart, they can get it by being dumb.”
— Yul Brynner

In part III and IV I discussed solutions for a circular problem where two indices performed a race within a circular buffer; that is, either index could be ahead of the other.

Sometimes, however, life is simpler and it is always known which index is the leader, and thus the winner — from the outset:


0 1 2 3 4 5 6 7 8 9
    ^           ^ 
    b           a

0 1 2 3 4 5 6 7 8 9

^ ^

b a

In this example, provided we know that b must always be ahead of a, we can deduce that b has wrapped around and the distance between a and b is 4.

Either implementation (the one given in part III and the one given in part IV) of circular_distance will return the same result:


circular_distance(8, 2, 10) == 4

circular_distance(8, 2, 10) == 4

However, both will fail for this case:


0 1 2 3 4 5 6 7 8 9
        ^       ^ 
        b       a

circular_distance(8, 4, 10) == -4

0 1 2 3 4 5 6 7 8 9

^ ^

b a

circular_distance(8, 4, 10) == -4

Why? Under the premise that b is always ahead of a, the distance is +6 not -4. circular_distance computes the wrong result because it assumes that the leading index is less than half the circular buffer size ahead of the other index. This assumption was made (I called it ‘invariant’ at the time) to be able to compute the circular distance even if it is not known which index is ahead of which.

Based on the prerequisite that b is always ahead of a we can give a simplified version of the circular_distance function from part III:


circular_lead(a, b, N):
    return (b - a) mod N

circular_lead(a, b, N):

return (b - a) mod N

I call this function circular_lead instead of circular_distance to emphasize that it returns how much b is ahead of a, which is always a positive number. As usual, all the restrictions (and optimizations) regarding the mod operator apply. In C, which lacks a true mod operator, a generic, portable implementation looks like this:


size_t circular_lead(size_t a, size_t b, size_t N) {
    if (b >= a) return b - a;
    return N - (a - b);
}

size_t circular_lead(size_t a, size_t b, size_t N) {

if (b >= a) return b - a;

return N - (a - b);

}

In situations where one index is known to be ahead of the other, circular_lead has an edge over circular_distance because it supports distances in the range 0 to N-1, whereas circular_distance only supports ranges from 0 to (N-1)/2. This is always the case in “monotonically increasing” scenarios, like run-time measurement, at least until the flux capacitor is invented. Hence, the last example of part IV can be rewritten like this:


volatile uint32_t g_systemTime; // Wraps around.
...
uint32_t start = g_systemTime;

// Do some lengthy operation. After that, we know
// that 'g_systemTime' will be ahead of 'start'.

unt32_t delta = circular_lead_32bit(start, g_systemTime);

volatile uint32_t g_systemTime; // Wraps around.

...

uint32_t start = g_systemTime;

// Do some lengthy operation. After that, we know

// that 'g_systemTime' will be ahead of 'start'.

unt32_t delta = circular_lead_32bit(start, g_systemTime);

If we replace the mod operator with a cast to uint32_t, circular_lead_32bit boils down to this:


inline uint32_t circular_lead_32bit(uint32_t a, uint32_t b) {
    return (uint32_t) (b - a);
}

inline uint32_t circular_lead_32bit(uint32_t a, uint32_t b) {

return (uint32_t) (b - a);

}

[For the mathematically inclined: What we are really talking about here is residue class rings, a concept used by all contemporary computers. I might explore the number theory behind this and other circular topics in a future post.]

More circular adventures…

Bug Hunting Adventures #9: A Random Piece of PI

According to an old saying, there’s more than one way to skin a cat. There are at least as many ways to compute the value of π. One of them uses the Monte Carlo method to approximate π’s value and it is the subject of today’s Bug Hunting epsisode.

We start with a so-called unit circle, a circle with radius 1 whose center is positioned at the origin in the Cartesian coordinate system. Next, we put a square around the unit circle whose sides have length 2 (the diameter of the unit circle):

drawing There are two areas (literally!) of interest in this picture: the circle area Ac and the square area As:

Ac = πr² = π
As = (2r)² = 4

The ratio Ac/As is π/4

Why is this ratio important? Because we can use it to calculate the value of π:

π = 4 Ac/As

Now let’s do some random sampling. We take N random points whose x and y values are both in range [-1; +1] and tally the number of points that fall within the square (Ns) and the number of points that fall within the circle (Nc). Given enough points, the ratio Nc/Ns is a very good approximation for Ac/As and we hence can compute:

π ≈ 4 Nc/Ns

The C code below attempts to calculate π in this manner, but sports a blunder. What is the bug? Bonus question for the mathematically inclined: without executing the code, what value does it really compute (instead of π)?

Code
Solution

Approxion

Code – People – Everything

Category Archives: Code

Bug Hunting Adventures #10: For Whom The Bell Tolls

Code Kata 4: Struct Member Alignment

So long Scott Meyers, So long C++?

Continuous “Commit”ment

Circular Adventures VI: When the Winner is Known Beforehand

Bug Hunting Adventures #9: A Random Piece of PI