Continuous “Commit”ment

“Strive for continuous improvement, instead of perfection.”
— Kim Collins
Checking in early and often is a well-accepted practice these days. Instead of keeping changes local for an extended period of time, software is continuously integrated and thus the overall integration risk is significantly reduced because every time you check into the central repository, your changes become immediately visible and available to others. Problems show up early, which is for sure a good thing.

On the other hand, your code must have a certain level of quality before you can foist it upon others. At the very least, it must compile without errors. Most likely, it is also required to be in line with other project or coding standards, for instance, code must be free of compiler (or MISRA) warnings. Some projects even demand that every code that is checked-in has been 100% code-coverage tested.

Having such “pre-commit quality gates” is a blessing but it stands in the way of the “commit early, commit often” paradigm: Depending on the size of the task it may take hours (if not days) to meet all check-in criteria. Deferring commits for such a long time would certainly be foolish because check-ins are important for another reason: With every commit you drive in a piton that not only saves you from data loss but also allows you to go back and forth in time.

Fortunately, with distributed version control systems like Git, you can check in locally at your heart’s content without affecting others. When your code is nice and shiny (and is in line with your project standards), you integrate it by “pushing” to the central repository. I love to work like this!

Often, I check in every couple of minutes, sometimes even though my code doesn’t compile yet. Maybe after some minutes of tedious editing, only to ensure that my changes are not lost. Just like Hansel and Gretel, I like to leave bread crumbs behind, but unlike theirs, mine won’t be eaten by the birds.

The hard part is deciding on a commit message, though. I don’t want to break my flow by thinking about something suitable. Many times, the changes are not even coherent, so the commit messages cannot be meaningful. What’s my solution? I simply run


git add -A && git commit -m "Checkpoint"

git add -A && git commit -m "Checkpoint"

over and over again.

As you can imagine, there will be dozens of “checkpoints” before I’m finished with my high-level task. Consequently, the commit history (what git log shows) is an utter mess. Even though Git supports various commands to alter the commit history (git rebase -i, for example), doing it manually is both, tedious and error-prone. I therefore decided to automate this process through a little tool named git-autocommit.

git-autocommit is a short Bash script that you invoke in your working directory. It runs indefinitely (at least until you hit Ctrl-C) and periodically executes git add -A && git commit -m "<git-autocommit>". When you’re done with your changes, you hit Ctrl-C and run git-autocommit again. Then, the script checks if there is a series of <git-autocommit> commit messages already at the top of your commit history and if so, performs a soft reset to the predecessor of the first autocommit; otherwise, it just waits for new changes in your working directory and autocommits them as before.

The upshot of this is that all the changes that you’ve done (those changes that have been autocommitted) are now staged and ready to be committed again, but this time en bloc and with a descriptive check-in comment. Once checked-in, all the intermediate autocommits are gone from the log and it looks as if you’ve made a perfect sausage.

Circular Adventures VI: When the Winner is Known Beforehand

“Girls have an unfair advantage over men: if they can’t get what they want by being smart, they can get it by being dumb.”
— Yul Brynner

In part III and IV I discussed solutions for a circular problem where two indices performed a race within a circular buffer; that is, either index could be ahead of the other.

Sometimes, however, life is simpler and it is always known which index is the leader, and thus the winner — from the outset:


0 1 2 3 4 5 6 7 8 9
    ^           ^ 
    b           a

0 1 2 3 4 5 6 7 8 9

^ ^

b a

In this example, provided we know that b must always be ahead of a, we can deduce that b has wrapped around and the distance between a and b is 4.

Either implementation (the one given in part III and the one given in part IV) of circular_distance will return the same result:


circular_distance(8, 2, 10) == 4

circular_distance(8, 2, 10) == 4

However, both will fail for this case:


0 1 2 3 4 5 6 7 8 9
        ^       ^ 
        b       a

circular_distance(8, 4, 10) == -4

0 1 2 3 4 5 6 7 8 9

^ ^

b a

circular_distance(8, 4, 10) == -4

Why? Under the premise that b is always ahead of a, the distance is +6 not -4. circular_distance computes the wrong result because it assumes that the leading index is less than half the circular buffer size ahead of the other index. This assumption was made (I called it ‘invariant’ at the time) to be able to compute the circular distance even if it is not known which index is ahead of which.

Based on the prerequisite that b is always ahead of a we can give a simplified version of the circular_distance function from part III:


circular_lead(a, b, N):
    return (b - a) mod N

circular_lead(a, b, N):

return (b - a) mod N

I call this function circular_lead instead of circular_distance to emphasize that it returns how much b is ahead of a, which is always a positive number. As usual, all the restrictions (and optimizations) regarding the mod operator apply. In C, which lacks a true mod operator, a generic, portable implementation looks like this:


size_t circular_lead(size_t a, size_t b, size_t N) {
    if (b >= a) return b - a;
    return N - (a - b);
}

size_t circular_lead(size_t a, size_t b, size_t N) {

if (b >= a) return b - a;

return N - (a - b);

}

In situations where one index is known to be ahead of the other, circular_lead has an edge over circular_distance because it supports distances in the range 0 to N-1, whereas circular_distance only supports ranges from 0 to (N-1)/2. This is always the case in “monotonically increasing” scenarios, like run-time measurement, at least until the flux capacitor is invented. Hence, the last example of part IV can be rewritten like this:


volatile uint32_t g_systemTime; // Wraps around.
...
uint32_t start = g_systemTime;

// Do some lengthy operation. After that, we know
// that 'g_systemTime' will be ahead of 'start'.

unt32_t delta = circular_lead_32bit(start, g_systemTime);

volatile uint32_t g_systemTime; // Wraps around.

...

uint32_t start = g_systemTime;

// Do some lengthy operation. After that, we know

// that 'g_systemTime' will be ahead of 'start'.

unt32_t delta = circular_lead_32bit(start, g_systemTime);

If we replace the mod operator with a cast to uint32_t, circular_lead_32bit boils down to this:


inline uint32_t circular_lead_32bit(uint32_t a, uint32_t b) {
    return (uint32_t) (b - a);
}

inline uint32_t circular_lead_32bit(uint32_t a, uint32_t b) {

return (uint32_t) (b - a);

}

[For the mathematically inclined: What we are really talking about here is residue class rings, a concept used by all contemporary computers. I might explore the number theory behind this and other circular topics in a future post.]

More circular adventures…

Bug Hunting Adventures #9: A Random Piece of PI

According to an old saying, there’s more than one way to skin a cat. There are at least as many ways to compute the value of π. One of them uses the Monte Carlo method to approximate π’s value and it is the subject of today’s Bug Hunting epsisode.

We start with a so-called unit circle, a circle with radius 1 whose center is positioned at the origin in the Cartesian coordinate system. Next, we put a square around the unit circle whose sides have length 2 (the diameter of the unit circle):

drawing There are two areas (literally!) of interest in this picture: the circle area Ac and the square area As:

Ac = πr² = π
As = (2r)² = 4

The ratio Ac/As is π/4

Why is this ratio important? Because we can use it to calculate the value of π:

π = 4 Ac/As

Now let’s do some random sampling. We take N random points whose x and y values are both in range [-1; +1] and tally the number of points that fall within the square (Ns) and the number of points that fall within the circle (Nc). Given enough points, the ratio Nc/Ns is a very good approximation for Ac/As and we hence can compute:

π ≈ 4 Nc/Ns

The C code below attempts to calculate π in this manner, but sports a blunder. What is the bug? Bonus question for the mathematically inclined: without executing the code, what value does it really compute (instead of π)?

Code
Solution

The Need for an Hippocratic Oath for Software Engineers

Two months ago, I wrote about an incident where a fault in an airbag system was responsible for the death of a child. (Actually, as a court ruled later, it was not the airbag system that was responsible, but rather an engineer who kept quiet about a bug that he had discovered earlier.)

Now, it seems like we have another case of a software-based disaster: Volkswagen recently admitted that they used software in their cars that detects official test situations and then reconfigures the engine to reduce pollution — just to get the figures right. When not in this “cheat mode”, that is, under real driving conditions, Volkswagen’s NOx emission rates are up to 40 times higher.

I think this cunning piece of German engineering has the potential to smash Volkswagen. Not extinguish entirely, of course, but Volkswagen’s stock value might decrease so much that it can be taken over easily by the competition. In any case, it will cost this company, whose bosses probably thought it was too big to fail, big: billions of dollars, thousands of jobs, an immeasurable amount of reputation and credibility.

What makes this story so exceptionally shocking is that we are not talking about a bug (like in the failing airbag case) but a feature. This software was put in deliberately, so it is rather a sin of commission than a sin of omission.

While it would be interesting to know what made people act in such a criminal way, what really matters is this: the whole mess could have probably been avoided if someone along the chain of command would have stood up and said “No!”.

I will use this sad story as an opportunity to introduce the “Software Engineering Code of Ethics and Professional Practice” to you, a joint effort by IEEE and ACM. As for the Volkswagen scandal, it looks as if at least the following principles have been grossly violated:

Software Engineers shall:

1.03. Approve software only if they have a well-founded belief that it is safe, meets specifications, passes appropriate tests, and does not diminish quality of life, diminish privacy or harm the environment. The ultimate effect of the work should be to the public good.

1.04. Disclose to appropriate persons or authorities any actual or potential danger to the user, the public, or the environment, that they reasonably believe to be associated with software or related documents.

1.06. Be fair and avoid deception in all statements, particularly public ones, concerning software or related documents, methods and tools.

2.06. Identify, document, collect evidence and report to the client or the employer promptly if, in their opinion, a project is likely to fail, to prove too expensive, to violate intellectual property law, or otherwise to be problematic.

3.03. Identify, define and address ethical, economic, cultural, legal and environmental issues related to work projects.

5.11. Not ask a software engineer to do anything inconsistent with this Code.

6.13. Report significant violations of this Code to appropriate authorities when it is clear that consultation with people involved in these significant violations is impossible, counter-productive or dangerous.

In a world in which an ever-increasing part is driven by software, “The Code” should be considered the “Hippocratic Oath” of software engineers, something we are obliged to swear before we can call ourselves software professionals, before we are unleashed to an unsuspecting world. I wonder how many more software catastrophes it will take until we finally get there.

Bug Hunting Adventures #8: Just Like a Rubber Ball…

Two incidents gave rise to this Bug Hunting episode: first, I’m currently beefing up my burglar alarm system with a Raspberry Pi and second, I’ve just come across the old song from the ’60s “Rubber Ball” by Bobby Vee while randomly surfing YouTube (“Bouncy, bouncy, bouncy, bouncy”).

The alarm system extension I have in mind is this: I want to be able to configure temporary inhibition of motion detection via a simple push-button: if you press it once, you suppress motion detection by one hour, if you press it n times, you suppress it by n hours, respectively. If you press the button after you haven’t pressed it for five seconds, any suppression is undone. There are status LEDs that give visual feedback such that you know what you’ve configured.

Thanks to the fine RPi.GPIO library, accessing GPIO ports is usually a piece of Pi. However, there is a fundamental problem when reading digital signals from switches and buttons called “switch bounce”, a phenomenon that has driven many brave engineers nuts.

If you flip a switch, there never is a nice transition from zero to one (or one to zero, depending on how you flipped the switch). Instead, the signal bounces between zero and one, in a rather indeterminate fashion for many milliseconds. The bouncing characteristics depend on various factors and can even vary for the same type of switch.

The takeaway is this: with switches (and buttons, of course), expect the unexpected! You must use some sort of debouncing mechanism, implemented in either hardware or software. For some background information check out this excellent article by embedded systems luminary and die-hard veteran Jack Ganssle.

But back to my alarm system. I skimmed the RPi.GPIO documentation and was happy to read that since version 0.5.7, RPi.GPIO has support for debouncing of digital input signals, so there was no need to debounce the push-button myself. Exuberantly, I sat down and wrote this test code:


#!/usr/bin/env python

import RPi.GPIO as GPIO
import time

# My push-button is connected to GPIO port 17.
GPIO_PORT = 17

# Use official Raspi GPIO numbering scheme (port 17 is pin 11).
GPIO.setmode(GPIO.BCM)

# Configure GPIO port for input, use a pull-down resistor.
GPIO.setup(GPIO_PORT, GPIO.IN, pull_up_down=GPIO.PUD_DOWN)

# Handler that handles rising edge events.
def button_pressed():
    print "Button pressed!"
        
# Register rising edge handler, ignore further rising edges for 200 ms.
GPIO.add_event_detect(GPIO_PORT, GPIO.RISING, 
    callback=button_pressed, bouncetime=200)

# Keep looping forever.
while True:
    time.sleep(1.0)

#!/usr/bin/env python

import RPi.GPIO as GPIO

import time

# My push-button is connected to GPIO port 17.

GPIO_PORT = 17

# Use official Raspi GPIO numbering scheme (port 17 is pin 11).

GPIO.setmode(GPIO.BCM)

# Configure GPIO port for input, use a pull-down resistor.

GPIO.setup(GPIO_PORT, GPIO.IN, pull_up_down=GPIO.PUD_DOWN)

# Handler that handles rising edge events.

def button_pressed():

print "Button pressed!"

# Register rising edge handler, ignore further rising edges for 200 ms.

GPIO.add_event_detect(GPIO_PORT, GPIO.RISING,

callback=button_pressed, bouncetime=200)

# Keep looping forever.

while True:

time.sleep(1.0)

To avoid arbitrary, floating input voltages, I setup the port such that it is pulled down (to ground) by default. As a consequence, reading that port when the button is not pressed yields a clean zero.

I didn’t want to poll the button port, as this would burn valuable CPU cycles, so I registered an event handler with the RPi.GPIO library that would be called back once a rising edge had been detected. I added a ‘bouncetime’ parameter of 200 milliseconds, to avoid multiple calls to my ‘button_pressed’ handler, in full accordance with the RPi.GPIO documentation:

To debounce using software, add the ‘bouncetime=’ parameter to a function where you specify a callback function. Bouncetime should be specified in milliseconds. For example:

# add rising edge detection on a channel, ignoring further # edges for 200ms for switch bounce handling GPIO.add_event_detect(channel, GPIO.RISING, callback=my_callback, bouncetime=200)

1
2
3
4
5
6

# add rising edge detection on a channel, ignoring further
# edges for 200ms for switch bounce handling
GPIO.add_event_detect(channel, GPIO.RISING, callback=my_callback,
bouncetime=200)

In other words, after a rising edge has been detected due to a button press, I won’t be bothered with spurious rising edges for 200 ms. (200 ms is plenty of time — no switch I’ve ever seen bounced for more than 100 ms. Yet, 200 ms is short enough to handle cases where the user presses the button in rapid succession.)

I tested my button for a minute and it worked like a charm; then, I varied my press/release scheme and suddenly some spurious “Button pressed!” messages appeared. After some head scratching, I discovered the problem — can you spot it, too? (Hint: it’s a logic error, not something related to misuse of Python or the RPi.GPIO library.)

Epilogue: Once the problem was understood, the bug was fixed easily; getting rid of this darn, “bouncy, bouncy” earworm was much harder, though.

Solution

Two German Maxims That Will Save Your Neck (and Others’ Necks as Well)

I quite remember the uneasy sensation that I had when a former coworker told me a story — a story about a senior engineer who went to jail because of a bug, a fatal one, as it turned out.

The bug lurked in an electronic control unit (ECU) which was, among other things, controlling the manual deactivation of the front passenger seat’s airbag. Under normal circumstances, you wouldn’t want to disable an airbag, a feature that saves lives every day around the world. However, if you intend to put a rearward-facing baby seat in the front, you have to do it, or you risk severe injury of your child in case the airbag deploys during an accident.

Now, this unfortunate engineer discovered that under extremely rare conditions there was a tiny window of opportunity for the airbag deactivation mechanism to fail silently; that is, it would appear to be deactivated when in fact it wasn’t. I don’t remember the necessary prerequisites, but what I do remember is that the combination of inputs and actions sounded so silly, so unusual, so improbable that he — like probably most of us would have — expected that the fault would never ever show up in practice. But what a terrible mistake this was, as this is exactly what happened and a child lost its life.

How unlikely or likely is the higly improbable? The chances of winning a 6-number lottery game are typically 1 against many tens of millions; yet, the likelihood that some player (not a particular player, of course) wins is quite high. Why? Because there are millions of players who take part in such lotteries. The same is true for ECUs which frequently find their way into millions of cars.

The developer was punished not for creating the bug but for not telling his managers about his discovery, for keeping it secret. But why didn’t he report the problem to his superiors? I can only guess. Maybe there was a lot of schedule pressure, perhaps he didn’t want to upset his boss. Or, the product was already released and a recall would have cost a lot of money, let alone reputation. If you ask me, it was a deadly cocktail of fear and pride.

When I did my military training at the German Armed Forces, one of the first rules I learned was “Melden macht frei”, which more or less translates to “reporting is liberating”. It is your duty to report an incident and it has a liberating effect on you, both emotionally and legally. After reporting, it is your superior’s problem. He has to decide what to do next. That’s not dodging responsibility — it’s passing on an issue that is outside your area of responsibility to the right person.

In the same spirit, as professionals we also have to report any issue that is harmful to customers or the company, regardless of how unlikely it appears to us. Even if management makes a (hopefully prudent) decision to ignore the problem (like it was the case in the Space Shuttle Challenger disaster, where engineers clearly raised their concerns that the O-rings on the rocket boosters would not seal at low temperatures), at least you have behaved professionally and are saved from prosecution and guilt feeling.

There is, however, a strange phenomenon: People sometimes forget that you informed them, especially when they have to testify in court. That’s why I want to share another important German wisdom with you: “Wer schreibt, der bleibt”, which can be translated as “you write, you stay”. It means that (only) if you write something down, you will be remembered. In other words: always keep a paper trail; email usually suffices.

Approxion

Code – People – Everything