Category Archives: Code

Bad Hiring Strategy, Great Interview Question

recruitment You have to bear in mind that this story happened in 1999, more than a year before the dot com bubble burst; senseless hiring of people was considered perfectly normal at that time.

I was working for a well-known consumer electronics company, developing mobile phones. Our CEO had a problem. The problem — which is not so uncommon among CEOs — was that the investors weren’t happy because the company didn’t sell enough handsets. So the boss went to sales and they claimed they couldn’t sell more because the development guys didn’t give them enough cool products — products the customers reeeeeelly wanted and even if they did, they’d finish too late; that is, they’d come out with a phone in March, missing the Christmas sales season by threeeeee month.

So our boss went to the development manager to find out what the problem was; our development manager told him what almost every development manager tells in such a situation: “We could develop soooooo many cool products in sooooooo little time, if we only had moooooore developers!”

This procedure repeated a couple of times until our CEO freaked out. He went to our human resources manager and commanded him to “get more software developers, no matter what!”

Our HR manager didn’t know much about hiring software developers, but he surely had a plan: he wanted to hire 100 software developers in 100 days. He had t-shirts printed, carrying these words:

[company]
100
in
100
Do YOU have good
soft(ware) skills?

Our job was to wear these (poor quality) t-shirts and attract potential software developers for our team.

This strategy was unbelievably stupid for many reasons. First, it emphasizes quantity, not quality. Second, it reads as if soft skills were more important than software skills. Last, it is very offending as it is based on the idea that good software developers stupidly fall prey to such bad HR campaigns.

Anyway, we got lots of candidates — many more than we could interview; our HR manager celebrated a victory. We, however, had to separate the wheat from the chaff and thus we developed programming tests that every candidate had to take.

One of the most successful questions was this one:

Write a routine that sorts an array of ‘n’ integers; write best-quality code.

Innocuously as it looks, this question has several good characteristics: it requires the candidate to actually write real code; since it has to be ‘best-quality code’ you can find out about his/her skills and quality standards. For instance, does the candidate

– pay attention to style issues (indentation, layout and consistency in general)?
– choose meaningful identifier names?
– write (good) comments?
– use assertions and/or checks for boundary cases?

This question can reveal even more: since no programming language is given, you can find out about a candidate’s favorite programming language. But most importantly, you can find out if a developer is smart and has a questioning attitude.

Every smart developer knows that there is no perfect sorting algorithm. The choice depends on many constraints — constraints that are not given in the problem statement and hence must be investigated. I remember one applicant commented along these lines:

[…] It all depends on how big ‘n’ is and whether we have write access to the array (that is, we can sort in-place). Are there any code/RAM restrictions? Depending on factors like these I would choose the best sorting algorithm from a text book. Unless no further information is given, I would use the sorting algorithm that comes with the standard library (e. g. qsort() in C, java.util.Arrays.sort() in Java). Since I know that you want me to write some code, I’ll implement an insertion sort algorithm which is easy to code and its O(n^2) behavior is acceptable for ‘n’ < 1000 [...]

Naturally, we made an offer to him, which he turned down a couple of days later. I guess he was scared of having to work with too many soft skills experts.

How to become a better programmer

Good programmers often wonder how to become even better programmers. They constantly seek for new tools and techniques that help them getting their job done better and faster.

If you want to know what helps the most, here is some advice:

“You must do two things above all others: read a lot and write a lot. There’s no way around these two things that I’m aware of, no shortcut.”

These words are from the best-selling author Stephen King; he should really know — he makes 45 million bucks per year from his books.

I believe that programming and writing novels have a lot in common and that King’s words of wisdom are applicable to software development as well.

Let’s first focus on reading. It’s a well known fact that programmers read too little. In their book “Peopleware”, Tom DeMarco and Timothy Lister assert that the average developer doesn’t own a single book on the subject of his or her work. If this is true, it might be an explanation as to why our industry is performing so badly: if developers don’t know about the fundamentals of software engineering (not just coding issues — also topics like software quality, configuration management and peopleware in general) how can they explain them to non-technical folks like sales and upper management once they’ve become technical leaders?

What about code reading? Fortunately, we live in very privileged times. Twenty years ago, almost all code was closed source; nowadays, there are billions of lines of open source code out there from which we can learn. Alas, there is the fundamental law of programming: “It’s harder to read code than it is to write it.”

If browsing through huge open source code bases gives you headaches, check out “Code Reading” or “Code Quality” by Diomidis Spinellis. These two fine books quote (and criticize) countless examples from open source projects — in my view, a lightweight and often entertaining way to improve your programming Kung Fu.

But what about code writing? Isn’t a professional software developer already writing enough code? Not so! Typically, software developers only spend a fraction of their time writing code. In fact, most of their time is devoted to meetings, email, reading specs, writing documentation and so on. With this little time given for writing code, it is vitally important that developers keep their programming skills active.

A good way to practice is by contributing to an open source project. Another possibility is doing Code Katas — little practice sessions, based on a concept borrowed from karate and other martial arts, where the practitioner fights against an imaginary opponent. But by far the best way is to work for your employer in your leisure time — for free!

Have you recovered?

I presume that to most people, this idea sounds shocking, almost insane — but I really mean it. Often, good ideas arise during the day that your boss doesn’t understand and hence doesn’t approve. If you think your idea is challenging and useful for the company — do it at home! Not only does this improve your coding skills, it helps your company; as a bonus, your reputation within the company increases. So we have at least a win-win, if not a double-win-win situation. But only choose interesting topics, things that improve your skills; leave the drudgework for the office.

Constant reading and practicing is the key to success. It doesn’t take much time, but it needs to be done habitually. Don’t expect that your company or your boss or anyone but you is responsible for improving your skills. Even if those days existed in the past, they certainly don’t exist anymore.

The Pizza Box Problem

Consider this real-world problem: In a UMTS network, short messages (SMS) are comprised of a protocol header and the actual text message. The text message can be encoded in many different formats, but for the sake of this example I want to focus only on two encodings: 8-bit characters and 7-bit characters.

Since the standard SMS alphabet only uses 7-bit characters, it often makes sense to use a 7-bit encoding for the text message, as you can squeeze more characters in the available 140 octets (an octet is a byte comprising 8 bits; remember that it is not specified how many bits there are in a byte).

In the header, there is a so-called ‘user data length’ element that tells how many characters the text message comprises. The stress is on characters – you don’t know over how many of the following octets the message is distributed. But of course, you can find out. A so-called ‘data coding scheme’ octet tells you whether the text message uses 7-bit encoding or 8-bit encoding. Thus, calculating the total number of used octets should be straightforward:


    int char_count = get_char_count(buffer);
    int octet_count;

    // If 7-bit encoding.
    if (get_dcs(buffer) & DCS_7BIT_ENCODING) {
        octet_count = (char_count * 7) / 8;
    // If 8-bit encoding.
    } else {
        octet_count = char_count;
    }

int char_count = get_char_count(buffer);

int octet_count;

// If 7-bit encoding.

if (get_dcs(buffer) & DCS_7BIT_ENCODING) {

octet_count = (char_count * 7) / 8;

// If 8-bit encoding.

} else {

octet_count = char_count;

}

This code is short, simple and – unfortunately – wrong. I’ve seen this mistake in several guises and the reason for this bug is that programmers obviously don’t know about what I call the ‘Pizza Box Problem’. It goes like this.

You are a pizza delivery guy and you have to deliver pizza (stored in pizza cases) to your customers. To keep your pizzas hot, you stuff them into thermal bags, each of which is capable of holding 8 pizza boxes.

How many bags do you need to deliver, say, 21 pizza boxes?

Every pizza delivery guy immediately knows the answer: 3. It is not 21 / 8, since integer division causes the result to be equal to 2!

What you need is this: if your division yields a fractional part, you want to increase the result of the integer division by one. You could resort to floating point arithmetic (and use the ceil() function, for instance) but that would be inefficient.

The trick is that you add the divisor minus one to the dividend before performing the integer division:


    bags = (boxes + bag_size - 1) / bag_size;

bags = (boxes + bag_size - 1) / bag_size;

It works like this: if ‘boxes’ is already evenly divisible by ‘bag_size’, adding one less than the ‘bag_size’ doesn’t change the overall result; otherwise, the dividend will be increased such that the next ‘bag_size’ multiple is crossed:


    bags = (21 + 7) / 8 == 3

bags = (21 + 7) / 8 == 3

Applying what we have just learned to our SMS problem, we conclude that the code in the if block should look like this:


    ...
        octet_count = (char_count * 7 + 7) / 8;
    ...

...

octet_count = (char_count * 7 + 7) / 8;

...

We have ‘char_count’ * 7 bits (pizza boxes) that we want to store in octets (thermal bags) of size 8.

Enjoy your pizza!

The Return of the Pizza Delivery Guy

[update 2009-03-29: The equation

octet_count = (char_count * 7 + 7) / 8

1
2
3

    octet_count = (char_count * 7 + 7) / 8

can obviously be simplified to

octet_count = char_count - char_count / 8

1
2
3

    octet_count = char_count - char_count / 8

Here is the proof:

octet_count = (char_count * 8 - char_count + 7) / 8 octet_count = ( (char_count * 8) / 8 - (char_count / 8) + 7 / 8 )

1
2
3
4

    octet_count = (char_count * 8 - char_count + 7) / 8
    octet_count = ( (char_count * 8) / 8 - (char_count / 8) + 7 / 8 )

since (char_count * 8) / 8 = char_count and 7 / 8 = 0 we get

octet_count = char_count - char_count / 8 (q.e.d.)

1
2
3

    octet_count = char_count - char_count / 8  (q.e.d.)

— end update]

const static or static const or what?

This issue crops up time and again: somebody looks at code like this:


    const static char TEXT[] = "Hi there!";

const static char TEXT[] = "Hi there!";

and complains loudly that this was not legal C code: “static”, he screams, “must come before const”:


    static const char TEXT[] = "Hi there!";

static const char TEXT[] = "Hi there!";

I’ve never had a compiler that generated wrong code from the first version. I guess this myth is nourished by the fact that GCC and PC-Lint issue odd warnings when confronted with const-first declarations:


    $gcc -c -Wall -W test.c
    warning: `static' is not at beginning of declaration

    $lint-nt -u test.c
    Warning 618: Storage class specified after a type

$gcc -c -Wall -W test.c

warning: `static' is not at beginning of declaration

$lint-nt -u test.c

Warning 618: Storage class specified after a type

(PC-Lint’s warning message is particularly weird, isn’t it?)
Both tools process the second version without any complaint.

I really don’t know where this rumor comes from. Maybe it was true in K&R C, but C++98 and C99 certainly don’t care about the order of qualifiers — they don’t even care about the position of the type!


    char const static TEXT[] = "Hi there!";

char const static TEXT[] = "Hi there!";

is perfectly legal, but


    $gcc -c -Wall -W -std=c99 test.c

$gcc -c -Wall -W -std=c99 test.c

still complains like before (at least when using the -Wall -W combination; -Wall alone doesn’t produce this warning).

The C99 grammar (see 6.7 Declarations) clearly says:


    declaration-specifiers:
        storage-class-specifier declaration-specifiers_opt
        type-specifier declaration-specifiers_opt
        type-qualifier declaration-specifiers_opt
        function-specifier declaration-specifiers_opt

    storage-class-specifier:
        typedef
        extern
        static
        auto
        register

    type-specifier:
        void
        char
        short
        int
        ...

    type-qualifier:
        const
        restrict
        volatile

declaration-specifiers:

storage-class-specifier declaration-specifiers_opt

type-specifier declaration-specifiers_opt

type-qualifier declaration-specifiers_opt

function-specifier declaration-specifiers_opt

storage-class-specifier:

typedef

extern

static

auto

type-specifier:

void

char

short

int

...

type-qualifier:

const

restrict

volatile

so the order obviously doesn’t matter.

Personally, I prefer the first version, because the fact that a variable is a constant is more important to me than its scope. Still, it is probably wiser to use the second version: not because the first one is ill-formed, but because of misbehaving compilers and static analysis tools.

[update 2009-05-20: Christian Hujer found the missing link: chapter 6.11. (“Future Language Directions”) of the ISO C99 standard clearly says:

“The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature.”

There you have it. Forget my crazy preferences, write future-proof code by following the standard’s advice. – end update]

TODO or not TODO

Are TODOs in source code good or evil?

In my view, TODOs are a tool and like any tool, it can be misused. I do agree with most of the critism but if they are used with care and if they are actively managed they can improve productivity.

Sure, TODOs tend to accumulate and, yes, there are selfish developers who use them to acquit themselves from checking in sloppy code (which they will never clean-up). Yet, there are situations where they are appropriate.

Sometimes it just makes sense to check-in code that is not 100% done. For instance, if all of your requirements are not settled yet or you are uncertain about some boundary cases. Here is my litmus test: if your unfinished code already provides value to somebody else and it doesn’t break existing tests it is OK to check it in. Be it a tester who needs a running system to develop test cases against or a coder who needs your class to continue her work. Maybe it’s a sales person who needs a quick tentative solution for his trade show. Unfinished code is especially appreciated in the ‘bazaar’ software development model.

If you accept the notion that unfinished code is useful if it already provides value, you probably also agree that it is not a good idea to keep the list of open issues in your head; even scrap paper is only slightly better. To me, it is much better to record ‘tentativeness’ very close to the code.

However, it is essential to manage those TODOs — if you don’t manage them, they will eat you alive.

In order to manage you need to track, and tracking is much easier if your TODOs follow a standardized format. Once you have a standardized format you can easily extract metrics. If everybody used their own style of TODOs it would be very hard to control them. A mix of TODO, FIXME, BUGBUG, ???, ### is TODO hell.

After years of experimenting, I’ve found this style of TODO comments to be the best:


    // TODO:2008-12-06:johnc:Add support for negative offsets.
    // While it is unlikely that we get a negative offset, it can
    // occur if the garbage collector runs out of space.

// TODO:2008-12-06:johnc:Add support for negative offsets.

// While it is unlikely that we get a negative offset, it can

// occur if the garbage collector runs out of space.

Let me briefly describe why I consider this format to be superior.

First of all, it starts with ‘TODO’ in all capital letters, which is good, as it is recognized by many editors/IDEs and highlighted in a special manner. This makes it easy to recognize TODOs for human beings when browsing source code.

Next comes the date in ISO 8601 ‘extended’ format. Not only does this tell you when the TODO was added: it allows you to easily sort your TODOs:


    grep -R TODO: * | sort

grep -R TODO: * | sort

The sorted output will show you which TODOs have been around for a long time, maybe for too long a time.

After the ISO date comes the name of the TODO owner. I suggest you use login names, but it doesn’t really matter as long as it is unique and can be easily mapped to a real person’s name. Due to the fact that the name is surrounded by colons you can build a list showing who owns the most TODOs:


    grep -oRE "TODO:[^:]+:([^:]+)" * | sed -e "s/^.*://" | sort | uniq -c

grep -oRE "TODO:[^:]+:([^:]+)" * | sed -e "s/^.*://" | sort | uniq -c

Finally, there is a brief summary that describes the reason for the TODO; if necessary, additional text follows on the next lines. Describing a TODO is essential: it tells developers why it is there; leaving it out is akin to putting up a ‘Watch out’ warning sign without giving any additional clues.

In order to extract information automatically, the standardized format must be enforced. This grep command will catch most ill-formed TODOs:


    grep -iRE "todo|fixme|bugbug|???" * | grep -vP "TODO:\d{4}-\d{2}-\d{2}:.{3,}:.{10,}"

grep -iRE "todo|fixme|bugbug|???" * | grep -vP "TODO:\d{4}-\d{2}-\d{2}:.{3,}:.{10,}"

In addition to pinpointing typical TODO comment keywords it ensures that the date is well-formed and that the owner and description fields are not omitted.

Ideally, you run such scripts automatically every time a developer checks-in changes. If you don’t have testbots you should at least run them as part of your daily build procedure. In case you don’t even have daily builds (pity you!) it is the job of the project lead to execute them regularly.

TODOs are useful for modern, release-early-and-often software development processes; by standardizing their format, all of the common disadvantages can be overcome. If you don’t use them yet, I suggest you put them near the top of your TODO list.

Advanced Programming

Today, Hartmut, a colleague, showed me some code written by one of his teammates. He found the code to be very confusing — in fact, he was quite upset about it and the author — and asked me whether I had ever seen tricky code like this.

Oh well, oh well, I had. In fact, I was the one who introduced this particular technique to the author and recommended using it.

Years ago, I had a similar problem on a C++ project. My teammates were not really experienced C++ developers and didn’t know much about templates. As a matter of fact, I like templates and if used with care, they can make the code more readable and maintainable.

But this was just my view — the view of the rest of the team was, well, quite different.

I was shocked and disappointed. I thought I had written great code, but my ignorant colleagues didn’t like it — they didn’t even bother to learn about advanced C++ topics!

Hartmut argued that I’m allowed to use any advanced feature or technique I want as long as the rest of the team understands it. Not satisfied with his advice I countered by explaining that I didn’t want to drive a Ferrari by only using first gear. Hartmut’s response was a complete revelation: “Then”, he said “you have to teach them how to drive a Ferrari using all gears”.

Let me summarize this as ‘Hartmut’s Law of Advanced Programming’:

“You may only use an advanced programming language feature or technique if the rest of the team understands it; if this is not the case and you still want to use it, you have to educate them about it.”

Approxion

Code – People – Everything