Consider this real-world problem: In a UMTS network, short messages (SMS) are comprised of a protocol header and the actual text message. The text message can be encoded in many different formats, but for the sake of this example I want to focus only on two encodings: 8-bit characters and 7-bit characters.
Since the standard SMS alphabet only uses 7-bit characters, it often makes sense to use a 7-bit encoding for the text message, as you can squeeze more characters in the available 140 octets (an octet is a byte comprising 8 bits; remember that it is not specified how many bits there are in a byte).
In the header, there is a so-called ‘user data length’ element that tells how many characters the text message comprises. The stress is on characters – you don’t know over how many of the following octets the message is distributed. But of course, you can find out. A so-called ‘data coding scheme’ octet tells you whether the text message uses 7-bit encoding or 8-bit encoding. Thus, calculating the total number of used octets should be straightforward:
1 2 3 4 5 6 7 8 9 10 11 12 |
int char_count = get_char_count(buffer); int octet_count; // If 7-bit encoding. if (get_dcs(buffer) & DCS_7BIT_ENCODING) { octet_count = (char_count * 7) / 8; // If 8-bit encoding. } else { octet_count = char_count; } |
This code is short, simple and – unfortunately – wrong. I’ve seen this mistake in several guises and the reason for this bug is that programmers obviously don’t know about what I call the ‘Pizza Box Problem’. It goes like this.
You are a pizza delivery guy and you have to deliver pizza (stored in pizza cases) to your customers. To keep your pizzas hot, you stuff them into thermal bags, each of which is capable of holding 8 pizza boxes.
How many bags do you need to deliver, say, 21 pizza boxes?
Every pizza delivery guy immediately knows the answer: 3. It is not 21 / 8, since integer division causes the result to be equal to 2!
What you need is this: if your division yields a fractional part, you want to increase the result of the integer division by one. You could resort to floating point arithmetic (and use the ceil() function, for instance) but that would be inefficient.
The trick is that you add the divisor minus one to the dividend before performing the integer division:
1 2 3 |
bags = (boxes + bag_size - 1) / bag_size; |
It works like this: if ‘boxes’ is already evenly divisible by ‘bag_size’, adding one less than the ‘bag_size’ doesn’t change the overall result; otherwise, the dividend will be increased such that the next ‘bag_size’ multiple is crossed:
1 2 3 |
bags = (21 + 7) / 8 == 3 |
Applying what we have just learned to our SMS problem, we conclude that the code in the if block should look like this:
1 2 3 4 5 |
... octet_count = (char_count * 7 + 7) / 8; ... |
We have ‘char_count’ * 7 bits (pizza boxes) that we want to store in octets (thermal bags) of size 8.
Enjoy your pizza!
The Return of the Pizza Delivery Guy
[update 2009-03-29: The equation
1 2 3 |
octet_count = (char_count * 7 + 7) / 8 |
can obviously be simplified to
1 2 3 |
octet_count = char_count - char_count / 8 |
Here is the proof:
1 2 3 4 |
octet_count = (char_count * 8 - char_count + 7) / 8 octet_count = ( (char_count * 8) / 8 - (char_count / 8) + 7 / 8 ) |
since (char_count * 8) / 8 = char_count and 7 / 8 = 0 we get
1 2 3 |
octet_count = char_count - char_count / 8 (q.e.d.) |
— end update]