β Dalai Lama
If you do a Google search for why programmers typically start counting from zero, you’ll likely find two reasons. Today, I’m going to add another one. But let’s start with the usual explanations.
DIJKSTRA’S HALF-OPEN RANGES
On August 11, 1982, Edsger Dijkstra wrote a short paper on why numbering should start at zero. He first demonstrates that half-open ranges with an excluded upper bound are superior to other alternatives:
1 2 3 |
lower <= i < upper |
Here’s a quick summary of his reasoning, in case you don’t want to read it yourself. The main advantages are that a) you can easily represent empty ranges (i. e. lower equals upper) and b) compute the number of elements in a range by subtracting the lower bound from the upper bound.
Based on ranges where the upper bound is excluded, Dijkstra goes on to show that for sequences of N elements, there are only two ways of indexing:
1 2 3 4 |
1) 1 <= i < N + 1 2) 0 <= i < N |
Obviously, the latter is much more elegant and hence we see code like this everywhere:
1 2 3 4 5 |
for (int i = 0; i < N; ++i) { // upper bound N excluded, N-0 == N iterations. ... } |
BASE-RELATIVE ADDRESSING
Sequences of heterogeneous data in contiguous memory are almost universally laid-out like this:
1 2 3 4 5 6 7 |
+------+------+------+------+-----+------+ | elem | elem | elem | elem | ... | elem | +------+------+------+------+-----+------+ ^ |base address ---------> higher addresses| |
A sequence starts with the first element at some base address, the second follows sizeof(elem) bytes after the first element and so on. Computing the start address of the n-th element can be done using this formula:
1 2 3 |
elemAddr = baseAddr + (n * sizeof(elem)) |
However, this formula only applies if you index your elements from 0 to N – 1. If instead you chose to number indices from 1 to N, the formula would need to be adapted:
1 2 3 |
elemAddr = baseAddr + ((n - 1) * sizeof(elem)) |
This alternative is not only less pleasant to look at, but because of the additional subtraction also harder for the CPU to compute. Consequently, the fathers of C employed the zero-based array access syntax that we are all so familiar with:
1 2 3 |
int x = elem[i]; |
which is really just a shorthand notation for
1 2 3 |
int x = *(elem + i); |
If i is 0, you get the first element, if i is positive, the i-th successor of elem, and if i is negative, the i-th predecessor of elem. The latter fact often surprises developers because they either assume that negative offsets are illegal in the first place or yield elements from the end of the array, like in Python.
Incidentally, there’s another secret to C array indexing: since the addition operation is commutative, you can equally well write
1 2 3 |
int x = i[elem]; // same as int x = *(i + elem) |
As obvious as this is in hindsight, it’s not well known amongst C programmers and a good opportunity to show off at parties. However, I strongly advise against writing code like this for production use.
MODULAR ARITHMETIC
As you know, applying the mod operator like this
1 2 3 |
n mod N |
yields values ranging from 0 to N – 1. This dovetails nicely with zero-based indices into sequences.
Take hash maps for example. To determine the position of an element in a hash map containing N slots, just apply a hash function and take the result modulo N to get the index. That’s it!
Another use case is the ring buffer, one of my favorite containers: to advance an index into a ring buffer, just add the desired offset and apply the mod operator to get wrap-around β no need for extra if/else logic. Again, starting indices at 1 instead of 0 would entail extra additions and subtractions.
There you have it β one more reason to start numbering from zero. (As if you still needed to be convinced…)