— Dalai Lama XIV
One of the lesser-known secrets of the C programming language is the so-called “strict aliasing rule”. This is a shame, because failing to adhere to it takes you (along with your code) straight into the realm of undefined behavior. As no one in their right mind wants to go there, let’s shed some light on it!
POINTER ALIASING DEFINED
First of all, we have to clarify what “aliasing” really means, or rather aliasing of pointers. Take a look at this example:
1 2 3 4 5 6 |
int value; int* p1 = &value; // p1 points to 'value'. int* p2 = &value; // p2 as well... |
Here, ‘p1’ and ‘p2’ are aliased to the same object ‘value’; that is, they point to the same object. If you update ‘value’ through ‘p1’:
1 2 3 |
*p1 = 42; |
a read through ‘p2’ will reflect this change:
1 2 3 |
assert((*p1 == *p2) && (value == *p2)); // So true... |
Because of the possibility of aliasing, a C compiler is prevented from applying certain optimizations. Consider:
1 2 3 4 5 6 7 |
int silly(int* x, int* y) { *x = 0; *y = 1; return *x; } |
You might think that any decent compiler would generate simplified code equivalent to this:
1 2 3 4 5 6 7 |
int silly(int* x, int* y) { *x = 0; *y = 1; return 0; // *x was previously set to 0, so don't load from memory again. } |
It’s not a matter of decency — the compiler just can’t do this optimization! Here’s the assembly output that clearly shows that the return value is loaded from memory:
1 2 3 |
$ gcc -O2 -masm=intel silly.c -S && cat silly.s |
1 2 3 4 5 6 7 |
silly: mov DWORD PTR [rdi], 0 mov DWORD PTR [rsi], 1 mov eax, DWORD PTR [rdi] ; '*x' fetched from memory. ret |
The optimization is not possible because the caller could call ‘silly’ like so:
1 2 3 4 |
int value; silly(&value, &value); |
In this case, ‘x’ and ‘y’ are aliased to the same ‘value’, which means ‘silly’ must return 1 not 0. Consequently, ‘*x’ must be read from memory, every time. Period.
ROOM FOR IMPROVEMENT
If you think about it, even though it may happen, pointer aliasing won’t happen very often in practice. Why waste so much potential for optimization for the uncommon case? Most likely, the folks from the C standards committee had the same line of thinking. They introduced rules that state when pointer aliasing must not happen. Enter the strict aliasing rule.
To facilitate compiler optimization, the strict aliasing rule demands that (in simple words) pointers to incompatible types never alias. Pointers to compatible types (like the two ‘int’ pointers ‘x’ and ‘y’ in ‘silly’) are assumed to (potentially) alias. Let’s make the pointer types incompatible (‘short*’ vs. ‘int*’):
1 2 3 4 5 6 7 |
int silly2(short* x, int* y) { *x = 0; *y = 1; return *x; } |
1 2 3 |
$ gcc -O2 -masm=intel silly2.c -S && cat silly2.s |
1 2 3 4 5 6 7 |
silly2: mov WORD PTR [rdi], ax mov DWORD PTR [rsi], 1 xor eax, eax ; equivalent to mov eax, 0 ret |
As you can see, this time no load from memory is performed — 0 is returned instead. The optimization is possible because the compiler assumes that aliasing is not allowed in this case.
VIOLATIONS
But what happens if pointers to incompatible types nevertheless alias? After all, this can happen quite easily. Maybe not in the ‘silly’ example, but in real-world production code:
1 2 3 4 5 6 7 8 9 10 11 12 |
struct measurements_t { uint8_t level; uint16_t temperature; uint32_t force; }; void convert(const uint8_t* data, struct measurements_t* measurements) { /* Fill measurements object with raw data. */ *measurements = *((struct measurements_t*) &data[0]); } |
In an attempt to convert data stored in a buffer (maybe read over a network connection) into a high-level structure, a pointer to ‘struct measurements_t’ is aliased with a pointer to a ‘uint8_t’. Since both types are incompatible (pointer to struct vs. pointer to ‘uint8_t’) this code is a violation of the strict aliasing rule. Experienced C developers most likely recognized immediately that this code yields undefined behavior, but they would have probably attributed it to struct padding and alignment issues. The real reason, as we know by now, is a violation of the strict aliasing rule.
THE FINE PRINT
So what exactly is the strict aliasing rule and what does “type compatibility” mean? Here’s an excerpt from the ISO C99, standard, chapter 6.5:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
Such Standardeese is often hard to digest, so let me try to clarify it a bit. Aliased pointer access is fine if:
1. The pointed-at types are identical. Note that typedefs are just type aliases and don’t introduce new types:
1 2 3 4 5 |
typedef int INT; INT* p = ... int x = *((int*) p); // Fine and cast not really necessary! |
2. The pointed-at types are identical apart from the “signed-ness” (e. g. ‘int’ vs. ‘unsigned int’).
3. The pointed-at types are identical apart from qualification (e. g. ‘const int’ vs. ‘int’).
4. The rule “an aggregate or union type that includes one of the aforementioned types among its members” is highly confusing and probably doesn’t mean much. Check this out for details.
5. The pointed-at types are different, but the pointed-at type through which the access is made is a pointer to character:
1 2 3 4 5 6 7 8 |
float f = 3.1415; unsigned char* p = (unsigned char*) &f; unsigned char a1 = p[0]; // First byte of 'f'. unsigned char a2 = p[1]; // : unsigned char a3 = p[2]; // : unsigned char a4 = p[3]; // Last byte of 'f'. |
Conversely, aliased pointer access is not defined if the pointed-at types are fundamentally different. Note that this includes pointers to structs that are identically defined but have different tag names:
1 2 3 4 5 6 7 |
struct S1 { int x; }; // tag 'S1'. struct S2 { int x; }; // tag 'S2'. S1* s1; S2 = *((S2*) s1); // Undefined behavior! |
CONCLUSION
The strict aliasing rule was introduced to give the compiler vendors some leeway regarding optimizations. By default, the compiler assumes that pointers to (loosely speaking) incompatible types never alias. As a consequence, you, the programmer, have to make sure that this rule is obeyed.
Here’s some disquieting news: a lot of existing code isn’t conforming to the strict aliasing rule, but the code works (or appears to work) fine anyway. As an example, the ‘convert’ function above, which aliases a struct to an array of bytes might work fine on an Intel x86-based platform, which supports unaligned memory access. However, if you use ‘convert’ on an (older) ARM-based platform, you might get a “bus error” exception that could crash your system. In other cases, nonconforming code just works by coincident, with a particular compiler, or a particular compiler version at a particular optimization level.
To me, knowing about the strict aliasing rule is as important for every systems developer as knowing about the other systems programming “secrets” like alignment, struct padding, and endianness.
FURTHER READING
I’m indebted to the authors of the following two blog post. From the first post, I shamelessly stole the idea for the ‘silly’ function, which the author originally called ‘foo’, which I found silly :-) The second post gives a much more detailed coverage of the strict aliasing rule and also discusses the interesting technique of “casting through a union”.