Category Archives: C/C++/Embedded

Pointers in C, Part V: The ‘restrict’ Qualifier

Pointers in C

“Le vrai est trop simple, il faut y arriver toujours par le compliqué.”
(“The truth is too simple: one must always get there by a complicated route.”)
― George Sand, Letter to Armand Barbès, 12 May 1867”

Exactly one year ago, I started this series on pointers, but what I really wanted to blog about originally was a rather arcane and rarely used keyword that first appeared in the C99 language standard: the ‘restrict’ qualifier. But after trying to digest the formal definition in chapter 6.7.3.1 I decided that taking a little detour would make my and my reader’s life much easier.

Let me set the stage for ‘restrict’ by summarizing what I wrote in episode 3 about the “strict aliasing rule”:

1. The compiler might optimize code involving multiple pointers, provided the pointers are not aliased; that is, they don’t point to the same object or memory.

2. The compiler assumes that pointers to incompatible types never alias.

3. The compiler assumes that pointers to compatible types (same types, apart from CV-qualification and signedness) potentially alias.

Therefore, a function with this signature is eligible for compiler optimization:


void transform(const int* input, double* output, size_t nvals);

void transform(const int* input, double* output, size_t nvals);

whereas this one is not:


void transform(const double* input, double* output, size_t nvals);

void transform(const double* input, double* output, size_t nvals);

This is unfortunate, because most likely, the arrays passed to the second version of ‘transform’ are in completely different, non-overlapping memory regions. But the compiler doesn’t know and hence stubbornly adheres to the strict aliasing rule.

The ‘restrict’ qualifier, which — contrary to the ‘const’ and ‘volatile’ qualifiers — can only be applied to pointers, is a promise given by the programmer to the compiler that pointers don’t alias even though they point to objects of the same type. Therefore, this version of ‘transform’ can be optimized by the compiler:


void transform(
    const double* restrict input,
    double* restrict output,
    size_t nvals);

void transform(

const double* restrict input,

double* restrict output,

size_t nvals);

Let’s put this to the test with the ‘silly’ example from episode 3:


int silly(int* x, int* y) {
    *x = 0;
    *y = 1;
    return *x;
}

int silly(int* x, int* y) {

*x = 0;

*y = 1;

return *x;

}

Before knowing about the strict aliasing rule, we were surprised to see that the memory access to ‘x’ in the return statement was not replaced with a simple ‘return 0’. After having learned about the strict alias rule, it’s clear: since ‘x and ‘y’ point to the same type, the compiler must assume that they may point to the same memory location and hence it loads the value pointed to by ‘x’ from memory afresh:


$ gcc -O2 -masm=intel silly.c -S && cat silly.s

$ gcc -O2 -masm=intel silly.c -S && cat silly.s


silly:
        mov     DWORD PTR [rdi], 0
        mov     DWORD PTR [rsi], 1
        mov     eax, DWORD PTR [rdi] ; '*x' fetched from memory.
        ret

silly:

mov DWORD PTR [rdi], 0

mov DWORD PTR [rsi], 1

mov eax, DWORD PTR [rdi] ; '*x' fetched from memory.

ret

Now, if we tell the compiler that ‘x’ and ‘y’ never point to the same memory location, optimization is possible:


int silly3(int* restrict x, int* restrict y) {
    *x = 0;
    *y = 1;
    return *x;
}

int silly3(int* restrict x, int* restrict y) {

*x = 0;

*y = 1;

return *x;

}


$ gcc -O2 -std=c99 -masm=intel silly3.c -S && cat silly3.s

$ gcc -O2 -std=c99 -masm=intel silly3.c -S && cat silly3.s


silly3:
        mov     DWORD PTR [rdi], 0
        mov     DWORD PTR [rsi], 1
        xor     eax, eax            ; equivalent to mov eax, 0
        ret

silly3:

mov DWORD PTR [rdi], 0

mov DWORD PTR [rsi], 1

xor eax, eax ; equivalent to mov eax, 0

ret

Nice, isn’t it?

If you use the ‘restrict’ qualifier on a pointer, you promise that — at least for the lifetime of the restricted pointer — the object pointed to is only accessed through this pointer. Break that promise and you get undefined behavior. (In the ‘silly3’ example, the lifetime of the pointers ‘x’ and ‘y’ end once the call to ‘silly3’ returns.)

In the C99 language standard, many functions from the standard library have been revised and now make use of the ‘restrict’ keyword. Take ‘memcpy’, for instance:


void *memcpy(void* restrict dst, const void* restrict src, size_t n);

void *memcpy(void* restrict dst, const void* restrict src, size_t n);

As everybody knows, ‘memcpy’ can only copy non-overlapping blocks of memory and this fact is nicely highlighted by the use of the ‘restrict’ keyword: during the call to ‘memcpy’ the memory regions src[0] to src[n] as well as dst[0] to dst[n] are exclusively owned and may not be accessed by other pointers. Since ‘memmove’ can copy overlapping blocks of memory (with a little speed penalty, of course), ‘memmove’ consequently doesn’t declare restricted pointers:


void *memmove(void* dst, const void* src, size_t n);

void *memmove(void* dst, const void* src, size_t n);

Please be aware that ‘restrict’ is not supported by the C++ language standard and it’s unclear whether it ever will be. If you mix C99 and C++ code, you might have to strip the ‘restrict’ keyword from C99 headers to avoid compilation errors:


// MyClass.cpp

extern "C" {
#define restrict
#include "MyC99Library.h"
}

// MyClass.cpp

extern "C" {

#define restrict

#include "MyC99Library.h"

}

In general, I’m not a big fan of optimization features that the compiler is free to ignore. If utmost performance is important, you want dependable performance. Most likely, your routine is not on the performance critical path, anyway. If you think it is, carefully profile your code and after you proved that it is, you’d better code that part in assembly language. Without such evidence, sprinkling your code with ‘restrict’ is little short of premature optimization. (I complained here about the unnecessarily overused ‘inline’ keyword for the same reason.)

What I do like about the ‘restrict’ keyword, though, is that by unraveling it, we’ve made a beautiful journey through important everyday programming topics like “pointers vs. arrays”, “type qualifiers”, “pointer conversion rules”, and the “strict aliasing rule”. The journey was the destination.

Boolean Arguments are BOOL-Shit

“All generalizations are false, including this one.”
— Mark Twain

In programming, little is as obscure as code that passes boolean literals to functions.

As an example, consider the following invocation of a method that sets logging options on a logger object:


logger.setOutputControlMode(true, true);

logger.setOutputControlMode(true, true);

What does this really mean? Even looking at the declaration of ‘setOutputControlMode’ doesn’t give a satisfying answer:


void Logger::setOutputControlMode(bool toStdErr, bool newline);

void Logger::setOutputControlMode(bool toStdErr, bool newline);

You need to delve into the implementation of the ‘Logger’ class to really see the intent:


void Logger::setOutputControlMode(bool toStdErr, bool newline) {
    toStdErr_ = toStdErr;
    newline_ = newline;
}

void Logger::log(const char* text) {
    ...
    if (loggingEnabled_) {
        logstream << text;
        if (newline_) {
            logstream << std::endl;
        }
        if (toStdErr_) {
            std::cerr << text;
            if (newline_) {
                std::cerr << std::endl;
            }
        }
    }
}

void Logger::setOutputControlMode(bool toStdErr, bool newline) {

toStdErr_ = toStdErr;

newline_ = newline;

}

void Logger::log(const char* text) {

...

if (loggingEnabled_) {

logstream << text;

if (newline_) {

logstream << std::endl;

}

if (toStdErr_) {

std::cerr << text;

if (newline_) {

std::cerr << std::endl;

}

There you go! After calling ‘setOutputControlMode’ with ‘true’ as the first argument, subsequent logging will additionally go to ‘stderr’; passing ‘true’ as second argument will automatically append a newline character after every log statement.

While you could argue that the wishy-washy name of method ‘setOutputControlMode’ was badly chosen, it was probably done so to accommodate even more options in the future. I’m convinced that sooner or later another boolean parameter will pop up — ‘buffered’, to specify whether logging should be buffered or not.

How should you deal with such obviously inferior interfaces? The strategy depends on whether you own the offending code or not. Let’s start with the first case, where you’re using some 3rd party library that you can’t directly modify.

HIDING THE DIRT

First of all, make it a habit to never feed these boolean monsters with boolean literals. Don’t even do this if the method is part of a standard library, like Java’s ‘Component.setVisible(boolean b)’ method. Stand your ground and shame the original author by working around it. But how?

One easy solution is to define a couple of boolean constants in your code:


const bool OUTPUT_CONTROL_COPY_TO_STDERR = true;
const bool OUTPUT_CONTROL_DONT_COPY_TO_STDERR = false;
const bool OUTPUT_CONTROL_EMIT_NEWLINE = true;
const bool OUTPUT_CONTROL_DONT_EMIT_NEWLINE = false;

const bool OUTPUT_CONTROL_COPY_TO_STDERR = true;

const bool OUTPUT_CONTROL_DONT_COPY_TO_STDERR = false;

const bool OUTPUT_CONTROL_EMIT_NEWLINE = true;

const bool OUTPUT_CONTROL_DONT_EMIT_NEWLINE = false;

At the expense of a little bit more typing, code that calls ‘setOutputControlMode’ becomes much more expressive:


logger.setOutputControlMode(
    OUTPUT_CONTROL_COPY_TO_STDERR, 
    OUTPUT_CONTROL_DONT_EMIT_NEWLINE);

logger.setOutputControlMode(

OUTPUT_CONTROL_COPY_TO_STDERR,

OUTPUT_CONTROL_DONT_EMIT_NEWLINE);

The only risk that’s not mitigated is that one can still confuse the order of arguments. After all, the compiler will happily accept this slip:


logger.setOutputControlMode(
    OUTPUT_CONTROL_DONT_EMIT_NEWLINE,
    OUTPUT_CONTROL_COPY_TO_STDERR);

logger.setOutputControlMode(

OUTPUT_CONTROL_DONT_EMIT_NEWLINE,

OUTPUT_CONTROL_COPY_TO_STDERR);

Which is, of course, not what the developer had in mind.

A safer approach is to use wrapper functions. This is particularly useful if you don’t need all flag combinations in practice. Let’s assume you always want to emit newlines at the end of every log entry and only sometimes want to enable/disable echo to ‘stderr’. All you need to do is define two straightforward helper functions in your own code base:


namespace logger_utils {

void LoggerEnableCopyToStdErr(Logger* lg) {
    lg->setOutputControlMode(
        OUTPUT_CONTROL_COPY_TO_STDERR,
        OUTPUT_CONTROL_EMIT_NEWLINE);
}

void LoggerDisableCopyToStdErr(Logger* lg) {
    lg->setOutputControlMode(
        OUTPUT_CONTROL_DONT_COPY_TO_STDERR,
        OUTPUT_CONTROL_EMIT_NEWLINE);
}

}

namespace logger_utils {

void LoggerEnableCopyToStdErr(Logger* lg) {

lg->setOutputControlMode(

OUTPUT_CONTROL_COPY_TO_STDERR,

OUTPUT_CONTROL_EMIT_NEWLINE);

}

void LoggerDisableCopyToStdErr(Logger* lg) {

lg->setOutputControlMode(

OUTPUT_CONTROL_DONT_COPY_TO_STDERR,

OUTPUT_CONTROL_EMIT_NEWLINE);

}

Using these wrapper functions is both, highly readable and safe:


LoggerEnableCopyToStdErr(logger);
...
LoggerDisableCopyToStdErr(logger);

LoggerEnableCopyToStdErr(logger);

...

LoggerDisableCopyToStdErr(logger);

AVOIDING THE DIRT

If you can modify the source code, you have even better options. Let’s assume that a colleague of yours developed a GUI framework that has the same weakness as Java’s java.awt.Component class; that is, it provides a method which takes a boolean argument:


void Component::setVisible(boolean b) {
    ...
}

void Component::setVisible(boolean b) {

...

}

Now, there’s probably tons of code that uses this method, as ugly as it may be. In order to maintain backwards compatibility, I would designate ‘setVisible’ as deprecated and add two wrapper functions to your colleague’s class which clearly communicate their intend:


void Component::show() {
    setVisible(true);
}

void Component::hide() {
    setVisible(false);
}

void Component::show() {

setVisible(true);

}

void Component::hide() {

setVisible(false);

}

If backwards compatibility is not an issue or if you’re designing a completely new method, you should do it the right way, and the right way is to use enums instead of booleans:


enum {
    VISIBILITY_SHOW,
    VISIBILITY_HIDE
} VISIBILITY;

void Component::setVisibility(VISIBILITY visibility) {
    switch (visibility) {
    case VISIBILITY_SHOW:
        ...
        break;
    case VISIBILITY_HIDE:
        ...
        break;
    default:
        assert(false); // Invalid visibility.
    };
}

enum {

VISIBILITY_SHOW,

VISIBILITY_HIDE

} VISIBILITY;

void Component::setVisibility(VISIBILITY visibility) {

switch (visibility) {

case VISIBILITY_SHOW:

...

break;

case VISIBILITY_HIDE:

...

break;

default:

assert(false); // Invalid visibility.

};

}

In exchange for roughly one additional minute of work for the developer, everyone gets a readable interface:


mainWindow.setVisibility(VISIBILITY_SHOW);

mainWindow.setVisibility(VISIBILITY_SHOW);

It’s a given fact of programming, that when there are two states or modes, it won’t be long until a third one pops up. No problem, if you used enum parameters from the outset:


enum VISIBILITY {
    VISIBILITY_SHOW,
    VISIBILITY_HIDE,
    VISIBILITY_TRANSPARENT,
};

enum VISIBILITY {

VISIBILITY_SHOW,

VISIBILITY_HIDE,

VISIBILITY_TRANSPARENT,

};

In the visibility example, we have mutually exclusive modes. In the ‘setOutputControlMode’ example, however, options can be combined. Still, enums are a pragmatic solution:


enum OUTPUT_CONTROL_FLAGS {
    OCF_COPY_TO_STDERR = (1 << 0),
    OCF_EMIT_NEWLINE = (1 << 1),
    OCF_BUFFERED = (1 << 2),
};

void Logger::setOutputControlMode(OUTPUT_CONTROL_FLAGS flags);

enum OUTPUT_CONTROL_FLAGS {

OCF_COPY_TO_STDERR = (1 << 0),

OCF_EMIT_NEWLINE = (1 << 1),

OCF_BUFFERED = (1 << 2),

};

void Logger::setOutputControlMode(OUTPUT_CONTROL_FLAGS flags);

Notice that I added the suffix ‘FLAGS’ as a hint to the reader that these options can be freely mixed with the binary OR operator:


setOutputControlMode(OCF_COPY_TO_STDERR | OCF_EMIT_NEWLINE);

setOutputControlMode(OCF_COPY_TO_STDERR | OCF_EMIT_NEWLINE);

This, of course, only works in plain C. A C++ compiler will complain that the integer result that operator ‘|’ yields cannot be converted back to an enum type, so in order to please your C++ compiler you have to apply a ‘static_cast’:


setOutputControlMode(static_cast<OUTPUT_CONTROL_FLAGS>(
    OCF_COPY_TO_STDERR | OCF_EMIT_NEWLINE));

setOutputControlMode(static_cast<OUTPUT_CONTROL_FLAGS>(

OCF_COPY_TO_STDERR | OCF_EMIT_NEWLINE));

Such casts are never pretty, but when they get the job done, they’re acceptable.

Let me bring this post to a close by driving home the following guidelines:

1. Never pass boolean literals (‘true’/’false’) to functions. Define meaningful symbolic constants instead.

2. Alternatively, consider writing wrappers around functions taking boolean arguments. Remember that the best (most readable) functions take zero arguments:

“The ideal numbers of arguments for a function is zero (niladic). Next comes one (monadic), followed closely by two (dyadic). Three arguments (triadic) should be avoided where possible. More than three (polyadic) requires very special justification ‐ and then shouldn’t be used anyway.”
— Uncle Bob (“Clean Code”)

3. If you are the author of a method that takes options, spend the effort and code dedicated, niladic setters. If that doesn’t feel right, use enum parameters. But on no account should you ever resort to boolean arguments.

BBC micro:bit —The End of All Excuses for Not Learning Programming

“He that is good for making excuses is seldom good for anything else”
— Benjamin Franklin

The BBC micro:bit, or Micro Bit, is a small embedded computer designed by the BBC for use in computer education in the UK. Every British schoolchild of grade 6 – 8 gets a Micro Bit for free. But before you become too envious: You can buy one for about 18 EUR—just do an Internet search to find a local dealer.

In my view, the Micro Bit is an excellent platform for learning programming since it sports many attractive features, such as:

– 5 x 5 LED matrix display
– Temperature sensor
– Ambient light sensor
– Accelerometer
– Magnetometer/compass
– Bluetooth connectivity
– Input buttons
– Various GPIO ports to attach sensors/actuators
– Audio output jack

But the coolest thing about it is that it comes without friction. The development environment is available online, which means that you can do all your programming in your favorite web browser—no need to install any software or drivers. The Micro Bit presents itself as a USB mass storage device onto which you just drag & drop your executable files.

COMMON EXCUSES

Let me show you more of the Micro Bit’s features by dispelling some common excuses for not starting to write code for it today:

Excuse: I don’t possess a Micro Bit!
Response: You don’t need one. There’s a powerful simulator directly on the Block Editor coding environment. I highly recommend starting out with the Block Editor, anyway. Try it out now, before reading on! (Let’s put it this way: If I recommend something provided by Microsoft, it must really mean something, right?)

Excuse: I have no idea what to code.
Response: There are dozens of step-by-step tutorials and pre-built examples on the Block Editor page. Just click on “Projects” and then again “Projects” or “Examples.”

Excuse: Graphical programming sucks and is only for wimps.
Response: While I’m a big fan of text-based programming languages, I think it’s a good idea to toy around with the Block Editor for a while, just to familiarize yourself with Micro Bit’s features. Once you’ve learned what it’s capable of, you can do the rest of your programming in JavaScript or Python.

Excuse: I am (or want to become) a hardcore embedded coder and I’m not interested in high-level, wimpy JavaScript and Python programming.
Response: The Micro Bit is based on ARM’s Mbed IoT platform. There’s an online IDE on Mbed’s website that allows you to program your Micro Bit in C/C++, if you want.

Excuse: I live in a cabin in the woods and have poor Internet connectivity. Online programming is not an option.
Response: Once you’ve visited the Block Editor, you can access it from your browser even if there’s no Internet connection. But since you’re living in a cabin in the woods, you are probably a hardcore C/C++ coder, aren’t you? Well, thanks to the Yotta build system from Mbed, you can do command-line based builds and write your code in Emacs or vi, just as you please.

Excuse: I don’t have time.
Response: This is a lame excuse and I truly detest it. We’re all busy yet have plenty of time for things that are important to us. When we say, “I don’t have time,” we are really saying, “I’m a lazy bum and such a coward that I don’t dare confessing it to you.” Don’t get me wrong: It’s completely OK if you don’t like programming and have other priorities in life, but if you read my blog and you’ve made it this far, you should at least give it a quick try so that you can make an informed decision.

PLAYFUL TESTING

The Micro Bit does look like the perfect learning platform, doesn’t it?

In fact, there’s only one thing I wasn’t so happy about, at least initially: As far as I know, you can only use the simulator if you code in the Block Editor or JavaScript. My preferred Micro Bit language, however, is Python.

I scratched my head a bit and came to the conclusion that this was not really a problem, after all. Why? A simulator basically means manual testing, and manual testing isn’t really something we should ever aim for—it’s no fun at all. Instead, I want to use automated unit tests to ensure that my code behaves correctly even after the most minute change.

But there’s a catch: If you want to run Python unit tests on a host PC (not on the real Micro Bit device) you have a problem. Naturally, the ‘microbit‘ package, which provides access to the Micro Bit’s runtime and it’s peripherals, isn’t part of the standard Python distribution.

As a remedy, I set out to create a little Micro Bit playground, one that comes with a ‘microbit’ package containing mock objects for all the API objects that the original ‘microbit’ package offers. Here’s the essence of the ‘happy_sad’ example contained in the playground GitHub repository:


from microbit import *

def step():
    if (button_a.is_pressed()):
        display.show(Image.HAPPY)
    elif (button_b.is_pressed()):
        display.show(Image.SAD)

def main():
    while True:
        step()
        sleep(10)

if __name__ == '__main__':
    main()

from microbit import *

def step():

if (button_a.is_pressed()):

display.show(Image.HAPPY)

elif (button_b.is_pressed()):

display.show(Image.SAD)

def main():

while True:

step()

sleep(10)

if __name__ == '__main__':

main()

And here are the corresponding test cases using the mocked microbit.py module:


def test_no_image_shown_without_button_presses(self):
    script.step()
    script.step()
    script.step()
    self.assertFalse(mbit.display.show.called)

def test_happy_face_shown_on_button_a_press(self):
    mbit.button_a.is_pressed.return_value = True
    script.step()
    mbit.display.show.called_with(mbit.Image.HAPPY)

def test_sad_face_shown_on_button_b_press(self):
    mbit.button_b.is_pressed.return_value = True
    script.step()
    mbit.display.show.called_with(mbit.Image.SAD)

def test_no_image_shown_without_button_presses(self):

script.step()

self.assertFalse(mbit.display.show.called)

def test_happy_face_shown_on_button_a_press(self):

mbit.button_a.is_pressed.return_value = True

script.step()

mbit.display.show.called_with(mbit.Image.HAPPY)

def test_sad_face_shown_on_button_b_press(self):

mbit.button_b.is_pressed.return_value = True

script.step()

mbit.display.show.called_with(mbit.Image.SAD)

Just run ‘make’ to execute the test cases and ‘make install’ to transfer the code to your plugged-in Micro Bit. It can’t get any more convenient. No more excuses left!

Pointers in C, Part IV: Pointer Conversion Rules

Pointers in C

“Failure is the key to success; each mistake teaches us something.
— Morihei Ueshiba

Sometimes, someone walks up to you and claims that there is a bug in your well-crafted code. Then, after having successfully proved that individual wrong, it occurs to you that there is indeed a bug—albeit a different one! Those are quite humbling experiences, but experiences that we should be most grateful for.

SETTING THE STAGE

This episode was triggered by feedback that I received from a reader regarding a “Dangerously Confusing Interfaces” post. In said post, I advise that instead of accepting a pointer to “uncopied” memory like this:


void WriteAsync(const void* data, size_t len);

void WriteAsync(const void* data, size_t len);

‘WriteAsync’ should rather take a pointer to an opaque data structure named ‘uncopied_memory’:


typedef struct {
    void* dummy;
} uncopied_memory;

void WriteAsync(const uncopied_memory* data, size_t len);

typedef struct {

void* dummy;

} uncopied_memory;

void WriteAsync(const uncopied_memory* data, size_t len);

“uncopied” memory means that for the sake of efficiency, the called function doesn’t copy the provided data but instead expects you to keep it alive and unchanged while the called function is executed asynchronously. Since the suggested interface change requires an explicit cast to an ‘uncopied_memory’ pointer, it’s a lot less likely that a temporary buffer allocated from the stack is passed accidentally. The idea of the proposed approach is that every call to ‘WriteAsync’ requires an explicit cast that acts as a reminder to the programmer that the buffer’s contents must be preserved.

For instance, if you wanted to pass a structure that I used in the previous installment of this series to ‘WriteAsync’, you would do it like this:


typedef struct {
    uint8_t level;
    uint16_t temperature;
    uint32_t force;
} measurements_t;

extern struct measurements_t my_measurements;
...
WriteAsync((uncopied_memory*) &my_measurements, sizeof(my_measurements));

typedef struct {

uint8_t level;

uint16_t temperature;

uint32_t force;

} measurements_t;

extern struct measurements_t my_measurements;

...

WriteAsync((uncopied_memory*) &my_measurements, sizeof(my_measurements));

But back to the question. What the reader was worried about is that since ‘measurements_t’ and ‘uncopied_memory’ are by no means compatible, wouldn’t a cast to an ‘uncopied_memory’ pointer constitute a violation of the “strict aliasing rule“?

Actually, when it comes to the “strict aliasing rule,” the fact that these structs have incompatible members doesn’t really matter—even if you accessed the stored value through a pointer to a struct with an identical set of members you would be in trouble; if the tag names of the structs are different, it already counts as a violation of the “strict aliasing rule.”

The key word here is access. If you just create a pointer to incompatible types, everything is fine. Within ‘WriteAsync’ you just cast the received ‘uncopied_memory’ pointer into a ‘uint8_t’ pointer and access the provided data byte-wise, which is always safe, as you know (if you didn’t know, go back and read my previous post).

So far, so good. We don’t access stored memory through incompatible pointers; we only do pointer conversion, which is always safe, isn’t it? I replied to my reader that everything was fine, there was no violation of the “strict aliasing rule.”

Nevertheless, I couldn’t rid myself of this nagging feeling about whether the conversion/cast is really always safe.

POINTER CONVERSION RULES

The venerable book “The C Programming Language” by Brian Kernighan and Dennies Ritchie has this to say on pointer conversions:

A pointer to one type may be converted to a pointer to another type. The resulting pointer may cause addressing exceptions if the subject pointer does not refer to an object suitably aligned in storage. It is guaranteed that a pointer to an object may be converted to a pointer to an object whose type requires less or equally strict storage alignment and back again without change; the notion of alignment” is implementation-dependent, but objects of the char types have least strict alignment requirements. As described in Par.A.6.8, a pointer may also be converted to type void * and back again without change.

Let me paraphrase: pointer conversion is safe provided the alignment requirements of the target type are less or equal to the alignment requirements of the source type. The converted pointer can be converted back to the original pointer without problems.

Though, the statement “The resulting pointer may cause addressing exceptions” is not clear to me. What does it mean? If the target type has stricter alignment requirements, do you get “addressing exceptions” when you create the pointer or when you access memory through it? Let’s assume that we are on a typical platform where objects of type ‘double’ are aligned on an 8-byte boundary and ‘chars’ have no alignment requirements (‘chars’ are aligned on a 1-byte boundary, so to speak.):


double PI = 3.1415927;

char* pc = (char*) &PI;          // (1)
char byte0 = *pc;                // (2)

double* pd = (double*) &byte0;   // (3)
double d = *pd;                  // (4)

double PI = 3.1415927;

char* pc = (char*) &PI; // (1)

char byte0 = *pc; // (2)

double* pd = (double*) &byte0; // (3)

double d = *pd; // (4)

The conversion (1) is 100% safe and so is the corresponding read-access (2): the alignment requirements of type ‘char’ are less than the alignment requirements of type ‘double’. (4) is 100% unsafe, but what about (3)? Aren’t we just creating a pointer? To find out, I had to dig deep into my copy of the C99 language standard. Eventually, I found what I call the “pointer conversion rule”:

6.3.2.3/7 A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object.

There you have it and much more precise than the paragraph from “The C Programming Language.” Believe it or not—statement (3), the sheer pointer conversion already gets you into the realm of undefined behavior. Who knew?

So what does this mean regarding the conversion/cast from a ‘measurements_t’ pointer to an ‘uncopied_memory’ pointer? As we know from the standard, it would be safe if the alignment requirements for ‘uncopied_memory’ were less or equal to the alignment requirements of ‘measurements_t’.

In the previous example, we had to deal with primitive types (‘char’, ‘double’) whose alignment requirements can easily be determined. In order to find out about the alignment requirements for structs, we need to dive once more into the C99 standard document:

6.7.2.1/13 A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

Meditate on this for a while. Paraphrased, this all means that the alignment requirements of a struct are the same as the alignment requirements of a struct’s first member. So the question boils down to this: Are the alignment requirements of a ‘void’ pointer (‘uncopied_memory’s first member) less or equal to the alignment requirements of a ‘char’ (‘measurements_t’s first member)?

Of course, they’re not! A pointer type (like void*) is more or less just an integer type in disguise that is capable of holding all the addresses of your system and as such, pointer types have the same alignment requirements as regular integer types. On a 32-bit platform, pointers typically comprise 4 bytes. Thus, on typical 32-bit platforms, they will need to be aligned on 4-byte boundaries.

By contrast, a character (like the first element of measurements_t) comprises exactly one byte and thus has no alignment requirement—it can be stored at any address in memory.

Since the alignment requirements of the first element of ‘uncopied_memory’ are stronger than the alignment requirements of ‘measurements_t’, we can conclude that my advice to cast to ‘uncopied_memory’ may yield undefined behavior. Not because of the “strict aliasing rule,” but because of a violation of the “pointer conversion rules.”

To solve the problem, the type of the ‘dummy’ member of ‘uncopied_memory’ needs to be changed to ‘char’, a type that has the weakest alignment requirements. I have updated the “Dangerously Confusing Interfaces” post accordingly.

Pointers in C, Part III: The Strict Aliasing Rule

“Know the rules well, so you can break them effectively.”
— Dalai Lama XIV

One of the lesser-known secrets of the C programming language is the so-called “strict aliasing rule”. This is a shame, because failing to adhere to it takes you (along with your code) straight into the realm of undefined behavior. As no one in their right mind wants to go there, let’s shed some light on it!

POINTER ALIASING DEFINED

First of all, we have to clarify what “aliasing” really means, or rather aliasing of pointers. Take a look at this example:


int value;

int* p1 = &value;   // p1 points to 'value'.
int* p2 = &value;   // p2 as well...

int value;

int* p1 = &value; // p1 points to 'value'.

int* p2 = &value; // p2 as well...

Here, ‘p1’ and ‘p2’ are aliased to the same object ‘value’; that is, they point to the same object. If you update ‘value’ through ‘p1’:


*p1 = 42;

*p1 = 42;

a read through ‘p2’ will reflect this change:


assert((*p1 == *p2) && (value == *p2)); // So true...

assert((*p1 == *p2) && (value == *p2)); // So true...

Because of the possibility of aliasing, a C compiler is prevented from applying certain optimizations. Consider:


int silly(int* x, int* y) {
    *x = 0;
    *y = 1;
    return *x;
}

int silly(int* x, int* y) {

*x = 0;

*y = 1;

return *x;

}

You might think that any decent compiler would generate simplified code equivalent to this:


int silly(int* x, int* y) {
    *x = 0;
    *y = 1;
    return 0;   // *x was previously set to 0, so don't load from memory again.
}

int silly(int* x, int* y) {

*x = 0;

*y = 1;

return 0; // *x was previously set to 0, so don't load from memory again.

}

It’s not a matter of decency — the compiler just can’t do this optimization! Here’s the assembly output that clearly shows that the return value is loaded from memory:


$ gcc -O2 -masm=intel silly.c -S && cat silly.s

$ gcc -O2 -masm=intel silly.c -S && cat silly.s


silly:
        mov     DWORD PTR [rdi], 0
        mov     DWORD PTR [rsi], 1
        mov     eax, DWORD PTR [rdi] ; '*x' fetched from memory.
        ret

silly:

mov DWORD PTR [rdi], 0

mov DWORD PTR [rsi], 1

mov eax, DWORD PTR [rdi] ; '*x' fetched from memory.

ret

The optimization is not possible because the caller could call ‘silly’ like so:


int value;
silly(&value, &value);

int value;

silly(&value, &value);

In this case, ‘x’ and ‘y’ are aliased to the same ‘value’, which means ‘silly’ must return 1 not 0. Consequently, ‘*x’ must be read from memory, every time. Period.

ROOM FOR IMPROVEMENT

If you think about it, even though it may happen, pointer aliasing won’t happen very often in practice. Why waste so much potential for optimization for the uncommon case? Most likely, the folks from the C standards committee had the same line of thinking. They introduced rules that state when pointer aliasing must not happen. Enter the strict aliasing rule.

To facilitate compiler optimization, the strict aliasing rule demands that (in simple words) pointers to incompatible types never alias. Pointers to compatible types (like the two ‘int’ pointers ‘x’ and ‘y’ in ‘silly’) are assumed to (potentially) alias. Let’s make the pointer types incompatible (‘short*’ vs. ‘int*’):


int silly2(short* x, int* y) {
    *x = 0;
    *y = 1;
    return *x;
}

int silly2(short* x, int* y) {

*x = 0;

*y = 1;

return *x;

}


$ gcc -O2 -masm=intel silly2.c -S && cat silly2.s

$ gcc -O2 -masm=intel silly2.c -S && cat silly2.s


silly2:
        mov     WORD PTR [rdi], ax
        mov     DWORD PTR [rsi], 1
        xor     eax, eax            ; equivalent to mov eax, 0
        ret

silly2:

mov WORD PTR [rdi], ax

mov DWORD PTR [rsi], 1

xor eax, eax ; equivalent to mov eax, 0

ret

As you can see, this time no load from memory is performed — 0 is returned instead. The optimization is possible because the compiler assumes that aliasing is not allowed in this case.

VIOLATIONS

But what happens if pointers to incompatible types nevertheless alias? After all, this can happen quite easily. Maybe not in the ‘silly’ example, but in real-world production code:


struct measurements_t {
    uint8_t level;
    uint16_t temperature;
    uint32_t force;
};

void convert(const uint8_t* data, struct measurements_t* measurements) {
    /* Fill measurements object with raw data. */
    *measurements = *((struct measurements_t*) &data[0]);
}

struct measurements_t {

uint8_t level;

uint16_t temperature;

uint32_t force;

};

void convert(const uint8_t* data, struct measurements_t* measurements) {

/* Fill measurements object with raw data. */

*measurements = *((struct measurements_t*) &data[0]);

}

In an attempt to convert data stored in a buffer (maybe read over a network connection) into a high-level structure, a pointer to ‘struct measurements_t’ is aliased with a pointer to a ‘uint8_t’. Since both types are incompatible (pointer to struct vs. pointer to ‘uint8_t’) this code is a violation of the strict aliasing rule. Experienced C developers most likely recognized immediately that this code yields undefined behavior, but they would have probably attributed it to struct padding and alignment issues. The real reason, as we know by now, is a violation of the strict aliasing rule.

THE FINE PRINT

So what exactly is the strict aliasing rule and what does “type compatibility” mean? Here’s an excerpt from the ISO C99, standard, chapter 6.5:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

a type that is the signed or unsigned type corresponding to the effective type of the object,

a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

Such Standardeese is often hard to digest, so let me try to clarify it a bit. Aliased pointer access is fine if:

1. The pointed-at types are identical. Note that typedefs are just type aliases and don’t introduce new types:


typedef int INT;
INT* p = ...
int x = *((int*) p);    // Fine and cast not really necessary!

typedef int INT;

INT* p = ...

int x = *((int*) p); // Fine and cast not really necessary!

2. The pointed-at types are identical apart from the “signed-ness” (e. g. ‘int’ vs. ‘unsigned int’).
3. The pointed-at types are identical apart from qualification (e. g. ‘const int’ vs. ‘int’).
4. The rule “an aggregate or union type that includes one of the aforementioned types among its members” is highly confusing and probably doesn’t mean much. Check this out for details.
5. The pointed-at types are different, but the pointed-at type through which the access is made is a pointer to character:


float f = 3.1415;
unsigned char* p = (unsigned char*) &f;
unsigned char a1 = p[0];   // First byte of 'f'.
unsigned char a2 = p[1];   // :
unsigned char a3 = p[2];   // :
unsigned char a4 = p[3];   // Last byte of 'f'.

float f = 3.1415;

unsigned char* p = (unsigned char*) &f;

unsigned char a1 = p[0]; // First byte of 'f'.

unsigned char a2 = p[1]; // :

unsigned char a3 = p[2]; // :

unsigned char a4 = p[3]; // Last byte of 'f'.

Conversely, aliased pointer access is not defined if the pointed-at types are fundamentally different. Note that this includes pointers to structs that are identically defined but have different tag names:


struct S1 { int x; }; // tag 'S1'.
struct S2 { int x; }; // tag 'S2'.

S1* s1;
S2 = *((S2*) s1);     // Undefined behavior!

struct S1 { int x; }; // tag 'S1'.

struct S2 { int x; }; // tag 'S2'.

S1* s1;

S2 = *((S2*) s1); // Undefined behavior!

CONCLUSION

The strict aliasing rule was introduced to give the compiler vendors some leeway regarding optimizations. By default, the compiler assumes that pointers to (loosely speaking) incompatible types never alias. As a consequence, you, the programmer, have to make sure that this rule is obeyed.

Here’s some disquieting news: a lot of existing code isn’t conforming to the strict aliasing rule, but the code works (or appears to work) fine anyway. As an example, the ‘convert’ function above, which aliases a struct to an array of bytes might work fine on an Intel x86-based platform, which supports unaligned memory access. However, if you use ‘convert’ on an (older) ARM-based platform, you might get a “bus error” exception that could crash your system. In other cases, nonconforming code just works by coincident, with a particular compiler, or a particular compiler version at a particular optimization level.

To me, knowing about the strict aliasing rule is as important for every systems developer as knowing about the other systems programming “secrets” like alignment, struct padding, and endianness.

A GCC Compiler Mistake

“Most of the evil in this world is done by people with good intentions.”
— T.S. Eliot

Errors, defects, bugs, blunders — when we talk about software-related errors, we often use terms loosely and synonymously — but there are differences. For instance, in his book “The Design of Everyday Things“, Donald A. Norman makes a clear distinction between “mistakes” and “slips”:

“Errors come in several forms. Two fundamental categories are slips and mistakes. Slips result from automatic behavior, when subconscious actions that are intended to satisfy our goals get waylaid en route. Mistakes result from conscious deliberations.”

In short: mistakes are the result of faulty ideas whereas slips are errors made when implementing an idea. Usually, slips are not just easy to make, but also easy to fix. Fixing mistakes is typically much harder.

One of the easiest slips to make in C/C++ is to inadvertently do a boolean test on an assignment expression:


if (a = b) {    // Oops! Should have been a == b.
    ...
}

if (a = b) { // Oops! Should have been a == b.

...

}

which is equivalent to:


if ((a = b) != 0) {
    ...
}

if ((a = b) != 0) {

...

}

While in some rare cases this is exactly what the developer had in mind, in 99% of all cases it’s not. Hence, boolean-testing assignments is explicitly banned by many C/C++ coding standards and frowned-upon by most developers.

But what’s all the fuzz about, you might ask. If an unlucky developer forgets to type the second ‘=’, any decent 21st century compiler surely generates a warning, doesn’t it? Well, the answer is, as we shall see, both, yes and no.

If you compile the example above with GCC (I’ve tried version 5.4.0) using options ‘-W -Wall’, you do get a warning:

warning: suggest parentheses around assignment used as truth value

GCC’s reasoning is this: if developers really wanted to truth test the assignment (there are still people out there who do, as strange as this may sound), they need to put an extra pair of parentheses around the assignment, to show their intend:


if ((a = b)) {    // Warning is gone.
    ...
}

if ((a = b)) { // Warning is gone.

...

}

Requiring an extra set of parentheses seems to be a neat idea, but it’s the devil in disguise. For one thing, it reminds me of Sledge Hammer saying “Trust me, I know what I’m doing” (which was usually entailed by disaster), for another, it doesn’t work reliably. In order to explain, I first need to put the same slip in a slightly more complicated expression:


if (a == b && c = d) {  // Should be c == d.
    ...
}

if (a == b && c = d) { // Should be c == d.

...

}

In this case, you not just get a warning, your compiler will refuse to compile this code. Why? According to C’s precedence rules, the assignment operator has lower priority than the ‘&&’ operator, which means that the code is equivalent to


if (((a == b) && c) = d) {  // Should be c == d.
    ...
}

if (((a == b) && c) = d) { // Should be c == d.

...

}

The C language standard says that the result of an ‘&&’ expression is a so-called “rvalue” and an rvalue is more or less read-only. Thus, assigning ‘d’ to it is just not possible and GCC is right when it barks:

error: lvalue required as left operand of assignment

A slip that doesn’t compile is a kind slip, you might think, but read on. We only got lucky by accident, so to speak.

Many coding standards, like MISRA, for instance, require that you put parentheses around subexpressions to clearly show what precedence you have in mind, instead of relying on obscure operator precedence rules. Hence, instead of


if (a == b && c == d)   // violates MISRA-C 2012, Rule 12.1: 
                        // "The precedence of operators within
                        //  expressions should be made explicit"

if (a == b && c == d) // violates MISRA-C 2012, Rule 12.1:

// "The precedence of operators within

// expressions should be made explicit"

you have to write


if ((a == b) && (c == d))   // MISRA compliant

if ((a == b) && (c == d)) // MISRA compliant

MISRA exists to make coding errors unlikely, but if a MISRA-abiding developer forgets the second ‘=’, he’s out of luck, at least if he’s using GCC:


if ((a == b) && (c = d))   // MISRA compliant, but a slip anyway...

if ((a == b) && (c = d)) // MISRA compliant, but a slip anyway...

Now the devil reveals himself: since the parentheses are properly placed, there is no attempt to assign to an rvalue, so there won’t be a compile-time error and because of GCC’s “parentheses feature” mentioned above, GCC doesn’t issue a warning, either.

Early in my career as a software developer, I read the aforementioned book “The Design of Everyday Things” and I believe it left a mark on me. One of the book’s unforgettable key lesson is this:

“When you have trouble with things — whether it’s figuring out whether to push or pull a door or the arbitrary vagaries of the modern computer and electronics industry — it’s not your fault. Don’t blame yourself: blame the designer.”

GCC’s “extra parentheses” feature is far from neat design — it’s rather bad design that doesn’t work in all contexts and gives developers a false sense of security. It was deliberately put in and correctly implemented but the idea was wrong from the outset. Thus, it’s not a slip, but obviously a mistake.

Approxion

Code – People – Everything

Category Archives: C/C++/Embedded

Pointers in C, Part V: The ‘restrict’ Qualifier

Boolean Arguments are BOOL-Shit

BBC micro:bit —The End of All Excuses for Not Learning Programming

Pointers in C, Part IV: Pointer Conversion Rules

Pointers in C, Part III: The Strict Aliasing Rule

A GCC Compiler Mistake