–Mark Twain
Buffer overflows are among the most frequent causes of security flaws in software. They typically arise in situations such as when a programmer is 100% certain that the buffer to hold a user’s name is big enough — until a guy from India logs in. Thus, well-behaved developers always use the bounded-length versions of string functions. Alas, they come with differing, dangerously confusing interfaces.
THE GOOD
Let’s start with ‘fgets‘:
1 2 3 4 |
char buffer[30]; /* 30 bytes ought to be enough for everyone! */ fgets(buffer, sizeof(buffer), stdin); |
No matter what users type into their terminals, ‘fgets’ will ensure that ‘user_name’ is a well-formed, zero-terminated string of at most 29 characters (one character is needed for the ‘\0’ terminator). The same goes for the ‘snprintf‘ function. After executing the following code
1 2 3 4 |
char buffer[4]; snprintf(buffer, sizeof(buffer), "The quick brown fox"); |
‘buffer’ will contain the string “The”, again, properly zero-terminated.
Both functions follow the same, easy-to-grasp pattern: you pass a pointer to a target buffer as well as the buffer’s total size and get back a terminated string that doesn’t overflow the buffer. Awesome!
THE BAD
In order to copy strings safely, developers often reach for ‘strncpy‘ to guard themselves against dreaded buffer overruns:
1 2 3 4 |
char buffer[30]; /* 30 bytes ought to be enough for everyone! */ strncpy(buffer, user_name, sizeof(buffer)); /* safer than good ol' strcpy? */ |
Unfortunately, this is not how ‘strncpy’ works! We assumed that it followed the pattern established by ‘fgets’ and ‘snprintf’ but that’s not the case. Even if ‘strncpy’ promises that it never overflows the target buffer, it doesn’t necessarily zero-terminate it. What it does is copy up to ‘sizeof(buffer)’ bytes from ‘user_name’ to ‘buffer’ but if the last byte that is copied is not ‘\0’ (i. e. ‘user_name’ comprises more than ‘sizeof(buffer)’ characters), ‘strncpy’ leaves you with an untermiated string! A traditional approach to solve this shortcoming is to enforce zero-termination by putting a ‘\0’ character as the last element of the target buffer after the call to ‘strncpy’:
1 2 3 4 |
strncpy(buffer, user_name, sizeof(buffer)); buffer[sizeof(buffer) - 1] = '\0'; |
Using ‘strncpy’ without such explicit string termination is almost always an error — a rather insidious one, as your code will work most of the time but not when the buffer is completely filled (i. e. your Indian colleague “Villupuram Chinnaih Pillai Ganesan” logs on).
Boy, oh boy is this inconsistent! ‘fgets’ and ‘snprintf’ give you guaranteed zero-termination but ‘strncpy’ doesn’t. A clear violation of the principle of least surprise. Apparently, ‘strncpy’ fixes one safety problem and at the same time lays the foundation for another one.
THE UGLY
Can it get worse? You bet! How do you think ‘strncat‘, the bounded-length string concatenation function, behaves? Ponder this code:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
const char* string1 = "123"; const char* string2 = "4567890"; char buffer[7]; /* First, safely fill buffer with string1. */ strncpy(buffer, string1, sizeof(buffer)); buffer[sizeof(buffer) - 1] = '\0'; /* Next, concatenate strings. */ strncat(buffer, string2, sizeof(buffer)); |
But this is wrong, of course: the third argument to ‘strncat’ (let’s call this argument ‘n’) is not the size of the target buffer. It is the maximum number of characters to copy from the source string (‘string2’) to the destination buffer (‘buffer’). If the length of the source string is greater or equal to ‘n’, ‘strncat’ copies ‘n’ characters plus a ‘\0’ to terminate the target string. Confused? Don’t worry, here’s how you would use it to avoid concatenation buffer overruns:
1 2 3 4 |
strncat(buffer, string2, sizeof(buffer) - strlen(buffer) - 1); // -1 to account for '\0'. |
Yuck! What’s the likelihood that people remember this correctly?
THE REMEDY
Even if the different interfaces and behaviors of the bounded-length string functions in the C API make sense for certain use cases (or made sense at some point in time), the upshot is that they confuse programmers and potentially lead to new security holes when in fact they were supposed to plug them. What’s a poor C coder supposed to do?
As always, you can roll your own versions of bounded/safe string functions or use my safe version of ‘strcpy’. If you rather prefer something from the standard library, I’d suggest that you use ‘snprintf’ as a replacement for both, ‘strncpy’ and ‘strncat’:
1 2 3 4 5 6 7 |
/* Safe replacement for 'strncpy' */ snprintf(buffer, sizeof(buffer), "%s", string1); /* Safe replacement for 'strncat' */ snprintf(buffer, sizeof(buffer), "%s%s", string1, string2); |
Looks like ‘snprintf’ is the swiss army knife of safe string processing, doesn’t it? The moral is this: use whatever you’re comfortable with, but refrain from using ‘strncpy’ or ‘strncat’ directly.