Gotchas: Difference between revisions

Line 203:
 
As using <code>sizeof</code> on a memory address returns the number of bytes that the CPU's instruction pointer register can hold, you'll get the same return value from <code>gotcha</code> regardless of how many elements the array has. When C programmers talk about "arrays decaying into pointers" this is what they're referring to.
 
===gets()===
Computer science teachers (and even most compilers) will tell you to never use <code>gets()</code>. This function takes a pointer as an argument, and will take user input and store it at that memory address. What makes this function infamous and unsafe, however, relates to the above section. <code>gets()</code> only takes a pointer to a <code>char</code> array as an argument, and doesn't have any information about the size of the array it's writing to. The function continues to copy the user's input until the user has finished (i.e. until the Enter key is pressed). Since the user can type without limit, and the function doesn't know the size of the array, any input in excess of the array's size will overwrite whatever is in memory after the array. Since the default behavior of C is to allocate memory on the hardware stack, this can lead to overwriting a function's return address, known as a buffer overflow exploit, allowing a hacker to use <code>gets()</code> as a means to call any function in the program provided the hacker knows the function's address and the endianness of the CPU architecture.
 
===printf()===
The first function every C programmer learns (besides <code>main</code>), <code>printf</code> can be exploited in a similar fashion as <code>gets()</code>, but only if the programmer is irresponsible. <code>printf</code> can theoretically take any number of arguments; however there is no CPU that can actually support variadic functions in hardware (in the sense that the CPU knows how many arguments are passed into it without cheating (e.g. using a variable that holds the number of arguments as in <code>int argc, char** argv[]</code>).
 
The way this was pulled off is with a dirty trick: the format string. Every time a <code>%</code> is encountered in the format string, <code>printf()</code> will accomplish the substitution using the next function argument, which depending on the calling convention starts off using registers and then pulls the rest from the stack. This isn't a problem as long as you never let the user write the format string. If the format string has more unescaped <code>"%"</code>s than there are arguments, <code>printf()</code> will read from the stack and assume whatever is there are the "missing" arguments. This lets a malicious user see the program's function call history which can be useful in figuring out other ways of exploiting the program.
 
<syntaxhighlight lang="C">int main()
{
int x = 3;
int y = 5;
int z = 7;
printf("%d %d %d %x %x",x,y,z); //on an Intel cpu the first %x reveals %%ebp and the second reveals the return address.)
}</syntaxhighlight>
 
=={{header|J}}==
1,489

edits