Last time I wrote about an ABI pitfall involving callee vs. caller saved registers. This time, I’ll talk about a particularly nasty bug involving the difference between call by value and call by reference.
Review
I expect most readers will already have a handle on the difference between call by value and call by reference, and will be more interested in the bug that is this subject of this post. If you fall into this category, you may want to skip this section; It’s just a review of that material.
Call by value vs. reference concerns the way parameters are passed to subroutines. With call by value, the subroutine gets it’s own private copy of the data. In contrast, with call by reference, the subroutine is provided with the location (reference) of the original data. The most important consequence is that with call by value, modifications to the data are not seen by the caller, whereas with call by reference they are. Consider the following C program:
#include <stdio.h>
int subroutine(int x) {
x++;
return x * 2;
}
int main(void) {
int y = 4;
printf("Before call : %d\n", y);
printf("Return value : %d\n", subroutine(y));
printf("After call : %d\n", y);
return 0;
}
First, we print the value of y
, which is 4
. Then, we pass y
to our subroutine
which first increments it’s argument, then returns twice the new value. We then print the result, which is 10
((y+1) * 2
).
We then print y
again, which yields its original value of 4
. The complete output is:
Before call : 4
Return value : 10
After call : 4
C uses call by value for parameter passing, but it also has pointers, which we can use to achieve call by reference. Let’s modify our program as follows:
#include <stdio.h>
int subroutine(int *x) {
(*x)++;
return (*x) * 2;
}
int main(void) {
int y = 4;
printf("Before call : %d\n", y);
printf("Return value : %d\n", subroutine(&y));
printf("After call : %d\n", y);
return 0;
}
It does almost the same thing, but now we give the subroutine a pointer to y
, instead of y
itself. The output is as follows:
Before call : 4
Return value : 10
After call : 5
This time, the increment inside the call is reflected int the caller, and so after the call y
is equal to 5
.
The Bug
The bug in question is something I hit in one of my early attempts at x86 interrupt handling code. The interrupt handlers themselves were short snippets of assembly which pushed the interrupt number on to the stack (along with a dummy error code, for the interrupts that didn’t supply them), and then jumped to a stub which looked something like this:
isr_stub:
/* save the general purpose registers */
pusha
/* save the old data segment */
movw %ds, %ax
pushl %eax
/* load the kernel data segment: */
movw $0x10, %ax
movw %ax, %ds
movw %ax, %es
movw %ax, %fs
movw %ax, %gs
cld
call int_handler_common
/* clean up the stack and restore the old data segment. */
popl %eax
popl %eax
movw %ax, %ds
movw %ax, %es
movw %ax, %fs
movw %ax, %gs
popa
addl $8, %esp
iret
It saves the registers and the current data segment, calls the interrupt handler written in C, and then restores them.
When a fault happens inside an interrupt handler, the x86 invokes the “double fault handler.” If a fault happens again there, the machine simply reboots, which is called a triple fault. I was getting triple faults.
After some time examining things in the Bochs emulator, I suspected I’d made some kind of mistake setting up the GDT, which was why restoring the data segment was giving me trouble. I believed this because the value that ended up in the segment registers at the end of the stub was the kernel code segment, rather than the data segment. I spent the better part of a day working under that assumption, which turned out to be a complete red herring.
The problem was, the C handlers were actually set up to accept a struct containing the information I’d saved on the stack (registers, data segment, and interrupt info). Because C uses call by value, the C routine expects that it has its own private copy of the arguments passed to it. It was therefor invalid for my assembly code to expect it to be unchanged upon return.
Note that this is true even though my C routine didn’t as written modify its argument at all. The C compiler can still take advantage of the calling convention to do optimization. Sure enough, everything worked fine if I compiled with no optimizations. It even worked fine with just -O1
(The most basic optimizations). With -O2
and above however - triple faults.
When I hit this, I narrowed the problem down to one particular optimization flag, which turned on tail call optimization. Tail call optimization is a technique used to save stack space in the event that a function calls another function immediately before returning. For example, consider:
int foo(int);
int bar(int x) {
return foo(x * 2);
}
int baz(int x) {
return 1 + foo(x);
}
In this case, both bar
and baz
make calls to foo
, but only bar
makes a tail call. Once the result of foo(x * 2)
is obtained, bar
doesn’t need to do anything else. baz
however, must take the result of foo(x)
and add 1 before returning.
Normally, when you make a function call, you allocate a new stack frame - whatever space you need for arguments and local variables is reserved on the stack. Tail call optimization looks for tail calls, and where it finds them, just reuses the existing stack frame for the subroutine call. This is fine since the local variables and arguments to (in our example) bar
won’t be needed after foo
returns.
My interrupt handler written in C was doing a tail call, and so with that optimization turned on, the C compiler over-wrote the stack frame, trashing the data segment I’d incorrectly stored.
This took me the better part of two days to figure out way back when I hit it. In a more recent descendant of this code, I passed in a pointer to the saved data, and modified the C handler accordingly - The compiler still clobbers the handler’s stack frame, but now the data I’m interested in is safely outside of it.