Know thy ABI - Part 1

01 Oct 2013

Mixing C and assembly can result in some truly nasty bugs. Some of the more obvious things that can go wrong arise when you violate the calling convention used by the C compiler. Any C compiler will have a set of rules governing things such as how arguments are passed in to a function, where the return value is stored, how to return from a function, and so on.

In my next few posts, I plan to write about some of the ways in which I’ve screwed up calling conventions in the past.

Caller vs. Callee Saved Registers - A Brief Primer.

One thing that every calling convention will specify is which registers are caller-saved and which registers are callee-saved. If you have some C code like:

int funcA(void) {
    int var1 = ...;

The value of var1 must be stored somewhere - usually either on the stack, or in a register. If var1 is stored in a caller-saved register, this means that, when funcA (the “caller”) calls funcB (the “callee”), funcB has no obligation to make sure the value in this register is the same when it returns. If the caller needs the value later, then the caller is responsible for saving the value somewhere else - typically on the stack.

If var1 is stored in a callee-saved register, then the callee (funcB) is responsible for preserving the value of that register - If funcB wants to use the register, it must first save it, and then restore the original value before returning. funcA may assume that the value is unchanged after the call.

How I screwed it up

I’m the kind of person who’s idea of fun includes things like writing my own operating system, and doing so necessarily involves writing some assembly - especially when interfacing directly with some piece of hardware. A good example of this is the x86 archetecture’s cpuid instruction - which queries the processor for various information such as whether or not it has an APIC, whether it supports extensions like hyperthreading or various SIMD instructions, etc. By virtue of being a specialized machine instruction, it can’t be used directly in C; instead one can either use inline assembly, or just do what I did: write a function in assembly that acts as a wrapper for the instruction.

The cpuid instruction stores its results in four of the general purpose registers: eax, ebx, ecx, and edx. My assembly function was a very thin wrapper around this; it took a pointer to a struct contianing four fields of the appropriate size, and simply filled them in from the corresponding registers.

Under gcc’s calling convention, three of those registers, eax, ecx, and edx are caller-saved. ebx is not - but my original implementation didn’t save it.

You’ll notice in the commit message I say “memcpy seems to have issues with -O2”. Wait, how did memcpy get involved? Weren’t we talking about cpuid? Yes, we were, and in fact memcpy isn’t at fault here, though at the time that was my guess.

There’s a lesson here that’s not the immediate subject of this post: Write small, self contained pieces, and test them individually before using them. In this case, I’d implemented the cpuid wrapper, and was looking for a way to test it. The obvious solution was to just print out some of the values obtained, for which I decided I needed an implementation of memcpy. What I should have done at this point was set aside the cpuid code, implement and test memcpy, and only when I was satisfied with that, go on to use it to test other parts of the code. Alas, in this instance laziness prevailed, and I ended up with a bug in “one of these things I just wrote.” Kernel bugs are bad enough when you’re being diligent.

When I tested the kernel, the following output showed up on the serial line:

A wild breakpoint appeared!
Hello, World!
Mboot info at : 0x9500, high-memory: 0x7f00000
kernel end at : 0x10882c

And then it just stopped. There was no further output. My working theory was that for some reason, the data that cpuid had given me wasn’t being copied into the string I’d allocated correctly, and that was why I wasn’t getting output.

My working theory was, of course, completely wrong. The real problem was that the cpuid routine was trashing the ebx register, which presumably was being used for something important.

The fact that the problem only occurred with optimizations on should have been a red flag that I’d messed up the ABI somehow. The fix, once I realized what was wrong, was fairly simple: Just save ebx and restore it, like the calling convention requires.

I plan to write about a few more issues like this. Next time I’ll talk about an ABI bug that took me almost two days to figure out.