Mixing C and assembly can result in some truly nasty bugs. Some of the more obvious things that can go wrong arise when you violate the calling convention used by the C compiler. Any C compiler will have a set of rules governing things such as how arguments are passed in to a function, where the return value is stored, how to return from a function, and so on.
In my next few posts, I plan to write about some of the ways in which I’ve screwed up calling conventions in the past.
Caller vs. Callee Saved Registers - A Brief Primer.
One thing that every calling convention will specify is which registers are caller-saved and which registers are callee-saved. If you have some C code like:
int funcA(void) {
int var1 = ...;
funcB();
...
}
The value of var1
must be stored somewhere - usually either on the
stack, or in a register. If var1
is stored in a caller-saved register,
this means that, when funcA
(the “caller”) calls funcB
(the
“callee”), funcB
has no obligation to make sure the value in this
register is the same when it returns. If the caller needs the value
later, then the caller is responsible for saving the value somewhere
else - typically on the stack.
If var1
is stored in a callee-saved register, then the callee
(funcB
) is responsible for preserving the value of that register -
If funcB
wants to use the register, it must first save it, and then
restore the original value before returning. funcA
may assume that the
value is unchanged after the call.
How I screwed it up
I’m the kind of person who’s idea of fun includes things like writing
my own operating system, and doing so necessarily involves
writing some assembly - especially when interfacing directly with some
piece of hardware. A good example of this is the x86 archetecture’s
cpuid
instruction - which queries the processor for various
information such as whether or not it has an APIC, whether it supports
extensions like hyperthreading or various SIMD instructions, etc. By
virtue of being a specialized machine instruction, it can’t be used
directly in C; instead one can either use inline assembly, or just do
what I did: write a function in assembly that acts as a wrapper for the
instruction.
The cpuid
instruction stores its results in four of the general
purpose registers: eax
, ebx
, ecx
, and edx
. My assembly function
was a very thin wrapper around this; it took a pointer to a struct
contianing four fields of the appropriate size, and simply filled them
in from the corresponding registers.
Under gcc’s calling convention, three of those registers, eax
, ecx
,
and edx
are caller-saved. ebx
is not -
but my original implementation didn’t save it.
You’ll notice in the commit message I say “memcpy seems to have issues
with -O2”. Wait, how did memcpy
get involved? Weren’t we talking about
cpuid
? Yes, we were, and in fact memcpy
isn’t at fault here, though at
the time that was my guess.
There’s a lesson here that’s not the immediate subject of this post:
Write small, self contained pieces, and test them individually before
using them. In this case, I’d implemented the cpuid
wrapper, and was
looking for a way to test it. The obvious solution was to just print out
some of the values obtained, for which I decided I needed an
implementation of memcpy
. What I should have done at this point was
set aside the cpuid
code, implement and test memcpy
, and only when
I was satisfied with that, go on to use it to test other parts of the
code. Alas, in this instance laziness prevailed, and I ended up with a
bug in “one of these things I just wrote.” Kernel bugs are bad enough
when you’re being diligent.
When I tested the kernel, the following output showed up on the serial line:
A wild breakpoint appeared!
Hello, World!
Mboot info at : 0x9500, high-memory: 0x7f00000
kernel end at : 0x10882c
Ok!
Cpu:
And then it just stopped. There was no further output. My working theory
was that for some reason, the data that cpuid
had given me wasn’t
being copied into the string I’d allocated correctly, and that was why I
wasn’t getting output.
My working theory was, of course, completely wrong. The real problem was
that the cpuid
routine was trashing the ebx
register, which
presumably was being used for something important.
The fact that the problem only occurred with optimizations on
should have been a red flag that I’d messed up the ABI somehow. The
fix, once I realized what was wrong, was fairly simple: Just save
ebx
and restore it, like the calling convention requires.
I plan to write about a few more issues like this. Next time I’ll talk about an ABI bug that took me almost two days to figure out.