Zenhack.net

Cross Compiling: Why Is This Hard? (Part 1)

21 Nov 2015

When I started writing on this topic, I expected to be able to say what I wanted to say in one post. That didn’t happen, so it’s going to be published in a series of posts. The TL;DR for the whole series is: Cross compilation should be trivial, but instead it’s a giant cluster-fuck. It’s one of the most extreme examples of Murphy’s law I’ve seen.

Part 1 defines what cross compilation is, and presents a normative description of how it should work. Subsequent posts in the series will be more upsetting.

Terminology

Cross compilation is the act of compiling software using a computer with a different CPU architecture or operating system from the computer on which the software will run. In this series of articles, I call the computer running the software the “host,” and the computer compiling it the “build machine.” When the architectures and operating systems of the host and build machines are the same, I call it “native compilation.”

The Hope

Cross compilation should be trivial. On Plan 9 From Bell Labs, native compilation is a special case of cross compilation. When compiling a program, there a few things that are (almost) always necessary:

On Plan 9, these are all neatly namespaced per-host. If you’re compiling for 32-bit x86, the commands are:

Compiled libraries for the host are located in /386/lib, and headers are in /386/include (if they’re architecture dependant, which not many are), or /sys/include (if they’re not).

For arm, it’s not very different; the commands are 5c, 5l, and 5a, the compiled libraries are in /arm/lib, and the platform-dependant headers are in /arm/include.

The compilers and linkers know about the header and library paths, so we just need to make sure we call the right commands. This is easy enough, and the build systems look at the environment variable $objtype, which from which it derives the command names. To compile and install for x86:

objtype=386 mk all install

To compile for arm:

objtype=arm mk all install

Everything just works. The process for building the tools themselves is pretty much the same, so there’s not much to speak of there. Plan 9’s toolchain doesn’t target any other operating systems, but the Go programming language’s toolchain, which is based on the plan 9 toolchain, does, and it’s a pretty straightforward extension: instead of $objtype you have $GOOS and $GOARCH, and there are directories for linux_amd64, linux_arm, darwin_386, and so on. It’s a breath of fresh air: if I’m on my x86-64 Linux laptop, and I want to build a 32-bit Mac OS X binary, I can just do:

GOARCH=386 GOOS=darwin go build

…and I’m done. What’s more, my program will actually run on the host system.

Next Time

The next post in this series will focus on the process of getting a C/C++ cross compiler up and running in a more mainstream Unix-like environment. We will see that this is itself non-trivial, and explore what went wrong.