Zenhack.net

Fun With CI

05 Jan 2017

The other week I spent a fair chunk of time working on CI systems for a handful of different projects. Here’s the list:

This post is a reflection on my experiences. For the lazy, here are the takeaways:

So when I started, all three of these projects were primarily hosted on Github, and used Travis for their CI. Let’s look at each one case by case.

Simp_le

simp_le is a minimalist client for Let’s Encrypt. I started using Let’s Encrypt as a CA towards the end of their beta period, and while the notion of automating the whole setup & maintenance process is nice, I was disinclined to use the reference client for a few reasons:

Harlan pointed me to simp_le, which seemed much closer to what I wanted. I found and fixed a small bug (not a blocker), and used the tool to get myself a cert.

Fast forward a bit; roughly six months ago the maintainer became inactive on Github, and the last time I tried to renew my cert, the bitrot had set in and my cert would not renew. Good thing my calendar-fu had left me with a full month to sort out the mess.

There were already pull requests out fixing the bugs, and so I merged them into my fork, updated my cert, and posted a comment to one of the issues saying that I was planning on keeping it afloat. Somebody changed the simp_le link on the list of clients, and as of the time of writing 20 people or so have starred my fork on Github.

The fun thing about no activity for six months is that the CI also doesn’t run. So, when I got Travis running against my fork, it didn’t surprise me much to find it wasn’t working anymore.

Getting it working again took some trial and error, and along the way I made the realization (which in retrospect seems obvious) that working on a throwaway branch made some sense. You’re bound to generate some messy history if the only way you have to test a thing is to commit and push.

The integration tests involve setting up Boulder, which had changed out from under simp_le. I ended up scrapping the logic for setting it up, and used the same Docker container that the Boulder CI uses.

HIL

The bug I was fighting with in HIL turned out not to be specific to Travis; basically, tests were insufficiently isolated from one another, and because it was easy locally to only run the tests I was having trouble with, but less easy on Travis, I mistakenly believed I couldn’t reproduce it locally. I did a similar thing using a dedicated debugging branch, and along the way committed this atrocity to get an interactive debugging session going in Travis. I felt proud and ashamed at the same time.

Layout DSL

The prototype implementation of my layout DSL is written in haskell, and I set up a Travis config to test against a couple different versions of GHC. Or so I thought.

As it turns out Travis doesn’t have anything more recent than GHC 7.8, which was the oldest version on my list. To make matters worse, rather than signaling an error when you specify a version of the compiler that it can’t provide, it runs your tests with some other version. Wat. So I was running two or three identical jobs, since neither 7.10 or 8.0 were available.

I spent a bit of time trying to figure out how to get later versions of GHC running on Travis, and, finding most of the scripts I came across more complex than I was really comfortable with, just said “Screw it! I’m switching to Gitlab!” Gitlab has its own integrated CI system, and so far I much prefer it to Travis. One shiny thing: unlike Travis, where all of your builds are going to run in some terribly outdated version of Ubuntu, you can specify an arbitrary Docker image from Docker Hub. On a per-job basis. Docker images with various versions of the Haskell toolchain are easy to find, so that solved the problem.

Once I got the Gitlab CI working, I deleted the old Travis config, tweaked the README to specify that the Github repo was a mirror of the one on Gitlab, imported the issues/wiki (Gitlab can just do this, yay), and turned off the issue tracker/wiki on Github.

Then I got an email from Travis saying my build was failing. Huh? It shouldn’t be running, I deleted the config.

Turns out if you don’t have a .travis.yml, Travis assumes your project is Ruby.