The other week I spent a fair chunk of time working on CI systems for a handful of different projects. Here’s the list:
- simp_le, which I adopted not long ago, and had 6 months of bitrot to contend with.
- HIL, one of the projects I work on to pay the bills.
- layout-dsl, one of my side projects.
This post is a reflection on my experiences. For the lazy, here are the takeaways:
- Debugging stuff that only shows up in CI has some interesting implications.
- Ug, Boulder.
- Python lets me do clever things that I am both proud and ashamed of at the same time.
- Gitlab CI is so much better than Travis.
So when I started, all three of these projects were primarily hosted on Github, and used Travis for their CI. Let’s look at each one case by case.
simp_le is a minimalist client for Let’s Encrypt. I started
using Let’s Encrypt as a CA towards the end of their beta period, and
while the notion of automating the whole setup & maintenance process is
nice, I was disinclined to use the reference client for a few reasons:
- I was already automating most of my server’s setup with ansible, and didn’t want to have to fuss with more than one robot doing sysadmin stuff for me.
- I was (and still am) using Nginx for a web server, and at the time only Apache was supported by the auto-configuration magic.
- I knew how to configure a web server for https already, so I didn’t need a tool to set up my server for me (and certainly not two).
- I am very proficient with a calendar, so I’m not worried about forgetting to renew a cert.
- More software means more stuff that can fail, and in light of the past several points, a tool so complex didn’t strike me as a good complexity to benefit trade-off.
Fast forward a bit; roughly six months ago the maintainer became inactive on Github, and the last time I tried to renew my cert, the bitrot had set in and my cert would not renew. Good thing my calendar-fu had left me with a full month to sort out the mess.
There were already pull requests out fixing the bugs, and so I merged
them into my fork, updated my cert, and posted a comment to one of the
issues saying that I was planning on keeping it afloat. Somebody changed
simp_le link on the list of clients, and as of the time of
writing 20 people or so have starred my fork on Github.
The fun thing about no activity for six months is that the CI also doesn’t run. So, when I got Travis running against my fork, it didn’t surprise me much to find it wasn’t working anymore.
Getting it working again took some trial and error, and along the way I made the realization (which in retrospect seems obvious) that working on a throwaway branch made some sense. You’re bound to generate some messy history if the only way you have to test a thing is to commit and push.
The integration tests involve setting up Boulder, which had
changed out from under
simp_le. I ended up scrapping the logic for
setting it up, and used the same Docker container that the Boulder CI
The bug I was fighting with in HIL turned out not to be specific to Travis; basically, tests were insufficiently isolated from one another, and because it was easy locally to only run the tests I was having trouble with, but less easy on Travis, I mistakenly believed I couldn’t reproduce it locally. I did a similar thing using a dedicated debugging branch, and along the way committed this atrocity to get an interactive debugging session going in Travis. I felt proud and ashamed at the same time.
The prototype implementation of my layout DSL is written in haskell, and I set up a Travis config to test against a couple different versions of GHC. Or so I thought.
As it turns out Travis doesn’t have anything more recent than GHC 7.8, which was the oldest version on my list. To make matters worse, rather than signaling an error when you specify a version of the compiler that it can’t provide, it runs your tests with some other version. Wat. So I was running two or three identical jobs, since neither 7.10 or 8.0 were available.
I spent a bit of time trying to figure out how to get later versions of GHC running on Travis, and, finding most of the scripts I came across more complex than I was really comfortable with, just said “Screw it! I’m switching to Gitlab!” Gitlab has its own integrated CI system, and so far I much prefer it to Travis. One shiny thing: unlike Travis, where all of your builds are going to run in some terribly outdated version of Ubuntu, you can specify an arbitrary Docker image from Docker Hub. On a per-job basis. Docker images with various versions of the Haskell toolchain are easy to find, so that solved the problem.
Once I got the Gitlab CI working, I deleted the old Travis config, tweaked the README to specify that the Github repo was a mirror of the one on Gitlab, imported the issues/wiki (Gitlab can just do this, yay), and turned off the issue tracker/wiki on Github.
Then I got an email from Travis saying my build was failing. Huh? It shouldn’t be running, I deleted the config.
Turns out if you don’t have a
.travis.yml, Travis assumes your project