Making Iron Blogger Robust

15 May 2016

This week’s post is another status update on Iron Blogger development. Since the last update, I’ve closed two issues:

With these in place, I will be much more comfortable making bigger changes going forward. We’ve had numerous bugs related to date and time handling. A lot of these have had really obvious symptoms like negative debts or multiple deadlines per week. A couple of them have gone a few weeks between manifesting and becoming obvious though, and that was frankly scary. Handling money is not the kind of thing that is allowed to have bugs.

The fix for #59 means that most of these kinds of bugs, if they are introduced at all, will most likely manifest as loud error messages when loading pages, instead of subtle discrepancies in the output. I’m much more okay with that kind of bug.

If you look at the comments on #13, you’ll note I put 90% test coverage as a metric for when we could close the bug. At the time, I knew it was somewhat arbitrary, but I don’t like tasks that don’t have concrete deliverables. When I first opened the bug, it was a case of “wow, I’m embarrassed, our test coverage should at least be not totally pathetic.” Getting flattering numbers from code coverage tools is a low bar, but we weren’t even making that until recently.

When I decided to tackle #13, I was actually deeply concerned about catching bugs. I’ve been winnowing down the issues list to the point where there aren’t that many things on it that are trivial and non-intrusive. I wasn’t going to be satisfied having a number to point at that wasn’t embarrassing. I wanted to have some confidence that I was actually being fairly thorough.

I’ve written in the past about my attitude towards testing, and in particular how important it is to keep the tests maintainable. At the time, I gave some very high-level advice on how to do that. #13 provides an opportunity to be a bit more concrete. Here’s what I did:

  1. I wrote a routine that crawls then entire site, looking for pages that don’t load. This includes dead links, but any status code other than success or a redirect will get flagged as a failure.
  2. For randomized testing, I integrated pytest-quickcheck and wrote some routines for generating random database contents.
  3. I added tests that utilize both (1) and (2).

That covers a lot of ground. You’re exercising most of the site with a lot of inputs. The fun part with randomized testing is that you can tweak the number of inputs; the default is set relatively small, so that running the test suite locally is quick, but our (new) travis config cranks that number up, so when you push or submit a pr, it gets hammered on a bit more. Some shops will use this trick to do things like run days worth of tests when it comes time to do a release.

The funny thing is, it didn’t do that much for our “code coverage;” I spent a far amount of time on this, and coverage went up maybe 3%. If I were using the coverage tool as my primary metric, this wouldn’t have been worth the effort. However, it did catch a couple bugs, which have been fixed.

I added similarly thorough tests for areas that just weren’t covered at all, which got the coverage numbers over the somewhat arbitrary 90% metric, so the bug has been closed.

The fix for #59 actually introduced/exposed a new issue, #60: Admin interface should disallow entering illegal values for party duedates. There were no known symptoms of this before, whereas now with the wrong database contents we’ll get an assertion error when loading the ledger. I spent a bit of time fussing with Flask-Admin, trying to get the admin interface to filter the inputs correctly, and decided not to deal with that right now. I’ll be inputting this coming week’s party with a python shell.

I’d been wanting to revamp the admin interface for a while, and this makes it something of a priority. Issue #61 is tracking this.