Update Re: Backups

01 Nov 2015

Last week I wrote about my backup system. I had an initial import into Camlistore running at the time. Camlistore choked on being asked to import 500 GiB of data all at once. I also have a hard time imagining it will be able to preform as well as I’d like it to — It’s going to check the hash of every file on every pass, which sets a lower bound on how fast it can be. My existing backup tool can just trust modification times, which provides a huge speedup.

A friend pointed out that empty files might be a special case, and that other files might not be likely to have the same problem. This had crossed my mind, but having not really dug too deeply into solving the problem, I hadn’t thoroughly considered it. I did some measurements, and while there were a couple of other files that had large numbers of links, the count dropped off very quickly with the size of the file. As such, I’ve decided to just patch the existing tool to make whole copies of files below a certain size, which should solve the problem.

It had been a while since I’d worked with the code for said tool. It was the first Haskell program I’d written that really saw significant use, and it shows. I did some refactoring, and started working on a test suite. With something as important as my backup system, I didn’t want make significant changes without tests, but it had been too long since I’d actually done a backup, so instead of doing a quick fix and running it the one time, I just made a backup with rsync. I have space for a few of those, so it should do fine for an interm solution.

I’m digging into QuickCheck, and finding it pretty neat. I have some things related to testing that I want to write about, so I’ll hold off on going too deep into that for now. I’ll keep posting about this as there’s more to say. I hope to make the tool something other people can use at some point, but that’s down the line a bit; first I need to get it back to a state where I can use it.