Zenhack.net

Kerckhoffs's Principle and Machine Learning

19 Feb 2017

I recently stumbled across an article entitled Attacking machine learning with adversarial examples. It’s a good article, and well worth the read (Go read it; I’ll wait).

Towards the end of the article the author writes:

Every strategy we have tested so far fails because it is not adaptive: it may block one kind of attack, but it leaves another vulnerability open to an attacker who knows about the defense being used. Designing a defense that can protect against a powerful, adaptive attacker is an important research area.

(Emphasis mine). In other words, none of these defenses heed Kerckhoffs’s principle, which was first articulated in the 1800s:

A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.

Kerckchoffs was talking specifically about cryptography, but still.

To someone with at least a passing familiarity with modern computer security, the proposed defenses seemed laughable at the outset.

So, how do we do better? That’s a really hard question. And, unfortunately, most of my own modest acquired wisdom about how to do security gives me the gut instinct “Don’t use machine learning techniques in adversarial contexts.” One might suggest that this amounts to just giving up, and that’s probably correct, but I bring it up because I think there’s what at first blush seems like a really fundamental conflict:

The standard approach to dealing with weird, potentially malicious inputs in other contexts is to just shortcut the problem: don’t use heuristics, just do something that always works.

Let’s look at an example:

Exploits of a Mom

The classic XKCD comic depicts a SQL injection vulnerability. These are depressingly common, especially given how preventable they are. All we need is a function that takes the student name and maps it to something that SQL will always treat as a string, never as more commands. This amounts to putting backslashes in front of a handful of special characters.

If you’re working with a modern web framework, you don’t really even have to think about all this, since someone’s already packaged up that function, and built on top of it an interface to the database that doesn’t require you to glue strings together in the first place. If you’re using proper tools, SQL injection vulnerabilities are not just preventable, they’re impossible.

The vast majority of security vulnerabilities are things like this. They’re usually just stupid mistakes. We should be better about building (and using) tools that just rule these out entirely, but the problems there are mainly social, not technical.

Can we just take a similar approach to solving the machine learning problem? Probably not; if we had such a clear, concise idea of what our inputs were supposed to mean, we probably wouldn’t be bothering with machine learning in the first place.

So yeah, this is a hard problem, and I don’t have great answers. I do have some thoughts, however.

Firstly, don’t define the problem in terms of how the attacks work. Define it in terms of guarantees the system must provide. Basic statements of goals rarely talk about details of techniques attackers might use. They’ll say things like “the attacker can intercept, view, and modify messages in transit” and “the attacker must not be able to gain non-negligible information about the plaintext.” The how generally shouldn’t go in the problem statement, and the article seems to suggest trying to figure out a way to put it there. This seems like a mistake.

As an example, here’s a stab at an informal definition of a security property you might want a system managing self-driving cars to have:

It’s wordy as hell, and it still doesn’t make the problem easy. But at least we have a sense of what we’re trying to achieve, which is a step beyond what the article gives us.

The second thought is to consider the whole system, not just a particular subsystem. And when I say the whole system I really mean the whole thing:

Bugs happen. Even if you do have the luxury of a mathematical proof that your approach solves the problem perfectly, you still have to weigh the possibility that someone is going to screw up the actual building of the thing. The most tractable approach I’ve seen with dealing with this sort of thing is “defense in depth” or “don’t put all your eggs in one basket.” Sure, if the rest of the program is designed such that a particular function should never be called with that invalid argument, it should be okay to just assume the input is good. Check it anyway, if you can.

Sandstorm has a list of security non-events — places where some piece of software has a vulnerability, and it turns out to not be very useful to an attacker, because other parts of the system limit its impact. We should be doing this everywhere.

The article brings up a really hard, really important problem. Machine learning, and artificial intelligence more broadly, are not my specific area of expertise. But, I see much hard-earned wisdom from other disciplines that could be drawn upon, so the “defenses” that the article discusses are somewhat saddening.