Reflections on trusting trust

Reflections on Trusting Trust Ken Thompson, 1984 (Turing Award Lecture)

Another Turing Award lecture to close out the week, this time from Ken Thompson who asks:

To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.

It’s a short read (only 3 pages), and Thompson leads you gently step by step to one of those “oh *!$&#" moments.

Stage 1 – Self-reproducing programs

We begin by learning what college kids used to do for fun way back when… trying to write the smallest possible self-reproducing programs. Such programs are also called quines, and there’s a good summary on wikipedia.

Imagine a quine written in C (Thompson gives an example in the paper, I found a good collection here). It’s a program that creates a program.

Stage 2 – Learned behaviour

A more sohpisticated example of a program that creates a program is a compiler, say the (a) C compiler. The C compiler also happens to be written in C. Consider the humble \n escape sequence in a C string, for example “Hello world”. The C compiler can compile this in a portable way to generate the appropriate character code for a newline in any character set. The code of the compiler to process these escapes might look like this:

c = next();
if (c != '\\') /* not a backslash... */ return (c);
c = next();
if (c == '\\') /* literal backslash...*/ return('\\');
if (c== 'n') /* newline...*/ return ('\n');
...

Notice that the escape characters (e.g ‘’) are themselves used to tell the compiler what character to return when the input contains an escape character.

How can we extend the compiler to understand a new escape character (e.g. ‘’ for a vertical tab)? We’d like to write if (c == 'v') return ('\v'); but obviously we can’t do that since the whole point is that the compiler doesn’t yet know what \v is. The solution (on our ASCII machine), is to look up that the ASCII code for a vertical tab is 11, and instead build a new version of the compiler containing the line if (c == 'v') return 11;. Install this compiler as the new official C compiler, and now we can write the portable version

c = next();
if (c != '\\') /* not a backslash... */ return (c);
c = next();
if (c == '\\') /* literal backslash...*/ return('\\');
if (c == 'n') /* newline...*/ return ('\n');
if (c == 'v') /* vertical tab...*/ return ('\v');
...

This is a deep concept. It is as close to a “learning” program as I have seen. You simply tell it once, then you can this use self-referencing definition.

Stage 3 – Double blind

So now we’ve understood that we can teach a compiler new tricks, Thompson introduces ‘the cutest program I ever wrote.’

The actual bug I planted in the compiler would match code in the UNIX “login” command. The replacement code would miscompile the login command so that it would accept either the intended encrypted password or a particular known password. Thus if this code were installed in a binary and the binary were used to compile the login command, I could log into that system as any user.

Note that there would be no clue that this was happening by inspecting the source code for the login command itself. However, “even the most casual perusal of the source of the C compiler would raise suspicions.” Is there anything that can be done about that?

Let’s combine the idea of a self-reproducing program with the notion that the compiler can inject a trojan horse… Using the ideas from stage 1 we add a second trojan horse to the compiler in addition to the login one, this trojan horse is aimed at the compiler itself:

The replacement code is a Stage I self-reproducing progam that inserts both Trojan horses into the compiler. This requires a learning phase as in the Stage II example. First we compile the modified source with the normal C compiler to produce a bugged binary. We install this binary as the official C. We can now remove the bugs from the source of the compiler and the new binar will reinsert the bugs whenever it is compiled. Of course, the login command will remain bugged with no trace in source anywhere.

The moral of the story

The moral is obvious. You can’t trust code that you did not totally create yourself… No amount of source-level verification or scrutiny will protect you from using untrusted code.

And the attack vector doesn’t have to be the C compiler… any program-handling program (assembler, loader, microcode etc.) can be used to the same effect. “As the level of program gets lower, these bugs will be harder and harder to detect. A well-installed microcode bug will be almost impossible to detect.”