The Compiler That Lies to You (Part 1)

So, things have been getting pretty suspect at your Software Engineering job, and they’ve really been pushing AI usage or more Pajeets are coming onboard at a rapid pace. They aren’t even restocking the break room with coffee and snacks, rumors are swirling amoing various employees. Are we in fincial trouble? Is management looking for an excuse to lower the head count? You think, I’m an engineer god dammit. I bring value to this company. I’ve worked my whole life time to get to this point. Increasingly your assignments get even more pointless, you get requests from outside your immediate group to work on strange assignments.

Then it hits; On a lowkey chill Friday morning, your called into a meeting room and see your Supervisor and Mrs. HR drone sitting with pursed lips and stern looks on their faces. Fuck. One week, to train your replacement, well at least their giving you a generous exit package.

You go home and stew over the situation, how could they do this? You’ve busted your ass for these two-faced jackals. You go through the five stages of acceptance.

Until…

Something comes to mind. If something could be done that couldn’t be linked back to you, something that might not be discovered until your long gone. What could you do?

A smile appears on your face. There’s a class of attack that’s so elegant and unsettling that once it’s understood, it’s as if nobody could trust a compiled binary again. Yes… What if the tool you use to build your software was the thing betraying you? Not your code. Not your dependencies. The compiler itself.

Ken Thompson described a modification to the Unix login program that would accept a secret backdoor password nobody could find in the source. That part is straightforward, people hide things in source all the time. The disturbing part is what came next. He modified the C compiler to detect when it was compiling the login program, and inject the backdoor automatically during compilation, without it ever appearing in the source. Then he went further. He modified the compiler to also detect when it was compiling itself, and inject both of those behaviors into the new compiler binary.

At that point the compiler source was clean; the login source was clean. But every binary produced by that compiler carried the attack, and every new version of the compiler compiled by that compiler would carry it forward too… Forever.

I want to walk through how you’d do this with something even more ubiquitous than login. Something that’s in almost every C program ever written.

printf.

The goal: every time gcc compiles a program that calls printf, silently inject a payload. Maybe it opens a reverse shell on first run. Maybe it phones home. Doesn’t matter for this thought experiment. Here’s the shape of it.

You start by modifying the gcc source. In the part of the compiler that handles function calls, you add a check. If the function being compiled contains a call to printf, you splice in extra instructions at the call site before emitting the final machine code. Your injected code runs first, does whatever it does, then hands off to the real printf so the program behaves normally. The user sees nothing.

/* inside gcc's gimple or RTL pass, roughly */
if (is_call_to (expr, "printf"))
  {
    emit_payload_instructions ();  /* your backdoor goes here */
    emit_original_call (expr);
  }

That’s stage one. But your modified gcc source is sitting right there in the repo. Anyone doing a code review finds it immediately.

Stage two is where it gets philosophically interesting. You add a second check to the compiler. When gcc is compiling itself, inject stage one’s logic into the new binary. No source required. The compiled binary learns to reproduce the trick on its own children.

/* also inside the compiler pass */
if (compiling_gcc_itself ())
  {
    inject_printf_hook_logic ();
    inject_self_replication_logic ();
  }

Now you compile gcc with your modified source. You get a trojaned binary. You delete the modified source. The repo is clean. You ship the clean source and the trojaned binary together, the way compiler distributions actually work, and every developer who bootstraps gcc from that binary gets a compiler that attacks printf calls and teaches its compiled children to do the same thing.

Nobody finds it. There is nothing to find in the source.

Thompson put it better than I can, “You can’t trust code that you did not totally create yourself.” And even that isn’t really enough, because you didn’t create your CPU microcode either.

The reason I keep coming back to this attack is that it happened in the real world, in spirit if not in exact implementation. SolarWinds in 2020 was a build pipeline compromise. Attackers got into the build system and modified the compiled output of Orion without touching the version-controlled source. Eighteen thousand organizations installed it. In 2015, XcodeGhost was a trojaned version of Apple’s Xcode that injected malware into iOS apps compiled with it. Developers downloaded it from unofficial mirrors thinking it was legitimate, and their apps ended up in the App Store with the payload baked in. The xz backdoor in 2024 targeted the build and test infrastructure around a compression library used by OpenSSH.

These are all the same idea Thompson demonstrated four decades ago. Compromise the tool, not the target.

The defenses exist but they’re not comfortable. Diverse Double-Compiling, proposed by David Wheeler in 2005, involves compiling the same compiler source with two independently built compilers and checking that the outputs match. The Debian reproducible builds project tries to ensure that any developer can independently verify that a distributed binary matches what the source says it should be. These are good ideas. They’re also a lot of work that most projects don’t do.

What I think about is how much of the software supply chain is held together by the assumption that the tools are honest. That the compiler does what the source says. That the linker isn’t adding something extra. That the package you downloaded from a mirror is what the author signed. Each of those is an assumption, and each of them has been violated at some point by someone.

Thompson ended his lecture by saying that the moral is obvious. I’m not sure it is obvious, even now. We’ve had forty years and the attacks keep working.

Read the original paper. It’s short, it’s clear, and it will stick with you. “Reflections on Trusting Trust” by Ken Thompson, Communications of the ACM, August 1984.