Grand Pwning Unit: Accelerating microarchitectural attacks with the GPU

Grand Pwning Unit: Accelerating microarchitectural attacks with the GPU Frigo et al., IEEE Security & Privacy

The general awareness of microarchitectural attacks is greatly increased since meltdown and spectre earlier this year. A lot of time and energy has been spent in defending against such attacks, with a threat model that assumes attacks originate from the CPU. Frigo et al. open up an entirely new can of worms – modern SoCs contain a variety of special purpose accelerator units, chief among which is the GPU. GPUs are everywhere these days.

Unfortunately, the inclusion of these special-purpose units in the processor today appears to be guided by a basic security model that mainly governs access control, while entirely ignoring the threat of more advanced microarchitectural attacks.

I’m sure you know where this is heading…

It turns out the accelerators can also be used to “accelerate” microarchitectural attacks. Once more we find ourselves in a situation with widespread vulnerabilities. The demonstration target in the paper is a mobile phone running on the ARM platform, with all known defences, including any applicable advanced research defences, employed. Using WebGL from JavaScript, Frigo et al. show how to go from e.g. an advert on a web page to a fully compromised browser in under two minutes.

Our end-to-end attack, named GLitch, uses all these GPU primitives in orchestration to reliably compromise the browser on a mobile device using only microarchitectural attacks in under two minutes. In comparison, even on PCs, all previous Rowhammer attacks from JavaScript require not default configurations (such as reduced DRAMh refresh rates or huge pages) and often take such a long time that some researchers have questioned their practicality.

If only I could flip a bit…

In Firefox, values stored in JavaScript ArrayObjects are 64 bits. The first 32 bits are used as a tag identifying the type of the object. When the tag value is below 0xffffff80 whole 64-bit work is considered as an IEEE-754 double, otherwise the last 32 bits are considered as a pointer to an object. (This strategy is known as NaN-boxing, encoding object pointers in IEEE-754 doubles as NaN values).

So… if only we could flip bits within the first 25 bits of the tag, we could turn pointers into doubles, and vice-versa.

The goal of the [GLitch] exploit is to obtain an arbitrary read/write primitive which can eventually lead to remote code execution. ArrayBuffers are the best fit to gain such a primitive since they provide the attacker with full control over their content. As a consequence, we want to create a reference to a fake ArrayBuffer whose data pointer we control.

  1. The GLitch exploit tool starts by storing a pointer to an inlined ArrayBuffer (header adjacent to data) in 1-to-0 bit-flip vulnerable location. Triggering a bit flip then turns this into a double that can be read, breaking ASLR (address space layout randomisation).
  2. Store a double in a 0-to-1 vulnerable cell in the ArrayBuffer, constructed in such a way that when a bit is flipped it becomes a pointer to a JSString, in turn pointing at the header (address obtained in step 1) for its immutable data. Read the value of the string to extract the content of the ArrayBuffer’s header.
  3. Create a header for a fake ArrayBuffer (using the header information obtained in step 2) within the leaked ArrayBuffer and craft a reference to it using the same double-to-pointer bit flip technique that we used for the JSString. Now we have the desired arbitrary read/write primitive.

Here’s how long it takes for Glitch to break ASLR and compromise the browser on a Nexus 5:

On average, GLitch can break ASLR in only 27 seconds and fully compromise the browser remotely in 116s, making it the fastest known remote Rowhammer attack.

So how does the GPU help us to flip bits (or just steal data in general using side-channel attacks)?

Four attack primitives

We need to be able to either leak data (side-channel attacks) or corrupt data (e.g. Rowhammer attacks).

A primary mechanism for leaking data using microarchitectural attacks is to time operations over resources shared with a victim process.

The first attack primitive then is access to a high-resolution timer. There has been a bit of an arms race in CPU land with clever news ways of creating timers being devised and then blocked as best as possible. But the defences don’t take into account GPUs. There are two explicit timer sources within the OpenGL / WebGL world that will do the job, available when the EXT_DISJOINT_TIMER_QUERY extension is present. Both GPU and CPU operations can be timed directly using the primitives it provides. It’s also possible to craft your own timers using only standard (i.e., always available) WebGL2 functions: clientWaitSync and getSyncParameter. WebGL2 itself is not yet as widely supported as WebGL1 though.

…in order to comply with the WebGL2 specification none of these functions can be disable. Also, due to the synchronous nature of these timers, we can use them to measure both CPU and GPU operations.

Here we can see for example clear timing differences between cached and uncached data using the EXT_DISJOINT_TIMER_QUERY extension:

A second attack primitive is having access to resources shared with other process. By figuring out the caching structure within their GPU (Adreno 330), the authors were able to figure out the sequence of operations needed to effectively bypass the GPU caches and measure memory page accesses for memory pages shared with the rest of the system.

Internally the GPU has two levels of caching, and two ways of accessing memory (by inputting vertices to vertex shaders, or by fetching textures within shaders). Texture fetching turned out to be the easiest to control, and section IV.B of the paper describes in detail how the authors deduced an efficient strategy to evict cache sets from the GPU.

The third attack primitive is knowledge of the physical location of allocated memory addresses: a requirement in order to understand which rows to hammer in a rowhammer atttack. When a row of memory is accessed we can tell if it was already in the row buffer or not by measuring the time the operation takes (buffer hits are faster). To carry out a reliable Rowhammer attack, three adjacent rows within a DRAM bank are required. Distinguishing between row buffer hits and misses enables us to determine whether allocations are contiguous or non-contiguous. (Details are in section VII.D, and the appendix covers the relationship between adjacency and contiguity).

The fourth and final attack primitive is fast memory access needed to trigger bit flips with Rowhammer attacks. Using the knowledge of the GPU cache hierarchy gained via probing the authors derive efficient access patterns to perform double-sided Rowhammering attacks…

Rowhammering

DRAM rows are composed of cells which store the value of a bit in a capacitor.

The charge of a capacitor is transient, and therefore DRAM needs to be recharged within a precise interval (usually 64ms). Rowhammer is a software-based fault injection attack that can be considered a fallout of this DRAM property. By frequently activating specific rows an attacker can influence the charge in the capacitors of adjacent rows, making it possible to induce bit flips in a victim row without having access to its data.

In a double-sided Rowhammer attack quick accesses to rows n-1 and n+1, impose high pressure on the capacitors in victim row n, triggering bit flips. “… our novel GPU-based side-channel attack provides us with information about contiguous physical memory regions in JavaScript, allowing us to perform double-sided Rowhammer on ARM devices in the browser.

Mitigations

You could combine these primitives in a number of imaginative ways to construct different attacks, GLitch is but one end-to-end example. Eliminating known timers (e.g., disabling the EXT_DISJOINT_TIMER_QUERY) is still the best line of defence (though more timing strategies will likely be discovered). The WebGL2 getSyncParameter function can be disabled, and the clientWaitSync function could be replaced by a callback design. (This requires changes to the WebGL2 spec.). Stricter policies for memory reuse may also make it harder for an attacker to hammer valuable data.

We showed that it is possible to perform advanced microarchitectural attacks directly from integrated GPUs found in almost all mobile devices… more alarming, these attacks can be launched from the browser. For example, we showed for the first time that with microarchitectural attacks from the GPU, an attacker can fully compromise a browser running on a mobile phone in less than 2 minutes… we hope our efforts make processor vendors more careful when embedding the next specialized unit into our commodity processors.