SGXBounds: memory safety for shielded execution

SGXBounds: memory safety for shielded execution Kuvaiskii et al., EuroSys’17

We’ve previously looked at a number of Intel SGX-related papers in The Morning Paper, including SCONE, which today’s paper builds on. SGX comes with a memory encryption engine and seeks to protect trusted applications from an untrusted operating system, providing confidentiality and integrity guarantees. SGX, and the ARM equivalent which is called TrustZone, are both really interesting developments.

I’m grateful to the authors of today’s paper, SGXBounds, for pointing out an (obvious with hindsight!) flaw in my thinking: it seems on the surface that with all that hardware-based memory protection the contents of your memory should be safe. And yet:

Shielded execution… does not protect the program against memory safety attacks. These attacks are widespread, especially on legacy applications written in unsafe languages such as C/C++. In particular, a remote attacker can violate memory safety by exploiting the existing program bugs to invoke out-of-bounds memory accesses (aka buffer overflows). Thereafter the attacker can hijack program control flow or leak confidential data.

Why obvious with hindsight? If you think about it, the one entity that SGX does not protect the memory from is the trusted application itself. Memory safety attacks essentially fool the trusted application into doing what the attacker wants…

Still don’t believe these attacks work under SGX? The authors demonstrated the feasibility by reproducing publicly available memory safety exploits (including Heartbleed) against Apache, Memcached, and Nginx, as well as 16 test cases from the RIPE security benchmark.

If you’re concerned enough about security to go to the lengths of running in an SGX enclave, you should certainly be concerned about these memory safety attacks too. What can be done to prevent them?

The foundation of all memory attacks is getting access to a prohibited region of memory. Hence, memory safety can be achieved by enforcing a single invariant: memory accesses must always stay within the bounds of originally intended (referent) objects. For legacy applications written in C/C++, this invariant is enforced by changing (hardening) the application to perform additional bounds checks.

The authors evaluated two existing approaches to memory safety bounds checking: one software based approach called AddressSanitizer, and one hardware based approach in the form of Intel MPX. Neither of them gave good performance when combined with SGX. Here’s SQLite running in an SGX enclave as is, and with AddressSanitization (ASan), MPX, and the SGXBounds approach developed in this paper:

… Intel MPX performs so poorly that it crashes due to insufficient memory already after a tiny working set of 100 (corresponding to memory consumption of 60MB for the native SGX execution). AddressSanitizer is more stable, but performs up to 3.1x slower than SGX on larger inputs… additionally AddressSanitizer consumes 3.1x more virtual memory which can quickly exhaust available memory inside the enclave.

Realising that the approach taken by existing memory safety approach did not mesh well with the high encryption overheads and limited enclave memory of SGX, the authors set out to develop a mechanism in sympathy with SGX, which they call SGXBounds.

Our design takes into account architectural features of SGX and reduces performance and memory overheads to the levels acceptable in production use. For instance, in the case of SQLite, SGXBounds outperforms both AddressSanitizer and Intel MPX, with performance overheads of no more than 35% and almost zero memory overheads with respect to the native SGX execution.

SGXBounds design considerations

SGX enclave pages are located in an Enclave Page Cache (EPC) – a dedicated memory region. This is a limited resource (128MB) shared among all enclaves. Of those 128MB, only 94MB are available to the application, and the rest is reserved for metadata. A paging mechanism supports the creation of enclaves that need larger memory sizes. Pages are encrypted when they are paged out of the EPC, and decrypted when they come back in. Paging overhead can be 2x for sequential memory accesses, and up to 2000x for random ones.

… shielded application memory (more specifically, its working set) must be kept minimal due to the very limited ECP size in current SGX implementations. This is in sharp contrast to the usual assumption of almost endless reserves of RAM for many other memory-safety approaches.

Observing that many applications spend a considerable amount of time iterating through the elements of an array, SGXBounds also selects a metadata layout optimised for this.

Finally, SGXBounds relies on the SCONE infrastructure and its monolithic build process to statically link all application code with no external dependencies, this avoid the problem of interoperability with uninstrumented code.

Small address space, big pointers

That small address space, which causes a lot of problems for the traditional memory-safety approaches, also holds the key to the efficiency of SGXBounds.

All modern SGX CPUs operate in a 64-bit mode, meaning that all pointers are 64 bits in size. In SGX enclaves, however, only 36 bits of virtual address space are currently addressable, and even this amount of space is not likely to be used due to performance penalties. Thus, SGXBOUNDS relies on the idea of tagged pointers: a 64-bit pointer contains the pointer itself in its lower 32 bits and the referent object’s upper bound in the upper 32 bits.

This simple scheme gives a whole host of benefits:

It minimises the amount of memory for metadata
It requires no additional memory accesses while iterating over arrays with a positive increment
In alleviates the problems with traditional ‘fat pointer’ designs concerning memory layout changes and multithreading (an update of a pointer and its associated metadata must be implemented as one atomic operation, which requires some synchronisation mechanism).
There is nothing to do on pointer assignment as all the metadata is already contained within the pointer
It is robust to type casts

When an object is created, SGXBounds associates a pointer with the bounds of the object. For global and stack-allocated variables, the memory layout is change to pad them with 4 bytes and initialise them at runtime. For dynamically allocated variables, SGXBounds wraps malloc and friends to append 4 bytes to each newly created object, initialise these bytes with the lower-bound value, and make the pointer tagged with the upper bound. Run-time bounds checks are then inserted before each memory access. The original pointer, upper- and lower-bounds can all be easily extracted with efficient operations.

Boundless memory

The default fail-fast behaviour is to crash with a diagnostic error whenever SGXBounds detects an out-of-bounds access. It’s also possible to attempt to recover from these and carry on:

To allow applications to survive most bugs and attacks and continue correct execution, SGXBOUNDS reverts to failure-oblivious computing by using the concept of boundless memory blocks. In this case, whenever an out-of-bounds memory access is detected, SGXBOUNDS redirects this access to a separate “overlay” memory area to prevent corruption of the adjacent objects, creating the illusion of “boundless” memory allocated for the object.

Optimisations

The compiler analysis phase detects pointer arithmetic operations and memory accesses that can be statically determined to be always safe, and avoids instrumenting them. “This is a standard optimization for many approaches, and yields significant performance gains for some applications, up to 20%.”

Secondly, when iterating over an array in a simple loop there is no need to perform the lower-bound check on each iteration, this can be hoisted outside of the loop, saving two memory accesses per iteration. This optimisation leads to performance gains of up to 22% in some cases.

Evaluation

We are interested in the performance overhead of SGXBounds, and also of course in how well it protects against memory safety attacks.

Performance

Here we can see the performance and memory overheads of Intel MPX, AddressSanitizer, and SGXBounds normalised against an uninstrumented SGX version, for applications from the Phoenix and PARSEC benchmarks.

Performance overheads of Intel MPX vary significantly, but can reach 5-6x. For pointer intensive cases the memory overhead can also reach up to 4x. AddressSanitizer has a more reasonable performance overhead of around 51%, but can lead to memory blow-ups of 50-100x.

… SGXBounds performs the best, with an average performance overhead of 17% and average memory overhead of 0.1%. In comparison to Intel MPX, SGXBounds does not choke on pointer-intensive programs. In comparison to AddressSanitizer, SGXBounds has much better memory consumption. It also does not exhibit corner-case performance drops like AddressSanitizer in swaptions and does not eat up all memory like Intel MPX in dedup.

Protection

The security guarantees of SGXBounds were tested using the RIPE security benchmark.

RIPE claims to perform 850 working buffer-overflow attacks. However, under our native configuration, only 46 attacks were successful: through the shellcode that creates a dummy file and through return-into-libc. When building RIPE under SCONE infrastructure, this number decreased to 16 attacks: the shellcode attacks failed because SGX disallows the int instruction used in shellcode.

MPX only detects 2 out of 16 attacks, AddressSanitizer and SGXBounds get 8 out of 16 each.

What’s up with the other 8? These are all in-struct buffer overflows, where the same object contains both a vulnerable buffer and a target-of-attack function pointer. Because both AddressSanitizer and SGXBounds operate at whole-object granularity, they could not detect this.

Case studies

The paper ends with a look at Memcached, Apache, and Nginx running inside SGX with bounds protection. Here’s a summary of the obtained throughput and memory usage (and you can also compare against native performance – i.e., not using SGX at all):

With memcached, the authors reproduced the CVE-2011-4971 vulnerability in the SGX environment. Intel MPX, AddressSanitizer, and SGXBounds all caught it.

For Apache, the authors tested Heartbleed. All three techniques detect it, with the boundless memory technique of SGXBounds also allowing Apache to continue its execution (in the other two cases it crashes).

For Nginx, a stack buffer overflow CVE-2013-2028 was tested and once more all three approaches were able to detect it.

End notes

Considering that Intel MPX is a hardware extension, its low performance was surprising to us. Intel MPX performs well if the protected application works only with a small portion of pointers, but in the opposite case the overheads may get very high. To understand the underlying reasons of poor MPX performance, we conducted a more extensive and rigorous evaluation.

Source code for SGXBounds is available at https://github.com/tudinfse/sgxbounds.

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic