System programming in Rust: beyond safety

System programming in Rust: beyond safety Balasubramanian et al., HotOS’17

Balasubramanian et al. want us to switch all of our systems programming over to Rust. This paper sets out the case.

Despite many advances in programming languages, clean-slate operating systems, hypervisors, key-value stores, web servers, network and storage frameworks are still developed in C, a programming language that is in many ways closer to assembly than to a modern high-level language. Today, the price of running unsafe code is high… Why are we still using C?

About 2/3 of the 2017 CVEs relating to the Linux kernel can be attributed to the use of an unsafe language, and pervasive use of pointer aliasing, pointer arithmetic and unsafe type casts defeat the use of software verification tools.

So why are we still using C? Because safe languages have overheads that are too high for many use cases argue the authors. (And because of familiarity, and large existing C codebases, I would add).

Is it reasonable to sacrifice safety for performance, or should we prioritize safety and accept its overhead? Recent developments in programming languages suggest that this might be a false dilemma, as it is possible to achieve both performance and safety without compromising on either.

Enter Rust!

Rust achieves both safety and performance by embracing linear types. In the Rust ownership model, a variable that is bound to an object acquires ownership of that object. When the variable goes out of scope, the object is deallocated. Ownership can be transferred to another variable, but doing so destroys the original binding. You can also borrow an object within breaking the binding, but only within the syntactic scope of the declaration, and when it does not exceed the scope of the primary binding.

This ownership model eliminates pointer aliasing. There is an unsafe subset of the language that is not subject to the single ownership restrictions (best left to the standard library, to implement e.g., doubly-linked lists). There is also a mechanism for safe read-only aliasing which involves wrapping the object with a reference counted type, Rc or Arc. And if you must have write aliasing (for example, to a shared resource), you can enforce this dynamically by wrapping the object with the Mutex type:

In contrast to conventional languages, this form of aliasing is explicit in the object’s type signature…

Several projects have demonstrated Rust’s suitability for building low-level high-performance systems, but the authors of this paper want us to go further and consider additional benefits for systems programming that are enabled by Rust’s type system: software fault isolation (SFI), program analysis – especially static information flow control (IFC), and safe traversal of pointer-linked data structures which enables automation of tasks such as checkpointing.

(Recall we looked at timely dataflow as used by Mosaic a couple of weeks ago. Timely dataflow is implemented in Rust).

Software Fault Isolation (SFI)

SFI is the idea of enforcing process-like boundaries around program modules in software, without relying on hardware protection. Modern SFI implementations enable low-enough cost isolation in some applications (e.g., browser plugins and some device drivers), but “their overhead becomes unacceptable in applications that require high-throughput communication across protection boundaries.” Consider for example a network processing framework forwarding packets through a pipeline of filters that should ideally be isolated from each other:

Sending data across protection boundaries requires copying it, which is unacceptable in a line-rate system.

You can avoid copying with a tagged shared heap, but this introduces other runtime overheads (up to 100%) for tag validation on each pointer dereference.

Rust’s single ownership model allows us to implement zero-copy SFI. The Rust compiler ensures that, once a pointer has been passed across isolation boundaries, it can no longer be accessed by the sender.

The authors demonstrate how to build an SFI library in Rust, which supports secure communication across isolation boundaries with negligible overhead. The library exports two data types, protection domains (PDs) and remote references (rrefs).

Arguments and return values of remote invocations follow the usual Rust semantics: borrowed references are accessible to the target PD for the duration of the call; all other arguments change their ownership permanently. The sole exception is remote references: the object pointed to by an rref stays in its original domain and can only be accessed from the domain holding the reference via remote invocation.

Remote references (rrefs) are weak references to a reference table. All remote invocations are proxied through this table.

If a panic occurs inside a domain, its domain reference table is cleared and a user-provided function is invoked to re-initialize it from a clean slate.

The cost of this isolation is accessed in the context of a network processing framework using null filters which forward batches of packets without doing any work on them:

The overhead grows from 90 CPU cycles for 1-packet batches to 122 cycles for 256-packet batches, which is roughly the cost of 2 or 3 L3 cache accesses.

Information flow control

Static information flow control (IFC) enforces confidentiality by tracking the progress of sensitive data through a program. It doesn’t require a great leap of imagination to see that aliasing make this problem much harder – you have to keep track of all aliases to a variable and everywhere that they are used.

Modern alias analysis achieves efficiency by sacrificing precision, posing a major barrier to accurate IFC. By restricting aliasing, Rust sidesteps the problem.

The authors implement a proof-of-concept IFC for Rust. Rust macros transform a program into an abstract representation in which the value of each variable is simply represented by its security label. Input variables are initialised with user-provided labels, arithmetic expressions over secure values compute an upper bound of their arguments, and an auxiliary program counter tracks the flow of information via branching on labeled variables.

The resulting abstract program is verified using a verifier implemented as an extension of SMACK. This is able to detect problems such as the secret data leak in the following program fragment:

Safe traversals

… checkpointing, transactions, replication, multiversion concurrency, etc., involve snapshotting parts of program state. This, in turn, requires traversing pointer-linked data structures in memory. Ideally one would like to generate this functionality automatically and for arbitrary user-defined data types. However, doing so in a robust way can be complicated in the presence of aliasing.

In Rust, the problem is much simpler. By default all references in Rust are unique owners and can be safely traversed without extra checks. If aliasing is present, it is detectable through the use of Rc and Arc wrappers, which make these wrappers a convenient place to deal with aliasing with minimal changes to user code and without expensive lookups.

The authors built an automatic checkpointing library for Rust following this observation using a Checkpointable trait (interface) and a custom implementation for Rc (Arc could be extended similarly). A compiler plugin automatically generates an implementation of the Checkpointable trait for types composed of scalar values and references to other checkpointable types.

Our library adds the checkpointing capability to arbitrary user-defined data types; in particular, it checkpoints objects with internal aliases correctly and efficiently.

The last word

We show that Rust enables system programmers to implement powerful security and reliability mechanisms like SFI, IFC, and automatic checkpointing more efficiently than any conventional language. This is just the tip of the iceberg: we believe that further exploration of linear types in the context of real systems will yield more game-changing discoveries.