Unikernels: Library Operating Systems for the Cloud – Madhavapeddy et al. 2013
See also: Unikernels: Rise of the Virtual Library Operating System from ACM Queue.
As we discussed in a previous edition of The Morning Paper, there is an increasing mismatch between the traditional OS design point, and the way that we are using the OS in modern systems. The OSv paper is well worth a read on this topic – I wish I could link you to the write-up, but that was the very last paper for which I gave a twitter-only review before switching to the blog format. (And it was the fact that I had so many things I wanted to say about it that made the need for the blog plain!). Today’s choice, from the Mirage project tackles the same problem space as OSv: how to optimise an OS for running server-side applications on top of a hypervisor; from that point on the solutions diverge pretty rapidly…
Despite this shift from applications running on multi-user operating systems to provisioning many instances of single-purpose VMs, there is little actual specialisation that occurs in the image that is deployed to the cloud. We take an extreme position on specialisation, treating the final VM image as a single-purpose appliance rather than a general-purpose system by stripping away functionality at compile-time
A unikernel is a library OS. By building on top of the hypervisor, the usual pain point of hardware compatibility can be delegated to it. The Mirage team have chosen to ‘eschew backwards compatibility’ (you won’t find any POSIX here) which liberates them to consider new points in the design space.
The unikernel approach builds on past work in library OSs . The entire software stack of system libraries, language runtime, and applications is compiled into a single bootable VM image that runs directly on a standard hypervisor…. Our key insight is that the hypervisor provides a virtual hardware abstraction that can be scaled dynamically – both vertically by adding memory and vCPUs, and horizontally by spawning more VMs. This provides an excellent target for library operating systems (libOSs), an old idea recently revisited to break up monolithic OSs.
Everything in the Mirage world is built in OCaml – including your application should you choose write one to link with the OS. Yes, that does mean that vast swathes of existing applications and libraries won’t run as-is on Mirage. So if you do decide to roll-up your sleeves and build an OCaml app explicitly for this OS, what do you get in return?
We find sacrificing source-level backward compatibility allows us to increase performance while significantly improving the security of external-facing cloud services.
There’s also a very real size advantage. The tiny size and very fast boot times make new system designs possible – for example, booting an entire VM in response to an incoming network request.
For example, the Mirage DNS server outperforms both BIND 9 (by 45%) and the high-performance NSD server (§4.2), while using very much smaller VM images: our unikernel appliance image was just 200 kB while the BIND appliance was over 400MB.
200K! There are plenty of web pages that download at least that much, and here we have a full virtual machine and DNS implementation! That truly is a microservice.
How come the images are so tiny?
A libOS is structured very differently from a conventional OS: all services, from the scheduler to the device drivers to the network stack, are implemented as libraries linked directly with the application…. Unikernels link libraries that would normally be provided by the host OS, allowing the Unikernel tools to produce highly compact binaries via the normal linking mechanism. Features that are not used in a particular compilation are not included and whole-system optimization techniques can be used. In the most specialised mode, all configuration files are statically evaluated, enabling extensive dead-code elimination at the cost of having to recompile to reconfigure the service. The small binary size (on the order of kilobytes in many cases) makes deployment to remote datacenters across the Internet much smoother.
The elimination of so much code also greatly reduces the security attack surface, but Mirage has a couple of other tricks up its sleeve as well in this area: sealing, and address-space randomisation. Sealing prevents any new code being introduced and executed at runtime (e.g. treating data as code), address-space randomisation makes it much harder to create attacks that try to exploit code already in the VM.
Sealing takes advantage of the single-image, single address-space nature of unikernels:
Implementing this policy is very simple: as part of its start-of-day initialisation, the unikernel establishes a set of page tables in which no page is both writable and executable and then issues a special seal hypercall which prevents further page table modifications. The memory access policy in effect when the VM is sealed will be preserved until it terminates.
Address space randomisation turns what might be considered a weakness of the approach into an advantage:
The unikernel model means that reconfiguring an appliance means recompiling it, potentially for every deployment. We can thus perform address space randomisation at compile time using a freshly generated linker script, without impeding any compiler optimisations and without adding any runtime complexity.
(AFAICT, there’s nothing stopping you building an app that takes advantage of dynamic configuration if that’s what you choose to do though).
Mirage is built on top of Xen. OCaml was chosen for four reasons:
- It’s a full-fledged systems programming language
- It has a simple and high-performance runtime
- Its static types are eliminated at compile time, while retaining full run-time type safety
- The open source Xen Cloud Platform and other key components are written in OCaml making integration easier
As I said previously, that decision extends all the way out to the language you need to use to build your app though. So whereas with my OSv experiments I was able to get a Spring Boot based microservice VM up and running very simply, that just wouldn’t be possible with Mirage.
The language runtime is specialised for Mirage in two key areas: memory management and concurrency. Mirage “guarantees a virtual contiguous address space, simplifying runtime memory management,” and it integrates the Lwt cooperative threading library.
The Mirage network stack also has support for zero-copy I/O, which helps make it fast.
[The Mirage network stack] provides two communication methods: a fast on-host inter-VM vchan transport, and an Ethernet transport for external communication. vchan is a fast shared memory interconnect through which data is tracked via producer/consumer pointers.
Using vchan, VMs can exchange data directly via shared memory without any further intervention from the hypervisor.
Many other Mirage implementation details are documented in the paper (link at the top of the post), to which I refer you for more information.
We already know that Mirage can produce very compact, secure binaries. What else did the team find during their evaluation?
Unikernels are compact enough to boot and respond to network traffic in real-time…. Mirage boots in under 50 milliseconds. Such fast reboot times mitigate the concern that re-deployment by reconfiguration is too heavyweight, as well as opening up the possibility of regular micro-reboots.
Once booted, Mirage performance is comparable to traditional Linux alternatives:
Unikernel low-level networking performs competitively to conventional OSs, despite being fully type-safe. Library-based block management performs on par with a conventional layered storage stack.
There are some factors to consider before widely deploying large numbers of small unikernels though:
Existing cloud orchestration layers such as OpenStack and Eucalyptus exhibit high latency when manipulating small VMs, when compared to processes running with a single OS. Unikernels depend on running multiple VMs for parallelization, and so improvements will be needed in this space. Finally, the use of cooperative threading does require some form of broader management system, as a single bug can completely deadlock an entire appliance.
There’s plenty more in the paper that I didn’t have space to cover here, so I encourage you to dig in.