The case for distributed operating systems in the data center

New wine in old skins: the case for distributed operating systems in the data center – Schwarzkopf et al. 2013.

I attended the New Directions in Operating Systems one-day event in London last week, and came away with the impression that the beginning of the end of the traditional operating system is in sight. Today’s OSs are designed around a paradigm of sharing a single machine amongst multiple users. Increasingly, that’s not what we need our OSs to do. For running isolated application processes on a distributed computing substrate, library OSs, rump kernels, or unikernels that are stripped down and fit-for-purpose make a lot more sense. On the client-side, instead of a multi-user device, we have multi-device users. The primary unit of isolation becomes an installed application, rather than a user. And on the warehouse-scale computers that increasingly underpin all of our server-side processing? … Schwarzkopf et al. make the case for the warehouse-scale OS.

Distributed operating systems were all the rage in the early 1980s, but fell out of fashion. In the era of the ‘warehouse scale computer’ (WSC) could they make sense once more? Let us examine the evidence.

The historical evidence clearly shows the failure of the distributed OS concept. Instead, orchestration of distributed operation is now firmly the business of applications. So, can we declare the case closed? Not quite, as some key conditions have recently changed, or are in the process of changing…

  • CPU clock rates have plateaued, but network communication bandwidth keeps increasing. Long standing OS abstractions designed on the premise that network I/O is far slower than local operations – such as the BSD sockets interface – do not scale to ever faster networks.

  • The growth in number of cores per processor brings single-machine abstractions closer to distributed systems.

Due to this reversal of relative speed, the performance advantage of spatial locality is lost at least for some applications. Indeed, it has been observed that accessing remote memory and disks has no substantial overhead any more [3, 31].

  • Moving computation and state is increasingly viable as migration overheads diminish. Examples given include moving to a faster processor, or to a GFGPU/FPGA accelerator, or away from contention.

  • Warehouse-scale computing applications are sufficiently demanding that they cannot fit on a single computer. They require distribution rather than merely benefiting from its flexibility.

  • Deep cache hierarchies and complex interconnects looks like networked distributed systems:

Modern multi-socket many-core machines have complex internal interconnects as well as deep cache hierarchies. Hence, even the within-chassis environment is increasingly reminiscent of a networked distributed system, and operating systems must adapt to this new reality.

  • Applications that provide transparent distribution and tolerate coarse-grained parallelism by design (such as MapReduce) are a better fit for a distributed OS than earlier desktop applications.

The authors believe the time is now right to reconsider distributed OS research.

The complexity of WSC-scale operations and applications is such that key operating
decisions are best made by integrated software stacks. At the same time, simplicity in programming and configuration abstractions is of key importance.

Why can’t this all be supported in middleware, as it is today?

While we are aware that there is a delicate balance between feature expansion and attack surface expansion [20], we believe pushing distributed systems functionality into shared OS components can be beneficial. As with all operating systems, common operations can be implemented once and shared between all ap-plications in the WSC, and applications can assume them to be present and trustworthy. Furthermore, additional efficiency and security enhancements are only possible when the per-node kernel is fully aware of its role as a component in a massively distributed system.

A distributed OS would create a substrate for tracking information flow amongst the components of a distributed application,

Thus, a distributed WSC OS can provide assurances about data management policy
conformance (and detect violation attempts). This is notoriously difficult in the user-space of a traditional OS, where it is possible to bypass such mechanisms trivially.

A new OS would also give us the opportunity to introduce new and better tuned abstractions for the distributed reality.

While traditional distributed OSes tend to stick closely to the familiar model of files and processes, a future WSC OS may benefit from being built around data-flow abstractions. This is particularly pertinent given that data-flow maps well onto capability-based security mechanisms, and has attractive properties (e.g. explicit
dependencies) for information flow tracking.

Bandwidth may be getting better rapidly, but latency has not reduced by nearly as much. What can we do to ameliorate latency issues?

Good engineering can help, but the answer may lie in old concepts such as ordered broadcast [40]: increased bandwidth makes time-division multiplexing of a fraction of the link capacity a viable optimization for low latency.

The team at the University of Cambridge are convinced by these arguments…

As such, we have put our money where our mouth is, and are developing on DIOS, a new distributed operating system design for WSCs. A prototype implementation is in the works.

I’m certainly intrigued, if not yet convinced. But I look forward to watching their progress with interest.