FastRoute: A scalable load-aware anycast routing architecture for modern CDNs - Flavel et al. 2015 This is the story of how a team at Microsoft redesigned their CDN that supports 'numerous popular online services.' It's also a great example of mature systems thinking: the team deliberately eschew designs that would give marginally better performance at … Continue reading FastRoute: A scalable load-aware anycast routing architecture for modern CDNs
Month: May 2015
Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services
Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services - Sharma et al. 2015 At Facebook, lots of applications are interested in data being written to Facebook's data stores. Having each of these applications poll the data stores of interest would be untenable, so Facebook built a pub-sub system to identify updates and transmit notifications to … Continue reading Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services
The Design and Implementation of Open vSwitch
The Design and Implementation of Open vSwitch - Pfaff et al. 2015 Another selection from this month's NSDI 2015 programme, this time from the operational systems track. What inspired the creation of Open vSwitch? What has most influenced its design? And what's next? As virtualized (or containerized) workloads grew, physically provisioning networks to support them … Continue reading The Design and Implementation of Open vSwitch
Queues don’t matter when you can JUMP them
Queues don't matter when you can JUMP them - Grosvenor et al. 2015 The Cambridge Systems at Scale team are on a roll. Hot on the heels of the excellent Musketeer paper from Eurosys 2015 comes this paper on QJUMP which last week won a best paper award at NSDI'15. Distributed systems design involves trade-offs. … Continue reading Queues don’t matter when you can JUMP them
Jitsu: Just-in time summoning of unikernels
Jitsu: Just-in time summoning of unikernels - Madhavapeddy et al. 2015 Last week saw the 12th USENIX symposium on Networked Systems Design and Implementation (NSDI '15), so the papers are now open access. I've been looking forward to bringing you today's choice for some time. Take the MirageOS work on unikernels, and the Xen port … Continue reading Jitsu: Just-in time summoning of unikernels
Extensible Distributed Coordination
Extensible Distributed Coordination - Distler et al. 2015 Coordination services such as ZooKeeper offer a deliberately limited API. As a consequence, more complex coordination tasks have to be implemented as multiple RPCs. In Extensible Distributed Coordination, Distler et al. describe a sandboxed extension mechanism for coordination services that allows execution of client logic in the … Continue reading Extensible Distributed Coordination
Large-scale cluster management at Google with Borg
Large-scale cluster management at Google with Borg - Verma et al. 2015 Borg has been running all of Google's workloads for the last ten years, and the learnings from Borg are being packaged into kubernetes so that the rest of the world can benefit from them. An important paper then as the rest of us … Continue reading Large-scale cluster management at Google with Borg
Blade: A data center garbage collector
Blade: A data center garbage collector - Terei & Levy 2015 Thanks to Justin Mason (@jmason) for bringing today's choice to my attention. GC times are a major cause of latency in the tail - Blade aims to fix this. By taking a distributed systems perspective rather than just a single node view, Blade collaborates … Continue reading Blade: A data center garbage collector
Taming uncertainty in distributed systems with help from the network
Taming uncertainty in distributed systems with help from the network - Leners et al. 2015 Albatross is a membership service with a very interesting new twist: it exploits SDN functionality to actively enforce partitions! Perhaps it is not immediately obvious why that might be a good thing :). It turns out there are several benefits: … Continue reading Taming uncertainty in distributed systems with help from the network
Putting Consistency Back into Eventual Consistency
Putting Consistency Back into Eventual Consistency - Balegas et al. 2015 Today's choice is another pick from the recent crop of Eurosys 2015 papers. Balegas et al. show us that we don't have to put up with weak forms of eventual consistency, even in geo-replicated settings. In Building on Quicksand Helland argued that we need … Continue reading Putting Consistency Back into Eventual Consistency