Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services - Sharma et al. 2015 At Facebook, lots of applications are interested in data being written to Facebook's data stores. Having each of these applications poll the data stores of interest would be untenable, so Facebook built a pub-sub system to identify updates and transmit notifications to … Continue reading Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services
Category: Distributed Systems
Core distributed systems topics, for example consistency, availability and so on.
Jitsu: Just-in time summoning of unikernels
Jitsu: Just-in time summoning of unikernels - Madhavapeddy et al. 2015 Last week saw the 12th USENIX symposium on Networked Systems Design and Implementation (NSDI '15), so the papers are now open access. I've been looking forward to bringing you today's choice for some time. Take the MirageOS work on unikernels, and the Xen port … Continue reading Jitsu: Just-in time summoning of unikernels
Extensible Distributed Coordination
Extensible Distributed Coordination - Distler et al. 2015 Coordination services such as ZooKeeper offer a deliberately limited API. As a consequence, more complex coordination tasks have to be implemented as multiple RPCs. In Extensible Distributed Coordination, Distler et al. describe a sandboxed extension mechanism for coordination services that allows execution of client logic in the … Continue reading Extensible Distributed Coordination
Large-scale cluster management at Google with Borg
Large-scale cluster management at Google with Borg - Verma et al. 2015 Borg has been running all of Google's workloads for the last ten years, and the learnings from Borg are being packaged into kubernetes so that the rest of the world can benefit from them. An important paper then as the rest of us … Continue reading Large-scale cluster management at Google with Borg
Blade: A data center garbage collector
Blade: A data center garbage collector - Terei & Levy 2015 Thanks to Justin Mason (@jmason) for bringing today's choice to my attention. GC times are a major cause of latency in the tail - Blade aims to fix this. By taking a distributed systems perspective rather than just a single node view, Blade collaborates … Continue reading Blade: A data center garbage collector
Taming uncertainty in distributed systems with help from the network
Taming uncertainty in distributed systems with help from the network - Leners et al. 2015 Albatross is a membership service with a very interesting new twist: it exploits SDN functionality to actively enforce partitions! Perhaps it is not immediately obvious why that might be a good thing :). It turns out there are several benefits: … Continue reading Taming uncertainty in distributed systems with help from the network
Putting Consistency Back into Eventual Consistency
Putting Consistency Back into Eventual Consistency - Balegas et al. 2015 Today's choice is another pick from the recent crop of Eurosys 2015 papers. Balegas et al. show us that we don't have to put up with weak forms of eventual consistency, even in geo-replicated settings. In Building on Quicksand Helland argued that we need … Continue reading Putting Consistency Back into Eventual Consistency
Musketeer – Part II: all for one, and one for all in data processing systems
Musketeer: all for one, one for all in data processing systems - Gog et al. 2015 Musketeer gives you portability of data processing workflows across across data processing systems. It can even analyse your workflow and recommend the best system to run it on, as well as combining systems for different parts of the workflow. … Continue reading Musketeer – Part II: all for one, and one for all in data processing systems
Musketeer – Part I : What’s the best data processing system?
Musketeer: all for one, one for all in data processing systems - Gog et al. 2015 For between 40-80% of the jobs submitted to MapReduce systems, you'd be better off just running them on a single machine... It was Eurosys 2015 last week, and a great new crop of papers were presented. Gog et al. … Continue reading Musketeer – Part I : What’s the best data processing system?
Distributed Snapshots: Determining Global States of Distributed Systems
Distributed Snapshots: Determining Global States of Distributed Systems - Chandy & Lamport 1985. What state is your distributed system in? In the absence of a universal clock, is that even a well-formed question? And if you could take a distributed snapshot of system state, would that be useful? Through an algorithm that has simply become … Continue reading Distributed Snapshots: Determining Global States of Distributed Systems