Cloud programming simplified: a Berkeley view on serverless computing Jonas et al., arXiv 2019 With thanks to Eoin Brazil who first pointed this paper out to me via Twitter…. Ten years ago Berkeley released the ‘Berkeley view of cloud computing’ paper, predicting that cloud use would accelerate. Today’s paper choice is billed as its logical … Continue reading Cloud computing simplified: a Berkeley view on serverless computing
Tag: Distributed Systems
Core distributed systems topics, for example consistency, availability and so on.
Efficient synchronisation of state-based CRDTs
Efficient synchronisation of state-based CRDTs Enes et al., arXiv’18 CRDTs are a great example of consistency as logical monotonicity. They come in two main variations: operation-based CRDTs send operations to remote replicas using a reliable dissemination layer with exactly-once causal delivery. (If operations are idempotent then at-least-once is ok too). state-based CRDTs exchange information about … Continue reading Efficient synchronisation of state-based CRDTs
A generalised solution to distributed consensus
A generalised solution to distributed consensus Howard & Mortier, arXiv'19 This is a draft paper that Heidi Howard recently shared with the world via Twitter, and here’s the accompanying blog post. It caught my eye for promising a generalised solution to the consensus problem, and also for using reasoning over immutable state to get there. … Continue reading A generalised solution to distributed consensus
Keeping CALM: when distributed consistency is easy
Keeping CALM: when distributed consistency is easy Hellerstein & Alvaro, arXiv 2019 The CALM conjecture (and later theorem) was first introduced to the world in a 2010 keynote talk at PODS. Behind its simple formulation there’s a deep lesson to be learned with the power to create ripples through our industry akin to the influence … Continue reading Keeping CALM: when distributed consistency is easy
Fixed it for you: protocol repair using lineage graphs
Fixed it for you: protocol repair using lineage graphs Oldenburg et al., CIDR'19 This is a cool paper on a number of levels. Firstly, the main result that catches my eye is that it’s possible to build a distributed systems ‘debugger’ that can suggest protocol-level fixes. E.g. say you have a system that sometimes sends … Continue reading Fixed it for you: protocol repair using lineage graphs
BEAT: asynchronous BFT made practical
BEAT: asynchronous BFT made practical Duan et al., CCS'18 Reaching agreement (consensus) is hard enough, doing it in the presence of active adversaries who can tamper with or destroy your communications is much harder still. That’s the world of Byzantine fault tolerance (BFT). We’ve looked at Practical BFT (PBFT) and HoneyBadger on previous editions of … Continue reading BEAT: asynchronous BFT made practical
ScootR: scaling R dataframes on dataflow systems
ScootR: scaling R dataframes on dataflow systems Kunft et al., SoCC'18 The language of big data is Java ( / Scala). The languages of data science are Python and R. So what do you do when you want to run your data science analysis over large amounts of data? ...programming languages with rich support for … Continue reading ScootR: scaling R dataframes on dataflow systems
Overload control for scaling WeChat microservices
Overload control for scaling WeChat microservices Zhou et al., SoCC'18 There are two reasons to love this paper. First off, we get some insights into the backend that powers WeChat; and secondly the authors share the design of the battle hardened overload control system DAGOR that has been in production at WeChat for five years. … Continue reading Overload control for scaling WeChat microservices
Debugging distributed systems with why-across-time provenance
Debugging distributed systems with why-across-time provenance Whittaker et al., SoCC'18 This value is 17 here, and it shouldn’t be. Why did the get request return 17? Sometimes the simplest questions can be the hardest to answer. As the opening sentence of this paper states: Debugging distributed systems is hard. The kind of why questions we’re … Continue reading Debugging distributed systems with why-across-time provenance
The FuzzyLog: a partially ordered shared log
The FuzzyLog: a partially ordered shared log Lockerman et al., OSDI'18 If you want to build a distributed system then having a distributed shared log as an abstraction to build upon — one that gives you an agreed upon total order for all events — is such a big help that it’s practically cheating! (See … Continue reading The FuzzyLog: a partially ordered shared log