The Tail at Scale

The Tail at Scale - Dean and Barroso 2013 We've all become familiar with the importance of fault-tolerance and the techniques that can be used to achieve it. Less well-known is the idea of tail-tolerance. A system that doesn't respond quickly enough feels clunky to its users and can have serious negative consequences for site/service ... Continue Reading

Blazes: Coordination analysis for distributed programs

Blazes: Coordination analysis for distributed programs - Alvaro et al. 2014 For many practitioners distributed consistency is the most critical issue for system performance and manageability at scale. In Blazes, Alvaro et al. take a fresh look at 'an urgent issue for distributed systems developers,' namely the correctness and efficiency of distributed consistency mechanisms for ... Continue Reading

Derflow: Distributed Deterministic Dataflow programming for Erlang

Derflow: Distributed Deterministic Dataflow programming for Erlang - Bravo et al. 2014 Today's choice is part of the work of the SyncFree European research project on large-scale computation without synchronisation. Non-determinism makes it very difficult to reason about distributed applications. So Bravo et al. figured life might be easier if we could just make them ... Continue Reading

The Network is Reliable

The Network is Reliable - Bailis and Kingsbury 2014 This must be the easiest paper summary to write of the series so far. The network is reliable? Oh no it isn't... OK, here's a little more detail :) Network reliability matters because it prevents us from having reliable communication, and that in turn makes building ... Continue Reading