The Tail at Scale

January 15, 2015July 26, 2017 ~ adriancolyer ~ 16 Comments

The Tail at Scale - Dean and Barroso 2013 We've all become familiar with the importance of fault-tolerance and the techniques that can be used to achieve it. Less well-known is the idea of tail-tolerance. A system that doesn't respond quickly enough feels clunky to its users and can have serious negative consequences for site/service … Continue reading The Tail at Scale

Blazes: Coordination analysis for distributed programs

January 5, 2015July 26, 2017 ~ adriancolyer ~ 5 Comments

Blazes: Coordination analysis for distributed programs - Alvaro et al. 2014 For many practitioners distributed consistency is the most critical issue for system performance and manageability at scale. In Blazes, Alvaro et al. take a fresh look at 'an urgent issue for distributed systems developers,' namely the correctness and efficiency of distributed consistency mechanisms for … Continue reading Blazes: Coordination analysis for distributed programs

Derflow: Distributed Deterministic Dataflow programming for Erlang

December 19, 2014July 26, 2017 ~ adriancolyer ~ 2 Comments

Derflow: Distributed Deterministic Dataflow programming for Erlang - Bravo et al. 2014 Today's choice is part of the work of the SyncFree European research project on large-scale computation without synchronisation. Non-determinism makes it very difficult to reason about distributed applications. So Bravo et al. figured life might be easier if we could just make them … Continue reading Derflow: Distributed Deterministic Dataflow programming for Erlang

The Network is Reliable

December 18, 2014July 26, 2017 ~ adriancolyer ~ 4 Comments

The Network is Reliable - Bailis and Kingsbury 2014 This must be the easiest paper summary to write of the series so far. The network is reliable? Oh no it isn't... OK, here's a little more detail :) Network reliability matters because it prevents us from having reliable communication, and that in turn makes building … Continue reading The Network is Reliable

Tachyon: Reliable, Memory Speed Storage for Cluster Computing

December 4, 2014July 26, 2017 ~ adriancolyer ~ 3 Comments

Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks - Li et al. 2014 Data processing can often be naturally expressed as a sequence of steps in a pipeline. For example, the unix command line below that pipes a file through a series of transforms to ultimately generate some output. cat Fin.csv | a | … Continue reading Tachyon: Reliable, Memory Speed Storage for Cluster Computing

The case for distributed operating systems in the data center

December 2, 2014July 26, 2017 ~ adriancolyer ~ 4 Comments

New wine in old skins: the case for distributed operating systems in the data center - Schwarzkopf et al. 2013. I attended the New Directions in Operating Systems one-day event in London last week, and came away with the impression that the beginning of the end of the traditional operating system is in sight. Today's … Continue reading The case for distributed operating systems in the data center

SwiftCloud: Fault-tolerant geo-replication integrated all the way to the client

November 27, 2014July 26, 2017 ~ adriancolyer ~ 5 Comments

SwiftCloud: Fault-tolerant geo-replication integrated all the way to the client machine - Zawirski et al. 2013 Data is stored in the cloud, presentation is on mobile devices, and application processing is increasingly split between the two. As mobile devices get more and more capable, we would like to exploit more and more of that capability. … Continue reading SwiftCloud: Fault-tolerant geo-replication integrated all the way to the client

Use of Formal Methods at Amazon Web Services

November 24, 2014July 26, 2017 ~ adriancolyer ~ 18 Comments

Use of Formal Methods at Amazon Web Services - Newcombe et al 2014 Leslie Lamport recently gave a talk at the React conference on the specification language TLA. I wasn't there to hear the talk, but I was intrigued enough to dig in and find out a little more. Especially since I have some experience … Continue reading Use of Formal Methods at Amazon Web Services

Life Beyond Distributed Transactions

November 20, 2014July 26, 2017 ~ adriancolyer ~ 5 Comments

Life Beyond Distributed Transactions: An Apostate's Opinion - Pat Helland, 2007 It takes real skill to strip something back to its essence and explain it clearly in such a way that the ramifications become apparent. In my view Pat Helland pulls this off admirably in this paper and helps the reader think more deeply about … Continue reading Life Beyond Distributed Transactions

End-to-End Arguments in System Design

November 14, 2014July 26, 2017 ~ adriancolyer ~ 5 Comments

End-to-end arguments in system design - Saltzer, Reed, & Clark 1984. A true classic from 30 years ago. From the abstract: This paper presents a design principle that helps guide placement of functions among the modules of a distributed computer system. The principle, called the end-to-end argument, suggests that functions placed at low levels of … Continue reading End-to-End Arguments in System Design