The Tail at Scale - Dean and Barroso 2013 We've all become familiar with the importance of fault-tolerance and the techniques that can be used to achieve it. Less well-known is the idea of tail-tolerance. A system that doesn't respond quickly enough feels clunky to its users and can have serious negative consequences for site/service … Continue reading The Tail at Scale
Tag: Distributed Systems
Core distributed systems topics, for example consistency, availability and so on.
Blazes: Coordination analysis for distributed programs
Blazes: Coordination analysis for distributed programs - Alvaro et al. 2014 For many practitioners distributed consistency is the most critical issue for system performance and manageability at scale. In Blazes, Alvaro et al. take a fresh look at 'an urgent issue for distributed systems developers,' namely the correctness and efficiency of distributed consistency mechanisms for … Continue reading Blazes: Coordination analysis for distributed programs
Derflow: Distributed Deterministic Dataflow programming for Erlang
Derflow: Distributed Deterministic Dataflow programming for Erlang - Bravo et al. 2014 Today's choice is part of the work of the SyncFree European research project on large-scale computation without synchronisation. Non-determinism makes it very difficult to reason about distributed applications. So Bravo et al. figured life might be easier if we could just make them … Continue reading Derflow: Distributed Deterministic Dataflow programming for Erlang
The Network is Reliable
The Network is Reliable - Bailis and Kingsbury 2014 This must be the easiest paper summary to write of the series so far. The network is reliable? Oh no it isn't... OK, here's a little more detail :) Network reliability matters because it prevents us from having reliable communication, and that in turn makes building … Continue reading The Network is Reliable
Tachyon: Reliable, Memory Speed Storage for Cluster Computing
Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks - Li et al. 2014 Data processing can often be naturally expressed as a sequence of steps in a pipeline. For example, the unix command line below that pipes a file through a series of transforms to ultimately generate some output. cat Fin.csv | a | … Continue reading Tachyon: Reliable, Memory Speed Storage for Cluster Computing
The case for distributed operating systems in the data center
New wine in old skins: the case for distributed operating systems in the data center - Schwarzkopf et al. 2013. I attended the New Directions in Operating Systems one-day event in London last week, and came away with the impression that the beginning of the end of the traditional operating system is in sight. Today's … Continue reading The case for distributed operating systems in the data center
SwiftCloud: Fault-tolerant geo-replication integrated all the way to the client
SwiftCloud: Fault-tolerant geo-replication integrated all the way to the client machine - Zawirski et al. 2013 Data is stored in the cloud, presentation is on mobile devices, and application processing is increasingly split between the two. As mobile devices get more and more capable, we would like to exploit more and more of that capability. … Continue reading SwiftCloud: Fault-tolerant geo-replication integrated all the way to the client
Use of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web Services - Newcombe et al 2014 Leslie Lamport recently gave a talk at the React conference on the specification language TLA. I wasn't there to hear the talk, but I was intrigued enough to dig in and find out a little more. Especially since I have some experience … Continue reading Use of Formal Methods at Amazon Web Services
Life Beyond Distributed Transactions
Life Beyond Distributed Transactions: An Apostate's Opinion - Pat Helland, 2007 It takes real skill to strip something back to its essence and explain it clearly in such a way that the ramifications become apparent. In my view Pat Helland pulls this off admirably in this paper and helps the reader think more deeply about … Continue reading Life Beyond Distributed Transactions
End-to-End Arguments in System Design
End-to-end arguments in system design - Saltzer, Reed, & Clark 1984. A true classic from 30 years ago. From the abstract: This paper presents a design principle that helps guide placement of functions among the modules of a distributed computer system. The principle, called the end-to-end argument, suggests that functions placed at low levels of … Continue reading End-to-End Arguments in System Design