ZooKeeper: wait-free coordination for internet scale systems

January 27, 2015July 26, 2017 ~ adriancolyer ~ 4 Comments

ZooKeeper: wait-free coordination for internet scale systems - Hunt et al. (Yahoo!) 2010 Distributed systems would be much simpler if the distributed parts didn't have to coordinate in some fashion. But it's this notion of 'working together' to achieve some aim that differentiates a distributed system from an unrelated bag of parts. Examples of the … Continue reading ZooKeeper: wait-free coordination for internet scale systems

Dremel: interactive analysis of web-scale datasets

January 26, 2015July 26, 2017 ~ adriancolyer ~ 2 Comments

Dremel: interactive analysis of web-scale datasets - Melnik et al. (Google), 2010. Dremel is Google's interactive ad-hoc query system that can run aggregate queries over trillions of rows in seconds. It scales to thousands of CPUs, and petabytes of data. It was also the inspiration for Apache Drill. Dremel borrows the idea of serving trees … Continue reading Dremel: interactive analysis of web-scale datasets

The MADlib Analytics Library

January 23, 2015July 26, 2017 ~ adriancolyer ~ 1 Comment

The MADlib Analytics Library - MAD Skills, the SQL - Hellerstein et al. 2012 The way that we use large databases has evolved from being primarily in support of accounting and financial record-keeping, to primarily in support of predictive analytics over a wide range of potentially noisy data. Analytics at scale requires the marriage of … Continue reading The MADlib Analytics Library

The Design Philosophy of the DARPA Internet Protocols

January 22, 2015July 26, 2017 ~ adriancolyer ~ 7 Comments

The Design Philosophy of the DARPA Internet Protocols - Clark 1988 While there have been papers and specifications that describe how the (internet) protocols work, it is sometimes difficult to deduce from these why the protocol is as it is. For example, the Internet protocol is based on a connectionless or datagram mode of service. … Continue reading The Design Philosophy of the DARPA Internet Protocols

A new presumed commit optimisation for two-phase commit

January 21, 2015July 26, 2017 ~ adriancolyer ~ 2 Comments

A new presumed commit optimisation for two phase commit - Lampson and Lomet 1993. Two phase commit (2PC) is the protocol used to coordinate distributed transactions. In this paper Lampson and Lomet first recap the basic 2PC protocol in its 'presumed nothing' form, then go on to describe the traditional 'presumed abort' and 'presumed commit' … Continue reading A new presumed commit optimisation for two-phase commit

Architecture of a Database System

January 20, 2015July 26, 2017 ~ adriancolyer ~ 9 Comments

Architecture of a Database System - Hellerstein, Stonebraker & Hamilton, 2007. This is a longer read (and hence a slightly longer write-up too) coming in at 119 pages, but it's written in a very easy style so the pages fly by. It oozes wisdom and experience from every paragraph as Joe Hellerstein and Michael Stonebroker … Continue reading Architecture of a Database System

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing

January 19, 2015July 26, 2017 ~ adriancolyer ~ 5 Comments

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing - Google 2014 Mesa is another in the tapestry of systems that support Google's advertising business. Previously editions of The Morning Paper have covered Photon, Spanner, F1, and F1's online schema update mechanism. Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related … Continue reading Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing

Epigrams on programming

January 16, 2015July 26, 2017 ~ adriancolyer

Epigrams on programming - Perlis, 1982 See also the original formatted version in PDF at the ACM Digital Library if you have a subscription. A bit of Friday fun today. Not strictly a paper, but certainly a classic! Twitter didn't exist in 1982, though if it did, Alan Perlis would be a master tweeter. This … Continue reading Epigrams on programming

The Tail at Scale

January 15, 2015July 26, 2017 ~ adriancolyer ~ 16 Comments

The Tail at Scale - Dean and Barroso 2013 We've all become familiar with the importance of fault-tolerance and the techniques that can be used to achieve it. Less well-known is the idea of tail-tolerance. A system that doesn't respond quickly enough feels clunky to its users and can have serious negative consequences for site/service … Continue reading The Tail at Scale

Mergeable persistent data structures

January 14, 2015July 26, 2017 ~ adriancolyer ~ 3 Comments

Mergeable persistent data structures - Farinier et al. 2014 Irmin is part of the MirageOS project that was the subject of yesterday's paper, where it is also the basis for a Git-like persistent file system used for the OS. What if you could version-control a (mutable) persistent data structure, inspect its history, clone a remote … Continue reading Mergeable persistent data structures