Recursive Programming - Dijkstra 1960 * Updated link to one that is not behind a paywall - thanks to Graham Markall for the catch * This paper deals with something we take so much for granted that it's hard to imagine a time when it had yet to be introduced to the world. That time … Continue reading Recursive Programming
Month: January 2015
On Distributed Communications Networks
On Distributed Communications Networks - Baran 1962 Way before it became fashionable to build large-scale distributed systems out of relatively unreliable commodity hardware, Baran was investigating the properties a network built in such a way might have. This paper is very much of its time, and all the more enjoyable for it: want to know … Continue reading On Distributed Communications Networks
Eraser: A dynamic data race detector for multi-threaded programs
Eraser: A dynamic data race detector for multi-threaded programs - Savage et al. 1997 Debugging a multithreaded program can be difficult. Simple errors in synchronization can produce timing-dependent data races that can take weeks or months to track down. Eraser dynamically detects data races in multi-threaded programs. There are two basic approaches to doing this, … Continue reading Eraser: A dynamic data race detector for multi-threaded programs
ZooKeeper: wait-free coordination for internet scale systems
ZooKeeper: wait-free coordination for internet scale systems - Hunt et al. (Yahoo!) 2010 Distributed systems would be much simpler if the distributed parts didn't have to coordinate in some fashion. But it's this notion of 'working together' to achieve some aim that differentiates a distributed system from an unrelated bag of parts. Examples of the … Continue reading ZooKeeper: wait-free coordination for internet scale systems
Dremel: interactive analysis of web-scale datasets
Dremel: interactive analysis of web-scale datasets - Melnik et al. (Google), 2010. Dremel is Google's interactive ad-hoc query system that can run aggregate queries over trillions of rows in seconds. It scales to thousands of CPUs, and petabytes of data. It was also the inspiration for Apache Drill. Dremel borrows the idea of serving trees … Continue reading Dremel: interactive analysis of web-scale datasets
The MADlib Analytics Library
The MADlib Analytics Library - MAD Skills, the SQL - Hellerstein et al. 2012 The way that we use large databases has evolved from being primarily in support of accounting and financial record-keeping, to primarily in support of predictive analytics over a wide range of potentially noisy data. Analytics at scale requires the marriage of … Continue reading The MADlib Analytics Library
The Design Philosophy of the DARPA Internet Protocols
The Design Philosophy of the DARPA Internet Protocols - Clark 1988 While there have been papers and specifications that describe how the (internet) protocols work, it is sometimes difficult to deduce from these why the protocol is as it is. For example, the Internet protocol is based on a connectionless or datagram mode of service. … Continue reading The Design Philosophy of the DARPA Internet Protocols
A new presumed commit optimisation for two-phase commit
A new presumed commit optimisation for two phase commit - Lampson and Lomet 1993. Two phase commit (2PC) is the protocol used to coordinate distributed transactions. In this paper Lampson and Lomet first recap the basic 2PC protocol in its 'presumed nothing' form, then go on to describe the traditional 'presumed abort' and 'presumed commit' … Continue reading A new presumed commit optimisation for two-phase commit
Architecture of a Database System
Architecture of a Database System - Hellerstein, Stonebraker & Hamilton, 2007. This is a longer read (and hence a slightly longer write-up too) coming in at 119 pages, but it's written in a very easy style so the pages fly by. It oozes wisdom and experience from every paragraph as Joe Hellerstein and Michael Stonebroker … Continue reading Architecture of a Database System
Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing
Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing - Google 2014 Mesa is another in the tapestry of systems that support Google's advertising business. Previously editions of The Morning Paper have covered Photon, Spanner, F1, and F1's online schema update mechanism. Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related … Continue reading Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing