WANalytics: Analytics for a geo-distributed, data intensive world

WANalytics: analytics for a geo-distributed data intensive world - Vulimiri et al. 2015 ...data is born distributed; we only control data replication and distributed execution strategies. This is true for so many sources of data. Combine this with Dave McCrory's observation that 'Data has Gravity' (i.e. it attracts applications and other data processing workloads to … Continue reading WANalytics: Analytics for a geo-distributed, data intensive world

ZooKeeper: wait-free coordination for internet scale systems

ZooKeeper: wait-free coordination for internet scale systems - Hunt et al. (Yahoo!) 2010 Distributed systems would be much simpler if the distributed parts didn't have to coordinate in some fashion. But it's this notion of 'working together' to achieve some aim that differentiates a distributed system from an unrelated bag of parts. Examples of the … Continue reading ZooKeeper: wait-free coordination for internet scale systems

Dremel: interactive analysis of web-scale datasets

Dremel: interactive analysis of web-scale datasets - Melnik et al. (Google), 2010. Dremel is Google's interactive ad-hoc query system that can run aggregate queries over trillions of rows in seconds. It scales to thousands of CPUs, and petabytes of data. It was also the inspiration for Apache Drill. Dremel borrows the idea of serving trees … Continue reading Dremel: interactive analysis of web-scale datasets

A new presumed commit optimisation for two-phase commit

A new presumed commit optimisation for two phase commit - Lampson and Lomet 1993. Two phase commit (2PC) is the protocol used to coordinate distributed transactions. In this paper Lampson and Lomet first recap the basic 2PC protocol in its 'presumed nothing' form, then go on to describe the traditional 'presumed abort' and 'presumed commit' … Continue reading A new presumed commit optimisation for two-phase commit

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing - Google 2014 Mesa is another in the tapestry of systems that support Google's advertising business. Previously editions of The Morning Paper have covered Photon, Spanner, F1, and F1's online schema update mechanism. Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related … Continue reading Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing