WANalytics: analytics for a geo-distributed data intensive world - Vulimiri et al. 2015 ...data is born distributed; we only control data replication and distributed execution strategies. This is true for so many sources of data. Combine this with Dave McCrory's observation that 'Data has Gravity' (i.e. it attracts applications and other data processing workloads to … Continue reading WANalytics: Analytics for a geo-distributed, data intensive world
Category: Distributed Systems
Core distributed systems topics, for example consistency, availability and so on.
The Missing Piece in Complex Analytics
The Missing Piece in Complex Analytics: Low latency scalable model management and serving with Velox - Crankshaw et al. 2015. Analytics at scale can be used to create statistical models for making predictions about the world, but once the data scientists and analysts have done their initial work and a model has been built and … Continue reading The Missing Piece in Complex Analytics
ZooKeeper: wait-free coordination for internet scale systems
ZooKeeper: wait-free coordination for internet scale systems - Hunt et al. (Yahoo!) 2010 Distributed systems would be much simpler if the distributed parts didn't have to coordinate in some fashion. But it's this notion of 'working together' to achieve some aim that differentiates a distributed system from an unrelated bag of parts. Examples of the … Continue reading ZooKeeper: wait-free coordination for internet scale systems
Dremel: interactive analysis of web-scale datasets
Dremel: interactive analysis of web-scale datasets - Melnik et al. (Google), 2010. Dremel is Google's interactive ad-hoc query system that can run aggregate queries over trillions of rows in seconds. It scales to thousands of CPUs, and petabytes of data. It was also the inspiration for Apache Drill. Dremel borrows the idea of serving trees … Continue reading Dremel: interactive analysis of web-scale datasets
A new presumed commit optimisation for two-phase commit
A new presumed commit optimisation for two phase commit - Lampson and Lomet 1993. Two phase commit (2PC) is the protocol used to coordinate distributed transactions. In this paper Lampson and Lomet first recap the basic 2PC protocol in its 'presumed nothing' form, then go on to describe the traditional 'presumed abort' and 'presumed commit' … Continue reading A new presumed commit optimisation for two-phase commit
Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing
Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing - Google 2014 Mesa is another in the tapestry of systems that support Google's advertising business. Previously editions of The Morning Paper have covered Photon, Spanner, F1, and F1's online schema update mechanism. Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related … Continue reading Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing