Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing - Google 2014 Mesa is another in the tapestry of systems that support Google's advertising business. Previously editions of The Morning Paper have covered Photon, Spanner, F1, and F1's online schema update mechanism. Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related ... Continue Reading

The Tail at Scale

The Tail at Scale - Dean and Barroso 2013 We've all become familiar with the importance of fault-tolerance and the techniques that can be used to achieve it. Less well-known is the idea of tail-tolerance. A system that doesn't respond quickly enough feels clunky to its users and can have serious negative consequences for site/service ... Continue Reading

Photon: Fault-tolerant and scalable joining of continuous data streams

Photon: Fault-tolerant and scalable joining of continuous data streams - Google 2013 To the best of our knowledge, this is the first paper to formulate and solve the problem of joining multiple streams continuously under these system constraints: exactly-once semantics, fault-tolerance at datacenter-level, high scalability, low latency, unordered streams, and delayed primary stream. It's interesting ... Continue Reading

The Google File System

The Google File System - Ghemawat, Gobioff & Leung, 2003 Here's a paper with a lot to answer for! Back in 2003 Ghemawat et al reported that We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault-tolerance while running on inexpensive commodity hardware, ... Continue Reading