TAO: Facebook’s Distributed Data Store for the Social Graph

TAO: Facebook's Distributed Data Store for the Social Graph Bronson et al. (Facebook) 2013 A single Facebook page may aggregate and filter hundreds of items from the social graph. We present each user with content tailored to them, and we filter every item with privacy checks that take into account the current viewer. This extreme … Continue reading TAO: Facebook’s Distributed Data Store for the Social Graph

Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services

Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services - Sharma et al. 2015 At Facebook, lots of applications are interested in data being written to Facebook's data stores. Having each of these applications poll the data stores of interest would be untenable, so Facebook built a pub-sub system to identify updates and transmit notifications to … Continue reading Wormhole: Reliable pub-sub to support Geo-Replicated Internet Services

Musketeer – Part I : What’s the best data processing system?

Musketeer: all for one, one for all in data processing systems - Gog et al. 2015 For between 40-80% of the jobs submitted to MapReduce systems, you'd be better off just running them on a single machine... It was Eurosys 2015 last week, and a great new crop of papers were presented. Gog et al. … Continue reading Musketeer – Part I : What’s the best data processing system?

SAMC: Semantic-aware model checking for fast discovery of deep bugs in cloud systems

SAMC: Semantic-aware model checking for fast discovery of deep bugs in cloud systems - Leesatapornwongsa et al. 2014 This is the second of three papers we'll be looking at this week on the theme of verifying correctness of, and catching bugs in, distributed systems. Yesterday we saw the Statecall Policy Language and associated tool chain … Continue reading SAMC: Semantic-aware model checking for fast discovery of deep bugs in cloud systems

RIPQ: Advanced photo caching on flash for Facebook

RIPQ: Advanced Photo Caching on Flash for Facebook - Tang et al. 2015 It's three for the price of one with this paper: we get to deepen our understanding of the characteristics of flash, examine a number of priority queue and caching algorithms, and get a glimpse into what's behind an important part of Facebook's … Continue reading RIPQ: Advanced photo caching on flash for Facebook

Liquid: Unifying nearline and offline big data integration

Liquid: Unifying Nearline and Offline Big Data Integration - Fernandez et al. 2015 This is post 3 of 5 in a series looking at the latest research from the CIDR '15 conference. Also in the series so far this week: 'The missing piece in complex analytics' and 'WANalytics: analytics for a geo-distributed, data intensive world'. … Continue reading Liquid: Unifying nearline and offline big data integration

WANalytics: Analytics for a geo-distributed, data intensive world

WANalytics: analytics for a geo-distributed data intensive world - Vulimiri et al. 2015 ...data is born distributed; we only control data replication and distributed execution strategies. This is true for so many sources of data. Combine this with Dave McCrory's observation that 'Data has Gravity' (i.e. it attracts applications and other data processing workloads to … Continue reading WANalytics: Analytics for a geo-distributed, data intensive world