High-Performance ACID via Modular Concurrency Control

High-Performance ACID via Modular Concurrency Control - Xie et al. 2015 In yesterday's paper on Existential Consistency at Facebook the authors postulated that a future direction might be to use different consistency mechanisms for different parts of a system. 'High Performance ACID via Modular Concurrency Control' applies a similar idea within the confines of an … Continue reading High-Performance ACID via Modular Concurrency Control

Existential Consistency: Measuring and Understanding Consistency at Facebook

Existential Consistency: Measuring and Understanding Consistency at Facebook - Lu et al. 2015 At the core of this paper is an analysis of the number of anomalies seen in Facebook's production system for clients of TAO, which is impressively low under normal operation (0.0004%) - to interpret that number of course, we'll have to dig … Continue reading Existential Consistency: Measuring and Understanding Consistency at Facebook

IronFleet: Proving Practical Distributed Systems Correct

IronFleet: Proving Practical Distributed Systems Correct - Hawblitzel et al. (Microsoft Research) 2015 Every so often a paper comes along that makes you re-evaluate your world view. I happily would have told you that full formal verification of non-trivial systems (especially distributed systems) in a practical manner (i.e. something you could consider using for real … Continue reading IronFleet: Proving Practical Distributed Systems Correct

App-Bisect: Autonomous healing for microservices-based apps

App-Bisect: Autonomous healing for microservices-based apps - Rajagopalan & Jamjoon 2015 We've become comfortable with the idea of continuous deployment across multiple microservices, but what happens when that deployment introduces a problem? The standard answer comes in two parts: (a) use a canary when rolling out a new version to detect a potential problem before … Continue reading App-Bisect: Autonomous healing for microservices-based apps

lprof: A Non-intrusive Request-Flow Profiler for Distributed Systems

lprof: A Non-intrusive Request-Flow Profiler for Distributed Systems - Zhao et al. 2014 The Mystery Machine needs a request id in log records that can be used to correlate entries in a trace. What if you don't have that? lprof makes the absolute most of whatever logging your system already has. lprof is novel in … Continue reading lprof: A Non-intrusive Request-Flow Profiler for Distributed Systems

The Mystery Machine: End-to-end performance analysis of large-scale internet services

The Mystery Machine: End-to-end performance analysis of large-scale internet services - Chow et al. 2014 Google's Dapper paper is very well known, but Facebook's Mystery Machine seems to be much less well known - and that's a shame because I have a hunch the approach could be very relevant to many people. Current debugging and … Continue reading The Mystery Machine: End-to-end performance analysis of large-scale internet services

Dapper, A Large Scale Distributed Systems Tracing Infrastructure

Dapper, A Large Scale Distributed Systems Tracing Infrastructure - Sigelman et al. (Google) 2010 I'm going to dedicate the rest of this week to a series of papers addressing the important question of "how the hell do I know what is going on in my distributed system / cloud platform / microservices deployment?" As we'll … Continue reading Dapper, A Large Scale Distributed Systems Tracing Infrastructure

Out of the Fire Swamp – Part III, Go with the flow.

Go with the flow At the conclusion of Part II we introduced the notion of a (micro)service owning exclusive access to a set of data in order to manage application invariants. Once we start to break things down this way, we need to start thinking about the flow of data between microservices. A better paradigm? … Continue reading Out of the Fire Swamp – Part III, Go with the flow.

Out of the Fire Swamp – Part II, Peering into the mist

Peering into the mist In Part I we examined the data crisis, accepted that anomalies are inevitable, and realized the central importance of the application. But what should we do about it? Here I'm peering into the mist and speculating about a way forward, navigating via the signposts that the database research community has put … Continue reading Out of the Fire Swamp – Part II, Peering into the mist

Out of the Fire Swamp* – Part I, ‘The Data Crisis’

(*) with apologies to Moseley, Marks, and Westley. Something a little different to the regular paper reviews for the next three days. Inspired by yesterday's 'Consistency without Borders,' and somewhat dismayed by what we learned in 'Feral Concurrency Control', I'm going to attempt to pull together a bigger picture, to the extent that I can … Continue reading Out of the Fire Swamp* – Part I, ‘The Data Crisis’