Implementing Linearizability at Large Scale and Low Latency

Implementing Linearizability at Large Scale and Low Latency - Lee at al. 2015 Yesterday we saw how to layer a strictly serializable transaction model on top of an inconsistent replication protocol. Previously we've also looked at how to bolt-on a causal consistency model on top of eventual consistency. Today's paper demonstrates how to bolt-on (layer) … Continue reading Implementing Linearizability at Large Scale and Low Latency

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication - Zhang et al. 2015 Is there life beyond 'beyond distributed transactions?' In this paper, Zhang et al. introduce a layered approach to supporting distribution transactions, showing that a Transactional Application Protocol can be built on top of an Inconsistent Replication protocol (TAPIR). This direction is similar in spirit … Continue reading Building Consistent Transactions with Inconsistent Replication

IronFleet: Proving Practical Distributed Systems Correct

IronFleet: Proving Practical Distributed Systems Correct - Hawblitzel et al. (Microsoft Research) 2015 Every so often a paper comes along that makes you re-evaluate your world view. I happily would have told you that full formal verification of non-trivial systems (especially distributed systems) in a practical manner (i.e. something you could consider using for real … Continue reading IronFleet: Proving Practical Distributed Systems Correct

Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems

Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems - Mace et al. 2015 Problems in distributed systems are complex, varied, and unpredictable. By default, the information required to diagnose an issue may not be reported by the system or contained in system logs. Current approaches tie logging and statistics mechanisms into the development path of … Continue reading Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems

App-Bisect: Autonomous healing for microservices-based apps

App-Bisect: Autonomous healing for microservices-based apps - Rajagopalan & Jamjoon 2015 We've become comfortable with the idea of continuous deployment across multiple microservices, but what happens when that deployment introduces a problem? The standard answer comes in two parts: (a) use a canary when rolling out a new version to detect a potential problem before … Continue reading App-Bisect: Autonomous healing for microservices-based apps

lprof: A Non-intrusive Request-Flow Profiler for Distributed Systems

lprof: A Non-intrusive Request-Flow Profiler for Distributed Systems - Zhao et al. 2014 The Mystery Machine needs a request id in log records that can be used to correlate entries in a trace. What if you don't have that? lprof makes the absolute most of whatever logging your system already has. lprof is novel in … Continue reading lprof: A Non-intrusive Request-Flow Profiler for Distributed Systems

The Mystery Machine: End-to-end performance analysis of large-scale internet services

The Mystery Machine: End-to-end performance analysis of large-scale internet services - Chow et al. 2014 Google's Dapper paper is very well known, but Facebook's Mystery Machine seems to be much less well known - and that's a shame because I have a hunch the approach could be very relevant to many people. Current debugging and … Continue reading The Mystery Machine: End-to-end performance analysis of large-scale internet services

Dapper, A Large Scale Distributed Systems Tracing Infrastructure

Dapper, A Large Scale Distributed Systems Tracing Infrastructure - Sigelman et al. (Google) 2010 I'm going to dedicate the rest of this week to a series of papers addressing the important question of "how the hell do I know what is going on in my distributed system / cloud platform / microservices deployment?" As we'll … Continue reading Dapper, A Large Scale Distributed Systems Tracing Infrastructure

Distributed Information Processing in Biological and Computational Systems

Distributed Information Processing in Biological and Computational Systems - Navlakah & Bar-Joseph 2015 With thanks to Mark Allen for pointing me at today's paper choice via twitter. This is the last of the posts in the 'nature-inspired' series for a while, and we're moving on from optimisation problems to look at the way we build … Continue reading Distributed Information Processing in Biological and Computational Systems