With the Desert Island Papers weeks, I invite a guest to select 5 of their favourite papers and give a short introduction to them (see last month’s edition with Peter Alvaro for a longer explanation of the concept). This week, I’m delighted to have Jonas Bonér introduce his selections. I’ve known Jonas for a long time, going back to the early 2000’s when we were both active in the aspect-oriented programming community. Of course nowadays many of you probably know him best for his work with Akka and the reactive manifesto. If you’re in San Francisco this week, you can catch him speaking at the Scala Days conference this Tuesday.
Jonas, would you like to briefly introduce yourself and your research interests?
I work as CTO at Typesafe. I’ve been involved in Open Source for most of my career and started the Akka project (a runtime for concurrency and distribution) about 6 years ago. I’ve always been interested in distributed systems but the last years it has grown into a deep passion of mine — which naturally goes hand in hand with the my love for reading academic papers and CS books.
And now, can you please tell us a little bit about your selections and what it is that makes them special to you?
First when you asked me I wanted to come up with my list of 5 desert island papers, the papers that came to mind were some of the classics (Backus’ Turing Award Lecture, Lamport’s Time, Clocks and Ordering of Events paper, A Note on Distributed Computing and others). These are all papers that I love and re-read, but after thinking some more about it I had another idea. First, I am very passionate about distributed systems and second, there so much amazing research in distributed systems happening right now (the last 5-10 years). Why not instead dedicate my first 4 papers to contemporary research gems, that points towards the future, and that will most likely have a huge impact in how we will build systems in the (a not that distant) future. I liked this idea better and came up with a new list which will hopefully inspire you as much as it has me. The final one I’ll reserve for an old time favorite.
(Note, all three of the paper’s that Jonas cites above have previously featured in The Morning Paper as they are also among some of my favourites. This was in the pre-blog era of #themorningpaper when I gave short highlights on twitter only. Thus there is no permanent record on this blog for the first 44 papers I covered. Some of these I would love to revisit in future editions to fix that. One of them we get to cover this week!).
I’ll be featuring each of Jonas selections below throughout the week. As always, look out for the notifications on twitter or subscribe to the email list to make sure you don’t miss any!
Consistency Analysis in Bloom: a CALM and Collected Approach
Consistency analysis in Bloom – Alvaro et al. 2011
I’m a big believer in Logic Programming and ever since I took the first Prolog class in school I’ve had a strong belief that declarative programming is the right way to develop systems, and that it would take over the world as soon as the hardware got good enough. I still do. It just has to.
This is one of the reasons I got so excited when Joe Hellerstein published the CALM Conjecture (later to be proved into the CALM Theorem). His (and his team’s) work on marrying logic with distributed systems was genius move and the result is both as thrilling and mind-blowing as it is dead obvious. It feels intuitive when you see it, which is a trademark for most brilliant ideas.
The essence of the idea is that a program that can be expressed in monotonic logic does not need to coordinate it’s changes–it is deterministic (confluent) and therefore eventually consistent. This has huge implications since coordination is the most expensive thing you can do in a distributed system. Avoiding it means that we can eliminate both contention (wait time) and the coherency costs of maintaining the shared view on data.
Now the question is: when can a program be expressed in monotonic logic and how can it be made practical and approachable for the general audience of programmers? Ideally you would have your language tell you exactly when and where you need to coordinate, and guarantee you that the rest of the program can freely exploit all concurrency and parallelism that you make available to it, without you having to worry. Peter Alvaro et. al’s ground-breaking work on Bloom (and Dedalus) gives you the theoretical foundation for such a language and I believe it gives us a glimpse of the future.
Consistency, Availability, and Convergence
Consistency, Availability, and Convergence – Mahajan et al. 2011
In a world of impossibility theorems (FLP, CAP etc.) it is refreshing to read one that shows what is actually possible (even thought it does so through an impossibility result).
We all know that Eventual Consistency is often too weak, and can be unintuitive and hard to work with. On the other hand most problems do not need Linearizability (which the CAP theorem argument is really all about). There is a grey zone between these two extremes, but it is IMO in this grey zone that most of the interesting use-cases and applications reside.
What is so exciting with the CAC (Consistency, Availability, and Convergence) paper is that it wanders out in this grey zone and manages to come back with a roof for the strongest level of consistency achievable in an an always available system; Real-Time Causal Consistency. The reason I’m so excited about this result is that causal consistency is usually what users expect, it is intuitive and provides a level of consistency that is often sufficient in real-world applications.
A related paper (with concurrent research) is the COPS paper (Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS) which defines Causal+ consistency and is also worth reading.
Note: I’ve previously covered ‘Consistency, Availability, and Convergence’ on this blog, but only with a highlight on the main results. On Tuesday this week we’ll take a deeper look, and put it up in a double-header with the COPS paper.
A comprehensive study of Convergent and Commutative Replicated Data Types
A comprehensive study of Convergent and Commutative Replicated Data Types – Shapiro et al. 2011
The Dynamo paper was highly influential on the industry. It introduced and popularized an interesting combination of features–Eventually Consistency, epidemic gossiping, node ring, consistent hashing, read repair, hinted handoff etc.—which kicked off the NOSQL movement. In its footsteps we saw lots of Key-Value stores emerge, databases that chose availability over (strong) consistency (sometimes it felt like we got a new K/V store every day).
This was all good. But after the hype had settled in grew on us that there are a lot of problems that don’t naturally fit the key-value model. We as developers are used to model our data in richer data structures such as maps, sets and graphs, and we are used to be able to compose these to model our world. The question is is there a way to get the best of both worlds? To be able to model our data in rich composable data structures, while retaining the benefits of the eventually consistent model?
Mark Shapiro et. al’s work on CRDTs showed us how, and their work has already been highly influential in the industry with production grade implementations in both Riak and Akka.
Coordination Avoidance in Database Systems
Coordination Avoidance in Database Systems– Bailis et al. 2014
As we have discussed above; coordination of data is one of the most expensive thing you can do in a distributed system (or a local multi-core system for that matter). It introduces contention and coherency costs, and — as Neil Gunther’s Universal Scalability Law shows us — introduces scalability bottlenecks and makes it hard to write resilient answer the question “When does consistency require coordination?”.
Peter Bailis et. al set out to answer this question in the general sense — a bold task if you ask me — and in his most recent paper he came back with a very interesting answer. First they define a property called Invariant Confluence (I-confluence) which is defined as “all local commit decisions must be globally invariant-preserving”. Then they show that if I-confluence holds for a certain commit, then coordination of this commit is not necessary — at all. Quite an astonishing result in its simplicity and power, and feels quite intuitive when you start thinking about it.
However, defining these invariants in the application and making sure that they are maintained over its lifetime, is more easily said than done. Here tools or language support would be helpful. I would be interested in seeing the dynamics between CALM and I-confluence explored in more depth.
Out of the Tar Pit
Out of the tar pit – Moseley & Marks 2006
This paper is one of my all time favorites and one that I try reread every year. It covers a lot of ground but essentially tries to answer the question: why is programming still so hard, and what can we do about it? It starts out by trying to understand what software complexity is all about, its different flavors — essential and accidental complexity — and its roots.
The authors identify two main sources of complexity — mutable state and control — and proposes a better way of managing it. First, to move the mutable state into relations which are then managed through declarative logic. Second, to design systems in three different layers consisting of; a completely stand-alone core of Essential State, managed through a layer of Essential Logic (the business logic), shielded from the rest of the system – the so called Accidental State and Control layer.
I think this paper is becoming more and more relevant every year and a lot of it is applicable today; for example through the increased use of Functional and Dataflow Programming for managing state and state dependencies, to using Error Kernel pattern when programming with Actors as a way of shielding the Essential State and Logic from the more risky Accidental State and Control layers below. We are getting there, step by step.
Runner Ups
Here are some other recent distributed systems favorites that made it to the final round but that I had to leave at home (5 is too few :-)):
Highly Available Transactions: Virtues and Limitations – Bailis et al. 2014.
(See the write-up on The Morning Paper).
Distributed transactions strikes back! This paper got me to reconsider my pretty strong opinions on the usefulness of distributed transactions. A great reality check. Luckily it has been covered on this blog before.
Consistency without Borders – Alvaro et al. 2013
I really like this paper. It tells a very nice story about what consistency is really all about, what makes it hard, what is wrong with the way we approach it in today’s systems, and what we can do about it–at different levels in our programming model. It leaves more questions than answers, which I like. Food for thought.
Derflow: Distributed Deterministic Dataflow Programming for Erlang – Bravo et al. 2014
(See the write-up on The Morning Paper).
This research builds upon a long time love of mine: Oz’s deterministic dataflow concurrency. It takes it distributed and then mixes in some CRDTs. Points towards the future.
Life Beyond Distributed Transactions – Helland 2007
(See the write-up on The Morning Paper).
This is one of my top favorites and HUGE influence to me and to the work we are doing in Akka.