The Potential Dangers of Causal Consistency and an Explicit Solution – Bailis et al. 2012
Yesterday we saw how we could get both better performance and stronger consistency by upgrading from eventual consistency to causal consistency. Are there any downsides?
With useful semantics, low latency, partition tolerance, and, recently, a demonstrably efficient architecture, causal consistency appears an ideal model for the future of wide-area distributed data stores. However, implementations of causal consistency face serious scalability challenges. In this paper, we identify a critical trade-off between write throughput and visibility latency, or the amount of time that each write is hidden from readers due to missing dependencies.
Visibility latency is the delay between the time a system receives a write, and the time that it can apply it. The delay between the two is a function of how long it takes the write’s dependencies to arrive. In this paper, the focus is on studying global causal+ consistency – global as in needing to coordinate across datacenters, and the ‘+‘ to indicate that we care about convergence.
Intuitively, visibility latency and throughput are competing goals. If the aggregate (global) throughput limit across datacenters is exceeded and new versions are generated faster than they can be applied, visibility latency increases indefinitely due to the formation of unstable queues. The effects of this trade-off are magnified by the fact that convergent causal consistency effectively requires that all datacenters locally apply all writes: the global throughput limit is limited to the minimum apply-capacity across datacenters.
Weaker consistency models are often amenable to partial replication, this is much harder with causal+ consistency.
Convergent causal consistency requires all-to-all replication that limits global write throughput. Assume we have two datacenters, each of which has an apply-capacity of A. To prevent unstable queuing, whereby writes are generated at a rate that is faster than they are applied remotely, the aggregate new write throughput must be limited to A; if we allocate newthroughput equally between datacenters, each can locally generate new writes at a rate of A/2 . Adding a third, equally powerful datacenter does not improve the situation: the aggregate global write throughput is still A, and each datacenter can now generate writes at a rate of A/3 . With N datacenters, each can write at a rate A/N. With heterogeneously powerful datacenters, the sustainable aggregate write rate is limited to the local apply- capacity of the weakest datacenter.
That doesn’t sound so good! Especially when you consider that datacenters are often added to deal with limited capacity of existing datacenters – yet with convergent causality, the addition of another datacenter requires upgrading all existing datacenters’ capacity. Network partitions (and datacenter failures) bring the minimum local apply capacity to zero.
We get into this mess primarily because the dominant causal consistency systems that have been studied use potential causality: we don’t know anything about the application logic, so we just have to assume that anything you’ve seen is a potential cause.
The causality graph fanout (degree) determines the number of dependency checks required, while its depth and connectivity determines the degree of concurrency in performing checks. In practical settings, potential causality graphs are on the order of hundreds of millions of writes. Classic causal consistency tracks each data item’s potential dependencies. On a social network, if a user posts a status update after viewing 10 other updates, the new update potentially depends on all of them. This leads to a graph vertex degree (or fanout) of 10. At the level of a storage system, given that many modern applications are read-dominated, it is likely that this number is much higher—in the hundreds or thousands of dependencies. For many modern web services, it is rare—or impossible—to post an update without viewing tens or hundreds of data items upon which the update will potentially depend. Each update’s dependencies’ dependencies are included in the graph, along with those updates’ dependencies’ dependencies’ dependencies—ad infinitum—until a (set of) terminal source events is reached.
These large graphs limit local apply-capacities and hence maximum global throughput.
As we saw yesterday, a promising solution to this is explicit causality:
Instead of tracking all potential dependencies, why not track only those that matter?… To address these concerns, we consider explicit causality, or application-specified causal dependencies. Instead of including the entire set of possible influences in the happens-before relation, we defer to the application to tell the data store which dependencies matter. Each new write is accompanied by a set of dependencies that determine its happens-before relation. The causally consistent data store still enforces transitivity of the provided happens-before relation but does not enforce program order or reads-from relationships unless explicitly instructed to do so by the application. This explicit causality does not solve the problem of all-to-all replication, but it has important implications for the trade- off between throughput and latency visibility.
A study of explicit causality in Twitter and Facebook showed that transitive explicit causality relationships across operations are small: lengths are often in the tens of events, and maximally several thousands.
For major services like Twitter, the potential causality chains for even a year of operation are approximately nine orders of magnitude larger than a pessimistic explicit causality chain (e.g. 340M*365 vs. 100).
- These much smaller chains mean much faster dependency checking, increasing capacity and lowering visibility latency. (They also increase the probability that the datacenter already has all of the dependencies – a point not made directly in the paper).
-
The decreased fanout of each write decreases metadata overheads – far less than the requirements for potential causality. “While potential causality overhead may be reduced during normal operation due to garbage collection, we no longer need to rely on this mechanism to keep metadata small.”
-
The small causality graphs give us much more independence between sets of writes. “Instead of having a potential causality graph with a high degree of connectivity across all updates (likely completely connected), explicit causality results in several smaller, disjoint graphs.” Writes to each independent graph can be applied in parallel.
This is why bolt-on causal consistency uses explicit causality.
Under explicit causality, each application defines its own happens-before relationships. This means that each write to the data store must be accompanied by a (potentially empty) set of dependencies supplied by the programmer… For example, recent work frequently refers to an example of updating one’s privacy settings and subsequently posting a sensitive update on a social network. If the privacy setting replicates slower than the new update, the new update might be shown to an unintended audience. Under explicit causality, the application would ensure that each user’s most recent privacy policy happens-before each of their writes.
Convergent explicit causality still faces the peak throughput problems of all-to-all replication, though the minimum datacenter throughput is likely increased due to the smaller chains.
The voluntary API is best utilized at system layers in which causality is easy to understand and inexpensive to capture. These layers are likely at at higher level than a storage or communication library; in our discussion, we have focused on application-level causality.
The authors conclude:
If convergent causal consistency is to be the preferred model for future weakly consistent distributed data stores, these dangers must be studied in greater detail. Explicit causality lowers the price of causal consistency, but whether even this decreased cost is offset by the benefits that causality provides over weaker forms of convergent consistency is unknown. A serious positive recommendation requires further consideration of application semantics, realistic workloads, and both expected and worst-case operating conditions.