Cloud Types for Eventual Consistency – Burckhardt et al. 2012
Providing good programming abstractions for cloud storage, synchronization, and disconnected operation appears crucial to accelerate the production of useful and novel applications on today’s and tomorrow’s mobile devices.
This paper proposes a model based on cloud types (which may be integrated with a programming language). Programmers focus on declaring the data structures and the client code that accesses them:
Programmers declare the data they wish to share using special cloud types. This data is automatically shared between all devices, and is automatically persisted both on local storage as well as in cloud storage. Our cloud types include both simple types (cloud integers, cloud strings) and structured types (cloud arrays, cloud entities). Because cloud types are carefully chosen to behave predictably under concurrent modification, conflict resolution is automatic and the developer need not write special code to handle merging.
The system then uses revision diagrams to guarantee eventual consistency for cloud types:
Conceptually, the cloud stores the main revision, while devices maintain local revisions that are periodically synchronized. Revision diagrams are reminiscent of source control systems and provide an excellent intuition for reasoning about multiple versions and eventual consistency.
Cloud types are similar to CRDTs in that they separate the use of eventually consistent data types from their implementation. In comparing Cloud Types to CRDTs, the authors have the following to say:
CRDTs can serve as cloud types (as exemplified by the observed-remove set proposed in [13]). However, we are not aware of prior work on how to compose individual CRDTs into a larger schema. Furthermore, CRDTs only support commutative operations, whereas our approach supports non-commutative operations while still achieving eventual consistency. Furthermore, we support stronger synchronization primitives like flush when necessary, in the same framework.
Cloud types are implemented as fork-join automation (FJ-QUA) as we studied yesterday. For example, this is how the cloud type FJ-QUA for the CInt type behaves:
For cloud integers, we support operations get and set to read and write the current value, as well as add. In the state, we store three values: a boolean indicating whether the current revision performed any set operations, a base value, and an offset. On fork, the boolean is reset, the base value is set to the current value, and the offset is set to zero. Add operations change only the offset, while set operations set the boolean to true, set the base value, and reset the offset. On join, we assume the value of the joined revision (if it performed a set) or add its offset (otherwise).
And for CString:
For cloud strings, we support operations get and set to read and write the current value, and a conditonal operation setIfEmpty. In the state, we record the current value and whether it has not been written (⊥), has been written (wr), or has been conditionally written (cond). A conditional write succeeds only if the current value is empty, and this test is repeated on merge.
The cloud type language also supports arrays, sets and entities. It turns out that the full state of a cloud type program can itself be modelled as a FJ QUA :
It is remarkable that the complete state FJA operations are commutative by themselves. The only non commutative operations are in the FJAs implementing cloud types. This property makes using arrays and entities very natural and does not introduce unexpected conflict resolutions. Furthermore, our design was careful to enable a completely modular implementation of the complete state FJA with respect to the cloud type implementations. In part, this structure makes a single parameterized, reusable implementation of cloud storage and synchronization possible. Any schema and any extensions of cloud types can be supported without further changes.
The client programming language includes yield to demarcate transactions (as per the Eventually Consistent Transactions paper), and an additional flush operation for when stronger sychronization is required.
Eventual consistency is not always sufficient. Some problems, such as reserving a seat on an airplane, or withdrawing money from a bank account, involve a limited resource and require true arbitration. In such cases, we must establish a server connection and wait for a response… Upon flush, execution blocks until (1) all local updates have been applied to the main revision, and (2) the result has become visible to the local revision.
The full cloud type language syntax and semantics are given in the paper. There is also a definition of an operational whole-system model with multiple clients and an elastic server pool. This follows the general blueprint we saw yesterday. A complication concerns ensuring that the join rule / condition required for eventually consistent transactions with revision diagrams is obeyed (‘joiner must be downstream from the fork that forked the joinee). This is solved by introducing round numbers.
To ensure this condition, we assign round numbers to servers and clients, and use round maps (a form of vector clocks) to determine eligibility (by determining which forks are in the visible history)… All
clients and servers start with round 0, except the main revision that starts (and forever remains) in round 1.
After each fork, the round number of the client or server is incremented.
The authors conclude:
Providing good programming abstractions for cloud storage, synchronization, and disconnected operation appears crucial to accelerate the production of useful and novel applications on today’s and tomorrow’s mobile devices. In this paper, we provided a sound foundation upon which to build such programming abstractions through the use of automatically synchronized cloud data types that can be composed into a larger data schema using indexed arrays and entities. The design we presented allows implementing all the difficult parts of such a system (the cloud service, the local persistence, the caching, the conflict resolution, and the synchronization) once and for all, while guaranteeing eventual consistency.