Taming uncertainty in distributed systems with help from the network

Taming uncertainty in distributed systems with help from the network – Leners et al. 2015

Albatross is a membership service with a very interesting new twist: it exploits SDN functionality to actively enforce partitions! Perhaps it is not immediately obvious why that might be a good thing :). It turns out there are several benefits:

  • Albatross can detect failure faster than traditional timeout-based failure detection
  • When Albatross reports a process as disconnected, it truly is – since Albatross sets up network rules to prevent any traffic from a disconnected process ever reaching connected working processes.
  • Albatross can therefore enable use of simplified distributed algorithms since it turns the uncertainty as to whether a process may actually still be alive and therefore may continue to interact with other processes in due course, into certainty that it will not.

Whereas the purpose of SDNs was originally simplifying network management, this paper identifies a different use of SDNs: enhancing classical distributed systems. This connection had not been observed before, and we think that it may be more widely applicable.

For applications using the Albatross membership service, Albatross makes three guarantees:

First, rather than promise perfect information, Albatross provides definitive reports, which guarantee the failure status of a remote process. To provide this guarantee, Albatross sometimes interferes with processes (as noted in the next requirement), which amounts to applying an old technique (STONITH to a new context (SDN-enabled networks). Second, Albatross provides asymmetric guarantees: it categorizes processes as excluded or non-excluded and promises definitive answers only to non-excluded processes. Third, Albatross allows reports to be delayed in favor of being definitive, but it strives to be quick (sub-second detection time). To our knowledge, this combination is new, and we find that it is strong enough to be useful to applications.

Details of how Albatross interacts with SDN services to achieve these goals are given in the paper. Here I’d like to focus on the benefits to distributed systems of using Albatross. Consider a primary-backup system…

…the application needs a way to make progress if the primary or backup fails. Here is where the membership service enters. A standard choice would be ZooKeeper, which uses leases, and is used by production data center applications.

With Albatross, network failures can be detected an order of magnitude more quickly than with ZooKeeper (if ZooKeeper were to lower timeouts to achieve the same speed, the network would be overwhelmed). Moreover, once a failure (for example, of the primary) is detected, Albatross can install rules to prevent the primary from using the network and then report the problem to the backup. “The backup can then take over immediately — without having to wait for a lease to expire — because it knows that Albatross is preventing the primary from using the network.”

Albatross enables distributed algorithms to assume a simpler fault model:

On the one hand, the fact that membership services simplify the design of the distributed applications that use them has long been established: the fail-stop model (which assumes that all processes can detect all crashes correctly) is known to enable “easier” algorithms than the crash model. As just one example, Chain Replication (a form of primary-backup) is simpler than Viewstamped Replication , Paxos-based replication, and Raft. On the other hand, Albatross’s contract (§4), with its asymmetric guarantees, is not precisely the fail-stop model…

To explore this further, the authors compared Aab – an Albatross based atomic broadcast, with Zab – which uses majority-based agreement.

Under Albatross it is possible to pick a unique leader by choosing the smallest process id amongst the processes that Albatross considers to be connected. This means that atomic broadcast can be implemented using a sequencer-based algorithm.

[Compared to Zab], Aab has a smaller description, fewer phases, fewer round-trips, fewer message types, and fewer counters for ordering messages. Moreover, it tolerates the failure of all but one process; Zab, by contrast, tolerates the failure of fewer than half of the processes. (Equivalently, to tolerate f failures, the Albatross- based Aab requires f + 1 processes, whereas Zab requires 2f+1 processes.) The fundamental source of these differences is that Zab is built on majority-based agreement, which brings complexity, as noted earlier.

Making Albatross work well across data centers and providing integrated support for virtual machine migration are both future directions.