Extensible Distributed Coordination

Extensible Distributed Coordination – Distler et al. 2015

Coordination services such as ZooKeeper offer a deliberately limited API. As a consequence, more complex coordination tasks have to be implemented as multiple RPCs. In Extensible Distributed Coordination, Distler et al. describe a sandboxed extension mechanism for coordination services that allows execution of client logic in the coordination service itself. I’m reminded of stored procedures and triggers from the database world. The concept is implemented on top of ZooKeeper and DepSpace, where it is shown to improve throughput and reduce latencies for four common recipes taken from Apache Curator.

I hope we can all agree that it’s better to reuse an existing battle-hardened coordination service implementation than write your own from scratch. Such coordination services offer a relatively low-level and constrained API (for good reasons).

One important limitation of these systems is that their coordination kernels suit some coordination tasks better than others. For example, implementing mutual exclusion with Chubby is trivial since it already provides lock objects. In contrast, the best practice for acquiring a lock in ZooKeeper is to create a sequential node and to set a watch to the adjacent node with a smaller sequence number. This corresponds to three RPCs to the service while Chubby requires just one. On the other hand, …

Bloating coordination service APIs is not the answer: “A good coordination kernel is small, elegant, expressive, and stable, providing underpinnings for normal programmers to create coordination libraries and custom coordination tasks.”

Notice that some systems provide extensive libraries of coordination recipes, which are implemented based on their respective coordination kernel (e.g., Apache Curator for Zookeeper). Despite the ease of use of such libraries, the performance of the coordination tasks are still constrained by the underlying coordination kernel.

So how can you get the performance of a good coordination service kernel, while at the same time providing efficient support for higher-level coordination patterns (recipes)?

We advocate the use of extensions at the server side for making (fixed-kernel) coordination services as efficient as possible for any coordination task.

An operation extension is like a stored procedure – it can be invoked remotely by the client, executes some number of operations locally, and then returns. An event extension is like a trigger, you can provide a handler to be invoked when certain events occur in the underlying coordination service. Extensions themselves are managed by an Extension Manager using the facilities of the coordination service itself so that they have minimal impact. Extensions are also registered on a per-client basis:

…to register an extension, a client issues a standard create operation for the extension manager’s data object, passing the extension code as well as additional relevant information (e.g., the extension name) as data. When the extension manager intercepts the call, it verifies, compiles, and instantiates the extension and retrieves the extension’s operation and event subscriptions. Furthermore, the extension manager adds a new data object (e.g., /em/ext for an extension named ext) to the coordination service state, which from then on acts as a surrogate of the extension.

The extension code is restricted to a white-listed set of operations for security and performance reasons. Any coordination service calls performed by the extensions are routed via a proxy so that the extension manager can implement client-based access control.

An example helps to make everything clearer. Consider the shared counter Apache Curator recipe. The traditional client implementation looks like this:

int increment() {
  while (true) {
    int c = remote.read("/ctr");
    boolean success = remote.cas("/ctr",c,c+1);
    if (success) return c+1;
  }
}

With a shared counter extension, the client logic can become simply:

int increment() {
  return remote.read("/ctr-increment");
}

and the server-side extension logic is:

Object read(ObjectId oid) {
  int c = local.read("/ctr");
  local.update("/ctr",c+1);
  return c+1;
}

Note – extensions allow multiple operations to execute atomically, eliminating the need for the cas retry loop.

We evaluate the usefulness and efficiency of extensions based on four recipes included in the Apache Curator library that provide support for a shared counter, a distributed queue, a distributed barrier, and leader election, respectively… For all four coordination recipes evaluated, EZK and EDS achieve better performance than their respective counterparts, ZooKeeper and DepSpace. Analyzing the reasons for these results, we are able to divide the recipes into two different groups. On the one hand, the shared counter and the distributed queue mainly benefit from the fact that an extension allows clients to execute multiple operations atomically, thereby eliminating the need to retry operations in the face of contention. On the other hand, the recipes for the distributed barrier and leader election, which are both used by clients to wait for a specific event, take advantage of not requiring additional remote calls after the event has actually occurred.

These generally show an order of magnitude improvement in throughput, together with reduced latency. Performance tests show that the extension mechanism adds a low overhead, but these only used 30 clients which strikes me as a very modest number.

I admit to not being fully won over to the idea of executing per-client extensions in the heart of the coordination service, but if this direction can encourage greater reuse and less reinventing of the wheel, then that is certainly a good thing…

Current applications mostly use coordination services outside their critical processing path to avoid the costs of accessing them. The performance gains obtained by using extensions may enable use cases that so far have not been considered practical.