We Have a DREAM: Distributed Reactive Programming with Consistency Guarantees

We Have a DREAM: Distributed Reactive Programming with Consistency Guarantees – Magara & Salvaneschi 2014

Earlier this week we saw in “A Survey on Reactive Programming” that (at least as of 2012) distributing reactive programs remained an active research challenge. Today’s paper choice takes on that challenge and examines some consistency models for distributed reactive applications. Since we’ve been looking at the reactive paradigm all week, I’ll skip some of the intro and background and get straight to the point…

Many reactive applications are intrinsically distributed, and the benefits of reactive programming in distributed settings has been recognized in the literature, yet most existing solutions for reactive programming do not support distribution. Furthermore, the problem of defining suitable properties and consistency guarantees for the propagation of changes (e.g., to avoid glitches) has received little or no attention in distributed implementations.

With DREAM, Magara & Salvaneschi define three propagation (consistency) semantics: causal semantics, glitch-free semantics, and atomic semantics. The discussion is motivated by a sample reactive financial application. In the source code that follows, = represents regular assignment, and := represents the binding of a variable to a reactive expression.

    // binding local vars to external signals
    var marketIndex = InputModule.getMarketIndex()
    var stockOpts = InputModule.getStockOpts()
    var news  = InputModule.getNews()

    // forecasts according to different models
    var f1 := Model1.compute(marketIndex, stockOpts)
    var f2 := Model2. compute(marketIndex,stockOpts)
    var f3 := Model3.compute(marketIndex,news)

    // using the forecasts
    var gui := Display.show(f1,f2,f3)
    
    var financialAlert := ((f1+f2+f3)/3) < MAX
    if (financialAlert) decrease(stockOpts)
    
    var financialAlert_n := computeAlert_n(f1,f2,f3)
    if (financialAlert_n) adjust_n(stockOpts)

The ‘gui’, ‘financialAlert’ and ‘financialAlert_n’ reactive expressions have differing tolerances to inconsistencies. Suppose there is a new event on marketIndex – f1,f2, and f3 all need to be recomputed. If f1 completes first, and the gui is recomputed before f2 and f3 have been recomputed then the user may experience a temporary glitch in the values that are display – but the application is able to tolerate this and as soon as f2 and f3 are recomputed gui will be recomputed once more to reflect the current value of marketIndex across the board.

Now consider the same scenario but with financialAlert being recomputed immediately after f1 and before f2 and f3 have finished updating – now it is possible for financialAlert to hold a wrong value. Consider a state in which the current values of f1, f2, and f3 are 110, 95, and 99 respectively. Let MAX = 100. A new marketIndex value is produced which will lead to f1, f2, and f3 being recomputed as 90, 110, and 103 respectively. If ‘financialAlert’ is recomputed after f1 has updated, but before f2 and f3 have updated we will trigger the alert. Since this results in placing a sell order, this is something the application cannot tolerate. To avoid such glitches, it would be necessary to wait for all three dependencies to update – if different variables are stored at different nodes this introduces a coordination overhead in order to avoid glitches.

In financialAlert_n the assumption is that the decision making process is split among n components, each making autonomous decisions on how to adjust the stockOpts variable…

The actions of the different components are complementary: however, the system provides the expected behavior only under the assumption that all the components see the changes to f1, f2, and f3 in the same order. For example, if different components see different interleaving of marketIndex and news updates, they may disagree on the value of f3 over time and take contradictory measures to adjust the stockOpts variable. This variant shows that in some cases glitch freedomis not enough to guarantee the correct behavior of the application and exemplifies the need for a system-wide agreement on the order of updates.

The DREAM system model comprises n components that can exchange messages using communication channels. Each component is responsible for a set of variables, which it can directly read and write. Variables are divided into regular (imperative) variables, reactive variables, and observable variables. A reactive variable is defined via some expression e, and is updated whenever one of the variables that appear in e are updated. Observable variables are sources that can be used to build reactive expressions (for example, marketIndex in our sample code above).

We can arrange reactive and observable variables in a dependency graph where each variable is a node in the graph, and a directed edge between two variables v1 and v2 indicates that v2 appears in the reactive expression that defines v1.

The occurence and propagation of changes is model using read events, write events, and update events (for reactive variables).

The update of reactive variables takes place orthogonally
with respect to the execution flow of the program in each component. Accordingly, it is critical for the programmer to understand how the update process takes place, and in particular which properties and guarantees it provides, e.g., with respect to the order in which updates are propagated. We collectively refer to them as consistency guarantees.

Some interesting guarantees to consider for a reactive system include:

  • Exactly once delivery of updates
  • FIFO ordering of updates on a component basis: changes to a value of a variable in component c are propagated in the order in which they occur in c.
  • Causal ordering – events that are potentially causally related are seen by every component of the system in the same order (defined using the normal happens-before relationship).

We say that a system provides causal consistency guarantee if and only if, for all events e1, e2 such that e1→e2, all the components of the DRP system see e1 before e2. Notice hat the happened before is a partial order relation: events that are not causally related can be seen in any order by different components.

  • Glitch freedom a partial update of a reactive variable v occurs when it is triggered by a change in its dependency set without all of the changes in that set also being propagated to v. A system provides glitch freedom if it never introduces partial updates. Using our previous example, financialAlert should never be updated simply with a new f1 value, it should always see f1,f2, and f3 together.

  • Atomic consistency combines FIFO ordering with a guarantee that each write event on a variable v is atomically propagated to all the variables that depend on it (i.e. all the update events are executed in a single atomic operation).

Intuitively, the atomic consistency guarantee ensures to-
tal order of events (it is more strict than the causal order guarantee). At the same time, it also ensures atomicity of propagation: no operation can occur while a propagation is taking place. Finally, since propagations occur atomically, it also ensures glitch freedom.

Atomic consistency implies glitch freedom, and glitch freedom implies causal consistency.

The DREAM implementation uses publish-subscribe over a distributed broker network to propagate changes. A communication manager in each component manages local communication and acts as a proxy for global communication involving multiple components. (And the implementation uses AspectJ, amazing how many places that keeps popping up 😉 ).

In DREAM, each component is responsible for concretely
computing the expression expr for each reactive object obj defined in its scope. In particular, a recomputation is trig- gered upon receiving a new notification of change from the CommunicationManager. In the case of causal consistency, each change event can be processed immediately. In contrast, glitch freedom may require that the evaluation of a change is postponed until other notifications are received. As we describe soon, the broker network is responsible for detecting dependencies among events and notify them to the component. The CommunicationManager employs this information to temporarily queue notifications until they can be processed without introducing glitches.

Brokers are connected in an acyclic topology: advertisements of new observables are propagated to all brokers in the network; subscriptions are propagated only to components that provide matching advertisements; and notification are propogated only to components that provided a matching subscription.

…when a new reactive object is created, its defining
expression is propagated to all the brokers in the communi- cation infrastructure. This way, each broker gains complete knowledge of the graph of dependencies among objects and can identify possible sources of glitches… To prevent glitches, DREAM implements a queuing mechanism, such that partial notifications are temporarily stored at the components until all the consequences of a change be- come available.

For atomic consistency DREAM ensures a total order using a centralized Ticket Granted Server that grants rights to propagate a change to all reactive objects that depend on it.

I find the implementation details less interesting than the simple notion that we can define different consistency models for reactive programs and reason about them. Of special note is that causal consistency does not guarantee freedom from glitches, since:

No consistency stronger than real-time causal consistency (RTC) can be provided in an always available, one-way convergent system, and RTC can be provided in an always available one-way convergent system. – Consistency, Availability, and Convergence, Mahajan et al. 2014.

The related work section discusses the reactive progamming and event-based systems literature, but curiously (from my perspective) has nothing to say about distributed dataflow.

Reasoning about the consistency properties of applications composed from a variety of services requires reasoning both about the semantic properties of components and how these properties are preserved across compositions with other components. Hence it requires a model that captures both component semantics and the dependencies between interacting components. One approach is to view the distributed system as an asynchronous dataflow, in which streams of inputs pass through a graph of components that filter, transform, and combine them into streams of outputs. – Alvaro et al. 2013

Could we then draw inspiration the work of Alvaro et al. on Blazes?