Reverb: speculative debugging for web applications, Netravali & Mickens, SOCC’19
This week we’ll be looking at a selection of papers from the 2019 edition of the ACM Symposium of Cloud Computing (SoCC). First up is Reverb, which won a best paper award for its record and replay debugging framework that accommodates speculative edits (i.e., candidate bug-fixes) during replay. In the context of the papers we’ve been looking at recently, and for a constrained environment, Reverb is helping its users to form an accurate mental model of the system state, and to form and evaluate hypotheses in-situ.
Logging and replay of exactly what happened during an execution requires some kind of interception framework but is conceptually straightforward. But what you should ‘replay’ after a speculative edit has been made? Some prior events may no longer be appropriate, and some new events may need to be fabricated.
The speculative edit-and-debugging experience supported by Reverb has five phases:
- Logging events in a baseline execution run
- Replaying the execution up to a specified point
- Changing the program’s state in some way
- Resuming execution, with nondeterminism from the original run "influencing" the post-edit execution; and
- Comparing the behaviour of the original and altered runs to understand the effects of the speculative fix.
Reverb’s high-level approach
Set-Cookie header on the response. For Redis, Reverb does a similar thing via a proxy.
Given the recorded logs, it is possible to replay them exactly. The fun starts though, when a programmer pauses execution, makes a change, and then resumes. What values should the new program see now? Consider client-side calls to non-deterministic functions such as
Date(). The new code may make fewer, the same, or more calls to such functions depending on the branches explored.
- If the post-edit code makes fewer calls then we use return values from the log, and once the call-chain finishes skip forward to the first return value first seen by the next invocation of the event handler in the original execution.
- If the post-edit code contains more calls than function-specific extrapolation is used to generate additional values – e.g. new random numbers for
Math.random()and monotonically increasing time values for
Date()that are smaller than the next logged value.
Furthermore, if the edit results in the deletion of a timer or DOM hander, all subsequent events for the timer/DOM handler are marked as ‘do not replay.’ If
WebSockets are closed then any future events involving those connections are closed.
If the edit creates a new, unlogged network request, then the replay framework must inject new network events into the log. If the server-side responder is also being replayed, then Reverb inserts a new request into the server-side log… When the response is generated, Reverb buffers it and uses a model of network latency to determine where to inject the response into the client-side log.
Reverb also allows a developer to modify server-side responses, uses similar techniques to those just discussed to handle divergent scenarios.
Is that really feasible?
Across the top 300 Alexa sites the gzipped logs have a median size of 45.4 KB (95%-ile size 113.2 KB), and it takes 7.8 seconds on average to generate a full data-flow graph. The client-side instrumentation only slows down the median page load by 5.5%.
The paper contains a case study of the authors debugging EtherCalc using Reverb. The following figure shows an annotated wide-area debugging session using the tool:
The authors were able to find the source of the bug, supply a speculative bug fix, and verify that it worked.
Reverb was also successfully used to recreate five historic jQuery bugs from the public bug database, and verify the known-good fix using speculative replay.
Of course you’d hope the authors were able to use their own tool effectively! The evaluation also includes a small study with six front-end web developers asked to debug a problem in a web app. Three users used Reverb, and three used traditional debugging. The Reverb users were faster at debugging the problem (all within 10 minutes, two within 5) than the non-Reverb users (two between 5 and 10 minutes, one failing to find the bug within 10 minutes).
It’s a small sample, but they liked it:
When asked, "Would Reverb-style data flow operations be a useful compliment to standard debugging primitives?", all six participants said yes. Furthermore, all three Reverb users declared, without prompting, that speculative edit-and-continue would be a powerful debugging technique.