JaVerT: JavaScript Verification Toolchain

JaVerT: JavaScript Verification Toolchain Santos et al., POPL 18

The JavaScript Verification Toolchain (JaVerT) is designed to support reasoning over JavaScript programs. It can be used to verify functional correctness properties of programs annotated with pre- and post- conditions, loop invariants, and instructions for unfolding and folding user-defined predicates. To do this, it must be able to cope with JavaScript’s dynamic behaviours including extensible objects, dynamic property access, and dynamic function calls.

The big picture looks like this:

Specifications are written using the JS Logic assertion logic language, which includes for example predicates for reasoning over prototype chains, scopes and function closures.

…the challenge is to design assertions that fully capture the common heap structures of JavaScript, such as property descriptors, prototype chains for modelling inheritance, the variable store emulated in the heap, and function closures. Importantly, these assertions should abstract as much as possible from the details of the heap structures they describe, to provide a specification that makes sense to the JavaScript developer who has limited knowledge of the JavaScript internals.

A JavaScript program to be verified is first converted into an intermediate language called JSIL using a logic-preserving compiler JS-2-JSIL. Reasoning about compiled JSIL code is significantly easier than reasoning directly over JavaScript statements. JS Logic assertions are also mapping into their corresponding JSIL Logic equivalents.

JSIL Verify, a semi-automatic verification tool for JSIL can then verify that the assertions hold. JSIL Verify is in turn based on JSIL Logic, a sound separation logic for JSIL.

Since the behaviour of JavaScript statements depends on JavaScript internal functions, “whose definitions in the ECMAScript standard are operational, intricate, and intertwined,” the authors also write axiomatic specifications for the internal functions in JSIL Logic, with reference implementations in JSIL.

As JaVerT is a semi-automatic verification tool, we believe its target should be critical JavaScript code, such as Node.js libraries describing frequently used data structures.

Let’s now take a closer look at each of these key pieces.

JS Logic assertions

JS Logic comprises a simple model of a JavaScript heap (a partial function mapping pairs of object locations and property names to JS heap values) and a language for making logical assertions about heaps and types.

JS Logic assertions are constructed from; basic boolean constants, comparisons and connectives; the separating conjunction; existential quantification; and assertions for describing heaps and declaring typing information.

For example, the cell assertion $(E_1, E_2) \mapsto E_3$ says that an object at the location denoted by $E_1$ , with a property denoted by $E_2$ has the value denoted by $E_3$ . A JaVerT specification has the form $\{P\}\ fid(\bar{x})\ \{Q\}$ , where $P$ and $Q$ are the pre- and post-conditions of the function $fid$ , and $\bar{x}$ is its list of formal parameters.

There are built-in predicates for describing the internal properties of JavaScript objects (the special @proto, @class, and @extensible properties), and property descriptors which for example allow specification of the value, writable, enumerable, and configurable attributes of data descriptors. JSObject(o,p) states that object o has prototype p, and DataProp(o,p,v) states that the property p of object o holds a data descriptor with value v (and all other attributes set to true). There is also a FunctionObject predicate for describing function objects and there scope chains, and an SCell predicate for reasoning about properties of string objects. There are also predicates describing the heap:

We provide predicates that describe the built-in library objects, as well as the entire heap. These predicates come in two flavours: frozen, where changes to the target object(s) are not allowed; and open, where changes are allowed.

InitialHeap() describes the open initial heap for example.
For prototype chains, the Pi predicate Pi(o, p, d, lo, lc) states that property p has value d in the prototype chain of o. The parameters lo and lc are lists capturing the locations and classes in the prototype chain.

The Scope predicate Scope(x :v, sch, fid) states that variable x has value v in the scope chain denoted by sch of the function literal with identifier fid. Combining Scope with the maximally overlapping scope chains predicate OChains permits the specification of function closures.

To get a feel for the language in action, consider the following Map implementation in JavaScript:

Here’s how a map object predicate might be defined in JS Logic (where -u- is set union):

Compiling to JSIL

With some JS Logic predicates in hand and a JavaScript program they refer to, it’s time to map everything down to a language we can reason over more easily: JSIL.

JSIL is a simple goto language with top-level procedures and commands operating on object heaps. It natively supports the dynamic features of JavaScript, namely extensible objects, dynamic property access, and dynamic procedure calls.

JS-2-JSIL targets ES5 Strict, with a very close mapping to the ES5 standard. Here’s an example for compiling an assignment statement:

In the 14 lines of complied JSIL code above, 8 have a direct counterpart in the standard, and the remaining 6 deal with scoping.

The JSIL Logic translator translates JS Logic annotations into equivalent annotations in JSIL Logic and then integrates them into the compiled JSIL code.

Verifying JSIL

Given a JSIL program annotated with the specifications of its procedures, JSIL Verify checks whether the program procedures satisfy their specifications. JSIL Verify has two parts: a symbolic execution engine based on JSIL Logic, and an entailment engine for resolving frame inference and entailment questions.

The Hoare triples for the JSIL basic commands are of the from {P}bc{Q}, and are interpreted as “if bc is executed in a state satisfying P then, if it terminates, it will do so in a state satisfying Q”. We assume that JSIL programs are in SSA form, taking away the need for standard substitutions in many of the axioms.

Since JSIL has gotos, the standard sequential composition rule of Hoare logic isn’t enough. The solution is to use proof candidates which map each command in a procedure to a set of possible preconditions, associating each such precondition with the index of the command that led to it.

The entailment engine (figuring out what entails, or ‘follows from’ the premises) delegates to the Z3 SMT solver. JSIL Logic values are encoded as Z3 algebraic data types.

Modelling JavaScript internal functions

JavaScript internal functions describe the building blocks of the language, including prototype chain traversal, object management, and type conversions. They are called extensively by all JavaScript commands. Therefore, in order to reason about JavaScript code we have to first be able to reason efficiently about the internal functions.

These functions often have complex and intertwined definitions as for example we see here with GetValue and PutValue, the two main internal functions operating on references:

JSIL Logic axiomatic specifications are created for all cases of internal functions that do not use higher-order reasoning . This covers about 90% of all possible cases. The JSIL reference implementations are then verified against these using JSIL Verify. In all there are 186 specifications targeting 1K lines of JSIL code.

Coverage

We implement the entire kernel of ES5 Strict, except indirect eval, which exits strict mode. We implement the entire Object, Function, Array, Boolean, Math, and Error built-in libraries. Additionally, we implement: the core of the Global library, associated with the global object; the constructors and basic functionalities for the String, Number, and Data libraries, together with the functions from those libraries used for testing features of the kernel. We do not implement the orthogonal RegExp and JSON libraries.

JS-2-JSIL is tested using the ECMA Script Test262 test suite and passing all 8797 applicable tests in that suite.

Scalability

Compiled JSIL code has 10x-25x more lines of code than its JavaScript counterpart. It takes about 0.5 seconds to verify one hundred lines of compiled JSIL code.

JaVerT should be able to handle projects with thousands of lines of JS code, but perhaps not tens of thousands.

What next?

We believe JaVerT constitutes an important step towards the verification of real-world JavaScript programs. It is built on top of a trusted, systematically validated infrastructure and it successfully tackles a number of challenges that are critical for tractable reasoning about JavaScript.

There are a couple of key limitations at the moment though, which are being addressed as “immediate next steps” : support for the for-in statement, and for higher-order functions. The authors also expect to move JS-2-JSIL to ES6 Strict at some point.

We will develop an automated tool… for verifying large JavaScript codebases, but believe that the semi-automatic JaVerT will always have a role to play in the development of functional correctness specifications of critical libraries… We are also looking for ways to reuse the infrastructure behind JaVerT for other styles of JavaScript analysis.

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic