Type test scripts for TypeScript testing

Type test scripts for TypeScript testing Kristensen et al., OOPLSA’17

Today’s edition of The Morning Paper comes with a free tongue-twister; ‘type test scripts for TypeScript testing!’

One of the things that people really like about TypeScript is the DefinitelyTyped repository of type declarations for common (otherwise untyped) JavaScript libraries. There are over 3000 such declaration files now available. This is a great productivity boost, but it’s not perfect:

These declaration files are… written and maintained manually, which leads to many errors. The TypeScript type checker blindly trusts the declaration files, without any static or dynamic checking of the library code.

So it would be great if there was a way of running automated tests to verify that the TypeScript declaration files actually match the implementations in the JavaScript libraries they represent. The authors show that their runtime type checking approach can find more errors with fewer false positive than prior static analysis tools. But it requires a test suite. Where can we get a test suite from?

Our method is based on the idea of feedback-directed random testing as pioneered by the Randoop tool by Pacheco et al. 2007. With Randoop, a (Java) library is testing automatically by using the methods of the library itself to produce values, which are then fed back as parameters to other methods in the library.

The TSTest tool (http://www.brics.dk/tstools) takes as input a JavaScript library and a corresponding TypeScript declaration file. It then dynamically creates a JavaScript program to exercise the library, called a type test script. These scripts exercise the library code by mimicking the behaviour of potential applications, and perform runtime type checking to ensure that values match those specified by the type declarations.

TSTest is set loose on 54 real world libraries, and finds type mismatches in 49 of them.

An example type mismatch

Here’s a simple example of a type mismatch between the TypeScript declaration for PathJS, and the actual implementation.

First, the TypeScript declaration:

[code language=”javascript”]
declare var Path: {
root(path: string): void;
routes: {
root: IPathRoute,
}
};

interface IPathRoute {
run(): void;
}
[/code]

And an excerpt from the implementation:

[code language=”javascript”]
var Path= {
root: function(path) {
Path.routes.root = path;
},
routes: {
root: null
}
};
[/code]

Looking at the type declaration, we see that the parameter path of the root method is declared to be a string. But in the implementation the value of this parameter is assigned to Path.routes.root – which is declared to be of type IPathRoute. A manual examination of the implementation shows that the value really should be a string.

This error is not found by the existing TypeScript declaration file checker TScheck, since it is not able to relate the side effects of a method with a variable. Type systems such as TypeScript or Flow also cannot find the error, because the types only appear in the declaration file, not as annotations in the library implementation.

When the type test script generated by TStest is run, it will generated output like this, highlighting the error:

*** Type error
  property access: Path.routes.root
  expected: object
  observed: string

See §2 in the paper for an additional and much more involved example using generic types and higher-order functions.

Generating a type test script

A type test script has a main loop which repeatedly runs random tests against a library until a timeout is reached. Each test calls a library function, and the value returned is checked to see that its type matches the type declaration. Arguments to library functions are either randomly generated, or taken from those produced by the library in previous actions where those values match the function parameter type declarations. Test script applications can also interact with library object properties, which are treated like special kinds of function calls.

The strategy for choosing which tests to perform and which values to generate greatly affects the quality of the testing. For example, aggressively injection random (but type correct) values may break internal library invariants and thereby cause false positives, while having too little variety in the random value construction may lead to poor testing coverage and false negatives.

Here’s a snippet from the declaration file for the Async library:

[code language=”javascript”]
declare module async {
function memoize(
fn: Function,
hasher?: Function
): Function;
function unmemoize(fn: Function): Function;
}
[/code]

And the type test script generated for it:

Tricky things

…adapting the ideas of feedback-directed testing to TypeScript involves many interesting design choices and requires novel solutions…

This challenges include generating values that match a given structural type, and type checking return values with structural types. Type mismatch errors will be reported based on deep checking (checking all reachable objects), but only shallow type checking is required to pass in order for a value to be stored for feedback testing (normally a type failure stops any further testing).

When functions are returned, their types cannot be checked immediately, but only when the functions are invoked.

The blame calculus (Wadler and Findler 2009) provides a powerful foundation for tracking function contracts (e.g. types) dynamically and deciding whether to blame the library or the application code if violations are detected.

Since the application code in this case is known to be well-formed (we generated it), any problems can be blamed on the library.

Recursion with generic types is broken by treating any type parameters involved in recursion as any.

Soundness and completeness

TStest is not sound: when an error is reported by TSTest that does not necessarily mean there is a mismatch in practice. An example that can cause this is the breaking of recursion in generic types using any. “This is mostly a theoretical issue, we have never encountered false positives in practice.”

TStest is also not conditionally complete: there are some mismatches that it can never detect.

Neither of these two things mean that it isn’t useful though, which is what we’ll look at next.

How well does it work?

Evaluation of TStest was conducted using the following libraries;

With time budgets of 10 seconds, 1 minute, and 5 minutes, and multiple runs, here’s how many mismatches TStest reports across all 54 libraries:

Mismatches were found in 49 of the 54 benchmarks, independently of the timeout and the number of repeated runs… The numbers in table 1 are quite large, and there are likely not around 6000 unique errors among the 54 libraries tested. A lot of the detected mismatches are different manifestations of the same root cause. However, our manual study shows that some declaration files do contain dozens of actual errors.

124 of the reported mismatches were randomly sampled for further evaluation (spanning 41 libraries).

63 of the 124 are mismatches that a programmer would almost certainly want to fix
A further 47 were mismatches that a programmer would want to fix, but are only valid when TypeScript’s non-nullable types feature is enabled.
That leaves 14 of the 124 that turned out to be benign. Some of these are due to limitations in the TypeScript type system, some are due to TStest constructing objects with private behaviour that are really only intended to be constructed by the library itself, and the majority (7/14) are intentional mismatches by the developer:

For various reasons, declaration file authors sometimes intentionally write incorrect declarations even when correct alternatives are easily expressible, as also observed in previous work. A typical reason for such intentional mismatches is to document internal classes.

The authors believe it would be possible to adapt TStest to perform testing of Flow’s library definitions as well.

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic