TL;DR: both Flow and TypeScript are pretty good, and conservatively either of them can prevent about 15% of the bugs that end up in committed code.
“That’s shocking. If you could make a change to the way we do development that would reduce the number of bugs being checked in by 10% or more overnight, that’s a no-brainer. Unless it doubles development time or something, we’d do it.” – engineering manager at Microsoft.
(It’s amusing to me that this quote comes from a manager at the company where they actually develop TypeScript! You’d think if anyone would know about the benefits…. Big companies eh).
Let’s dig in.
The flame wars
Static vs dynamic typing is always one of those topics that attracts passionately held positions. In the static typing camp we have the benefits of catching bugs earlier in the development cycle, eliminating altogether some classes of bugs in deployed systems, improved things like code-assist and other tool support, and enabling compiler optimisations. In the dynamic typing camp we have cleaner looking code, and greater flexibility and code malleability.
Shedding some light through empirical data
In other words, if the developers had been taking advantage of TypeScript or Flow at the time, would the bug have made it past the type checker? If not, it’s reasonable to assume it would never have been committed into the repository in the first place.
Here’s an example of a trivial bug that type annotations can detect:
Through this process, we end up with an under-estimation of the total benefits that might be available through static type checking.
In the Venn diagram below, we see on the left the universe of bugs that can potentially be detected by a statically checked type system. Type checking may help catch some of these faster during development. On the right we see bugs that have made it into public repositories. Only a subset of these have clearly linked fixes / patches. This study looks at the intersection of type-system detectable bugs and those that have public fixes.
We consider public bugs because they are observable in software repository histories. Public bugs are more likely to be errors understanding the specification because they are usually tested and reviewed, and, in the case of field bugs, deployed. Thus, this experiment under-approximates static type systems’ positive impact on software quality, especially when one considers all their other potential benefits on documentation, program performance, code completion, and code navigation.
Finding bugs to study
At the end of the sampling process, the bug pool contained bugs from 398 different projects (two projects happened to have 2 bugs each in the corpus). Most of these bug fixing commits ended up being quite small: about 48% of them touched only 5 or fewer lines of code, with a median of 6.
Bug assessment process
To figure out how many of these bugs could have been detected by TypeScript and Flow, we need some rules for how far we’re prepared to go in adding type annotations, and long long we’re prepared to spend on it. A preliminary study on a smaller sample of 78 bugs showed that for almost 90% a conclusion could be reached within 10 minutes, so the maximum time an author was allowed to spend annotating a bug was set at 10 minutes.
Each bug is assessed both with TypeScript (2.0) and with Flow (0.30). To reduce learning effects (knowledge gained from annotating with one system speeding annotation with the other), the type system to try first is chosen at random. The process is then to read the bug report and the fix and spend up to the allotted ten minutes adding type annotations. Sometimes the tools can detect the bug without needing any annotations to be added at all. Other times the report will make it clear that the bug is not type related – for example a misunderstanding of the intended application functionality. In this case the bug is marked as type-system undetectable.
We are not experts in type systems nor any project in our corpus. To combat this, we have striven to be conservative: we annotate variables whose types are difficult to infer with any. Then we type check the resulting program. We ignore type errors that we consider unrelated to this goal. We repeat this process until we confirm that b is ts-detectable because ts throws an error within the patched region and the added annotations are consistent (Section II), or we deem b is not ts-detectable, or we exceed the time budget M.
Details of the approach used to gradually add type annotations are provided in section III.D.
Does typing help to detect public bugs?
Of the 400 public bugs we assessed, Flow successfully detects 59 of them, and TypeScript 58. We, however, could not always decide whether a bug is ts-detectable within 10 minutes, leaving 18 unknown. The main obstacles we encountered during the assessment include complicated module dependencies, the lack of annotated interfaces for some modules, tangled fixes that prevented us from isolating the region of interest, and the general difficulty of program comprehension.
The 18 unknown bugs are then investigated to conclusion, at which point the score is 60 each for TypeScript and Flow.
Running the binomial test on the results shows that, at the confidence level of 95%, the true percentage of detectable bugs for Flow and TypeScript falls into [11.5%, 18.5%] with mean 15%.
Flow and TypeScript largely detect the same bugs:
Which is better: Flow or TypeScript?
The bugs that the two systems can detect largely overlap, with just 3 bugs that are only detected by TypeScript, and 3 that are only detectable by Flow.
All three Flow-detectable bugs that TypeScript fails to catch are related to concatenating possibly undefined or null values to a value of type string. Two of the three TypeScript-detectable bugs that Flow fails to detect are due to Flow’s incomplete support for using a string literal as in index. The remaining bug slips through the net due to Flow’s permissive handling of the
Flow has builtin support for popular modules, like Node.js, so when a project used only those modules, Flow worked smoothly. Many projects, however, use unsupported modules. In these cases, we learned to greatly appreciate TypeScript community’s DefinitelyTyped project. Flow would benefit from being able to use DefinitelyTyped; TypeScript would benefit from automatically importing popular DefinitelyTyped definitions.
Yes, but what’s the cost of adding all those annotations?
Another consideration, both when comparing TypeScript and Flow, and when deciding whether to use either, is the cost of adding the annotations in terms of time and ‘token pollution’ in the source code.
… on average Flow requires 1.7 tokens to detect a bug, and TypeScript 2.4. Two factors contribute to this discrepancy; first, Flow implements stronger type inference, mitigating its reliance on type annotations; second, Flow’s syntax for nullable types is more compact.
The authors also measured the time taken to add the required annotations, with Flow coming out on top:
Thanks to Flow’s type inference, in many cases, we do not need to read the bug report and the fix in order to devise and add a consistent type annotation, which leads to the noticeable difference in annotation time.
Over to you
Is a 15% reduction in bugs making it all the way through your development pipeline worth it to you? And if so, will you go with the Flow or settle for TypeScript? Flow’s lighter-weight approach may suit dynamic-typing advocates better, whereas TypeScript’s DefinitelyTyped library and approach may perhaps be more appealing to strong-typing advocates coming from other languages.
Whatever you decide, at least you now have some data points to help make a more informed decision.