The Good, the Bad, and the Ugly: An Empirical Study of Implicit Type Conversions in JavaScript – Pradel and Sen, 2015
Updated to fix conditional coercion example: “0” == “false” is false, but “0” == false is true.
“JavaScript is notorious for its heavy use of implicit type coercions” – and many of those are confusing. For example "false" == false
evaluates to false, but "0" == false
is true! Style guidelines encourage you to stay away from some of the more troublesome conversions. Does type coercion help in the writing of clear and concise code, or is it error-prone and confusing? Pradel and Sen set out to find out what really goes on in practice by analyzing a large body of JavaScript code taken from the home pages of the top 100 websites, and from the SunSpider and Octane benchmarks. They use a dynamic (runtime) analysis so they can see what coercions actually get executed.
We dynamically analyze hundreds of programs, including real-world web applications and popular benchmark programs. We find that coercions are widely used (in 80.42% of all function executions) and that most coercions are likely to be harmless (98.85%). Furthermore, we identify a set of rarely occurring and potentially harmful coercions that safer subsets of JavaScript or future language designs may want to disallow. Our results suggest that type coercions are significantly less evil than commonly assumed and that analyses targeted at real-world JavaScript programs must consider coercions.
The paper addresses the following three questions:
- RQ1: How prevalent are type coercions in JavaScript, and what are they used for?
- RQ2: Are type coercions in JavaScript error-prone?
- RQ3: Do type coercions in JavaScript harm code understandability?
Answering the second and third questions requires a classification of type coercions into harmful and non-harmful conversions: “we propose a classification of all type coercions that may occur in JavaScript into likely harmless and potentially harmful coercions.” But how can we decide if a coercion is potentially harmful?
Since developers may purposefully exploit the behavior of any type coercion, there is no clear-cut definition of when a coercion constitutes an error. The proposed classification is based on our own experience with JavaScript, on reports of the experience of others, e.g., in web forums, and on a comparison with other programming languages. We classify a coercion as potential ly harmful if its semantics deviates from what is common in other, more strongly typed languages, such as C, Java, Python, or Ruby, if the operation that triggers the coercion has no intuitive meaning, or if the rules that determine which coercion to apply are very complex.
There follows in section 3 of the paper a very clear exposition of all the places in JavaScript where coercion can occur, the semantics of those coercions, and which ones should be considered harmful according to the rules above. If you just want a summary of coercions to avoid in your own code, check out table 1 on page 6 of the paper.
Key results are as follows:
- Type coercions are widely used: 80.42% of all function executions perform at least one coercion and 17.74% of all operations that may apply a coercion do apply a coercion, on average over all programs. (RQ 1)
The results [of the dynamic analysis] reveal two interesting properties. First, type coercions occur in a non-negligible fraction of all operations that may cause coercions. For web sites, 36.25% of all code locations that may coerce values indeed do it in the analyzed executions. Second, type coercions are significantly more prevalent in web sites than in the SunSpider and Octane benchmarks. These benchmark suites have been criticized to be unrepresentative for real-world JavaScript programs, and our study confirms this observation for type coercions.
The most common point of coercion is in conditionals:
…conditionals and logical negations, which are typically used in conditionals, are the most prevalent kinds of coercion. Overall, coercions that result from conditionals or from operations that are typically used in conditionals (!, &&, and ||) account for 93.01% of all coercions. This result suggests that analyses of JavaScript, such as type inference and checking approaches, should at least consider these kinds of coercions because they occur frequently in practice.
- In contrast to coercions, explicit type conversions are significantly less prevalent. For each explicit type conversion that occurs at runtime, there are 269 coercions. (RQ 1)
- 98.85% of all coercions are harmless and likely to not introduce any misbehavior. (RQ 1)
- A small but non-negligible percentage (1.15%) of all type coercions are potentially harmful. Future language designs or restricted versions of JavaScript may want to forbid them.
The most prevalent potentially harmful coercion (by number of static occurrences) are non-strict (in)equality checks that compare two objects of different types, which is a common source of confusion. Several prevalent kinds of potentially harmful coercions involve undefined, such as concatenating undefined with a string, which yields a string that contains “undefined”, and relative operators applied to undefined and a number, which always yields false. We speculate that most of these coercions are caused by an undefined value that accidentally propagates through the program.
The authors looked more closely at the reputedly error-prone binary + operator:
We conclude from these results that the + operator is less dangerous than commonly expected. Programmers are disciplined enough to apply + (mostly) in situations where the operation does not cause any type coercion or where it applies a harmless coercion that has obvious semantics. That said, reconsidering the semantics of + in future language designs to reduce its complexity seems to be a good idea. To deal with today’s JavaScript, checking for the rarely occurring potentially harmful usages of + is a promising endeavor for static or dynamic analyses.
- Out of 30 manually inspected potentially harmful code locations, 22 are, to the best of our knowledge, correct, and only one is a clear bug. These results suggest that the overall percentage of erroneous coercions is very small. (RQ 2)
- Most code locations with coercions are monomorphic (86.13%), i.e., they always convert the same type into the same other type, suggesting that these locations could be refactored into explicit type conversions for improved code understandability. (RQ 3)
- Most polymorphic code locations (93.79%), i.e., locations that convert multiple different types, are conditionals where either some defined value or the undefined value is coerced to a boolean. (RQ 3)
These results suggest that most coercions do not significantly harm code understandability, at least not because of polymorphic code locations. A more detailed study on how coercions influence human understanding of code, e.g., through a human study in the style of [7] remains for future work.
- JavaScript’s strict and non-strict equality checks are mostly used interchangeably, suggesting that refactoring non-strict equality checks into strict equality checks can significantly improve code understandability. (RQ 3)
A particularly intricate situation where JavaScript applies type coercions are (in)equality checks with == and !=. Guidelines suggest to avoid == and != altogether and to instead use their “non-evil” twins === and !==. Yet, we find that both kinds of equality checks are used in practice: In total, we observe 2,026,782 strict equality checks and 3,143,592 non-strict equality checks during the execution of all programs… developers seem to use non-strict and strict equality interchangeably… The results suggest that many code locations use == and != without any need, and that these location could use === and !== instead.