ConflictJS: finding and understanding conflicts between JavaScript libraries Patra et al., ICSE’18
The JavaScript ecosystem is fertile ground for dependency hell. With so many libraries being made available and the potential for global namespace clashes, it’s easy for libraries to break each other. Sometimes in an obvious to spot way (that’s a good day!), and sometimes in subtle ways that are harder to detect.
ConflictJS is a tool for finding conflicting JavasScript libraries. It’s available as open source and nicely documented, so you can try it for yourself from https://github.com/sola-da/ConflictJS.
We use ConflictJS to analyze and study conflicts among 951 real-world libraries. The results show that one out of four libraries is potentially conflicting and that 166 libraries are involved in at least one certain conflict.
Why do conflicts happen?
At a language level, until ES6 modules at least, there was no built-in namespacing mechanism (though we do have a number of conventions and module libraries). In principle developers can follow a ‘single API object’ pattern where the entire API of a library is encapsulated behind a single object. In practice, many of them don’t (71% of libraries did not do this, from 951 studied for this paper). There are also third-party module systems such as AMD and CommonJS, but they’re not universally used and not fully compatible.
…since widely used libraries cannot rely on recently added language features, they typically ensure backward compatibility by relying on other ways to export their APIs. In summary, the lack of namespace and modules in currently deployed versions of JavaScript creates a non-trivial problem for library developers.
Different types of conflicts
As an example of a conflict, consider a client that uses both Strophe.js (an XMPP) library and JSEncrypt.js (encryption). Both libraries write to the global variable Base64
. Say you build an application using JSEncrypt, test it out, and everything is working fine. Later on you add Strophe to the project, loaded after JSEncrypt is loaded. Calls to get encrypted data from JSEncrypt will suddenly start returning the value false
.
Conflicts occur when two different libraries both write to the same path in the global namespace. The authors define four different types of conflict that may occur:
- Inclusion conflicts: when the mere act of including multiple libraries is enough to cause an exception to be thrown.
- Type conflicts: when multiple libraries write type-incompatible values to the same globally reachable location.
- Value conflicts: when multiple libraries write different values (but with compatible non-function types) to the same globally reachable location.
- Behaviour conflicts: when multiple libraries write different functions to the some globally reachable location.
Examples of the four cases are given in the table below:
(Enlarge)
How ConflictJS works
ConflictJS works in two stages. First it dynamically analyses the loading of each library to determine the global access paths it writes to. Then it takes pairs of potentially conflicting libraries (i.e., they both write to at least one shared global access path) and generates tests to explore whether or not there really is a problem in practice. ConflictJS is precise, since any problem it reports is genuinely a problem, but it is not sound – that is, it may miss some conflicts.
Distinguishing between potential and actual conflicts is important to avoid false positives. For example, both JSLite.js and ext-core.js write to Array.prototype.remove
. The functions are syntactically different, but semantically equivalent.
To check for inclusion conflicts, it is sufficient to generate a client that loads both libraries. To check for type conflicts, ConflictJS generates a client that reads the value at the conflicting access path, and checks its type. If different configurations (l1 alone, l2 alone, l1 then l2, l2 then l1) cause the client to see different different types, than a type conflict is reported. The check for value conflicts is similar, but instead of comparing types, ConflictJS does a deep comparison of the objects written to the global access path.
That just leaves behaviour conflicts. ConflictJS generates a function call to test functions written to the same global access path. First the number of parameters, n, is estimated from the length
property of the function object, then the generator decides on a random number ranging between 0 and n arguments to pass. It randomly chooses the argument types choosing between boolean, string, number, array, object, undefined, and null. To create objects, the generator creates up to 10 properties and assigns randomly generated values to them.
Once the arguments are generated, the function is called using the generated arguments. If and only if the call succeeds, without raising an exception, for at least one library configuration, the generator synthesizes a client that contains this call.
What the analysis reveals…
ConflictJS is tested with 951 JavaScript selected – these are the subset of libraries from the CDNjs content delivery network that can be used in isolation on a standard desktop browser ( they have no further dependencies).
Between them, these 951 libraries write to a total of 130,714 different access paths. 4,121 of these, across 268 libraries, cause a potential conflict. Of the potentially conflicting libraries, ConflictJS finds an actual conflict in 62% of them (that’s 17% of the 951 total libraries). All four conflict types are represented.
…all four kinds of conflict are prevalent in practice, which confirms our decisions to consider all four kinds in ConflictJS… the majority of conflicts are non-inclusion conflicts, i.e., they do not cause an exception just after loading the conflicting libraries. Finding such conflicts and reasoning about them is challenging for both library developers and users alike.
The authors then went on to study a random sample of 25 of the conflicting libraries. They uncovered seven patterns of root cause for conflicts, as shown in the table below.
(Enlarge)
Our work not only provides a practical tool for library developers to detect conflicts and for library users to avoid conflicting libraries, but also highlights the importance of language features for encapsulating independently developed code.