Optimization Coaching for JavaScript – St-Amour & Guo, 2015
Because modern programming languages heavily rely on compiler optimizations for performance, failure to apply certain key optimizations is often the source of performance issues. To diagnose these performance issues, programmers need insight about what happens during the optimization process.
Consider the following program snippet from the Octane benchmark suite:
if (queue == NULL) {
this.state = STATE_SUSPENDED;
} else {
this.state = STATE_SUSPENDED_RUNNABLE;
}
Rewriting this as:
this.state = queue == null ? STATE_SUSPENDED : STATE_SUSPENDED_RUNNABLE;
results in a 3% (1.03x) speedup in a benchmark that the JavaScript engine is already tuned for.
An optimization coach looks at optimization near-misses and guides the programmer to changes that would enable the optimization, as St-Amour & Guo’s coach did in this instance.
[Optimization failures] are hard to diagnose and solve for two main reasons. First, optimizers fail silently; programmers are never informed that an optimization failed. Second, getting to the root causes of these failures requires skills and knowledge that are out of reach for most programmers. Those skills include auditing the compiler’s output, reverse engineering the optimizer’s decisions, etc. Optimization coaches help programmers get more out of their optimizers without requiring such knowledge and with a minimum of effort. They achieve this feat by reporting optimization near misses. Near misses are optimizations that the compiler did not apply to their program— either due to a lack of information, or because doing so may be unsound in some cases—but could apply safely if the source program were changed in a certain way.
Near miss reports are combined with recommendations for how to change the program such that an optimization can be applied.
These recommendations are not required to preserve programs’ exact semantics. In other words, coaches may recommend changes that would be beyond the reach of optimizing compilers, which are limited to semantics-preserving transformations. Programmers remain in control and are free to veto any recommendation that would lead to semantic, or structural, changes that they deem unreasonable.
The optimization coach works by logging optimization decisions during compilation, and analysing those logs offline afterwards to produce near-miss reports. From these recommended program changes are generated. To be useful to the programmer, the optimization tool must careful curate and summarize the changes it proposes – overwhelming with a mass of low-level detail would not be helpful. Three techniques are used to assist with this: pruning, ranking, and merging.
- Pruning removes near-misses that do not come with obvious source code solutions, or that are likely to be due to intentional design choices.
- Ranking computes a badness score for each near miss indicating its likely impact on performance
- Merging consolidates sets of related reports into a single summary (for example, near-misses that all have the same source solution).
Looking at the top-five recommendations for each of a set of programs in the Octane benchmark suite yielded performance improvements of between 1.02x and 1.20x.
When ranking, it is necessary to take into account that there may be multiple different compiled versions of the same source code over time…
…state-of-the-art JIT compilers may compile the same code multiple times— producing different compiled versions of that code—potentially with different near misses each time. A coach needs to know which of these compiled versions execute for a long time and which are short-lived. Near misses from compiled versions that execute only for a short time cannot have a significant impact on performance across the whole execution, regardless of the number or severity of near misses, or how hot the affected method is overall.
A variety of tactics are used for merging including by-solution merging, by-constructor merging, and temporal merging. These techniques rely on grouping proposals that relate to the same logical property.
We define two concrete properties p1 and p2, which appear on hidden classes t1 and t2 respectively, to belong to the same logical property if they have the same name p, and co-occur in at least one operation, i.e., there exists an operation o.p or o.p = v that receives objects of both class t1 and class t2.
By-solution merging compares the causes of failures and their context, as well as ensuring that the affected properties belong to the same logical property.
By-constructor merging is applied when multiple near-misses can be solved at the same time by changing a single constructor…
For example, inconsistent property layout for objects from one constructor can cause optimization failures for multiple properties, yet all of those can be resolved by editing the constructor.Therefore, merging constructor near misses that share a constructor can result in improved coaching reports.
Temporal merging puts together reports that have a common cause across multiple (re-)compilations:
Identical near misses that originate from different invocations of the compiler necessarily have the same solution; they are symptoms of the same underlying issue. To reduce redundancy in the coach’s reports, we extend the notion of locality merging—which merges reports that affect the same operation to operate across compiled version boundaries. The resulting technique, temporal merging, combines near misses that affect the same operation or constructor, originate from the same kind of failure and have the same causes across multiple compiled versions.
Optimizations studied
Conventional wisdom among JavaScript compiler engineers points to property and element accesses as the most important operations to be optimized. For this reason, our prototype focuses on these two classes of operations.
Section 4 of the paper contains a nice overview of some of the most important optimizations that JITs apply for property and element access.
Properties
Conceptually, JavaScript objects are open-ended maps from strings to values. In the most general case, access to an object property is at best a hash table lookup, which, despite being amortized constant time, is too slow in practice. Ion therefore applies optimization tactics when compiling these operations so that it can optimize cases that do not require the full generality of maps.
In order from best to worst, the strategies for compiling a property access are definite slot, polymorphic inline cache, and vm call (a slow path call to a general purpose function to access the property).
- Definite Slot
Consider a property access o.x. In the best case, the engine observes o to be monomorphic and with a fixed layout. Ion then emits a simple memory load or store for the slot where x is stored. This optimization’s prerequisites are quite restrictive. Not only must all objects that flow into o come from the same constructor, they must also share the same fixed layout. An object’s layout is easily perturbed, however, for example by adding properties in different orders.
- Polymorphic Inline Cache
Failing [a definite slot], if multiple types of plain JavaScript objects are observed to flow to o, Ion can emit a polymorphic inline cache (PIC). The PIC is a self-patching structure in JIT code that dispatches on the type and layout of o. Initially, the PIC is empty. Each time a new type and layout of o flows into the PIC during execution, an optimized stub is generated that inlines the logic needed to access the property x for that particular layout of o.
Elements
JavaScript’s element access and assignment operations are polymorphic and operate on various types of indexable data, such as arrays, strings and TypedArrays. This polymorphism restricts the applicability of optimizations; most of them can apply only when the type of the indexed data is known in advance.
JavaScript also does not require arrays to be stored in contiguous chunks of memory addressable by offset…
Element accesses into such arrays, then, are semantically (and perhaps surprisingly) equivalent to property lookups and are subject to the same set of rules, such as prototype lookups.
The most important optimisation for array elements is dense array access:
Consider an element access o[i]. In the best case, if o is determined to be used as a dense array and i an integer, Ion can emit a memory load or a store for offset i plus bounds checking. For this choice to be valid, all types that flow into o must be plain JavaScript objects that have dense indexed properties… SpiderMonkey further distinguishes dense arrays—those with allocated dense storage—from packed arrays—dense arrays with no holes between indexed properties. Ion is able to elide checking whether an element is a hole, or a missing property, for packed arrays.
Polymorphic inline caching may also be used as for properties.