Skip to content

When coding style survives compilation: de-anonymizing programmers from executable binaries

March 16, 2018

When coding style survives compilation: de-anonymizing programmers from executable binaries Caliskan et al., NDSS’18

As a programmer you have a unique style, and stylometry techniques can be used to fingerprint your style and determine with high probability whether or not a piece of code was written by you. That makes a degree of intuitive sense when considering source code. But suppose we don’t have source code? Suppose all we have is an executable binary? Caliskan et al., show us that it’s possible to de-anonymise programmers even under these conditions. Amazingly, their technique still works even when debugging symbols are removed, aggressive compiler optimisations are enabled, and traditional binary obfuscation techniques are applied! Anonymous authorship of binaries is consequently hard to achieve.

One of the findings along the way that I found particularly interesting is that more skilled/experienced programmers are more fingerprintable. It makes sense that over time programmers acquire their own unique way of doing things, yet at the same time these results seem to suggest that experienced programmers do not converge on a strong set of stylistic conventions. That suggests to me a strong creative element in program authorship, just as experienced authors of written works develop their own unique writing styles.

If we encounter an executable binary sample in the wild, what can we learn from it? In this work, we show that the programmer’s stylistic fingerprint, or coding style, is preserved in the compilation process and can be extracted from the executable binary. This means that it may be possible to infer the programmer’s identity if we have a set of known potential candidate programmers, along with executable binary samples (or source code) known to be authored by these candidates.

Out of a pool of 100 candidate programmers, Caliskan et al. are able to attributed authorship with accuracy of up to 96%, and with a pool of 600 candidate programmers, they reach accuracy of 83%. These results assume that the compiler and optimisation level used for compilation of the binary are known. Fortunately, previous work has shown that toolchain provenance, including the compiler family, version, optimisation level, and source language, can be identified using a linear Conditional Random Field (CRF) with accuracy of up to 99% for language, compiler family, and optimisation level, and 92% for compiler version.

One of the potential uses for the technology is identifying authors of malware.

Finding fingerprint features in executable binaries

So how is this seemingly impossible feat pulled off? The process for training the classifier given a corpus of works by authors in a candidate pool has four main steps, as illustrated below:

  1. Disassembly: first the program is disassembled to obtain features based on machine code instructions, referenced strings, symbol information, and control flow graphs.
  2. Decompilation: the program is translated into C-like pseudo-code via decompilation, and this pseudo-code is passed to a fuzzy C parser to generate an AST. Syntactical features and n-grams are extracted from the AST.
  3. Dimensionality reduction: standard feature selection techniques are used to select the candidate features from amongst those produced in steps 1 and 2.
  4. Classification: a random forest classifier is trained on the corresponding feature vectors to yield a program that can be used for automatic executable binary authorship attribution.

You can download the code at


The disassembly step runs the binary through two different disassemblers: the netwide disassembler (ndisasm), which does simple instruction decoding, and the radare2 state-of-the-art open source disassembler, which also understands the executable binary format. Using radare2 it is possible to extract symbols, strings, functions, and control flow graphs.

Information provided by the two disassemblers is combined to obtain our disassembly feature set as follows: we tokenize the instruction traces of both disassemblers and extract token uni-grams, bi-grams, and tri-grams within a single line of assembly, and 6-grams, which span two consecutive lines of assembly… In addition, we extract single basic blocks of radare2’s control flow graphs, as well as pairs of basic blocks connected by control flow.


Decompilation is done using the Hex-Rays commercial state-of-the-art decompiler, which produces human readable C-like pseudo-code. This code may be much longer than the original source code (e.g. decompiling a program that was originally 70 lines long may produce on average 900 lines of decompiled code).

From the decompiled result, both lexical and syntactical features are extracted. Lexical features are word unigrams capturing integer types, library function names, and internal function names (when symbol table information is available). Syntactical features are obtained by passing the code to the joern fuzzy parser and deriving features from the resulting AST.

Dimensionality reduction

Following steps one and two, a large number of features can be generated (e.g., 705,000 features from 900 executable binary samples taken across 100 different programmers). A first level of dimensionality reduction is applied using WEKA’s information gain attribute selection criteria, and then a second level of reduction is applied using correlation based feature selection. The end result for the 900 binary samples is a set of 53 predictive features.


Classification is done using random forests with 500 trees. Data is stratified by author analysed using k-fold cross-validation, where k is equal to the number of available code samples per author.

Evaluation results

The main evaluation is performed using submission to the annual Google Code Jam competition, in which thousands of programmers take part each year. “We focus our analysis on compiled C++ code, the most popular programming language used in the competition. We collect the solutions from the years 2008 to 2014 along with author names and problem identifiers.”

Datasets are created using gcc and g++, using each of O1, O2, and O3 optimisation flags (so six datasets in all). The resulting datasets contain 900 executable binary samples from 100 different authors. As we saw before, the authors are able to reduce the feature set down to 53 predictive features.

To examine the potential for overfitting, we consider the ability of this feature set to generalize to a different set of programmers, and show that it does so, further supporting our belief that these features effectively capture programming style. Features that are highly predictive of authorial fingerprints include file and stream operations along with the formats and initializations of variables from the domain of ASTs, whereas arithmetic, logic, and stack operations are the most distinguishing ones among the assembly instructions.

Without optimisation enabled, the random forest is able to correctly classify 900 test instances with 95% accuracy. Furthermore, given just a single sample of code (for training) from a given author, the author can be identified out of a pool of 100 candidates with 65% accuracy.

The classifier also reaches a point of dramatically diminishing returns with as few as three training samples, and obtains a stable accuracy by training on 6 samples. Given the complexity of the task, this combination of high accuracy with extremely low requirement on training data is remarkable, and suggests the robustness of our features and method.

The technique continues to work well as the candidate pool size grows:

Turning up the difficulty level

Programming style is preserved to a great extent even under the most aggressive level 3 optimisations:

…programmers of optimized executable binaries can be de-anonymized, and optimization is not a highly effective code anonymization method.

Fully stripping symbol information reduces classification accuracy by 24%, so even removing symbols is not an effective form of anonymisation.

For the  pièce de résistance the authors use Obfuscator-LLVM and apply all three of its obfuscation techniques (instruction substitution, introducing bogus control flow, flattening control flow graphs). And the result? “Using the same features as before, we obtain an accuracy of 88% in correctly classifying authors.

… while we show that our method is capable of dealing with simple binary obfuscation techniques, we do not consider binaries that are heavily obfuscated to hinder reverse engineering.

So you want to stay anonymous?

If you really do want to remain anonymous, you’d better plan for that from the very beginning of your programming career, and even then it doesn’t look easy! Here are the conditions recommended by the authors:

  • Do not have any public repositories
  • Don’t release multiple programs using the same online identity
  • Try to have a different coding style (!) in each piece of software you write, and try to code in different programming languages.
  • Use different optimisations and obfuscations to avoid deterministic patterns

Another suggestion that comes to mind is to use an obfuscater deliberately designed to prevent reverse engineering. Although since these weren’t tested, we don’t actually know how effective that will be.

A programmer who accomplishes randomness across all potential identifying factors would be very difficult to deanonymize. Nevertheless, even the most privacy savvy developer might be willing to contribute to open source software or build a reputation for her identity based on her set of products, which would be a challenge for maintaining anonymity.

Exposing congestion attack on emerging connected vehicle based traffic signal control

March 15, 2018
tags: ,

Exposing congestion attack on emerging connected vehicle based signal traffic signal control Chen et al., NDSS’18

I selected this paper as a great case study on the need to consider adversarial scenarios when deploying IoT and smart city systems. It was also an eye opener to me just how quickly the U.S. Department of Transport (USDOT) is planning to roll out connected vehicle technology. In 2016 the USDOT awarded $45M to start small-scale deployment of connection vehicle (CV) systems as part of CV Pilot Deployment Program. There are three live sites in New York, Anthem, and Palo Alto (of course!) as part of the Intelligent Traffic Signal Signal (I-SIG) using information from connected vehicles to influence the timing of traffic signals. I-SIG has been shown to reduce traffic delays by 26.6%. The USDOT has proposed to mandate all new new light-duty vehicles be equipped with CV technology. Even so, it is estimated it will take 25-30 years to reach at least 95% penetration of all vehicles on the roads.

Privacy issues aside, the improvement in traffic flow is significant. But this assumes everyone is playing by the rules. What if an adversary — perhaps just a single vehicle — tries to game the system? Maybe to try and speed their own passage through traffic light junctions, or perhaps to deliberately cause congestion and bring traffic to a halt. The authors look at data spoofing attacks on the I-SIG system, targeting design or implementation issues in the signal control algorithm (i.e., not relying on implementation bugs). As you could probably guess, they find exploitable weaknesses!

  • Using just a single attack vehicle, the total traffic delay at a junction can be increased by up to 68.1%, completely reversing the expected benefit of the I-SIG system.
  • Attacks can also completely jam entire approaches — in the example below vehicles queuing in left-turn lanes spill over and block through lanes, causing massive traffic jams. 22% of vehicles take over 7 minutes to get through the junction, when it should take 30 seconds.

As shown in our study, event though the I-SIG system has shown high effectiveness in benign settings, the current algorithm design and configuration choices are highly vulnerable to data spoofing.

Connected vehicle and I-SIG background

Data transmission takes place via the Dedicated Short Range Communications (DSRC) protocol, with DSRC devices embedded in connected vehicles via on-board units (OBUs), and in roadside infrastructure via roadside units (RBUs). Communication can be vehicle-to-vehicle or vehicle-to-infrastructure. Here we’re concerned with the latter case. Vehicles periodically broadcast Basic Safety Messages (BSM) which include information on their real-time trajectory (location and speed). BSM messages are signed using a PKI system.

I-SIG uses BSM messages to perform more effective signal control at a junction.

  • BSM messages are received by a trajectory awareness component which maintains the latest trajectory for each vehicle indexed by vehicle ID, and also assign vehicles to requested traffic light phases based on the intersection map.
  • At the beginning of each stage in the light sequence, the signal planning component pulls real-time trajectory information for the vehicles in the intersection, performs planning, and sends signal control commands to the controller.

There are two algorithms used for signal planning: COP and EVLS! COP stands for controlled optimization of phases. It takes as input each vehicles estimated arrival time at the junction, and uses dynamic programming to calculate an optimal signal plan with the least estimated total delay. Total delays are estimated using a queuing model. If there are no vehicles requesting a certain phase, COP will skip the phase altogether. In theory, COP can plan over an unlimited number of stages, but in practice it is configured to look ahead only a set number of stages (two). This is due to a combination of the relatively low-powered hardware used in the roadside units (to keep costs down) and the real-time constraints of needing planning to finish in sufficient time (typically 5-7 seconds).

COP ideally wants all vehicles to be equipped with broadcasting equipment, and its effectiveness is greatly reduced when the portion of equipped vehicles falls below about 95%. EVLS (Estimation of location and speed) is used to bridge the gap before penetration reaches this level. EVLS uses the trajectory data that is available from equipped vehicles to estimate the trajectories of unequipped vehicles.

Threat model

It is assumed that an attacker can compromise the on-board system in their own vehicle to send malicious BSM messages to the RSUs. “We do not assume that the attackers can spoof the sender identities in the BSM messages.

To maximize the realism of our threat model, in this paper we assume that only one attack vehicle presents at an intersection. Since the COP algorithm targets optimized total delay for all vehicles in an intersection, which normally have over 100 of them, it should be very challenging for the data from one single vehicle to significantly influence the signal planning.

(But not as challenging as expected, it turns out!).

The attack vehicle doesn’t necessarily have to be in the traffic flow, it could just park nearby, listen to BSM messages from other vehicles, and seek chances to launch attacks. At which point, it seems to me it doesn’t necessarily have to be a vehicle either, just so long as the attacker has a device that can simulate an OBU and broadcast over DSRC.

Data spoofing attacks and the unusual influence of ‘the last vehicle’

The data flow in an I-SIG system under attack looks like this:


All incoming data is first subject to a geofence check, so the attacker needs to perform reconnaissance to know the bounding box and only generate location data that will pass the check. After this hurdle, the attacker’s goal is to change the values in the arrival table so as to influence planning in the COP algorithm.

Strategy S1, which works both in a full deployment in which over 95% of vehicles are assumed to be connected, and in the transition period before that penetration rate (PR) is reached, is for an attacker to change the speed and location in its BSM message so as to target a given arrival table slot and increase its count by one.

Strategy S2 relies on confusing EVLS as it feeds into COP. Since this transition period is expected to last for the next 25-30 years, that seems to give plenty of opportunity! Manipulating the estimation results can influence the signal plan more significantly than simply changing one vehicles data as S1 does.

After some experimentation the authors determine that the most effective strategy for S2 is to target the queue estimation process. EVLS uses the available data from equipped vehicles to divide traffic in a lane into three regions: (i) the queuing region, including vehicles waiting in the queue with zero speed, (ii) the slow-down region which includes vehicles slowing down because of what is in front of them, and (iii) the free-flow region where vehicles are sufficiently far from the queue that they behave independently.

Among the three regions, we find that manipulating the estimation of the queuing region is most effective. The attacker can just set the speed to zero and set its location to the farthest possible point of the most empty lane within the geofence so that the lane can be fully filled with queuing vehicles after the estimation.

The very most successful attacks add a spoofed vehicle with a very late arrival time, causing the green light end time for its requested phase to be pushed out. This ‘last vehicle’ has an outsized effect on the signal planning.

In an unlimited COP implementation this wouldn’t have such a great impact, but when the planning horizon is limited to two stages there are limited opportunities in the plan to serve all vehicles, causing the planning to be significant affected by the late arriver.

Moving from theoretical analysis to the construction of real-time attacks, the authors assume an attacker with a 4-core laptop running a parallel I-SIG algorithm to try various data spoofing options and find the most effective one. It has a budget of about 4 seconds to determine the best BMS message to send in each window.

Testing using real traffic data from a junction and commercial-grade traffic simulation software (PTV VISSIM) the authors find that their attacks are very successful in creating traffic delays, including totally blocking approaches as described at the top of this post.

Possible defences

The COP algorithm as is, is only optimal once the deployment penetration rate gets above 95%. One option is to design a better algorithm!

Considering that the transition period is unavoidable and may last as long as 30 years, we believe this calls for a joint research effort among the transportation and security communities to design effective and robust signal control algorithms specifically for the transition period.

It will also be important to improve the performance of the hardware in roadside units to enable planning over more than two (ideally five) stages.

Finally, we can add redundant sources of information to the system:

…to ensure high effectiveness, data spoofing detection on the infrastructure side needs to rely on data sources that attackers cannot easily control, e.g., infrastructure-controlled sensors, to cross validate the data in BSM messages… for example, the vehicle detectors buried underneath the stop bar of each lane as used to measure aggregated traffic information in today’s traffic control.

Game of missuggestions: semantic analysis of search autocomplete manipulation

March 14, 2018

Game of missuggestions: semantic analysis of search autocomplete manipulations  Wang et al., NDSS’18

Maybe I’ve been pretty naive here, but I really had no idea about the extent of manipulation (blackhat SEO) of autocomplete suggestions for search until I read this paper. But when you think about it, it makes sense that people would be doing this: it’s a lot easier to generate a bunch of fake queries to try and manipulate autocomplete suggestions than it is to create a collection of highly ranked content. Here’s an example of what we’re talking about:

The paper contains two related investigations: one looking at the growing ecosystem of ‘autocomplete manipulation as-a-service’ providers, and one using a tool called Sacabuche to look at the extent to which manipulated suggestions have been able to pollute the autocomplete suggestion lists from the major search engines.

Looking into the manipulated autocomplete results reported by Sacabuche, we are surprised to find that this new threat is indeed pervasive, having a large impact on today’s Internet. More specifically, over 383K manipulated suggestions (across 275K triggers) were found from mainstream search engines, including Google, Bing, and Yahoo!. Particularly, we found at least 0.48% of the Google autocomplete results are polluted.

The manipulations are often used to promote dodgy sites, for example, with low quality content, malware, phishing. And by the way, there’s absolutely no reason you couldn’t use the same techniques to manipulate autocomplete suggestions for politicians…

According to publicly available information, autocomplete suggestions at major search engines rely heavily on the popularity of queries observed from search logs together with their freshness (how trending they are as popular topics). This makes them vulnerable to adversaries capable of forging a large number of queries.

Determining how widespread manipulation is

The authors built a tool called Sacabuche (Search AutoComplete Abuse Checking) to try and figure out just how widespread manipulation is in practice.

Our key insight is that a trigger and its suggestion are often less related when the autocomplete has been manipulated, presumably due to the fact that a missuggestion tends to promote a specific and rarely known product, which is less relevant to the corresponding trigger.

Consider a search query “bingo sites play free” a genuine autocomplete suggestion might be “free bingo sites for us players,” and a manipulated one “play free bingo online now an” The latter is more specific, and therefore less similar to the trigger. The insight leads to a two-stage process for uncovering manipulated suggestions: first evaluating autocomplete suggestions, and then verifying search results for suggestions that seem suspicious to get further confirmation.

Hitting the autocomplete endpoint is cheap and easy, so starting with a set of seed words and growing from there, it’s possible to submit a query and get back a set of autocomplete suggestions. It’s then possible to compare the semantic distance between the trigger (search term) and the suggestions. The algorithm uses word2vec at its core (see ‘The Amazing Power of Word Vectors’).

Such a semantic inconsistency feature was found to be discriminative in our research… missuggestions tend to have lower sentence similarity than benign ones: the average sentence similarity is 0.56 for the missuggestions, and 0.67 for the legitimate ones.

Furthermore, if you actually use a suggestion to perform a search, it turns out that for genuine suggestions the search results are in line with those you would get for the initial trigger, but for missuggestions, they often are not. Consider the trigger “online backup free download.” If you search on that term, you might see results like this:

Which are not too dissimilar from the results you might see if you instead search on one of the genuine autocomplete suggestions: “norton online backup free download” :

However, if we look at the manipulated suggestion “strongvault online backup free download” the search results look very different to those produced by the initial trigger:

The search result analysis uses the Rank-Biased Overlap algorithm to evaluate the similarity of two ranked lists. Not only are the suggestions different, but it also turns out that there tend to be fewer results overall when a manipulated suggestion is queried, and the domains involved often have low Alexa ranking.

Overall the set of features used to detect manipulated suggestions is as follows:

Sacabuche is used to automatically analyse 114 million suggestions and trigger pairs, and 1.6 million search result pages. The seed triggers are obtained from a set of ‘hot’ phrases representing popular search interests, coupled with 386 gambling keywords, and 514 pharmaceutical keywords. These latter two are added because they tend to be popular for manipulation, but are under-represented in the population at large. From the seed keywords new suggestion triggers are iteratively discovered from the autocomplete suggestions.

Altogether, our Search Term Analyzer found 1 million pairs to be suspicious. These pairs were further inspected by the Search Result Analyzer, which reported 460K to be illegitimate. Among all those detected, 5.6% of manipulated suggestions include links to compromised or malicious websites in their top 20 search results. In this step, we manually inspected 1K suggestion trigger pair, and concluded that Sacabuche achieved a precision of 95.4% on the unknown set.

The following table shows the most popular categories for autocomplete manipulation:

Manipulated suggestions tend to follow common patterns, the most popular being the “trigger relevant content + target” pattern. For example, “free web hosting by” Here are the top five patterns that indicate you might be looking at a manipulated suggestion:

The autocomplete manipulation industry

Promoting autocomplete suggestions is a booming business! It works like this:

An example of an autocomplete manipulation system is iXiala, which provides a manipulation service on 19 different platforms (web and mobile) including search engines, C2C, and B2C platforms.

To understand their promotion strategies and ecosystem, we studied such services through communicating with some of them to understand their services and purchasing the service to promote suggestion terms.

The services use a combination of search log pollution and search result pollution. Search logs are often polluted through crowd-sourcing of search requests. For example, the authors purchased a service from Affordable Reputation Management, who hired crowd-sourcing operators around the U.S. to perform the search. The workers performed the following three-step process:

  1. Search for the target suggestion phrase on Google
  2. Click the client’s website (which seems to presume it was already somewhere in the search results at least?)
  3. Click any other link from this website

Surprisingly, the operation took effect in just one month with our suggestion phrase ranking first in the autocomplete list. The service provider claimed that this approach was difficult to detect since the searches were real and the operations performed by the crowd-sourcing operators were unrecognizable from the normal activities.

In combination with search log pollution using the above process, search result pollution is also attempted using a variety of blackhat SEO techniques such as keyword stuffing, link farms, and traffic spam (adding unrelated keywords to manipulate relevance).

Manipulation campaigns often monetise by attracting victims to download malware of visit phishing sites, or by selling the victim’s traffic through an affiliate program. The top 3 affiliate networks involved in these schemes are shown in the table below.

Promoting a phrase costs between $300 to $ 2,500 per month depending on the target search engines and the popularity of the phrase. It takes 1-3 months to get the suggestion visible in the autocomplete results.

Hence the average revenue for the service providers to successfully manipulate one suggestion is $ 2,544… Through the communication with the provider iXiala, we found that 10K sites requested suggestion manipulation on it, which related to a revenue of $ 569K per week with 465K manipulated suggestions. At the same time, iXiala had a group of operators, who earned a commission of $ 54K per week. Hence, iXiala earned a profit of around $ 515K per week.

The last word

Our findings reveal the significant impact of the threat, with hundreds of thousands of manipulated terms promoted through major search engines (Google, Bing, Yahoo!), spreading low-quality content and even malware and phishing. Also discovered in this study are the sophisticated evasion and promotion techniques employed in the attack and exceedingly long lifetimes of the abused terms, which call for further studies on the illicit activities and serious efforts to mitigate the ultimately eliminate this threat.

JavaScript Zero: real JavaScript, and zero side-channel attacks

March 13, 2018

JavaScript Zero: Real JavaScript and zero side-channel attacks Schwarz et al., NDSS’18

We’re moving from the server-side back to the client-side today, with a very topical paper looking at defences against micro-architectural and side-channel attacks in browsers. Since submission of the paper to NDSS’18, this subject grew in prominence of course with the announcement of the meltdown and spectre attacks.

Microarchitectural attacks can also be implemented in JavaScript, exploiting properties inherent to the design of the microarchitecture, such as timing differences in memory accesses. Although JavaScript code runs in a sandbox, Oren et al. demonstrated that it is possible to mount cache attacks in JavaScript. Since their work, a series of microarchitectural attacks have been mounted from websites, such as page deduplication attacks, Rowhammer attacks, ASLR bypasses, and DRAM addressing attacks.

Chrome Zero is a proof of concept implementation that defends against these attacks. It installs as a Chrome extension and protects functions, properties, and objects that can be exploited to construct attacks. The basic idea is very simple, functions are wrapped with replacement versions that allow injection of a policy. This idea of wrapping functions (and properties with accessor properties, and certain objects with proxy objects) goes by the fancy name of virtual machine layering.

Closures are used when wrapping functions to ensure that references to the original function are inaccessible to any code outside of the closure.

Policies determine what the wrappers actually do. There are four possible alternatives:

  1. Allow (passthrough)
  2. Block – the function is replaced by a stub that returns a given default value
  3. Modify – the function is replaced with a policy-defined function, which may still call the original function if required. An example would be reducing the resolution of timers.
  4. User permission – JavaScript execution is paused and the user is asked for permission to continue executing the script.


As the code is continuously optimized by the JIT, our injected functions are compiled to highly efficient native code, with a negligible performance difference compared to the original native functions. The results of our benchmarks show that Chrome Zero does not have a visible impact on the user experience for everyday usage.

With that basic description of the mechanism out of the way, let’s get down to the interesting part: the features in JavaScript that are used in microarchitectural and side-channel attacks, and how Chrome Zero protects them.

The features used to build attacks

We identified several requirements that are the basis for microarchitectural attacks, i.e., every attack relies on at least one of the primitives. Moreover, sensors found on many mobile devices, as well as modern browsers introduce side-channels which can also be exploited from JavaScript.

The following table provides a nice summary of attacks and the features that they require:


Memory addresses

JavaScript never discloses virtual addresses, but ArrayBuffers can be exploited to reconstruct them. Once an attacker has knowledge of virtual addresses, they have effectively defeated address space layout randomization (ASLR). Microarchitectural attacks typically need physical addresses. Since browser engines allocate ArrayBuffers page aligned, the first byte is therefore at the beginning of a new physical page. Iterating over a large array also results in page faults at the beginning of a new page. The increased time to resolve a page fault is higher than a regular memory access and can be detected.

In Chrome Zero, the challenge is to ensure that array buffers are not page-aligned, and that attackers cannot discover the offset of array buffers within the page. Chrome Zero uses four defences here:

  1. Buffer ASLR – to prevent arrays from being page aligned, the length argument of buffer constructors is implemented to allocated an additional 4KB. The start of the array is then moved to a random offset within this page, and the offset is added to every array access.
  2. Preloading – iterating through the array after constructing it triggers a page fault for every page, after which an attacker cannot learn anything from iterating since the memory is already mapped.
  3. Non-determinism – as an alternative to pre-loading, array setters can be modified to add a memory access to a random array index for every access of the array. This offers stronger protection than preloading as it prevents an attacker just waiting for pages to be swapped out. With this random access mechanism, an attacker can learn the number of pages, but not where the page borders are (see Fig 5 below).
  4. Array index randomization – the above three policies cannot thwart page-deduplication attacks. To prevent these attacks, we have to ensure that an attacker cannot deterministically choose the content of an entire page. This can be done by introducing a random linear function mapping array indices to the underlying memory. (Mechanical sympathisers weep at this point 😉 ). An access to array index x is replaced with ax + b\ \mathrm{mod}\ n, where a and b are randomly chosen and co-prime, and n is the size of the buffer.

Timing information

Accurate timing is one of the most important primitives, inherent to nearly all microarchitectural and side-channel attacks.

JavaScript provides the Date object with resolution of 1 ms, and the Performance object which provides timestamps accurate to a few microseconds. Microarchitectural attacks often require resolution on the order of nanoseconds though. Custom timing primitives, often based on some form of monotonically increment counter, are used as clock replacements.

Chrome Zero implements two timing defences: low-resolution timestamps and fuzzy time.

  • For low-resolution timestamps, the result of a high-resolution timestamp is simply rounded to a multiple of 100ms.
  • In addition to rounding the timestamp, fuzzy time adds random noise while still guaranteeing that timestamps are monotonically increasing.

The following figure shows theses policies at work against an attacker trying to distinguish between fast and slow versions of a function.

With the low-resolution timestamp and edge-thresholding, the functions are correctly distinguished in 97% of cases… when fuzzy time is enable, the functions are correctly distinguished in only 65% of the cases, and worse, in 27% of the cases the functions are wrongly classified.

This is more than enough to defeat the JavaScript keystroke detection attack of Lipp et al.:


The support for parallelism afforded by web workers provides new side-channel attack possibilities, by measuring the dispatch time of the event queue. For example, an endless loop running within a web worker can detect CPU interrupts, which can then be used to deduce keystroke information.

A drastic but effective Chrome Zero policy is to prevent real parallelism by replacing web workers with a polyfill intended for unsupported browsers that simulates web workers on the main thread. An less drastic policy is to delay the postMessage function with random delays similar to fuzzy timing. This is sufficient to defeat keystroke detection attacks:

Shared data

JavaScript’s SharedArrayBuffer behaves like a normal ArrayBuffer, but can be simultaneously accessed by multiple workers. This shared data can be exploited to build timing primitives with a nanosecond resolution.

One simple Chrome Zero policy is to disallow use of SharedArrayBuffers (which is deactivated by default in modern browsers anyway at the moment). An alternative policy is to add random delays to accesses of shared buffers. This is enough to prevent the high-resolution timing needed by attacks:

Sensor API

Some sensors are already covered by browser’s existing permission systems, but several sensors are not:

Mehrnezhad et al., showed that access to the motion and orientation sensor can compromise security. By recording the data from these sensors, they were able to infer PINs and touch gestures of the user. Although not implemented in JavaScript, Spreitzer showed that access to the ambient light sensor can also be exploited to infer user PINs. Similarly, Olejnik utilized the Ambient Light Sensor API to recover information on the user’s browsing history, to violate the same-origin policy, and to steal cross-origin data.

The battery status API can also be used to enable a tracking identifier. In Chrome Zero the battery interface can be set to return randomized or fixed values, or disable entirely. Likewise Chrome Zero can return either a fixed value or disable the ambient light sensor API. For motion and orientation sensors data can be spoofed, or access prohibited entirely.


We’ve already seen a number of examples of Chrome Zero preventing attacks. The following table summarises the policies and their effect on attacks:


Back-testing on all 12 CVEs discovered since 2016 for Chrome 49 or later reveals that half (6) of them are prevented. Creating policies to specifically target CVEs was not a goal of the current research.

The performance overhead of Chrome Zero was evaluated on the Alexa Top 10 websites, and correct functioning of sites was evaluated for the Alexa Top 25. Page load times (we’re not told what metric is actually measured) increase from 10.64ms on average, to 89.08ms when all policies are enabled — the overhead is proportional to the number of policies in force. On the JetStream browser benchmark, Chrome Zero shows a performance overhead of 1.54%.

Depending on what an application actually does with arrays, web workers etc., I would expect the impact to be significantly greater in some circumstances. However, in a double-blind user study with 24 participants, the participants were unable to tell whether they were using a browser with Chrome Zero enable or not – apart from on the site.

Our work shows that transparent low-overhead defenses against JavaScript-based state-of-the-are microarchitectural attacks and side-channel attacks are practical.

Synode: understanding and automatically preventing injection attacks on Node.js

March 12, 2018

Synode: understanding and automatically preventing injection attacks on Node.js Staicu et al., NDSS’18

If you’re using JavaScript on the server side (node.js), then you’ll want to understand the class of vulnerabilities described in this paper. JavaScript on the server side doesn’t enjoy some of the same protections as JavaScript running in a browser. In particular, Node.js modules can interact freely with the operating system without the benefit of a security sandbox. The bottom line is this:

We show that injection vulnerabilities are prevalent in practice, both due to eval, which was previously studied for browser code, and due to the powerful exec API introduced in Node.js. Our study suggests that thousands of modules may be vulnerable to command injection attacks and that fixing them takes a long time, even for popular projects.

The Synode tool developed by the authors combines static analysis with runtime protection to defend against such attacks. You can get it at

Eval and exec injection vulnerabilities

There are two families of APIs that may allow an attacker to inject unexpected code:

  • execand its variants take a string argument and interpret it as a shell command (what could possibly go wrong??!)
  • eval and its variants take a string argument and interpret it as JavaScript code, allowing the execution of arbitrary code in the context of the current application.

Of course, you can combine the two to eval a string containing an exec command.

Node.js code has direct access to the file system, network resources, and any other operating system-level resources provided to processes. As a result, injections are among the most serious security threats on Node.js…

Here’s an example program illustrating a vulnerability:

Consider calling this function as follows: backupFile('-help && rm -rf * && echo ", "'). As the authors delightfully put it: “Unfortunately this command does not backup any files but instead it creates space for future backups by deleting all files in the current directory.”

How widespread is the problem?

The authors studies 235,850 npm modules, and found that 3% (7,686 modules) and 4% (9,111 modules) use exec and eval respectively. Once you start looking at dependencies though (i.e., modules that depend on an exec- or eval-using module), then about 20% of all modules turn out to directly or indirectly depend on at least one injection API.

Fixing the most popular 5% of injection modules would protect almost 90% of the directly dependent modules. Unfortunately, that still requires changing over 780 modules.

Perhaps these vulnerabilities are in seldom-used modules though? That turns out not to be the case:

The results invalidate the hypothesis that vulnerable modules are unpopular. On the contrary, we observe that various vulnerable modules and injection modules are highly popular, exposing millions of users to the risk of injections.

The authors then looked at call-sites to determine the extent to which data is checked before being passed into injection APIs. Can the site be reached by potentially attacker-controlled data, and are there mitigation checks in place?

A staggering 90% of the call sites do not use any mitigation technique at all.

Another 9% attempt to sanitise input using regular expressions. Unfortunately, most of those were not correctly implemented. No module used a third-party sanitization module to prevent injections, even though several such modules exist.

Reporting a representative set of 20 vulnerabilities to module developers did not result in quick fixes. “Most of the developers acknowledge the problem. However, in the course of several months only 3 of the 20 vulnerabilities have been completely fixed, confirming earlier observations about the difficulty of effectively notifying developers.

Introducing Synode

…the risk of injection vulnerabilities is widespread, and a practical technique to mitigate them must support module maintainers who are not particularly responsive. Motivated by these findings, this section presents Synode…

Synode combines static analysis to detect places where injection attacks can potentially take place, with runtime enforcement (guided by the results of that analysis) to ensure that injection attacks are detected and thwarted. The recommended deployment of Synode is via an npm post-installation script. This script will run on each explicitly declared third-party dependent and perform the code rewriting to add dynamic enforcement if needed.

The static analysis phase identifies call sites for injection APIs, and summarises what is known statically about all of the values that may be passed to the function in a template tree. For example:

The template trees are then reduced to a set of templates, where a template is a sequence of strings and inserts:

If all the templates for a particular call site are constant strings, i.e., there are no unknown parts in the template, then the analysis concludes that the call site is statically safe. For such statically safe call sites, no runtime checking is required. In contrast, the analysis cannot statically ensure the absence of injections if the templates for the call site contain unknown values. In this case, checking is deferred to runtime…

The goal of runtime checking is to prevent values that expand the template computed for the call site in a way that is likely to be unforeseen by the developer, and of course to do so as efficiently as possible. To achieve these combined aims the statically extracted set of templates are first expanded into a set of partial abstract syntax trees (PAST) that represent the expected structure of benign values. Then at runtime the value passed to the injection API is parsed into an AST, and this is compared against the pre-computed PASTs. This process ensures that (i) the runtime AST is derivable from at least one of the PASTs by expanding the unknown substrees, and (ii) the expansions remain within an allowed subset of all possible AST nodes.

For shell commands passed to exec, only AST nodes that represent literals are considered safe. For eval, all AST node types that occur in JSON code are considered safe.


The mitigation technique is applied to all (at the time of the study) 15,604 node.js modules with at least one injection API call site.

  • 18,924 of all 51,627 call sites are found to be statically safe (36.66%)
  • The templates for the vast majority of call sites have at most one hole, and very few templates contain more than five.
  • Static analysis completes for 96.27% of the 15,604 modules in less than one minute, with an average analysis time for these modules of 4.38 seconds.

To evaluate the runtime mechanism 24 vulnerable modules are exercised with benign and malicious inputs. The modules and injection vectors used are shown in the following table:

This results in 5 false positives (out of 56 benign inputs), which are caused by limitations of the static analysis (3/5) or node types outside of the safe set (2/5). There are no false negatives (undetected malicious inputs). The average runtime overhead for a call is 0.74ms.

The last word

In a broader scope, this work shows the urgent need for security tools targeted at Node.js. The technique presented in this paper is an important first step toward securing the increasingly important class of Node.js applications, and we hope it will inspire future work in this space.

Settling payments fast and private: efficient decentralized routing for path-based transactions

March 9, 2018

Settling payments fast and private: efficient decentralized routing for path-based transactions Roos et al., NDSS’18

Peer-to-peer path-based-transaction (PBT) networks such as the Lightning Network address scalability, efficiency, and interoperability concerns with blockchains through off-chain transactions. They work by establishing decentralised chains of participants through which payments are routed.

A PBT network builds on top of three key algorithms: routing, payment, and accountability. The routing algorithm is in charge of finding paths with enough funds from sender to receiver. The payment algorithm settles the funds between sender and receiver along the paths connecting them. Finally, the accountability algorithm allows the resolution of disputes in the presence of misbehaving users.

A good routing algorithm should offer efficiency, effectiveness (i.e., it is able to successfully route transactions), scalability, and privacy. When it comes to transaction privacy, this is comprised of three elements:

  1. Value privacy: an adversary should not be able to determine the value of the payment made between two non-compromised users. In this paper, this guarantee comes with the additional condition that none of the intermediate nodes are compromised either. Payments are split into multiple pieces, each following a separate route between the sender and receiver. A weaker form of value privacy can be offered even when some intermediate nodes are compromised, so long as at least one of the employed paths remains non-compromised.
    1. Sender privacy: it should not be possible for an adversary to determine the sender in a path-based transaction between non-compromised users.
  2. Receiver privacy: it should not be possible for an adversary to determine the receiver in a path-based transaction between non-compromised users.

The few routing algorithms proposed so far for PBT networks fail to achieve either privacy, efficiency, or scalability… In this work, we present SpeedyMurmurs, a routing algorithm for PBT networks that provides formal privacy guarantees in a fully distributed setting and outperforms the state-of-the-art routing algorithms in terms of effectiveness and efficiency.

And that makes SpeedyMurmurs very interesting indeed!

SpeedyMurmurs builds on existing work in SilentWhispers, Prefix Embedding and VOUTE.

Building blocks and limitations

In a PBT network users maintained weighted links between them, where the weights correspond to application-dependent funds. To make a payment, a path of intermediate users is formed between the sender and recipient, and link weights are pairwise adjusted to effectively settle funds between a sender and a receiver. This paper focuses on routing algorithm used to find paths.

Private PBTs with SilentWhispers

SilentWhispers is a decentralised PBT network without a public ledger. It offers good privacy, but lacks efficiency. SilentWhispers uses landmark-centered routing to discover multiple paths, and then performs multi-party computation to determine the amount of funds to send along each path. Landmark-centered routing uses well-known nodes of high connectivity, known as landmarks, to form the backbone of path creation. Payment is in two phases: first a probe operation computes the credit available in paths and determines amounts c_{i} to send down each of L paths, such that \sum_{i \in 1..L}{c_i} = c, where c is the total amount to be transferred between sender and receiver. A payment operation then executes the suggested transfers is a secure manner.

SilentWhispers has four main disadvantages though (which SpeedyMurmurs seeks to overcome):

  1. Periodic tree creation from landmarks is slow to react to changes in the network.
  2. All paths include the landmarks, even when the sender and receiver are in the same branch or there is an otherwise shorter path available. This makes paths unnecessarily long and increases the chances of failing to find at least one link without enough funds.
  3. The probe operation requires that nodes in a transaction path send shares to all landmarks. Transaction overhead scales quadratically in the number of landmarks.
  4. There is no suitable solution for concurrent transfers.

Embedding based routing and Prefix Embedding

Peer-to-peer PBT networks differ from common peer-to-peer networks as the connections between peers are predefined and cannot be changed to improve the quality of the routing. Due to their fixed structure, P2P PBT networks are route-restricted and hence are closely related to Friend-to-Friend (F2F) networks, which restrict connections to peers sharing a mutual trust relationship.

The state of the art in F2F networks is embedding-based routing. Embeddings assign coordinates to nodes in a network, and have nodes forward packets based on the distances between coordinates known to that node and a destination coordinate. Greedy embeddings assign coordinates base on a node’s position in a spanning tree.

Prefix Embedding is a greedy embedding that enables routing of messages in F2F overlays. As illustrated in Fig 1, Prefix Embedding assigns coordinates in the form of vectors, starting with an empty vector at the landmark/root. Each internal node of the spanning tree enumerates its children and appends the enumeration index of a child to its coordinate to obtain the child coordinate. The distance between two such coordinates corresponds to the length of the shortest path in the spanning tree between them.


VOUTE is a routing algorithm that provides anonymous and efficient message delivery based on Prefix Embedding, for dynamic route-restricted networks. Whereas Prefix Embedding reveals the unique coordinate of the receiver, VOUTE allows nodes to provide anonymous return addresses. VOUTE also replaces the node enumeration index with random b-bit numbers. Instead of periodically reconstructing the spanning tree, VOUTE repairs the tree on-demand.

When a tree is being built, nodes send invitations to all neighbours offering to become a parent. Nodes accept only one such invitation, but keep the others in their back pocket. If a link in the spanning tree ceases to exist, the child node and all its descendants choose a new parent based on their remaining invitations.

VOUTE uses undirected and unweighted links, whereas in PBT we need directed and weighted links. Likewise, VOUTE does not take into account changes in link weights when repairing the tree.


SpeedyMurmurs builds on ideas from SilentWhispers and VOUTE, adapting them to a scalable, efficient, effective and private PBT algorithm. There are three key operations in SpeedyMurmurs routing: setRoutes, setCred, and routePay.

  • In setRoutes multiple embeddings are constructed, one for each landmark. The VOUTE algorithm is divided into two phases. In the first phase only bidirectional links are added during construction of a spanning tree. In the second phase nodes that have not yet been included join by adding unidirectional links. The algorithm is described below using queues for ease of explanation. In the actual distributed scenario there are no central queues, and instead nodes send messages to their neighbours when they join a spanning tree. Because we also can’t take a central decision to move from the first phase to the second, setRoutes uses a simple time limit to determine when to shift.

  • setCred(it) reacts to a pair of nodes (u,v) that want to change the value of their shared link to c. Such a change may lead to coordinate changes. If one of the nodes changes its parent, all descendants also remove their coordinates, and then choose a new parent and corresponding coordinate. There are three situations that indicate the need for a coordinate change: (i) it’s a new non-zero unidirectional link and of the nodes is not yet part of the tree and hence should choose the other as their parent; (ii) it’s a new non-zero bidirectional link and u only has a unidirectional link to its current parent. In this case u changes its parent to v ; (iii) if a link is removed the child node should select a new parent.

  • routePay corresponds to the probe operation in SilentWhispers. It divides the process into three parts: generating receiver addresses, splitting the total transaction value c randomly over L paths, and finding paths for all embeddings that can transmit the required value. The path selection only considers links with guaranteed available credit, where the guaranteed credit is the link credit minus any funds a probe operation indicates is pending transmission along the link.

Apart from using embedding-based routing, SpeedyMurmurs distinguishes itself from SilentWhispers by splitting the credit between paths before the path discovery.


The performance evaluation is based on a simulation using data obtained from crawling the Ripple PBT network. In the interests of space, I’m going to cut straight to the chase:

  • SpeedyMurmurs outperforms SilentWhispers on all performance metrics in a static scenario (routing when links don’t change)
  • On-demand stabilization and embedding-based routing had a positive effect on all five performance metrics.
  • The Ford-Fulkerson method achieves a higher success than both SpeedyMurmurs and SilentWhispers when it comes to finding a path. However, it’s overhead is 2-3 orders of magnitude greater.
  • The evolution of the PBT network affects the performance of SpeedyMurmurs considerably. Stabilisation overhead and success ratio vary considerably depending on the frequency of transactions and link changes. SilentWhispers is more efficient during periods of frequent change, but results in higher overhead otherwise. One possibility for future work is to dynamically switch between the two depending on network conditions.


The last word

Our extensive simulation study and analysis indicate that SpeedyMurmurs is highly efficient and achieves a high probability of success while still providing value privacy as well as sender/receiver privacy against a strong network adversary. As these privacy notions are essential for PBT applications, SpeedyMurmurs is an ideal routing algorithm for decentralized credit networks and payment channel networks, as well as of emerging inter-blockchain algorithms.

Zeus: Analyzing safety of smart contracts

March 8, 2018

Zeus: Analyzing safety of smart contracts Kalra et al., NDSS’18

I’m sure many readers of The Morning Paper are also relatively experienced programmers. So how does this challenge sound? I want you to write a program that has to run in a concurrent environment under Byzantine circumstances where any adversary can invoke your program with any arguments of their choosing. The environment in which your program executes (and hence any direct or indirect environmental dependencies) is also under adversary control. If you make a single exploitable mistake or oversight in the implementation, or even in the logical design of the program, then either you personally or perhaps the users of your program could lose a substantial amount of money. Where your program will run, there is no legal recourse if things go wrong. Oh, and once you release the first version of your program, you can never change it. It has be right first time.

I don’t think there are many experienced programmers that would fancy taking on this challenge. But call it ‘writing a smart contract’ and programmers are lining up around the block to have a go! Most of them it seems, get it wrong.

Zeus is a framework for verifying the correctness and fairness of smart contracts:

We have built a prototype of Zeus for Ethereum and Fabric blockchain platforms, and evaluated it with over 22.4K smart contracts. Our evaluation indicates that about 94.6% of contracts (containing cryptocurrency worth more than $0.5B) are vulnerable…. (however, we do not investigate the practical exploitability of these bugs).

What could possibly go wrong?

We’ve studied some of the issues involved in writing smart contracts before. The authors of Zeus also provide a short summary of some of the ways that smart contracts can be incorrect or unfair. Implementation challenges leading to incorrect behaviour include:

  • Reentrancy: multiple parallel external invocations are possible using the call family of constructs. If global state is not correctly managed, a contract can be vulnerable to reentrancy attacks. The related cross-function race condition can occur when two different functions operate on the same global state.

  • Unchecked sends: upon a send call, a computation-heavy fallback function at the receiving contract can exhaust the available gas and cause the invoking send to fail. If this is not correctly handled it can lead to loss of Ether.

  • Failed sends: best practices suggest executing a throw upon a failed send in order to revert the transaction. But this practice also has its risks. For example, in the following extract the throw causes loss of money to the DAO:

  • Integer over or underflow: there are over 20 different scenarios that require careful handling of integer operations to avoid overflow or underflow. For example:

  • Transaction state dependence: contract writers can utilise transaction state variables such as tx.origin for managing control flow within a contract. A crafty attacker can manipulate tx.origin (for example, using social engineering or phishing techniques) to their advantage.

Contracts may also end up being unfair due to logical design errors:

  • Absence of required logic: for example, not guarding a call to selfdestruct with a check that only the owner of a contract is allowed to kill it.
  • Incorrect logic: “there are many syntactically legal ways to achieve semantically unfair behaviour.” For example, the HackersGold contract had a bug where the transferFrom function used =+ in place of += and thus failed to increment a balance transferred to the recipient. (15 unique contracts in the data set had copies of the function with the same bug, and held over $ 35,000 worth of Ether between them).
  • Logically correct but unfair designs. For example, an auction house contract that does not declare whether it is ‘with reserve,’ meaning that sellers can also bid or withdraw an item before it is sold. Unsuspecting bidders (with no expertise in examining the source code) could lose money due to artificially increased bids or forfeit their participation fee. (My personal view: when the code is the contract, and there is no other recourse, you’d really better be able to read the code if you want to participate). “This contract… indicates the subtleties involved in multi-party interactions, where fairness is subjective.”

Finally, miner’s also have some control over the environment in which contracts execute. Smart contract designers need to be aware of:

  • Block state dependencies: many block state variables are determined from the block header, and thus vulnerable to tampering from the block miner.
  • Transaction order dependencies: miners can reorder transactions and hence potentially influence their outcome.

A high-level description of Zeus

Of Zeus itself, we get only a relatively high-level description.

…Zeus takes as input a smart contract and a policy (written in a specification language) against which the smart contract must be verified. It performs static analysis atop the smart contract code and inserts policy predicates as assert statements at correct program points. Zeus then leverages its source code translator to faithfully convert the smart contract embedded with policy assertions to LLVM bitcode. Finally, Zeus invokes its verifier to determine assertion violations, which are indicative of policy violations.

Smart contracts are modelled using an abstract language that looks like this:

Programs are sequences of contract declarations, and each contract is viewed as a sequence of one or more method definitions in addition to the declaration and initialisation of persistent storage private to the contract. Contracts are identified by an Id, and the invocation of a publicly visible method on a contract is viewed as a transaction. Execution semantics are succinctly given in the following table:


A source code translator translates Solidity code into the abstract contract language.

The policy language enables a user to specify fairness rules for a contract. These rules are specified as XACML-styled five tuples with Subject, Object, Operation, Condition, and Result.

Our abstract language includes assertions for defining state reachability properties on the smart contract. Zeus leverages the policy tuple to extract: (a) predicate (i.e., Condition) to be asserted, and (b) the correct control location for inserting the assert statements in the program source.

Given the policy-enhanced program, Zeus translates it into LLVM bitcode. Here’s a simple end-to-end example showing the various components:

Finally, Zeus feeds the LLVM bitcode representation to an existing verification engine (Seahorn) that leverages constrained horn clauses (CHCs) to quickly ascertain the safety of the smart contract. Zeus is not tied to Seahorn though, in theory it could also be used with other verifiers operating on LLVM bitcode, such as SMACK or DIVINE.

Evaluating Solidity-based contracts

We periodically scraped Etherscan, Etherchain, and Ethercamp explorers over a period of three months and obtained source code for 22,493 contracts at distinct addresses. We discounted 45 contracts that had an assembly block in their source, and obtained 1524 unique contracts (as par their sha256 checksum). In the remainder of this section, we present results only for these unique contracts, unless otherwise specified.

The collected contracts divide up as follows:

Zeus checks for all of the correctness and miner’s influence issues are run across all of these contracts. Results are manually validated to determine the set of false positive and negatives. Zeus is also compared against the results from Oyente (‘Making smart contracts smarter’).

Here’s what Zeus finds:


  • 21,281 out of 22,493 contracts (94.6%), containing more than $ 0.5 billion worth of Ether are vulnerable to one or more bugs. Across the unique contracts, 1194 out of 1524 contracts were found to be vulnerable to one or more bugs.
  • There were zero false negatives across all seven of the bug classes.
  • Verification is fast, with only 44 out of 1524 contracts (2.89%) timing out in at least one bug class. The timeout threshold was set at one minute.

To evaluate fairness the authors had to write some policies. As as example, for the CrowdFundDao the authors implemented policies that (a) blacklisted developers cannot participate in the scheme, and (b) an investment must be more than a threshold limit. Zeus is able to determine that neither of these checks are encoded in the contract. As a general purpose policy, the team wrote a policy that selfdestruct should only be called by the contract owner. 284 of the 1524 contracts included a selfdestruct, with 5.6% of them violating the policy.