Designing secure Ethereum smart contracts: a finite state machine based approach Mavridou & Laszka, FC’18

You could be forgiven for thinking I’m down on smart contracts, but I actually think they’re a very exciting development that opens up a whole new world of possibilities. That’s why I’m so keen to see better ways of developing and verifying them. I’m watching the work to enable web assembly (WASM) enabled smart contracts with interest. At a higher level, Trent McConaghy’s series on Token Engineering (part I, part II, part III) also appeals greatly. Today’s paper choice, ‘Designing secure Ethereum smart contracts’ is about embedding a collection of lower-level smart contract design patterns into a contract generation tool, helping developers avoid some of the more fundamental errors. If you’re still coming up to speed on why such a tool might be a good idea, the Zeus paper provides some good background, which I won’t retread here.

Prior work focused on addressing these issues (vulnerabilities) in existing contracts by providing tools for verifying correctness and for identifying common vulnerabilities. In this paper, we explore a different avenue by proposing and implementing FSolidM, a novel framework for creating secure smart contracts.

FSolidM is based on a formal finite-state machine model of smart contracts, with Ethereum Solidity as the current generation target. Users develop contracts using a combination of a graphical editor and a code editor. The base tool captures the contract states, transitions, and guards. Plugins can then be used to embellish the contracts with additional desirable properties (to my eye, they look awfully like aspects. YMMV).

If you want to play with FSolidM for yourself, there’s a hosted version available at https://cps-vo.org/group/SmartContracts and the source is in GitHub at https://github.com/anmavrid/smart-contracts.

### Smart contracts as finite state machines

The running example in the paper is a blind auction smart contract. In this auction bidders send only hashed versions of their bids (i.e., they do not reveal the actual amount of the bid), and at the same time are required to make a deposit greater than or equal to the amount of the bid. The deposit ensures that the winner of the auction actually pays up.

The contract has four main states. Initially the contract is AcceptingBlindedBids (ABB). Once the auction is closed the contract moves to the RevealingBids (RB) state in which bidders reveal their bids. At the finish of the auction the contract moves to the Finished (F) state: the highest bid wins, and all bidders may withdraw their deposits — apart from the winner who may only withdraw the difference between their deposit and bid amount. Before it is finished, an auction may also be cancelled, in which case the contract moves to the Cancelled (C) state.

Here’s the corresponding FSM:

Every state has an associated set of transitions corresponding to actions that a user can perform during a blind auction. Actions may be guarded (checked pre-conditions) — guard clauses are denoted using square brackets. Guards and actions interact with variables which can be of the following types:

• contract data, stored within the contract
• input data received as a transition input
• output data returned as transition output

To specify a smart contract, the developer provides the following information:

The full generated contract for the blind auction example can be found in appendix C of this arVix version of the paper.

(This example code has the locking and transition counter security extension plugins enabled, we’ll look at those next). It would be interesting to run this code through Zeus!

### Patterns and plugins

So far then, we have generator which can build the basic skeleton of a contract. By enabling additional plugins, extra functionality can be added to the contract. The plugins fall into two main categories: protection against common vulnerabilities, and support for low-level contract design patterns.

#### Locking

The locking plugin is designed to prevent reentrancy vulnerabilities. It introduces a private boolean variable, locked, and then uses the equivalent of around advice to wrap every transition using this simple pattern:

	modifier locking {
require(!locked);
locked = true;
locked = false;
}


The locking plugin is always the outermost instrumentation on any transition.

#### Transition counter

If the behaviour of a transaction depends on the state of a contract, then it’s possible that state may change between a transaction being submitted and the transaction actually executing (transaction-ordering dependence). This can lead to security issues if care is not taken.

We provide a plugin that can prevent unpredictable-state vulnerabilities by enforcing a strict ordering on function executions. The plugin expects a transition number in every function as a parameter (i.e., a transition input variable) and ensure that the number is incremented by one for each function execution.

(c.f. optimistic transactions).

	modifier transitionCounting(uint nextTransitionNumber) {
require(nextTransitionNumber == transitionCounter);
transitionCounter += 1;
}


#### Automatic timed transitions

The automatic timed transitions plugins supports time-constraint based patterns — for example, an auction timing out after a certain period. The plugin is implemented as a modifier applied to every function that checks whether any timed transitions must be executed before the invoked transition is executed. The main benefit is avoiding accidentally missing a transition when attempting to implement the same function manually.

The timer support is fairly limited though. You can specify multiple timed transitions, but each timer is specified as a number of seconds since the creation of the contract. If you wanted any other pattern, you’re out of luck.

#### Access control

The access control plugin manages a list of administrators at runtime (identified by their addresses) and enables developers to forbid non-administrators from accessing certain functions.

The code sketch looks like this:

#### Events

The events plugin can be used to notify users of transition executions. When the plugin is enabled, transitions tagged with event emit a Solidity event after they are executed. Ethereum clients can listen to such events.

### A promising start

It’s all actually fairly simple stuff, but it’s a promising start, and the outlined directions for future work mean this might be an interesting tool to keep an eye on:

1. Additional security plugins addressing more of the known vulnerability types for smart contracts, and plugins implementing the most popular design patterns surveyed in ‘An empirical analysis of smart contracts: platforms, applications, and design patterns.’
2. Integration of verification tools and correctness-by-design techniques into the framework.
3. Support of modelling and verifying multiple interacting contracts as a set of interaction finite state machines.

A quantitative analysis of the impact of arbitrary blockchain content on Bitcoin Matzutt et al., FC’18

We’re leaving NDSS behind us now, and starting this week with a selection of papers from FC’18. First up is a really interesting analysis of what’s in the Bitcoin blockchain. But this isn’t your typical analysis of transactions, addresses, and identities, instead, Matzutt et al. take a look at file content (text, images, pdfs and so on) being stored on the blockchain. It’s not something I’d especially thought about before, but once the question has been asked, sadly the answer is all too predictable. You’ve met the human race I presume? So what do you think happens when you create a widely distributed, public, immutable data structure, and allow anyone to insert data into it? If there’s a saving grace here, it’s that the mechanism hasn’t been abused as much as you might expect. But sadly, it has been abused.

Our analysis shows that certain content, e.g., illegal pornography, can render the mere possession of a blockchain illegal… our analysis reveals more than 1,600 files on the blockchain, over 99% of which are texts or images. Among these files there is clearly objectionable content such as links to child pornography, which is distributed to all Bitcoin participants.

Let’s take a look at the kinds of content that might cause problems for blockchain participants, methods for inserting data on the Bitcoin blockchain, and what the authors find when they analyse the Bitcoin blockchain as of August 2017.

### Problematic content

Despite the potential benefits of data in the blockchain, insertion of objectionable content can put all participants in the Bitcoin network at risk…

The authors identify five categories of content that may cause problems for anyone storing the blockchain: copyright violations; malware; privacy violations; politically sensitive content; and illegal and condemned content.

#### Malware

Malware could in theory be spread via blockchains. Even it doesn’t become activated through that route, it can still be a nuisance. For example, a non-functional virus signature from 1987 was detected on the blockchain by Microsoft’s anti-virus software, denying access to the blockchain files on disk. This issue had to be fixed manually.

#### Privacy violations

Sensitive personal data of individuals may be posted on the blockchain, without their consent (i.e., doxing).

This threat peaks when individuals deliberately violate the privacy of others, e.g., by blackmailing victims under the threat of disclosing sensitive data about them on the blockchain. Real-world manifestations of these threats are well-known… Jurisdictions such as the whole European Union begin to actively prosecute the unauthorized disclosure and forwarding of private information in social networks to counter this novel threat.

#### Politically sensitive content

Politically sensitive content on the blockchain can cause problems for individuals in certain jurisdictions. For example, in China the mere possession of state secrets can result in longtime prison sentences. “Furthermore, China’s definition of state secrets is vague and covers e.g., ‘activities for safeguarding state security.’ Such vague allegations with respect to state secrets have been applied to critical news in the past.

#### Illegal and condemned content

Some categories of content are virtually universally condemned and prosecuted. Most notably, possession of child pornography is illegal at least in the 112 countries that ratified an optional protocol to the Convention on the Rights of the Child. Religious content such as certain symbols, prayers, or sacred texts can be objectionable in extremely religious countries that forbid other religions and under oppressive regimes that forbid religion in general.

#### Implications

If content in the categories above were to make it onto the blockchain, it would be downloaded by network participants and they could become liable for it.

Consequently, it would be illegal to participate in a blockchain-based system as soon as it contains illegal content.

That sounds a little bit dramatic when you first read it, and there are no court rulings yet, but the authors do point to related legal precedents that should give pause for thought:

Our belief stems from the fact that w.r.t. child pornography as an extreme case of illegal content, legal texts from countries such as the USA, England, and Ireland deem all data illegal that can be converted into a visual representation of illegal content… we expect that the term can be interpreted to include blockchain data in the future. For instance, this is already covered implicitly by German law, as a person is culpable for possession of illegal content if she knowingly possesses an accessible document holding said content. It is critical here that German law perceives the hard disk holding the blockchain as a document… furthermore, users can be assumed to knowingly maintain control over such illegal content with respect to German law if sufficient media coverage causes the content’s existence to become public knowledge among Bitcoin users…

### Data insertion methods

How might such objectionable content find its way onto the Bitcoin blockchain in the first case? There are actually a variety of different methods for inserting arbitrary data in the chain, ranging from a few bytes to a few kilobytes. The following table summarises the options and their relative effectiveness:

Most effective of all are standard financial transactions used to insert data using mutable values of scripts. For example, the public keys in pay-to-script-hash (P2SH) transactions can be replaced with arbitrary data as Bitcoin peers can’t verify their correctness before they are referenced by a subsequent input script. There are even blockchain content insertion services that will handle the process of injecting data into the blockchain for you. For example:

• CryptoGraffiti reads and writes messages and files to and from the Bitcoin blockchain, storing content using P2PKH (pay to public key hash) output scripts within a single transaction, storing up to 60KiB of content.
• Satoshi Uploader inserts content using a single transaction with multiple P2X outputs.
• P2SH Injectors (multiple services available) insert chunks of file content via slightly varying P2SH input scripts.
• Apertus allows fragmenting content over multiple transactions using an arbitrary number of P2PKH output scripts.

### Data on the Bitcoin blockchain

With an understanding of the various methods that can be used to inject data, you can imagine it’s possible to scan the blockchain looking for it. So I’m going to skip the description of how the authors built a tool to do that, and focus on what they found instead.

Measurements are based on Bitcoin’s complete blockchain as of August 31st, 2017, containing 482,870 blocks and 250,845,217 transactions with a total disk size of 122.64GiB. Out of this, the detectors found 3,535,855 transactions with data, comprising a total payload of 118.35MiB. The most popular mechanism is OP_RETURN, and the data containing transactions using this are predominantly used to manage off-blockchain assets or originate from notary services. P2X transactions constitute only 1.6% of all detector hits, but make up 9.08% of non-financial data (which is only a very modest 10.76MiB).

Out of the 22.63MiB of blockchain data not originating from coinbase or OP_RETURN transactions, we can extract and analyze 1557 files with meaningful content. In addition to these, we could extract 59 files using our suspicious transaction detector (92.25% text). Table 2 below summarizes the different file types of the analyzed files.

The key result is that content from all of the five objectionable content categories already exists on the Bitcoin blockchain.

The authors found seven files publishing intellectual property including the text of a book, one RSA private key, and a firmware secret key. The blockchain also contains an ‘illegal prime’ – encoding software to break the copy protection of DVDs.

#### Malware

The authors found no actual malware in Bitcoin’s blockchain. But they did find an individual non-standard transaction that contains a non-malicious cross-site scripting detector. When this is interpreted by an online blockchain parser, it notifies the author (a security researcher) about the vulnerability.

#### Privacy violations

609 transactions contain online public chat logs, emails, and forum posts, including topics such as money laundering. There are also at least two instances of doxing including phone numbers, addresses, bank accounts, passwords, and multiple online identities.

#### Politically sensitive content

The blockchain contains backups of the WikiLeaks Cablegate data as well as on online news article concerning pro-democracy demonstrations in Hong Kong in 2014.

(Here’s a dark thought: if a government wanted to clamp down on a given blockchain, all it has to do is anonymously post a transaction containing illegal or objectionable data, wait for it to propagate to all the miners in the country, and then go after them for possession).

#### Illegal and condemned content

There are at least eight files with sexual content, three of which would be considered objectionable in almost all jurisdictions.

Two of them are backups of link lists to child pornography, containing 247 links to websites, 142 of which refer to Tor hidden services. The remaining instance is an image depicting mild nudity of a young woman. In an online forum this image is claimed to show child pornography, albeit this claim cannot be verified (due to ethical concerns, we refrain from providing a citation).

### The last word

As we have shown in this paper, a plethora of fundamentally different methods to store non-financial — potentially objectionable — content on the blockchain exists in Bitcoin. As of now, this can affect at least 112 countries in which possessing content such as child pornography is illegal. This especially endangers the multi-billion dollar markets powering cryptocurrencies such as Bitcoin.

tags:

When coding style survives compilation: de-anonymizing programmers from executable binaries Caliskan et al., NDSS’18

As a programmer you have a unique style, and stylometry techniques can be used to fingerprint your style and determine with high probability whether or not a piece of code was written by you. That makes a degree of intuitive sense when considering source code. But suppose we don’t have source code? Suppose all we have is an executable binary? Caliskan et al., show us that it’s possible to de-anonymise programmers even under these conditions. Amazingly, their technique still works even when debugging symbols are removed, aggressive compiler optimisations are enabled, and traditional binary obfuscation techniques are applied! Anonymous authorship of binaries is consequently hard to achieve.

One of the findings along the way that I found particularly interesting is that more skilled/experienced programmers are more fingerprintable. It makes sense that over time programmers acquire their own unique way of doing things, yet at the same time these results seem to suggest that experienced programmers do not converge on a strong set of stylistic conventions. That suggests to me a strong creative element in program authorship, just as experienced authors of written works develop their own unique writing styles.

If we encounter an executable binary sample in the wild, what can we learn from it? In this work, we show that the programmer’s stylistic fingerprint, or coding style, is preserved in the compilation process and can be extracted from the executable binary. This means that it may be possible to infer the programmer’s identity if we have a set of known potential candidate programmers, along with executable binary samples (or source code) known to be authored by these candidates.

Out of a pool of 100 candidate programmers, Caliskan et al. are able to attributed authorship with accuracy of up to 96%, and with a pool of 600 candidate programmers, they reach accuracy of 83%. These results assume that the compiler and optimisation level used for compilation of the binary are known. Fortunately, previous work has shown that toolchain provenance, including the compiler family, version, optimisation level, and source language, can be identified using a linear Conditional Random Field (CRF) with accuracy of up to 99% for language, compiler family, and optimisation level, and 92% for compiler version.

One of the potential uses for the technology is identifying authors of malware.

### Finding fingerprint features in executable binaries

So how is this seemingly impossible feat pulled off? The process for training the classifier given a corpus of works by authors in a candidate pool has four main steps, as illustrated below:

1. Disassembly: first the program is disassembled to obtain features based on machine code instructions, referenced strings, symbol information, and control flow graphs.
2. Decompilation: the program is translated into C-like pseudo-code via decompilation, and this pseudo-code is passed to a fuzzy C parser to generate an AST. Syntactical features and n-grams are extracted from the AST.
3. Dimensionality reduction: standard feature selection techniques are used to select the candidate features from amongst those produced in steps 1 and 2.
4. Classification: a random forest classifier is trained on the corresponding feature vectors to yield a program that can be used for automatic executable binary authorship attribution.

#### Disassembly

The disassembly step runs the binary through two different disassemblers: the netwide disassembler (ndisasm), which does simple instruction decoding, and the radare2 state-of-the-art open source disassembler, which also understands the executable binary format. Using radare2 it is possible to extract symbols, strings, functions, and control flow graphs.

Information provided by the two disassemblers is combined to obtain our disassembly feature set as follows: we tokenize the instruction traces of both disassemblers and extract token uni-grams, bi-grams, and tri-grams within a single line of assembly, and 6-grams, which span two consecutive lines of assembly… In addition, we extract single basic blocks of radare2’s control flow graphs, as well as pairs of basic blocks connected by control flow.

#### Decompilation

Decompilation is done using the Hex-Rays commercial state-of-the-art decompiler, which produces human readable C-like pseudo-code. This code may be much longer than the original source code (e.g. decompiling a program that was originally 70 lines long may produce on average 900 lines of decompiled code).

From the decompiled result, both lexical and syntactical features are extracted. Lexical features are word unigrams capturing integer types, library function names, and internal function names (when symbol table information is available). Syntactical features are obtained by passing the code to the joern fuzzy parser and deriving features from the resulting AST.

#### Dimensionality reduction

Following steps one and two, a large number of features can be generated (e.g., 705,000 features from 900 executable binary samples taken across 100 different programmers). A first level of dimensionality reduction is applied using WEKA’s information gain attribute selection criteria, and then a second level of reduction is applied using correlation based feature selection. The end result for the 900 binary samples is a set of 53 predictive features.

#### Classification

Classification is done using random forests with 500 trees. Data is stratified by author analysed using k-fold cross-validation, where k is equal to the number of available code samples per author.

### Evaluation results

The main evaluation is performed using submission to the annual Google Code Jam competition, in which thousands of programmers take part each year. “We focus our analysis on compiled C++ code, the most popular programming language used in the competition. We collect the solutions from the years 2008 to 2014 along with author names and problem identifiers.”

Datasets are created using gcc and g++, using each of O1, O2, and O3 optimisation flags (so six datasets in all). The resulting datasets contain 900 executable binary samples from 100 different authors. As we saw before, the authors are able to reduce the feature set down to 53 predictive features.

To examine the potential for overfitting, we consider the ability of this feature set to generalize to a different set of programmers, and show that it does so, further supporting our belief that these features effectively capture programming style. Features that are highly predictive of authorial fingerprints include file and stream operations along with the formats and initializations of variables from the domain of ASTs, whereas arithmetic, logic, and stack operations are the most distinguishing ones among the assembly instructions.

Without optimisation enabled, the random forest is able to correctly classify 900 test instances with 95% accuracy. Furthermore, given just a single sample of code (for training) from a given author, the author can be identified out of a pool of 100 candidates with 65% accuracy.

The classifier also reaches a point of dramatically diminishing returns with as few as three training samples, and obtains a stable accuracy by training on 6 samples. Given the complexity of the task, this combination of high accuracy with extremely low requirement on training data is remarkable, and suggests the robustness of our features and method.

The technique continues to work well as the candidate pool size grows:

### Turning up the difficulty level

Programming style is preserved to a great extent even under the most aggressive level 3 optimisations:

…programmers of optimized executable binaries can be de-anonymized, and optimization is not a highly effective code anonymization method.

Fully stripping symbol information reduces classification accuracy by 24%, so even removing symbols is not an effective form of anonymisation.

For the  pièce de résistance the authors use Obfuscator-LLVM and apply all three of its obfuscation techniques (instruction substitution, introducing bogus control flow, flattening control flow graphs). And the result? “Using the same features as before, we obtain an accuracy of 88% in correctly classifying authors.

… while we show that our method is capable of dealing with simple binary obfuscation techniques, we do not consider binaries that are heavily obfuscated to hinder reverse engineering.

### So you want to stay anonymous?

If you really do want to remain anonymous, you’d better plan for that from the very beginning of your programming career, and even then it doesn’t look easy! Here are the conditions recommended by the authors:

• Do not have any public repositories
• Don’t release multiple programs using the same online identity
• Try to have a different coding style (!) in each piece of software you write, and try to code in different programming languages.
• Use different optimisations and obfuscations to avoid deterministic patterns

Another suggestion that comes to mind is to use an obfuscater deliberately designed to prevent reverse engineering. Although since these weren’t tested, we don’t actually know how effective that will be.

A programmer who accomplishes randomness across all potential identifying factors would be very difficult to deanonymize. Nevertheless, even the most privacy savvy developer might be willing to contribute to open source software or build a reputation for her identity based on her set of products, which would be a challenge for maintaining anonymity.

tags: ,

### The last word

Our findings reveal the significant impact of the threat, with hundreds of thousands of manipulated terms promoted through major search engines (Google, Bing, Yahoo!), spreading low-quality content and even malware and phishing. Also discovered in this study are the sophisticated evasion and promotion techniques employed in the attack and exceedingly long lifetimes of the abused terms, which call for further studies on the illicit activities and serious efforts to mitigate the ultimately eliminate this threat.

tags:

JavaScript Zero: Real JavaScript and zero side-channel attacks Schwarz et al., NDSS’18

We’re moving from the server-side back to the client-side today, with a very topical paper looking at defences against micro-architectural and side-channel attacks in browsers. Since submission of the paper to NDSS’18, this subject grew in prominence of course with the announcement of the meltdown and spectre attacks.

Microarchitectural attacks can also be implemented in JavaScript, exploiting properties inherent to the design of the microarchitecture, such as timing differences in memory accesses. Although JavaScript code runs in a sandbox, Oren et al. demonstrated that it is possible to mount cache attacks in JavaScript. Since their work, a series of microarchitectural attacks have been mounted from websites, such as page deduplication attacks, Rowhammer attacks, ASLR bypasses, and DRAM addressing attacks.

Chrome Zero is a proof of concept implementation that defends against these attacks. It installs as a Chrome extension and protects functions, properties, and objects that can be exploited to construct attacks. The basic idea is very simple, functions are wrapped with replacement versions that allow injection of a policy. This idea of wrapping functions (and properties with accessor properties, and certain objects with proxy objects) goes by the fancy name of virtual machine layering.

Closures are used when wrapping functions to ensure that references to the original function are inaccessible to any code outside of the closure.

Policies determine what the wrappers actually do. There are four possible alternatives:

1. Allow (passthrough)
2. Block – the function is replaced by a stub that returns a given default value
3. Modify – the function is replaced with a policy-defined function, which may still call the original function if required. An example would be reducing the resolution of timers.
4. User permission – JavaScript execution is paused and the user is asked for permission to continue executing the script.

(Enlarge)

As the code is continuously optimized by the JIT, our injected functions are compiled to highly efficient native code, with a negligible performance difference compared to the original native functions. The results of our benchmarks show that Chrome Zero does not have a visible impact on the user experience for everyday usage.

With that basic description of the mechanism out of the way, let’s get down to the interesting part: the features in JavaScript that are used in microarchitectural and side-channel attacks, and how Chrome Zero protects them.

### The features used to build attacks

We identified several requirements that are the basis for microarchitectural attacks, i.e., every attack relies on at least one of the primitives. Moreover, sensors found on many mobile devices, as well as modern browsers introduce side-channels which can also be exploited from JavaScript.

The following table provides a nice summary of attacks and the features that they require:

(Enlarge)

JavaScript never discloses virtual addresses, but ArrayBuffers can be exploited to reconstruct them. Once an attacker has knowledge of virtual addresses, they have effectively defeated address space layout randomization (ASLR). Microarchitectural attacks typically need physical addresses. Since browser engines allocate ArrayBuffers page aligned, the first byte is therefore at the beginning of a new physical page. Iterating over a large array also results in page faults at the beginning of a new page. The increased time to resolve a page fault is higher than a regular memory access and can be detected.

In Chrome Zero, the challenge is to ensure that array buffers are not page-aligned, and that attackers cannot discover the offset of array buffers within the page. Chrome Zero uses four defences here:

1. Buffer ASLR – to prevent arrays from being page aligned, the length argument of buffer constructors is implemented to allocated an additional 4KB. The start of the array is then moved to a random offset within this page, and the offset is added to every array access.
2. Preloading – iterating through the array after constructing it triggers a page fault for every page, after which an attacker cannot learn anything from iterating since the memory is already mapped.
3. Non-determinism – as an alternative to pre-loading, array setters can be modified to add a memory access to a random array index for every access of the array. This offers stronger protection than preloading as it prevents an attacker just waiting for pages to be swapped out. With this random access mechanism, an attacker can learn the number of pages, but not where the page borders are (see Fig 5 below).
4. Array index randomization – the above three policies cannot thwart page-deduplication attacks. To prevent these attacks, we have to ensure that an attacker cannot deterministically choose the content of an entire page. This can be done by introducing a random linear function mapping array indices to the underlying memory. (Mechanical sympathisers weep at this point 😉 ). An access to array index x is replaced with $ax + b\ \mathrm{mod}\ n$, where a and b are randomly chosen and co-prime, and n is the size of the buffer.

### Timing information

Accurate timing is one of the most important primitives, inherent to nearly all microarchitectural and side-channel attacks.

JavaScript provides the Date object with resolution of 1 ms, and the Performance object which provides timestamps accurate to a few microseconds. Microarchitectural attacks often require resolution on the order of nanoseconds though. Custom timing primitives, often based on some form of monotonically increment counter, are used as clock replacements.

Chrome Zero implements two timing defences: low-resolution timestamps and fuzzy time.

• For low-resolution timestamps, the result of a high-resolution timestamp is simply rounded to a multiple of 100ms.
• In addition to rounding the timestamp, fuzzy time adds random noise while still guaranteeing that timestamps are monotonically increasing.

The following figure shows theses policies at work against an attacker trying to distinguish between fast and slow versions of a function.

With the low-resolution timestamp and edge-thresholding, the functions are correctly distinguished in 97% of cases… when fuzzy time is enable, the functions are correctly distinguished in only 65% of the cases, and worse, in 27% of the cases the functions are wrongly classified.

This is more than enough to defeat the JavaScript keystroke detection attack of Lipp et al.:

The support for parallelism afforded by web workers provides new side-channel attack possibilities, by measuring the dispatch time of the event queue. For example, an endless loop running within a web worker can detect CPU interrupts, which can then be used to deduce keystroke information.

A drastic but effective Chrome Zero policy is to prevent real parallelism by replacing web workers with a polyfill intended for unsupported browsers that simulates web workers on the main thread. An less drastic policy is to delay the postMessage function with random delays similar to fuzzy timing. This is sufficient to defeat keystroke detection attacks:

### Shared data

JavaScript’s SharedArrayBuffer behaves like a normal ArrayBuffer, but can be simultaneously accessed by multiple workers. This shared data can be exploited to build timing primitives with a nanosecond resolution.

One simple Chrome Zero policy is to disallow use of SharedArrayBuffers (which is deactivated by default in modern browsers anyway at the moment). An alternative policy is to add random delays to accesses of shared buffers. This is enough to prevent the high-resolution timing needed by attacks:

### Sensor API

Some sensors are already covered by browser’s existing permission systems, but several sensors are not:

Mehrnezhad et al., showed that access to the motion and orientation sensor can compromise security. By recording the data from these sensors, they were able to infer PINs and touch gestures of the user. Although not implemented in JavaScript, Spreitzer showed that access to the ambient light sensor can also be exploited to infer user PINs. Similarly, Olejnik utilized the Ambient Light Sensor API to recover information on the user’s browsing history, to violate the same-origin policy, and to steal cross-origin data.

The battery status API can also be used to enable a tracking identifier. In Chrome Zero the battery interface can be set to return randomized or fixed values, or disable entirely. Likewise Chrome Zero can return either a fixed value or disable the ambient light sensor API. For motion and orientation sensors data can be spoofed, or access prohibited entirely.

### Evaluation

We’ve already seen a number of examples of Chrome Zero preventing attacks. The following table summarises the policies and their effect on attacks:

(Enlarge)

Back-testing on all 12 CVEs discovered since 2016 for Chrome 49 or later reveals that half (6) of them are prevented. Creating policies to specifically target CVEs was not a goal of the current research.

The performance overhead of Chrome Zero was evaluated on the Alexa Top 10 websites, and correct functioning of sites was evaluated for the Alexa Top 25. Page load times (we’re not told what metric is actually measured) increase from 10.64ms on average, to 89.08ms when all policies are enabled — the overhead is proportional to the number of policies in force. On the JetStream browser benchmark, Chrome Zero shows a performance overhead of 1.54%.

Depending on what an application actually does with arrays, web workers etc., I would expect the impact to be significantly greater in some circumstances. However, in a double-blind user study with 24 participants, the participants were unable to tell whether they were using a browser with Chrome Zero enable or not – apart from on the yahoo.com site.

Our work shows that transparent low-overhead defenses against JavaScript-based state-of-the-are microarchitectural attacks and side-channel attacks are practical.

tags:

Synode: understanding and automatically preventing injection attacks on Node.js Staicu et al., NDSS’18

If you’re using JavaScript on the server side (node.js), then you’ll want to understand the class of vulnerabilities described in this paper. JavaScript on the server side doesn’t enjoy some of the same protections as JavaScript running in a browser. In particular, Node.js modules can interact freely with the operating system without the benefit of a security sandbox. The bottom line is this:

We show that injection vulnerabilities are prevalent in practice, both due to eval, which was previously studied for browser code, and due to the powerful exec API introduced in Node.js. Our study suggests that thousands of modules may be vulnerable to command injection attacks and that fixing them takes a long time, even for popular projects.

The Synode tool developed by the authors combines static analysis with runtime protection to defend against such attacks. You can get it at https://github.com/sola-da/Synode.

### Eval and exec injection vulnerabilities

There are two families of APIs that may allow an attacker to inject unexpected code:

• execand its variants take a string argument and interpret it as a shell command (what could possibly go wrong??!)
• eval and its variants take a string argument and interpret it as JavaScript code, allowing the execution of arbitrary code in the context of the current application.

Of course, you can combine the two to eval a string containing an exec command.

Node.js code has direct access to the file system, network resources, and any other operating system-level resources provided to processes. As a result, injections are among the most serious security threats on Node.js…

Here’s an example program illustrating a vulnerability:

Consider calling this function as follows: backupFile('-help &amp;&amp; rm -rf * &amp;&amp; echo ", "'). As the authors delightfully put it: “Unfortunately this command does not backup any files but instead it creates space for future backups by deleting all files in the current directory.”

### How widespread is the problem?

The authors studies 235,850 npm modules, and found that 3% (7,686 modules) and 4% (9,111 modules) use exec and eval respectively. Once you start looking at dependencies though (i.e., modules that depend on an exec- or eval-using module), then about 20% of all modules turn out to directly or indirectly depend on at least one injection API.

Fixing the most popular 5% of injection modules would protect almost 90% of the directly dependent modules. Unfortunately, that still requires changing over 780 modules.

Perhaps these vulnerabilities are in seldom-used modules though? That turns out not to be the case:

The results invalidate the hypothesis that vulnerable modules are unpopular. On the contrary, we observe that various vulnerable modules and injection modules are highly popular, exposing millions of users to the risk of injections.

The authors then looked at call-sites to determine the extent to which data is checked before being passed into injection APIs. Can the site be reached by potentially attacker-controlled data, and are there mitigation checks in place?

A staggering 90% of the call sites do not use any mitigation technique at all.

Another 9% attempt to sanitise input using regular expressions. Unfortunately, most of those were not correctly implemented. No module used a third-party sanitization module to prevent injections, even though several such modules exist.

Reporting a representative set of 20 vulnerabilities to module developers did not result in quick fixes. “Most of the developers acknowledge the problem. However, in the course of several months only 3 of the 20 vulnerabilities have been completely fixed, confirming earlier observations about the difficulty of effectively notifying developers.

### Introducing Synode

…the risk of injection vulnerabilities is widespread, and a practical technique to mitigate them must support module maintainers who are not particularly responsive. Motivated by these findings, this section presents Synode…

Synode combines static analysis to detect places where injection attacks can potentially take place, with runtime enforcement (guided by the results of that analysis) to ensure that injection attacks are detected and thwarted. The recommended deployment of Synode is via an npm post-installation script. This script will run on each explicitly declared third-party dependent and perform the code rewriting to add dynamic enforcement if needed.

The static analysis phase identifies call sites for injection APIs, and summarises what is known statically about all of the values that may be passed to the function in a template tree. For example:

The template trees are then reduced to a set of templates, where a template is a sequence of strings and inserts:

If all the templates for a particular call site are constant strings, i.e., there are no unknown parts in the template, then the analysis concludes that the call site is statically safe. For such statically safe call sites, no runtime checking is required. In contrast, the analysis cannot statically ensure the absence of injections if the templates for the call site contain unknown values. In this case, checking is deferred to runtime…

The goal of runtime checking is to prevent values that expand the template computed for the call site in a way that is likely to be unforeseen by the developer, and of course to do so as efficiently as possible. To achieve these combined aims the statically extracted set of templates are first expanded into a set of partial abstract syntax trees (PAST) that represent the expected structure of benign values. Then at runtime the value passed to the injection API is parsed into an AST, and this is compared against the pre-computed PASTs. This process ensures that (i) the runtime AST is derivable from at least one of the PASTs by expanding the unknown substrees, and (ii) the expansions remain within an allowed subset of all possible AST nodes.

For shell commands passed to exec, only AST nodes that represent literals are considered safe. For eval, all AST node types that occur in JSON code are considered safe.

### Evaluation

The mitigation technique is applied to all (at the time of the study) 15,604 node.js modules with at least one injection API call site.

• 18,924 of all 51,627 call sites are found to be statically safe (36.66%)
• The templates for the vast majority of call sites have at most one hole, and very few templates contain more than five.
• Static analysis completes for 96.27% of the 15,604 modules in less than one minute, with an average analysis time for these modules of 4.38 seconds.

To evaluate the runtime mechanism 24 vulnerable modules are exercised with benign and malicious inputs. The modules and injection vectors used are shown in the following table:

This results in 5 false positives (out of 56 benign inputs), which are caused by limitations of the static analysis (3/5) or node types outside of the safe set (2/5). There are no false negatives (undetected malicious inputs). The average runtime overhead for a call is 0.74ms.

### The last word

In a broader scope, this work shows the urgent need for security tools targeted at Node.js. The technique presented in this paper is an important first step toward securing the increasingly important class of Node.js applications, and we hope it will inspire future work in this space.