A quantitive analysis of the impact of arbitrary blockchain content on Bitcoin

A quantitative analysis of the impact of arbitrary blockchain content on Bitcoin Matzutt et al., FC’18

We’re leaving NDSS behind us now, and starting this week with a selection of papers from FC’18. First up is a really interesting analysis of what’s in the Bitcoin blockchain. But this isn’t your typical analysis of transactions, addresses, and identities, instead, Matzutt et al. take a look at file content (text, images, pdfs and so on) being stored on the blockchain. It’s not something I’d especially thought about before, but once the question has been asked, sadly the answer is all too predictable. You’ve met the human race I presume? So what do you think happens when you create a widely distributed, public, immutable data structure, and allow anyone to insert data into it? If there’s a saving grace here, it’s that the mechanism hasn’t been abused as much as you might expect. But sadly, it has been abused.

Our analysis shows that certain content, e.g., illegal pornography, can render the mere possession of a blockchain illegal… our analysis reveals more than 1,600 files on the blockchain, over 99% of which are texts or images. Among these files there is clearly objectionable content such as links to child pornography, which is distributed to all Bitcoin participants.

Let’s take a look at the kinds of content that might cause problems for blockchain participants, methods for inserting data on the Bitcoin blockchain, and what the authors find when they analyse the Bitcoin blockchain as of August 2017.

Problematic content

Despite the potential benefits of data in the blockchain, insertion of objectionable content can put all participants in the Bitcoin network at risk…

The authors identify five categories of content that may cause problems for anyone storing the blockchain: copyright violations; malware; privacy violations; politically sensitive content; and illegal and condemned content.

Copyright violations

Copyright holders predominantly target users who actively distribute pirated data, although prosecutors have also convicted downloaders. If copyrighted material appears in transactions or on the blockchain, then network participants may be unwittingly distributing and/or downloading copyrighted content.

Malware

Malware could in theory be spread via blockchains. Even it doesn’t become activated through that route, it can still be a nuisance. For example, a non-functional virus signature from 1987 was detected on the blockchain by Microsoft’s anti-virus software, denying access to the blockchain files on disk. This issue had to be fixed manually.

Privacy violations

Sensitive personal data of individuals may be posted on the blockchain, without their consent (i.e., doxing).

This threat peaks when individuals deliberately violate the privacy of others, e.g., by blackmailing victims under the threat of disclosing sensitive data about them on the blockchain. Real-world manifestations of these threats are well-known… Jurisdictions such as the whole European Union begin to actively prosecute the unauthorized disclosure and forwarding of private information in social networks to counter this novel threat.

Politically sensitive content

Politically sensitive content on the blockchain can cause problems for individuals in certain jurisdictions. For example, in China the mere possession of state secrets can result in longtime prison sentences. “Furthermore, China’s definition of state secrets is vague and covers e.g., ‘activities for safeguarding state security.’ Such vague allegations with respect to state secrets have been applied to critical news in the past.

Illegal and condemned content

Some categories of content are virtually universally condemned and prosecuted. Most notably, possession of child pornography is illegal at least in the 112 countries that ratified an optional protocol to the Convention on the Rights of the Child. Religious content such as certain symbols, prayers, or sacred texts can be objectionable in extremely religious countries that forbid other religions and under oppressive regimes that forbid religion in general.

Implications

If content in the categories above were to make it onto the blockchain, it would be downloaded by network participants and they could become liable for it.

Consequently, it would be illegal to participate in a blockchain-based system as soon as it contains illegal content.

That sounds a little bit dramatic when you first read it, and there are no court rulings yet, but the authors do point to related legal precedents that should give pause for thought:

Our belief stems from the fact that w.r.t. child pornography as an extreme case of illegal content, legal texts from countries such as the USA, England, and Ireland deem all data illegal that can be converted into a visual representation of illegal content… we expect that the term can be interpreted to include blockchain data in the future. For instance, this is already covered implicitly by German law, as a person is culpable for possession of illegal content if she knowingly possesses an accessible document holding said content. It is critical here that German law perceives the hard disk holding the blockchain as a document… furthermore, users can be assumed to knowingly maintain control over such illegal content with respect to German law if sufficient media coverage causes the content’s existence to become public knowledge among Bitcoin users…

Data insertion methods

How might such objectionable content find its way onto the Bitcoin blockchain in the first case? There are actually a variety of different methods for inserting arbitrary data in the chain, ranging from a few bytes to a few kilobytes. The following table summarises the options and their relative effectiveness:

Most effective of all are standard financial transactions used to insert data using mutable values of scripts. For example, the public keys in pay-to-script-hash (P2SH) transactions can be replaced with arbitrary data as Bitcoin peers can’t verify their correctness before they are referenced by a subsequent input script. There are even blockchain content insertion services that will handle the process of injecting data into the blockchain for you. For example:

  • CryptoGraffiti reads and writes messages and files to and from the Bitcoin blockchain, storing content using P2PKH (pay to public key hash) output scripts within a single transaction, storing up to 60KiB of content.
  • Satoshi Uploader inserts content using a single transaction with multiple P2X outputs.
  • P2SH Injectors (multiple services available) insert chunks of file content via slightly varying P2SH input scripts.
  • Apertus allows fragmenting content over multiple transactions using an arbitrary number of P2PKH output scripts.

Data on the Bitcoin blockchain

With an understanding of the various methods that can be used to inject data, you can imagine it’s possible to scan the blockchain looking for it. So I’m going to skip the description of how the authors built a tool to do that, and focus on what they found instead.

Measurements are based on Bitcoin’s complete blockchain as of August 31st, 2017, containing 482,870 blocks and 250,845,217 transactions with a total disk size of 122.64GiB. Out of this, the detectors found 3,535,855 transactions with data, comprising a total payload of 118.35MiB. The most popular mechanism is OP_RETURN, and the data containing transactions using this are predominantly used to manage off-blockchain assets or originate from notary services. P2X transactions constitute only 1.6% of all detector hits, but make up 9.08% of non-financial data (which is only a very modest 10.76MiB).

Out of the 22.63MiB of blockchain data not originating from coinbase or OP_RETURN transactions, we can extract and analyze 1557 files with meaningful content. In addition to these, we could extract 59 files using our suspicious transaction detector (92.25% text). Table 2 below summarizes the different file types of the analyzed files.

The key result is that content from all of the five objectionable content categories already exists on the Bitcoin blockchain.

Copyright violations

The authors found seven files publishing intellectual property including the text of a book, one RSA private key, and a firmware secret key. The blockchain also contains an ‘illegal prime’ – encoding software to break the copy protection of DVDs.

Malware

The authors found no actual malware in Bitcoin’s blockchain. But they did find an individual non-standard transaction that contains a non-malicious cross-site scripting detector. When this is interpreted by an online blockchain parser, it notifies the author (a security researcher) about the vulnerability.

Privacy violations

609 transactions contain online public chat logs, emails, and forum posts, including topics such as money laundering. There are also at least two instances of doxing including phone numbers, addresses, bank accounts, passwords, and multiple online identities.

Politically sensitive content

The blockchain contains backups of the WikiLeaks Cablegate data as well as on online news article concerning pro-democracy demonstrations in Hong Kong in 2014.

(Here’s a dark thought: if a government wanted to clamp down on a given blockchain, all it has to do is anonymously post a transaction containing illegal or objectionable data, wait for it to propagate to all the miners in the country, and then go after them for possession).

Illegal and condemned content

There are at least eight files with sexual content, three of which would be considered objectionable in almost all jurisdictions.

Two of them are backups of link lists to child pornography, containing 247 links to websites, 142 of which refer to Tor hidden services. The remaining instance is an image depicting mild nudity of a young woman. In an online forum this image is claimed to show child pornography, albeit this claim cannot be verified (due to ethical concerns, we refrain from providing a citation).

The last word

As we have shown in this paper, a plethora of fundamentally different methods to store non-financial — potentially objectionable — content on the blockchain exists in Bitcoin. As of now, this can affect at least 112 countries in which possessing content such as child pornography is illegal. This especially endangers the multi-billion dollar markets powering cryptocurrencies such as Bitcoin.