A fistful of bitcoins: characterizing payments among men with no names Meiklejohn et al., USENIX ;login: 2013
This week we’re going to be looking at the five papers from the ACM Queue Research for Practice selections on ‘Cryptocurrencies, Blockchains, and Smart Contracts.’ These papers are chosen by Arvind Narayanan and Andrew Miller, co-authors of the ‘Bitcoin and cryptocurrency technologies‘ book which I also highly recommend.
Their first choice looks at the question of anonymity in the Bitcoin economy. By exploiting information leaked when spending bitcoins and receiving change, Meiklejohn et al. were able to cluster bitcoin addresses and from there map significant flows of bitcoins through the network. I worked from the USENIX ;login: article that Narayanan and Miller link in their ACM Queue piece. There’s also a full paper version with some nice additional details from IMC ’13.
2013 is a long time ago in the world of cryptocurrencies. The point forcefully made in ‘A Fistful of Bitcoins’ remains true today – Bitcoin offers little in the way of anonymity. In fact, we get better and better at techniques for de-anonymising block chain transactions over time. See for example ‘Data-driven de-anonymization in Bitcoin.’ The Mt Gox and Silk Road communities discussed in today’s paper both spectacularly disappeared though. Mt Gox ceased trading in February 2014, after losing about $350M ($473M?) worth of Bitcoin (‘Mt Gox: The History of a Failed Bitcoin Exchange‘). The aftermath of Mt Gox and one of the ‘greatest digital heists of all time’ is still ongoing (‘Hedge funds gamble on Mt Gox bitcoin payout‘, FT Feb 13 2017). The Silk Road marketplace was shut down by the FBI in October 2013.
Let’s take a look at the de-anonymisation process…
Obtaining a seed set of bitcoin addresses
The first step on the journey was to obtain a small set of de-anonymised addresses (832). To do this, the authors simply engaged in transactions with a wide variety of bitcoin services (26 exchanges, 10 wallet services, and 25 different vendors – 9 of which used the BitPay payment gateway).
For example, by depositing bitcoins into an account at the biggest Bitcoin exchange, Mt. Gox, we are able to tag one address as definitely belonging to that service, and by later withdrawing those bitcoins we are able to identify another.
The full list of services used is shown below:
We additionally scraped various publicly claimed addresses that we found, such as user’s signatures in Bitcoin forums, although we were careful to use only tags for which we could perform some manual due diligence.
Knowing one or two Mt. Gox addresses though (for example) is just a drop in the ocean compared to the total number of addresses a service such as Mt. Gox will use (ultimately the authors identified over 500,000 addresses as controlled by Mt. Gox). Starting from this seed set of addresses therefore, the next step is to exploit properties of transaction inputs and outputs to group addresses into clusters owned by the same entities. If we know the identity of any one address in the cluster, then we can know them all, and by propagating this through the transaction graph it is possible to ultimately learn the identity behind large numbers of addresses.
Linking addresses via transaction inputs
In theory, the use of pseudonyms within Bitcoin provides a property called unlinkability, which says that users’ transactions using one set of pseudonyms should not be linked to their transactions using a different set of pseudonyms. In practice, however, certain properties of Bitcoin usage erodes this anonymity.
Suppose you want to spend (send) some bitcoins – for example to make a purchase costing 10 BTC with a merchant. You have bitcoins spread across multiple addresses – one containing 6 BTC and another containing 4 BTC…
One potential way to pay the merchant would be to create a new address, send the 4 BTC and 6 BTC to this new address, and then send the 10 BTC now contained in this new address to the merchant. (In fact, this is the method that preserves the most anonymity).
There’s a simpler and more efficient way though – transactions are allowed to have arbitrarily many inputs, so it’s possible to just use the 4 BTC and 6 BTC addresses as input to the same transaction.
(mining fees ignored).
This provides the first means by which we can cluster (link) addresses – if two addresses have been used as input to the same transaction, they are controlled by the same user. Since the sender must know the private keys corresponding to all input addresses, this assumption is strong.
Linking addresses via transaction outputs
When sending bitcoins the user creates a message containing the intended receiver of the bitcoins (identified by public key), and the transaction in which his pseudonym received the bitcoins. When the transaction is broadcast through the peer-to-peer network, peers verify that no other transaction has already referenced the same previous transaction. This mechanism prevents double-spending. A consequence is that all transactions are zero balance – the same number of bitcoins flow out as flow in.
So what do you do if you have an address containing 4 BTC, and you want to send a merchant only 3 BTC? The answer is to create a transaction with multiple outputs.
In this case, we create a transaction with two outputs: 3 BTC going to the merchant address, and 1 BTC in ‘change’ going to a change address. Change addresses therefore can be identified as belonging to the sender. How do we know that a given address is a change address though?
As a first step, we observe that in the standard Bitcoin client, a change address is created internally and is not even known to the user (although he can always learn it by examining the block chain manually). Furthermore, these change addresses are used only twice: once to receive the change in a transaction, and once to spend their contents fully as the input in another transaction (in which the client will create a fresh address to receive any change). By examining transactions and identifying the outputs that meet this pattern of one-time usage, we identify the change addresses. If more than one output meets this pattern, then we err on the side of safety and do not tag anything as a change address.
Using this technique, the authors identified 3.5 million change addresses, with an estimated false positive rate of 0.17%. As of April 2013 when this study was carried out, the block chain contained more than 16 million transactions involving 12 million distinct public keys. Using the change address heuristic, these 12 million keys were collapsed into 3.3 million clusters.
Results
By layering our clustering analysis on top of our ground-truth data (and thus transitively tagging entire clusters that contain previously tagged addresses), we were able to identify 1.9 million public keys with some real-world service or identity, although in many cases the identity was not a real name, but rather (for example) a username on a forum. Although this is a somewhat small fraction (about 16%) of all public keys, it nevertheless allows us to de-anonymize significant flows of bitcoins throughout the network.
The authors uncovered 500,000 addresses as controlled by Mt. Gox, and 250,000 addresses as controlled by Silk Road, and thus were able to observe interactions with these services. This does not de-anonymise the users participating in the transactions, but it does de-anonymise flows of bitcoins into and out of such services.
We followed the flow of bitcoins out of Silk Road (in particular, from one notorious address)) and from a number of highly publicized thefts to see whether we could track the bitcoins to known services. Although some of the thieves attempted to use sophisticated mixing techniques to obscure the flow of bitcoins, for the most part tracking the bitcoins was quite straightforward and we ultimately saw large quantities of bitcoins flow to a variety of exchanges directly from the point of theft (or the withdrawal from Silk Road).
Because you really need to go through an exchange to cash out of the Bitcoin economy, the authors concluded that using Bitcoin for money laundering does not seem to be particularly attractive!
My how you’ve grown
It’s interesting to see how Bitcoin and the block chain have grown since this 2013 article was first written. Firstly, looking at XBT (BTC) : USB exchange rate over time, you can see a big spike in the value of XBT in early 2014, but taking that out we’re left with what looks like a pretty good hockey stick…
Bitcoinblockhalf.com has good statistics on the Bitcoin block chain itself. Here’s a snapshot from Sunday 19th February, 2017: