An empirical analysis of anonymity in Zcash

An empirical analysis of anonymity in Zcash Kappos et al., USENIX Security’18

As we’ve seen before, in practice Bitcoin offers little in the way of anonymity. Zcash on the other hand was carefully designed with privacy in mind. It offers strong theoretical guarantees concerning privacy. So in theory users of Zcash can remain anonymous. In practice though it depends on the way those users interact with Zcash. Today’s paper choice, ‘An empirical analysis of anonymity in Zcash’ studies how identifiable transaction participants are in practice based on the 2,242,847 transactions in the blockchain at the time of the study.

We conclude that while it is possible to use Zcash in a private way, it is also possible to shrink its anonymity set considerably by developing simple heuristics based on identifiable patterns of usage.

The analysis also provides some interesting insights into who is using Zcash and for what as well. Founders and miners combined account for around 66% of the value drawn from the shielded pool.

The code for the analysis is available online at https://github.com/manganese/zcash-empirical-analysis

Zcash guarantees and the shielded pool

Zcash is based on highly regarded research including a cryptographic proof of the main privacy feature of Zcash, the shielded pool. Not all transactions are required to go through the shielded pool though: Zcash also supports transparent transactions with similar properties to transactions in Bitcoin. Transparent transactions reveal the pseudonymous addresses of senders and recipients as well as the amount being sent.

All newly generated coins are required to pass through the shielded pool before being spent further. Based on this the Zcash developers concluded that the anonymity set for users spending shielded coins is all generated coins. This paper shows that in practice the anonymity set is much smaller.

To support transparent and shielded transactions Zcash has two types of addresses: transparent addresses (t-address) and shielded addresses (z-address). Addresses are supplied for the inputs and outputs of transactions, yielding four possible combinations:

  • Transparent transactions move funds from t-addresses to t-addresses.
  • Shielded transactions move funds from t-addresses to z-addresses
  • Deshielded transactions move funds from z-addresses to t-addresses
  • Private transactions move funds between z-addresses.

Their are four main types of actor in the Zcash ecosystem. Founders are onto a nice little number and receive 20% of all newly generated coins. Founder addresses are specified in the Zcash parameters. Miners maintain the ledger and receive block rewards and transaction fees. Services are entities that accept ZEC as a form of payment, for example exchanges and trading platforms. There are also individual participants who hold and transact in ZEC at a personal level. (Charities and other organisation accepting Zcash are included in this last category).

As of January 2018 258,472 blocks had been mined and 3,106,643 ZEC generated (621,182 ZEC of which went to the founders). Across all blocks there were 2,242,847 transactions, broken down as show in the following table.

As the following chart shows, transparent transaction usage is growing disproportionately over time to make a larger and larger percentage of the overall transaction volume.

As mentioned previously, these transactions offer essentially the same privacy as Bitcoin (i.e., not great), and can be de-anonymised using the same techniques as used for Bitcoin.

1,740,378 distinct t-addresses had been used, of which 8,727 had acted as inputs in at least one t-to-z transaction, and 330,780 have acted as outputs in at least one z-to-t transaction.

The overall value held in the shielded pool is increasing over time, with noticeable shielding and deshielding spikes that the analysis will show is due to the actions of miners and founders.

Only 25% of all t-addresses hold a non-zero balance, and the top 1% hold 78% of all ZEC. The richest address has a higher balance than the entire shielded pool.

Heuristics

We can use similar heuristics to those used in de-anonymising Bitcoin transactions, but adapted to account for the differences between t- and z-addresses.

  1. If two or more t-addresses are inputs in the same transaction (whether that transaction is transparent, shielded, or mixed), then they are controlled by the same entity.
  2. If one (or more) address in an input t-address in a vJoinSplit transaction and a second address is an output t-address in the same vJoinSplit transaction, then if the size of zOut is 1 (i.e., this is the only transparent output address), the second address belongs to the same user who controls the input addresses.

(Heuristic 2 is the ‘change address’ heuristic).

Using just the first heuristic it is possible to discover clusters of addresses, and by finding a known entity associated with any one address in a cluster, assign all of the cluster addresses to that entity. Using this method, here are the top ten identified Zcash exchanges according to volume traded (the deposits and withdrawal columns indicated the number of transactions initiated by the authors to discover seed addresses):

Identifying exchanges is important, as it makes it possible to discover where individual users may have purchased their ZEC. Given existing and emerging regulations, they are also the one type of participant in the Zcash ecosystem that might know the real-world identify of users.

The publicised addresses of founders, and of mining pools, also act as seeds to discover larger clusters of addresses controlled by these entities. In this manner 123 founder addresses were uncovered, and 110,918 mining pool addresses.

Who uses the shielded pool?

The previous section looked at t-addresses, where users should at least have less expectation of privacy. So what’s really interesting is what we can learn about usage of the shielded pool.

Deposits and withdrawals into the shielded pool closely mirror each other, with most users not only withdrawing the exact number of ZEC they deposit into the pool but doing so very quickly after making a deposit.

The main participants putting money into the pool are miners. The consensus rules for Zcash dictate that miners and founders must put their block rewards into the shielded pool before spending them further.

The intent of the shielded pool is to provide an anonymity set so that when users withdraw coins it is not clear whose coins they are. However, if a t-to-z transaction can be linked to a z-to-t transaction then those coins can be ruled out of the anonymity set.

Founders it turns out have predictable behaviour that enables many of their transactions to be linked. Known founder addresses identify deposits into the pool, and furthermore the deposits follow a predictable pattern of depositing 249.9999 ZEC – the reward for 100 blocks. That suggests that withdrawals might follow a predictable pattern too, and lo-and-behold there are 1,953 withdrawals of exactly 250.0001 ZEC. Both deposits and withdrawals happen with a period of 6-10 blocks, following a step-like pattern.

This leads to heuristic three:

  • Any z-to-t transaction carrying 250.0001 ZEC in value is done by the founders

This heuristic leads to the identification of a further 48 founder addresses.

Miner deposits into the pool are also predictable since they immediately follow coin generation. Flypool and F2Pool are the biggest:

Miners don’t solely use t-addresses associated with deposits for withdrawals, but they use enough of them that output addresses can be linked using the heuristics.

  • If a z-to-t transaction has over 100 output t-addresses, one of which belongs to a known mining pool, then we label the transaction as a mining withdrawal (associated with that pool), and label all non-pool output t-addresses as belonging to miners.

Using this heuristic 110,918 addresses were tagged as belonging to miners, allowing a signifiant portion of z-to-t transaction to be linked:

Outside of founders and miners, any time there is exactly one t-to-z transaction carrying value v, followed by exactly one z-to-t transaction for the exact same amount within a small number of blocks, then those transactions are linked too.

There were 6,934 private (z-to-z) transactions, with timing that suggests a smaller number of users make many transactions each.

The Shadow Brokers

Using their heuristics, and looking at deposits matching the price of NSA tool dumps made by the hacker collective ‘The Shadow Brokers’ (TSB), the authors were able to 24 clusters of addresses potentially associated with TSB purchases.

In conclusion

… our study has shown that most users are not taking advantage of the main privacy features of Zcash at all. Furthermore, the participants who do engage with the shielded pool do so in a way that is identifiable, which has the effect of significantly eroding the anonymity of other users by shrinking the overall anonymity set.