Skip to content

Mastering chess and shogi by self-play with a general reinforcement learning algorithm Silver et al., arXiv 2017

We looked at AlphaGo Zero last year (and the first generation of AlphaGo before that), but this December 2017 update is still fascinating in its own right. Recall that AlphaGo Zero learned to play Go with only knowledge of the rules and self-play. The obvious question is can you generalise that approach to other games or situations? In this paper, the DeepMind team make another small tweak to their self-play reinforcement learning engine to give us AlphaZero, and train it to play chess, shogi, and also Go once more. It’s the chess story that grabs me here – I vividly remember DeepBlue playing Gary Kasparov back in 1997, and my understanding of chess play is much better than my understanding of Go strategies and tactics. So I can have a better appreciation for AlphaZero’s play in this domain.

The game of chess represented the pinnacle of AI research over several decades. State-of-the-art programs are based on powerful engines that search many millions of positions, leveraging handcrafted domain expertise and sophisticated domain adaptations. AlphaZero is a generic reinforcement learning algorithm — originally devised for the game of Go — that achieved superior results within a few hours, searching a thousand times fewer positions, given no domain knowledge except the rules of chess.

### But isn’t chess old news?

Computers achieved super-human performance in chess 20 years ago, so what’s the big deal about AlphaZero learning to play chess well? As with AlphaGo Zero, AlphaZero taught itself through self-play with only knowledge of the rules – no opening book, no endgame database, no chess specific tactical knowledge etc.. And while the game of Go has a natural match between the structure of convolutional networks and the translation-invariant nature of Go, the fit with chess and shogi is not so clear cut:

Chess and shogi are, arguably, less innately suited to AlphaGo’s neural network architectures. The rules are position-dependent (e.g., pawns may move two steps forward from the second rank and promote on the eight rank) and asymmetric (e.g., paws only move forward and castling is different on kingside and queenside). The rules include long-range interactions (e.g., the queen may traverse the board in one move, or checkmate the king from the far side of the board..

### AlphaZero playing chess

Trained AlphaZero engines played chess against Stockfish, shogi against Elmo, and Go against the previous version of AlphaGo Zero. In each case 100 game matches were played with one minute per move time limits. AlphaZero used four TPUs, as did AlphaGo Zero, Stockfish and Elmo played at their strongest levels using 64 threads and a 1GB hash size.

AlphaZero convincingly defeated all opponents, losing zero games to Stockfish and eight games to Elmo, as well as defeating the previous version of AlphaGo Zero.

It took just 4 hours for AlphaZero to outperform Stockfish, and 8 hours to beat AlphaGo Lee. That’s elapsed hours of course. 5000 TPUs were used to generate self-play games, and 64 TPUs were used to train the networks. So perhaps a more accurate statement would be “it took approximately 20,000 TPU-hours” for AlphaZero to outperform Stockfish.

So the headlines are impressive. But it’s the way that AlphaZero plays chess that really captures my attention. AlphaZero seems to play in a much more ‘human’ style. It has élan, it has flair. To see what I mean, it’s most instructive to look at some moments from the games.

Here’s IM Daniel Rensch’s commentary on some of the match highlights:

I also really enjoyed agadmator’s analysis of this game where AlphaZero trades a pawn to leave black’s bishop in a weak position, crippling black for much of the game:

Bearing in mind AlphaZero has no openings database, it’s also interesting to see that it rediscovers popular openings all by itself – seeming to end up with a preference for the English Opening and the Queens Gambit.

Each of these openings is independently discovered and played frequently by AlphaZero during self-play training. When starting from each human opening, AlphaZero convincingly defeated Stockfish, suggesting it has indeed mastered a wide spectrum of chess play.

There’s some evidence to back my assertion that AlphaZero’s play feels more like human play too: AlphaZero uses a Monte-Carlo tree search (MCTS) algorithm which searches 80 thousand positions per second in chess. Granted that’s many more positions than any human! But compared to the 70 million positions per second Stockfish crunches through, it’s tiny. Three orders of magnitude fewer! So AlphaZero must genuinely have (if you’ll excuse the pun), a deeper understanding of the chess positions in order to focus that much more limited energy in the places where it matters the most:

AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus more selectively on the most promising variations — arguably a more ‘human-like’ approach to search…

### Under the hood: from AlphaGo Zero to AlphaZero

So how does AlphaZero do it? The heart of AlphaZero is very similar to AlphaGo Zero that we looked at last year, but in an even more general form. There is the same single network structure which outputs both a vector of move probabilities and a scalar estimating the expecting outcome from the current position. And the network is similarly trained from self-play.

Games are played by selecting moves for both players using Monte-Carlo tree search… At the end of the game, the terminal position is scored according to the rules of the game to compute the game outcome… The neural network parameters $\theta$ are updated so as to minimise the error between the predicted outcome and the game outcome, and to maximise the similarity of the policy vector to the search probabilities.

The main differences to AlphaGo Zero are as follows:

• AlphaGo Zero assumes binary win/loss outcomes, AlphaZero also takes draws into account.
• The AlphaGo family exploit Go’s invariance to rotation and reflection, generating 8 symmetries for each position and randomly selecting rotations or reflections before training. AlphaZero cannot do this as chess (and shogi) do not have the same invariance properties.
• AlphaZero continually updates a single neural network during self-play. In contrast AlphaGo Zero moved forward using a batch process whereby the performance of a new player after a training iteration was measured against the previous best and only took over as the new generation source for self-play games if it won by a margin of 55%.
• AlphaGo Zero tuned its hyperparameters using Bayesian optimisation. AlphaZero simply reuses the same hyper-parameters for all games without game-specific tuning.

And of course, AlphaZero needs a board input representation suitable for the games it is playing. For chess, the board is represented by 16 8×8 planes. There are six 8×8 planes representing the position of the white pieces (one for the pawn positions, one for knights, one for bishops, one for rooks, one for the king, and one for the queen), and a similar set to represent the positions of the black pieces. In addition there are a number of constant-value inputs for the player’s colour, total move count, legality of castling, move repetition count (3 repetitions is a draw in chess), and the number of moves without progress (50 moves without progress is a draw).

The policy encodes a probability distribution over 4,672 possible moves in an 8 x 8 x 73 stack of planes.

Each of the 8 x8 positions identifies the square from which to ‘pick up’ a piece. The first 56 planes encode possible ‘queen moves’ for any piece: a number of squares [1..7] in which the piece will be moved, along one of the eight relative compass directions {N, NE, E, SE, S, SW, W, NW }. The next 8 planes encode possible knight moves for that piece. The final 9 planes encode possible underpromotions for pawn moves or captures in two possible diagonals, to knight, bishop, or rook respectively. Other pawn moves or captures from the seventh rank are promoted to a queen.

Tomorrow we’ll take a look at another impressive application of self-play in reinforcement learning…

The case for learned index structures Kraska et al., arXiv Dec. 2017

Yesterday we looked at the big idea of using learned models in place of hand-coded algorithms for select components of systems software, focusing on indexing within analytical databases. Today we’ll be taking a closer look at range, point, and existence indexes built using this approach. Even with a CPU-based implementation the results are impressive. For example, with integer datasets:

As can be seen, the learned index dominates the B-Tree index in almost all configurations by being up to 3x faster and being up to an order-of-magnitude smaller.

As we left things yesterday, the naive learned index was 2-3x slower! Let’s take a look at what changed to get these kinds of results.

### Learned range indexes

Recall that a single learned CDF has difficulty accurately modelling the fine structure of a data distribution. Replacing just the first two layers of a B-Tree though, is much easier to achieve even with simple models. And once we’re dealing with a narrower portion of the data set, learning the fine structure of that becomes much more tractable. So the overall solution is to layer models in a recursive regression model:

…we build a hierarchy of models, where at each stage the model takes the key as an input and based on it picks another model, until the final stage predicts the position.

The recursive models are not trees though – different models at the same stage may map to the same models at the next stage. Nor do the models necessarily uniformly divide the key space. Think of model selection more like ‘picking an expert with better knowledge about certain keys.’

Note that we also have the freedom to use different models in the different stages:

For example, whereas on the top-layer a small ReLU neural net might be the best choice as they are usually able to learn a wide-range of complex data distributions, the models at the bottom of the hierarchy might be thousands of simple linear regression models as they are inexpensive in space and execution time.

The entire index (all stages) can be represented as a sparse matrix multiplication for a TPU/GPU, and can be trained end-to-end (see algorithm 1 in §3.3 of the paper).

The output of this Recursive Model Index (RMI) is a page. In absence of any knowledge of the data distribution, traditionally either binary search or scanning (for small page sizes) have been shown to be the fastest strategies. But hold on…

Yet again, learned indexes might have an advantage here as well: the models actually predict the position of the key, which likely to be much closer to the actual position of the record, while the min- and max-error is presumably larger.

Based on this insight, the authors develop a new biased quaternary search strategy. In a quaternary search, instead of picking one new middle point to to test as in binary search, three new test points are chosen (thus dividing the dataset into four quarters, as opposed to two halves). The three initial middle points for the quaternary search are set to be $pos - \sigma$, $pos$, and $pos + \sigma$, where $pos$ is the predicted position. “That is we make a guess that most of our predictions are accurate and focus our attention first around the position estimate and then we continue with traditional quaternary search.

For evaluation purposes 4 secondary indexes are created over 3 real-world datasets (with up to 200M rows) and one synthetic dataset designed to exercise heavy-tailed distributions. For all datasets a comparison is done between a very competitive B-Tree implementation, using a variety of page sizes, and 2-stage RMI models with varied second-stage sizes (10K, 50K, 100K, and 200K). Training a full RMI model takes just a few seconds.

Let’s look at the map, weblog, and synthetic log-normal dataset results first, since these all use integer keys.

For the maps dataset, an index is created for the longitude of approximately 200M map features (roads, museums, coffee shops, …). Here’s how the B-Tree and RMI models faired, with total lookup time broken down into model execution time (to find the page) and search time within the page. Results are normalised compared to a B-Tree using 128 page size.

The learned index dominates the B-Tree index: it is up to three times faster and uses significantly less space.

The weblog dataset contains every request to a major university website over several years, and is indexed by timestamp. This is a challenging dataset for the learned model, and yet the learned index still dominates the B-Tree.

The story is similar with the synthetic log-normal dataset.

The web-documents dataset contains 10M non-continuous document-ids of a large web index. The keys here are strings, and the approach is to turn each string into a max-input length vector with one element per character. Other more sophisticated encodings could be used in the future, exploiting the ability of e.g. RNNs to learn interactions between characters. A hybrid model is included in this set of results, that replaces 2nd-stage models with errors above a threshold with a local B-Tree.

Here the learned index is less obviously better (though it always gives good space savings). Model execution with the vector input is more expensive – a problem that using a GPU/TPU rather than a CPU would solve.

The current design, even without any signification modifications, is already useful to replace index structures as used in data warehouses, which might only be updated once a day, or BigTable where B-Trees are created in bulk as part of the SStable merge process.

It’s possible that learned models have an advantage for certain types of update workloads too – especially if the updates follow the existing data distribution. If the distribution does change, one option is to build a delta-index of inserts and periodically merge it with the main index, including a potential retraining.

### Learned point indexes

The key challenge for hash maps is preventing too many distinct keys from mapping to the same position inside the map (a conflict). With e.g., 100M records and 100M slots, there’s roughly a 33% chance of conflict. At this point, we have to use a secondary strategy (e.g., scanning a linked list) to search with the slot.

Therefore, most solutions often allocate significantly more memory (i.e., slots) than records to store… For example, it is reported that Google’s Dense-hashmap has a typical overhead of about 78% of memory.

But if we could learn a model which uniquely maps every key into a unique position inside the array we could avoid conflicts. If you have plenty of memory available, it’s going to be hard to beat the performance of traditional hashmaps. But at higher occupancy rates (e.g., less than 20% overhead) performance starts to slow down significantly and learned models may be able to do better.

Surprisingly, learning the CDF of the key distribution is one potential way to learn a better hash function… we can scale the CDF by the targeted size M of the hash-map and us $h(K) = F(K)*M$, with key K as our hash-function. If the model F perfectly learned the CDF, no conflicts would exist.

This strategy can be used to replace the hash function in any existing hash-map implementation. Here’s an evaluation comparing two-stage RMI models with 100K models in the second stage, against a fast hash function using 3 bitshifts, 3 XORs, and 2 multiplications. For each dataset, the two approaches are evaluated with the number of available slots in the map able to fit from 75% to 125% of the data.

The index with the model hash function overall has similar performance while utilizing the memory better.

### Learned existence indexes

Bloom-filters are used to test whether an element is a member of a set, and are commonly used to determine if for example a key exists on cold storage. Bloom filters guarantee that there are no false negatives (if a Bloom filter says the key is not there, then it isn’t there), but do have potential for false positives. Bloom filters are highly space efficient, but can still occupy quite a bit of memory. For 100M records with a target false positive rate of 0.1%, we need about 14x more bits than records. And once more…

… if there is some structure to determine what is inside versus outside the set, which can be learned, it might be possible to construct more efficient representations.

From an ML perspective, we can think of a Bloom-filter as a classifier, which outputs the probability that a given key is in the database. We can set the probability threshold $\tau$ for considering a key to be in the set such that we can achieve any given target false positive rate. Unlike Bloom filters though, we can’t guarantee the absence of false negatives. The solution to this problem is neat: take the set of false negatives produced by the learned function $f$ , and create a Bloom filter just for this subset of keys. Now on a lookup if $f$ says the key is present with probability greater than $\tau$ we can respond in the affirmative. If the probability is less than this threshold, a second check to the false negatives Bloom filter is added before returning the final decision.

The approach was tested using a dataset of 1.7M unique blacklisted phishing URLs together with a negative set of valid URLs. A character-level RNN (GRU) is trained as the classifier to predict which set a URL belongs to. Compared to a normal Bloom filter, the model + spillover Bloom filter gives a 47% reduction in size for the same false positive rate (1%). At 0.01%, the space reduction is 28%.

The more accurate the model, the better the savings in Bloom filter size. There’s nothing stopping the model using additional features to improve accuracy, such as WHOIS data or IP information.

### The last word

…we have demonstrated that machine learned models have the potential to provide significant benefits over state-of-the-art database indexes, and we believe this is a fruitful direction for future research.

I look forward to seeing more work along on this line!

The case for learned index structures Kraska et al., arXiv Dec. 2017

Welcome to another year of papers on The Morning Paper. With the rate of progress in our field at the moment, I can’t wait to see what 2018 has in store for us!

Two years ago, I started 2016 with a series of papers from the ‘Techniques everyone should know’ chapter of the newly revised ‘Readings in Database Systems.’ So much can happen in two years! I hope it doesn’t take another ten years for us to reach the sixth edition of the ‘Red Book,’ but if it does, in today’s paper choice Kraska et al., are making a strong case for the inclusion of applied machine learning in a future list of essential techniques for database systems. I can’t think of a better way to start the year than wondering about this blend of old and new — of established database systems research and machine learning models — and what it all might mean for systems software of the future. I’m going to break my summary up into two posts (today and tomorrow) so that we can focus on the big picture today and dive into the details tomorrow.

The thing I like best about this paper is that it opens the door to a different way of thinking about components of systems software. By cleverly framing the problem of indexing, the authors show the potential to use machine learning deep inside a data management system. There are some constraints in applicability of the current implementation (which we’ll get to), but even then results which show up to 70% speed improvements while simultaneously saving an order-of-magnitude in memory are worth paying attention to. Remember, these improvements are obtained against algorithms which have been tuned over decades. Rather than getting too hung on up the particulars of the indexing use case straight away though, I want to dwell on the bigger picture for a few moments:

…we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible.

When does it make sense (in systems software) to replace a hand-engineered data structure and/or algorithm with a machine-learned model? The best analogy I can come up with at present to address this question is personalisation. Early web sites and apps showed the same content to everyone. But today, most sites offer you a highly personalised experience, having found that this works better both for them and (hopefully!) for you. The generic experience might be good, but the personalised experience is better. In this analogy, the equivalent of a user in the world of systems software, is I think, a workload — where a workload is a combination of data & access patterns. And one day we might say something like this: “Early systems software used the same generic configuration, data structures, and algorithms for all workloads. But today, most systems learn from and optimise for the specific workloads they are running.”

An easy to understand example that we looked at a few times last year is configuration: most systems ship with a default configuration that’s ok across a broad set of workloads, but tuning the configuration to suit your specific workload can yield big performance improvements. That’s the pattern I think we’re looking for: workload personalisation makes sense to explore when the performance of the system is sensitive to the characteristics of the particular workload it’s running, and if we had known those characteristics a priori we could have optimised the system for them. Frankly, pretty much all systems software is sensitive to the characteristics of the workload running on it! This puts me very much in mind of the ‘no free lunch theorem’ too: “…if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.

A couple of other considerations come to mind: the workload personalisation needs to occur within some bounded component of the system (one that can be abstracted as a function), and we need a way of preserving guarantees / honouring constraints. Consider the use of probabilistic data structures today, where we’re already comfortable with the fact that they’re approximations (as a learned function will be). Perhaps the best known is the Bloom filter, which tells us whether or not an element is likely to be in a set. Bloom filters come with an important guarantee – there are no false negatives. Learning a function that has ‘only a small probability’ of a false negative is not the same at all in many use cases. It turns out the authors have a really neat solution to this problem that we’ll look at tomorrow!

Unlocking the full potential of workload personalisation needs an online learning/optimisation approach, since:

the engineering effort to build specialized solutions for every use case is usually too high… machine learning opens up the opportunity to learn a model that reflects the patterns and correlations in the data.

In this paper, the approach is applied to the automatic synthesis of specialized indexed structures, which the authors term learned indexes.

### On CPUs, GPUs, and TPUs

The implementations in this paper were done using CPUs, but there’s another reason why neural net style models will become increasingly attractive in the future: they can run very efficiently on GPUs and TPUs. While CPUs are plateauing in performance, NVIDIA predict a 1000x increase in GPU speed by 2025. With closer integration of CPU/GPU/TPU units to reduce the handover costs, the GPU/TPU performance curve looks to be the one to ride to achieve maximum performance into the next decade.

### Introducing learned (range) indexes

This paper is primarily concerned with read-only in-memory analytical workloads. In a typical B-Tree index found in an analytics database today, the B-Tree provides a mapping from a lookup key to a position inside a sorted array of records. The index contains entries for every nth key, for example, the first key in a page. The guarantee from such a B-Tree is that the key will be found within ‘page-size’ of the returned location, if it exists. It’s a function from key to location, with error bound guarantees. We can learn a function, but what about the guarantees?

At first sight it may be hard to provide the same error guarantees with other types of ML models, but it is actually surprisingly simple. The B-tree only provides this guarantee over the stored data, not for all possible data. For new data, B-Trees need to be re-balanced, or in machine learning terminology re-trained, to still be able to provide the same error guarantees. This significantly simplifies the problem: the min- and max-error is the maximum error of the model over the training (i.e. the stored) data.

If we execute the model for every key, and remember the worst over- and under-predictions, then we know that a lookup for any key that exists must fall within these bounds. Using a machine learned model becomes possible therefore, and transforms an O(log n) B-Tree look-up into a constant operation.

Traversing a single B-Tree page of size 100 takes roughly 50 cycles (with any cache-misses), and we would need log100N of those traversals for a lookup in a table with N keys. In comparison, you can do 1 million neural net operations in 30 cycles on NVIDIAs latest Tesla V100 GPU. On modern CPUs we have a budget of about log100N 400 arithmetic operations to beat the B-Tree.

A model that predicts the position of a key within a sorted array is effectively approximating the cumulative distribution function (CDF):

…estimating the distribution for a data set is a well-known problem and learned indexes can benefit from decades of research.

A naive first implementation using a two-layer fully-connected neural network with 32 neurons per layer achieved approximately 1250 predictions per second. The B-Tree was still 2-3x faster though! Why?

1. The Tensorflow invocation overhead (especially with a Python front-end) is too great.
2. B-Trees ‘overfit’ the data very effectively. In contrast, a learned CDF works well to approximate the general shape, but doesn’t do so well at the individual data instance level (as shown in the callout in the figure above). “Many datasets have exactly this behaviour: from the top the data distribution appears very smooth, whereas as we zoom in more, the harder it is to approximate the CDF because of the randomness on the individual level.” This makes the ‘last mile’ tougher for single neural nets.
3. Instead of minimising the average error (typical for ML optimisation), we really want to minimise the min- and max-error bounds.
4. B-Trees are extremely cache efficient, whereas standard neural nets are less so.

If we stopped here of course, it wouldn’t be much of a story. But in tomorrow’s post we’ll look at the learning index framework (LIF) which the authors develop to overcome these problems. In addition to range indexes, we’ll also take a look at point indexes (essentially, learned alternatives to hash maps), and existence indexes (learned alternatives to Bloom filters).

We’ve reached the end of term again and it’s time for me to take a few weeks off to recharge my brain, reorganise & refill my paper backlog, and get ready for 2018!

I’ve been reading and summarising a computer science research paper every weekday for over three years now. The knowledge I’ve gained has enabled me to have some wonderful conversations and meet some wonderful people, which looking back I never would have been able to do in the same way if I had not been so active in my explorations. But predicting in advance which of the papers were the ones that would end up most useful to me beyond the simple joy of learning would have been I think very difficult indeed. So for me at least, there’s value in continuing to explore a broad cross-section of computer science and just following my interests wherever they take me.

My copy of Michel de Montaigne’s ‘Essais’ came off the shelf only once this year, and in a wonderful stroke of good fortune I stumbled upon the following passage which perfectly sums up how I feel about the subjects I cover:

I speak my mind freely on all things, even on those which perhaps exceed my capacity and which I by no means hold to be within my jurisdiction. And so the opinion I give of them is to declare the measure of my sight, not the measure of things. — Michel de Montaigne, 1533-1592.

One of my favourite things is when I learn that by chance I have been able to land a relevant piece of computer science research in someone’s inbox, which they might not have found otherwise, and it helps them with a project they’re actively working on.

If you’ve been following along with The Morning Paper for all or part of this year, thank you. I hope that it has been beneficial for you too, perhaps in ways that you also could not have predicted ahead of time! The interactions around the papers bring me great joy, and knowing that you’re waiting for the next edition helps me to stick with the habit even on days when e.g., I’m tired and traveling and might not otherwise find the strength.

As has become tradition, I’ll leave you with a few selections from the term to tide you over until The Morning Paper begins again on Monday 8th January.

See you in January,
Thanks, Adrian.

Bolt: anonymous payment channels for decentralized currencies Green and Miers et al., CCS’17

Yesterday we spent some time looking at what payment channels are, their role in helping Bitcoin to scale by taking some of the load off of the chain, and some payment channels constructions such as direct channels, indirect channels via an intermediary, and event payment routing over a network of payment channels.

Today we’ll be digging into Blind Off-chain Lightweight Transactions (Bolt) themselves: a set of techniques for constructing privacy-preserving payment channels for a decentralized currency.

Upon receiving a payment from some customer, the merchant learns no information beyond the fact that a valid payment (of some known positive or negative value) has occurred on a channel that is open with them. The network learns only that a channel of some balance has been opened or closed.

An anonymous payment channel scheme (an APC scheme) is constructed out of four main building blocks:

1. Key generation (KeyGen) and initialisation (Init) algorithms which are used to open a channel and escrow the initial balance. Merchants generate long-lived key pairs for use across all customers, customers generate a key pair per channel. The Init algorithm is run by both customer and merchant, resulting in a set of channel tokens for each. The tokens are transmitted to the payment network, together with a transaction to escrow the backing funds.
2. The Establish and Pay protocols which activate an channel (one time) and transfer payments. These are run between the customer C and the merchant M off of the chain. If the parties disagree about initial channel balances during an Establish exchange then the channel will not be activated and either party can subsequently close the channel.
3. Means of closing a channel that is no longer needed: Refund which is algorithm run by a customer to initiate channel closure, and Refute, the corresponding algorithm run by a merchant.
4. The Resolve protocol run by the network, which takes an input the customer and/or merchant close tokens, and disperses the funds. If any of the closure messages fail to verify, the balance of the channel is granted to the opposing party.

(Enlarge)

There are three variations of the protocols, in order of increasing complexity these enable:

1. Unidirectional payment channels — a customer can pay a merchant, but there is no facility for payments in the other direction (e.g., refunds).
2. Bidirectional payment channels — permit funds to flow in both directions along the channel
3. Third-party payments — indirect channels in which parties A and B can transact via an intermediary I.

### Unidirectional payment channels

Unidirectional payment channels are based on a compact e-cash scheme. Such schemes enable the escrowed amount in the payment channel to be divided up into B separate coins. Initially the customer owns all the coins, and ‘pays’ the merchant through coin transfer. When the channel is closed, the escrowed funds are dispersed in proportion to the number of coins held by the customer and the merchant at the time.

To initiate a channel, a customer transmits a wallet commitment to the payment network along with the escrow funds and an ephemeral public signature verification key. The customer can now engage in the establishment protocol with the merchant.

• The customer generates B coin spend transactions, together with non-interactive zero-knowledge proofs that each coin is tied to the wallet commitment.
• The customer encrypts each of these transactions using a symmetric encryption scheme. Every transaction is encrypted with a unique key, and the transactions are ordered such that the key for transaction $i+1$ is embedded in the ciphertext for transaction $i$. The resulting ciphertexts are signed using the customer’s secret key, and transmitted to the merchant.

The chain of encryption keys linked in the spend transactions is exploited to give an efficient way of closing the channel. Suppose a customer wants to close a channel with remaining balance N. Let $j = (B-N) + 1$ i.e., the ‘index’ of the first unused spend transaction in the chain. The customer now posts a signed close message to the network with the channel ID, $j$, and $k_j$ — the key for the $j^{th}$ ciphertext. The merchant can use this key to decrypt that ciphertext, thus obtaining the key for the next ciphertext in the chain, and so on until the end of the chain is reached.

If the customer cheats by revealing and invalid decryption key, or if any ciphertext decrypts to an invalid coin, or if the resulting transactions indicate that she has double-spent any coin, the merchant can post indisputable evidence of this cheating to the network — which, to punish the customer, will grant the full channel balance to the merchant.

### Bidirectional payment channels

Unidirectional channels support payments in only one direction, and only with fixed-value coins. The bidirectional channel construction allows payments in both directions, and the transfer of arbitrary values.

The key difference from the first protocol is that, instead of conducting the payment $\epsilon$ using a series of individual coins, each payment has the customer (1) prove that it has a valid signed wallet with balance $B^{cust} \geq \epsilon$ of currency in it, and (2) request a blind signature on a new wallet for the amount $B^{cust} - \epsilon$ (and embedding a fresh wallet public key $wpk_{new}$).

The value $\epsilon$ can be positive or negative. At the conclusion of the transaction, the customer reveals the public key of the old wallet to assure the merchant the wallet can’t be spent a second time, and the old wallet is also invalidate by the customer signing a “revoked” message. Closing a channel is done by posting a refund token signed by the merchant. End to end it looks like this:

(Enlarge)

### Enabling third-party payments

One of the main applications of the bidirectional construction above is to enable third party payments. In these payments, a first party A makes a payment of some positive value to a second party B via some intermediary I with whom both A and B act as the customer for channel establishment, and I plays the role of the merchant.

The intermediary learns neither the identity of the participants, nor the amount being transferred. Nor does the intermediary need to be trusted to safeguard the participants funds. (Which is a pretty interesting set of properties when you think about it!).

The payer A doesn’t want to pay the intermediary until the intermediary has paid B, but likewise the intermediary doesn’t want to pay B until funds from A have been received. “There is no purely cryptographic solution to this problem… however there are known techniques for using blockchains to mediate aborts.

The first phase of the Pay protocol generates refund tokens that can be used to reclaim escrowed funds. For a chained transaction from A-I-B to be secure, it is enough that this first phase of both legs completes or fails atomically. To prevent B from claiming funds from I without a corresponding payment from A, then we add two conditions to B’s refund token:

1. B must produce a revocation message on A’s previous wallet. This prevents a payment from A to I being reversed once B forces a payment.
2. A must not have posted a revocation token containing the wallet key to the ledger. This prevents B forcing a payment if the payment from A to I has been reversed.

To hide the payment amount $\epsilon$, instead of revealing it to the intermediary, the customer A now commits to $\epsilon$ and uses this value as a secret input in computing the payment.

End to end the payments protocol looks like this:

(Enlarge)

It’s possible to extend this protocol to allow payments that route through multiple intermediaries, but it is not possible to hide channel balances in such a setting.

The general approach is as follows: we use “hash locks” to enforce that either all refund and revocation tokens are valid or none are. Specifically, we attach to both the fund and revocation tokens a condition that they can only be used if one party reveals $x$ such that $y = \mathcal{H}(x)$, where $x$ is picked by A. Because if one party releases $x$, all parties may close their channels, this forces the entire sequence of payments to either go through or not. As with Lightning, the timeouts for each channel must be carefully chosen…

Bolt: anonymous payment channels for decentralized currencies Green and Miers et al., CCS’17

In which I tried not to rant. But did end up ranting just a little bit…

The world of blockchains and cryptocurrencies seems to be growing at quite a pace. Yesterday we looked at Solidus, which provides confidentiality for transactions taking place on a ledger in a bank intermediated world. Today we’re looking at Bolt, which provides confidentiality for transactions taking place off ledger in what are called payment channels, potentially routed through untrusted intermediaries.

Isn’t it kind of fundamental to blockchain that transactions happen on the ledger you may ask? The problem is that in the case of Bitcoin, transaction throughput is currently capped at less than 10 transactions per second. Which lets face it, on a global scale is laughably inadequate. Can you imagine if only 10 cash transactions per second were possible? Here’s a back of the envelope calculation that really brought it home to me. There are 7.4 billion people in the world at the moment. At a throughput of 10 transactions per second, each person in the world (let’s ignore population growth) gets to initiate a transaction about once every twenty-three and a half years. So you probably get to do three transactions in your lifetime. For those 10 glorious transactions per second by the way we’re consuming the earth’s resources at the rate of 32TWh per year, and growing fast. (Ok, strictly some of that energy might come from renewable sources, but plenty won’t). Another thing this back-of-the-envelope tells you is that in such a world miners stand to make a fortune in bribes mining fees for prioritising the transactions of people that don’t want to wait another 23 years…

Payment channels are one of the mechanisms attracting attention to resolve this problem. Payment channels are anchored in the blockchain, but once established all trade within the channel happens off the chain. The blockchain gets involved again if there is a dispute, or when the channel is closed.

Let’s start with the simple case of a payment channel directly between two parties. We’ll call the first party the ‘customer’ and the second party the ‘merchant’, but I’m using the quotes here to make it clear that this is just for illustration – it could be any two trading parties. If a customer wants to make a one-off trade with the merchant, then we might just as well go ahead and use a regular transaction, because it takes a transaction just to establish a payment channel. But if the customer thinks they’ll be trading with the merchant multiple times (maybe it’s a subscription) then they can establish a payment channel, and place a certain amount of funds in escrow within the channel at time of creation. The customer and the merchant can then swap IOUs backed by the escrow funds. Later on the channel can be closed, and the escrow funds within it are distributed to the trading parties according to the IOU balances.

I confess I don’t really get payment channels in this basic form. It doesn’t seem great as the customer, since I have to lock up in escrow the maximum amount of funds I ever imagine spending with the merchant into the channel. It doesn’t seem great as the merchant because my cash flow is terrible. I’m providing goods / services all the way along but I don’t get any cash flow, and the only way to unlock the funds is to liquidate the means by which my customer can trade with me! You can see a secondary market growing with trades on payment channel promises to help solve this problem. It doesn’t really seem a great solution to the Bitcoin blockchain scalability problem either. If we take the ‘3 transactions in your lifetime’ allocation, and it takes one transaction to set up a channel, and one to close it, then you get to buy from 1.5 different merchants in your lifetime.

None of these are problems that Bolt addresses. And that’s not Bolt’s fault. Bolt starts from the ‘payment channels are a thing the blockchain industry is investigating’ position, and tries to provide enhanced privacy protection when using payment channels.

While payment channels offer a solution to the scaling problem, they inherit many of the well-known privacy weaknesses of Bitcoin. Although payments are conducted off-chain, any party may learn the pseudonymous identities and initial (resp. final) channel balances of the participants. More critically, payment channels provide few privacy protections against transaction counterparties.

Before we look at the privacy protections, there’s one more twist in the payment channel story we need to consider. If every trading pair needs to establish a payment channel for each trading session between them, then we’re looking at $O(n^2)$ payment channels, all created, closed, and arbitrated on the blockchain. Did I mention the 10 transactions per second thingy? So what do we do in computer science when we’re facing an n-to-m integration problem? Introduce an intermediary and replace n x m with n + m!

We can do the same thing with payment channels. As a customer, I can establish a payment channel with an intermediary. And the intermediary can establish payment channels with multiple merchants (and other end users). Now I can transact with multiple merchants and end users all through the one payment channel, via the intermediary. Ok, that can work — but it doesn’t seem in the spirit of decentralised anonymous cryptocurrencies to me. It’s going to be a hassle for me to keep funds in escrow in lots of different payment channels, so as a consumer I’m going to favour intermediaries that have accounts with lots of different merchants. Think of the marketplace dynamics of e.g. Visa and Mastercard. (And debit cards, not credit cards, because you need the backing funds held in escrow). And what’s the betting that such a useful intermediary might introduce some kind of transaction fee in return for their services? That’s a whole lot of hashing and crypto-invention to end up pretty much back where we started. One advantage though is that it turns out we don’t have to trust the intermediary. In this single intermediary (indirect channel) case, Bolt can provide pretty strong privacy:

• The merchant does not learn the identity of the customer
• The channel can be configured so that the identity of the merchant is not revealed to the customer
• The intermediary does not learn the identities of the transacting parties (out of the set of parties it has open channels with)
• The payment amount is hidden from the intermediary

Good luck with KYC and AML though! (And do the payment channels still have to be closed before any funds actually flow? I presume so…)

You can also make indirect channels with multiple intermediaries in the payment path. However,

… the advantages of intermediaries do not fully generalize to chains containing more than a single intermediary… channels which involve more than one intermediary cannot hide the value of an payment from all intermediaries.

While we’re filling in the background context, it’s also worth mentioning Lightning Network here. Lightning Network uses smart contract-powered bi-directional payment channels to keep payments off the chain. The interesting innovation in Lightning Network (which I haven’t studied depth, so this is only a superficial understanding on my part) is a kind of payment routing protocol that finds a path between two parties involved in a trade hopping over lots of individual two-party ledgers. Think of the parties as nodes in a graph, and the payment channels between them as edges. Then we know that under the conditions of an Erdős–Rényi random graph it’s quite likely that a giant connected component will emerge and we’ll be able to route payments between any two parties. How the escrow amounts are managed to cover all the payments on their way through such networks I don’t know without further study.

Bolt stands for Blind Off-chain Lightweight Transactions, and provides a set of techniques for the construction of Anonymous Payment Channels (APCs). APC schemes build on a number of cryptographic primitives including commitment schemes (allowing a party to commit to a chosen value while keeping it hidden), symmetric encryption schemes, pseudorandom functions that support efficient proofs of knowledge and have strong pre-image resistance, signature schemes, and non-interactive zero-knowledge proofs.

Now that we’ve set the scene, we’re also getting pretty close to my target length for a post. So I’ll save the details of how Bolt’s APCs work until tomorrow. See you then!

Solidus: confidential distributed ledger transactions via PVORM Cecchetti et al., CCS’17

Tokens on blockchains can be used to represent assets, and the ledger provides trade settlement on-chain. In a straightforward public blockchain, pseudonyms and transaction values are all publicly visible. Uncovering the true identities behind the pseudonyms becomes a real possibility (‘A fistful of Bitcoins’). When the trading is directly between institutions, and the market participants are relatively fixed, one solution is to use private blockchains. Another option is to store only transaction digests on-chain, and keep key details elsewhere. This provides certification, but doesn’t permit on-chain settlement.

Solidus is designed for bank-intermediated systems in which financial institutions manage on-chain assets on behalf of their customers. Such a model replicates asset flows within modern financial institutions, and enables regulatory duties such as know-your-customer to be discharged (something not supported by anonymous trading). As presented in this paper though, I’m not sure Solidus would support anti-money laundering obligations. Still, I find it an exciting development and there’s a lot to like here.

Solidus supports strong confidentiality and high transaction rates for bank-intermediated ledgers. Solidus not only conceals transaction values, but also provides the much more technically challenging property of transaction-graph confidentiality. This means that a transaction’s sender and receiver cannot be publicly identified, even by pseudonyms. They can be identified by their respective banks, but other entities learn only the identities of the banks.

### Fast transaction settlement

The big prize here is speeding up transaction settlement (the exchange of assets after clearing). Clearing and settling is typically done via clearing houses, and full settlement may take three days for securities.

This delay introduces systemic risk into the financial sector. Government agencies such as the Securities and Exchange Commission (SEC) are trying to reduce this delay and are looking to distributed ledgers as a long-term option. If asset titles — the authoritative record of ownership — are represented on a ledger, then trades could execute, clear, and settle nearly instantaneously.

### Solidus high-level design and protocol

Solidus is designed to match the structure of the financial industry. Banks have a set of customers (users) who hold shares in some asset in their accounts. Asset notaries inject new assets into the system. Transactions involve a sending user at a sending bank, and a receiving user at a receiving bank. To initiate a transaction a sending user signs a transaction and gives it to their bank. The bank verifies the validity of the transaction then updates its on-ledger state to reflect the results of the transaction. The receiving bank performs a corresponding update on the receiving user’s account.

To keep account balances on the ledger, but maintain strong confidentiality, Solidus uses publicly-verifiable oblivious RAM (which we’ll look at shortly).

The confidentiality properties of PVORM ensure that another entity can learn only the identities of the sending and receiving banks, not (the amount transferred) or the identities of the transacting users. Indeed, even the sending bank cannot identify the receiving user nor the receiving bank of the sending user. The public verifiability of PVORM ensures that any entity with access to the ledger can verify that each transaction is correctly processed by both banks.

So while the sending bank doesn’t know where it is sending money (that’s the regulatory concern I alluded to earlier), the receiving bank does know where the money is going. And if the receiving bank is a regulated institution using KYC, then in the system overall the details of the transacting parties is known. Banks can prove correct decryption of on-chain data, providing transaction logs on-demand to auditors, who can verify the correctness and completeness. If a bank shares their private decryption key with an auditor, then the auditor is able to proactively monitor activity within the bank and its accounts, simply by observing the on-chain state.

We treat the ledger as a public append-only memory which verifies transactions. All banks have asynchronous authenticated read and write access and the ledger accepts only well-formed transactions not already present.

The transaction verification could happen e.g. via a smart contract and is application defined. For example, account balances must remain non-negative.

Let’s look at the lifecycle of a transaction between a sending user $\mathcal{U}_s$ at bank $\mathcal{B}_s$ and a receiving user $\mathcal{U}_r$ at bank $\mathcal{B}_r$:

The sending user constructs a request consisting of:

• A unique transaction id
• The value to be transferred, encrypted with the public key of the sender
• The id of the intended recipient, encrypted using the receiving bank’s public key
• A hidden-public-key signature signed with with the private key of the sender. The hidden-public-key (HPK) scheme allows a signer to sign with respect to a signing public key that is in turn encrypted under a banks’ public key. Details are in appendix A.3 in the paper. The receiver learns that a valid signature was generated with respect to some key, but learns nothing about the key itself.

The sending bank verifies the received request and initiates the settlement process. First the bank generates a proof that the request is valid, and then it allocates a transaction id and sends the txId plus the details of the value to be transferred and the recipient, encoded using the receiving bank’s public key, and sends this to the receiving bank.

Both the sending bank and the receiving bank update their respective PVORMs and post all associated proofs and signatures onto the ledger.

Once the full transaction has been accepted by the ledger, the assets have been transferred and the transaction has settled.

The main Solidus protocol is captured in more detail in the following figure:

(Enlarge)

### Publicly-verifiable Oblivious RAM machine (PVORM)

Each bank maintains a private data structure, M, containing each user’s balance and public key. A corresponding public data structure C is placed on the ledger with contents encrypted under the bank’s encryption key. M and C together constitute the memory in a PVORM.

The notion of PVORM is heavily inspired by Oblivious RAM (ORAM) – a cryptographic protocol that permits a client to safely store data on an untrusted server. Oblivious RAM conceals memory access patterns, providing the foundation for transaction graph confidentiality. ORAM as is is missing the capability to validate updates, and to provide public verifiability. PVORMs have four key differences from standard ORAM:

1. Updates are constrained by a public function $f$ that updates M according to some update u. In the case of Solidus, this function guarantees that a single balance is updated to a non-negative value.
2. Updates are publicly verifiable — whenever the client modifies C, it must prove (in zero-knowledge) that the change reflects a valid application of f.
3. The client maintains all memory (both M and C). M remains hidden, but C is publicly visible on a ledger.
4. There is no private stash: any data in M not represented in C would prevent the client from proving correctness of writes. Therefore PVORM works with a fixed-size public encrypted stash.

Solidus’ PVORM is instantiated by combining Circuit ORAM with several Generalized Schnorr Proofs (GSPs). The only thing I can tell you about GSPs is that they allow for practical zero-knowledge arguments (proofs) of knowledge to be constructed.

In appendix B we concretize this construction. We prove it correct, oblivious, and publicly verifiable in the extended paper.

Solidus actually doesn’t care what kind of ledger transactions are recorded on, so long as it is a public append-only memory. It could be a decentralised blockchain, or it could use a traditional consensus protocol, or it could even be a single trustworthy entity.

### Optimisations

The cost of re-randomising ciphertexts while updating a PVORM can be reduced by pre-computing randomisation pairs during periods of lighter load. We can also reduce the requirement for every participant to cryptographically verify every update under certain circumstances:

• If a threshold number of trusted entities (large banks and regulators for example) have verified a transaction, other nodes may decide they only need to verify the signatures of the trusted entities (far faster than full verification).
• If the cost to reputation of being caught cheating is sufficient to deter bad behaviour, then online verification can be sidestepped altogether. Post-hoc identification of faulty transactions and the offending bank is trivial, meaning that banks will only submit valid transactions and proofs and the ledger can accept transaction immediately and verify later offline.

Finally, a bank can pipeline processing of transactions (which only need the state from one transaction to the next) while working on the associated proofs in parallel. Pipelining doesn’t reduce transaction latency, but it does drastically increase throughput.

### Implementation

Solidus is 4300 lines of Java code, 2000 of which are the PVORM. PVORM performance scales with the number of worker threads, and an update with proofs is about 114 KB compressed. When generating updates memory consumption peaks at 880MB.

We see pretty low throughput in the test (less than 10 transactions per second), however:

We emphasize that our performance results employ an unoptimized implementation and only one server per bank, highly limiting our parallelism. Solidus is designed to be highly parallelized, allowing it to scale up using multiple servers per bank to achieve vastly superior performance in practice.

Unless I’m missing something here though, the overall system throughput will still be limited by the transaction throughput of the chosen ledger.

### The last word

We believe that Solidus is the first viable approach to building strongly verifiable and fully auditable bank-intermediated ledger transactions systems.

I’m sure there are companies to be built around this line of technology!