The Part-Time Parliament – Lamport ’90/’98
This is part 2 of a 10-part series on consensus.
There’s quite the back story to this paper. First submitted in 1990, researchers at the time didn’t seem to take it seriously due to its presentation as an allegory, and failed to appreciate the fundamental contribution that we know today as the Paxos algorithm. It took a further 8 years before the paper was finally published. See Lamport’s description of the story under paper 122 on his publications page. Personally, I find it a very effective format that helps the subject matter seem much less intimidating. Paxos is, after all, notorious for being hard to understand and implement – every little helps. Paxos lies at the core of, or was the inspiration for, numerous important distributed systems today (most of Google’s large scale systems for example). This is the kind of paper where the best I can hope to do in this form is give you a flavour – a careful reading of the original is highly recommended.
The scene is the Aegean island of Paxos, where busy legislators had to juggle trade and political obligations and hence could only dedicate themselves part-time to the duties of parliament.
The problem of governing with a part-time parliament bears a remarkable correspondence to the problem faced by today’s fault-tolerant distributed systems, where legislators correspond to processes and leaving the Chamber corresponds to failing. The Paxons’ solution may therefore be of some interest to computer scientists. I present here a short history of the Paxos Parliament’s protocol, followed by an even shorter discussion of its relevance for distributed systems.
First Lamport describes a foundation problem where a Synod have to agree on a single decree. This is then extended to a continuous parliament passing multiple decrees. Finally a series of optimisations and enhancements are discussed. The context for all these agreement protocols is as follows:
The Paxon parliament required consistency of ledgers (in which decrees were written), meaning that no two ledgers could contain contradictory information. To avoid meeting this requirement by simply not passing any decrees, they also required that progress be made:
If a majority of the legislators were in the Chamber and no one entered or left the Chamber for a sufficiently long period of time then any decree proposed by a legislator in the Chamber would be passed, and every decree that had been passed would appear in the ledger of every legislator in the Chamber.
Legislators were given hourglass timers to measure the passage of time. Unfortunately, the acoustics of their parliamentary chamber were very poor, so communication could only be made by messenger.
A messenger could be counted on not to garble messages, but he might forget that he had already delivered a message, and deliver it again. Like the legislators they served, messengers devoted only part of their time to parliamentary duties. A messenger might leave the Chamber to conduct some business—perhaps taking a six-month voyage—before delivering a message. He might even leave forever, in which case the message would never be delivered…. While they remained in the Chamber, messengers delivered messages in a timely fashion and legislators reacted promptly to any messages they received.
The single-decree synod
The single decree is chosen through a series of ballots. Every ballot is associated with a set of priests called the quorum, and each priest can choose either to vote for the ballot or to abstain. A ballot succeeds if and only if every priest in the quorum votes for it.
Three basic conditions on a set of ballots were enough to show that consistency was agreed, and that progress was possible.
- Every ballot has a unique ballot number
- The quorums of any two ballots in the set have at least one priest in common
- If one or more of the priests in a ballot’s quorum have voted in lower numbered ballots, then the decree under consideration in the current ballot is constrained in the following way: let E be the set of lower-numbered ballots that the priests have voted in, and let R be the highest numbered ballot in E. The decree of the current ballot must be the same of the decree of R.
From these rules it can be deduced that if any ballot B in a set is successful, then any later ballot for is for the same decree as B, and therefore any two successful ballots must be for the same decree. This is proved by a simple contradiction, with a footnote which will make a reader of How to write a 21st century proof smile:
Paxon mathematicians always provided careful, structured proofs of important theorems. They were not as sophisticated as modern mathematicians, who can omit many details and write paragraph-style proofs without ever making a mistake.
Given enough priests in the chamber, it is possible to conduct a successful ballot while preserving these rules. While this does not guarantee progress, it does show that a balloting protocol based on the rules will not deadlock.
The preliminary protocol:
- A priest p chooses a new ballot number b and sends a NextBallot(b) message to some set of priests
- A priest q responds to this message by sending a LastVote(b,v) message where v is the vote with the largest ballot number less than b that q has cast (or the null vote if there is no such vote).
- After receiving a LastVote(b,v) message from every priest in some majority set Q, priest _p initiates a new ballot with number b, quorum Q, and decree d. The decree d is chosen so as to satisfy the third rule. He records the ballot in the back of his ledger and sends a BeginBallot(b,d) message to every priest in Q.
- Upon receiving a BeginBallot(b,d) message a priest q decides whether to cast a vote. Voting is not mandatory, but a priest may not vote if doing so would violate a LastVote(b’,d’) message he has sent for some other ballot. If voting, he sends a Voted(b,q) message to p and records the vote in his ledger.
- If p has received a Voted(b,q) message from every priest in Q then he writes d in his ledger and sends a Success(d) message to every priest in Q.
- Upon receiving a Success(d) message a priest enters decree d in his ledger.
A refinement of this protocol, called the basic protocol, is then introduced which reduces the amount of information each priest is required to keep. In particular, a priest only need keep track of:
- The number of the last ballot that he tried to initiated
- The vote cast in the highest-numbered ballot in which he voted
- The largest ballot number b for which he has sent a LastVote(b,v) message
The basic protocol is a restricted version of the preliminary protocol, meaning that every action allowed by the basic protocol is also allowed by the preliminary protocol. Since the preliminary protocol satisﬁes the consistency condition, the basic protocol also satisﬁes that condition. Like the preliminary protocol, the basic protocol does not require that any action ever be taken, so it does not addresses the question of progress.
To ensure progress, it was necessary to choose a single priest, called the president to initiate ballots. The criteria for president selection was that :
If no one entered or left the Chamber, then after T minutes exactly one priest in the Chamber would consider himself to be the president.
One way of achieving this is to select the priest whose name comes last alphabetically, with each priest broadcasting their name to all other priests in the chamber at least once every T – (message round trip time) minutes. A priest considers himself president if he receives no message from a higher named priest for T minutes.
The complete Synod protocol was obtained from the basic protocol by requiring priests to perform steps 2–6 promptly, adding a method for choosing a president who initiated ballots, and requiring the president to initiate ballots at the appropriate times. Many details of the protocol are not known. I have described simple methods for selecting a president and for deciding when the president should initiate a new ballot, but they are undoubtedly not the ones used in Paxos.
The multi-decree parliament
Instead of just passing one decree, the parliament needed to pass a succession of decrees.
Logically, the parliamentary protocol used a separate instance of the complete Synod protocol for each decree number. However, a single president was selected for all these instances, and he performed the ﬁrst two steps of the protocol just once.
When a new president was elected, if his ledger was behind the other priests send him details of the decrees he is missing. Any remaining gaps in the ledger that could not be filled by priests then in chamber (because all the priests involved in passing those numbered decrees had left the chamber) were filled in by a special ‘null’ decree (“The ides of February is national olive day“).
The consistency and progress properties of the parliamentary protocol follow immediately from the corresponding properties of the Synod protocol from which it was derived. To our knowledge, the Paxons never bothered writing a precise description of the parliamentary protocol because it was so easily derived from the Synod protocol.
We do know that when parliament was in session and no priests were entering or leaving the chamber that the following steps were taken upon receiving a request to pass a decree:
- The president sent a BeginBallot message to each member in the quorum
- Each legislator in the quorum sent a voted message to the president
- The president sent a success message to every legislator.
This is a total of three message delays and about 3N messages, assuming a parliament of N legislators and a quorum of about N/2. Moreover, if Parliament was busy, the president would combine the BeginBallot message for one decree with the Success message for a previous one, for a total of only 2N messages per decree.
Observations, optimisations, and enhancements
(Lots of detail omitted here, see the full paper for more).
- On electing a new president, it’s best not to pick one who has been out of the chamber for a long time and hence has a lot of catching up to do in his ledger
- Ledgers can be compressed to avoid them getting too long
- Legislators can delegate rulings to bureaucrats, who have decision making powers on a lease basis
- Ordinary citizens needed a way to learn about the law, “If one inquiry precedes a second inquiry, then the second inquiry cannot reveal an earlier state of the law than the ﬁrst.” A simple way of ensuring this is to pass a decree for every inquiry.
Passing a decree for every inquiry soon proved too cumbersome. The Paxons realized that a simpler method was possible if they weakened the monotonicity condition by changing the interpretation of precedes. They decided that for one event to precede another, the ﬁrst event not only had to happen at an earlier time, but it had to be able to causally aﬀect the second event… The weaker monotonicity condition was met by using decree numbers in all business transactions and inquiries.
Relation to computer science
The Paxon Parliament’s protocol provides another way to implement an arbitrary state machine. The legislators’ law book corresponds to the machine state, and passing a decree corresponds to executing a state-machine command. The resulting algorithm is less robust and less expensive than the earlier algorithms. It does not tolerate arbitrary, malicious failures, nor does it guarantee bounded-time response. However, consistency is maintained despite the (benign) failure of any number of processes and communication paths. The Paxon algorithm is suitable for systems with modest reliability requirements that do not justify the expense of an extremely fault-tolerant, real-time implementation.