How Paxos and Two-Phase Commit Differ

January 26, 2021 systems intro

Distributed systems courses frequently introduce the Paxos and two-phase commit (2PC) protocols in quick succession. On one hand, this is a reasonable educational choice, as Paxos and 2PC are both examples of consensus algorithms used in the real world. On the other, it leads many students toward the dangerously incorrect conclusion that Paxos and 2PC solve the same problem. The best way to resolve this confusion is via an intuitive example involving both algorithms where swapping their places clearly does the wrong thing. Here we'll dig into exactly such an example.

A small town at the edge of a blue sea, surrounded by thickly forested rolling hills going as far as the eye can see.

Imagine you are booking a trip. (I've been doing a lot of such imagining since March 2020.) On your favorite travel website, you pick a hotel, some airline tickets, and a rental car, then you enter your credit card information and click "Book!" What happens next?

Needless to say, this operation in the real world is significantly more complex than we could hope to cover here. We'll routinely gloss over many unimportant but nonetheless fascinating details, such as how paying for things over the Internet actually works.

Two-phase commit

You only want to pay the travel website if your choice of hotel, flight, and rental car are all available. Similarly, the companies involved don't want to hold any reservations in your name unless they can be sure that your bank will accept the credit card charges. To realize your trip, multiple independent computer systems need to examine their local information (bank account state, flight schedules, hotel room availability, etc.) and agree that the transaction is safe to perform.

The travel website acts as the transaction coordinator of the two-phase commit. First, it sends a "prepare to commit" message to the hotel, airline, rental agency, and your bank. These messages ask the protocol participants to confirm that they have no objections to the transaction based on their individual information. For example, your airline will confirm that your selected seats are still available — if someone else has booked them in the meantime, the airline will respond to the message with an "abort" command, and the transaction will fail without charging your credit card.

Responses to these "prepare to commit" messages are binding: for example, once your bank says "the transfer is allowed and sufficient funds are available," it can't change its mind at a later time (in practice, at least not until a very long timeout, usually 7–10 days). This is commonly called "authorizing the charge" on your card — you have not been charged yet, but your funds are held in reserve to ensure they remain available.

If your hotel, airline, rental agency, and bank all respond positively to the "prepare to commit" messages, then you know your trip is viable, and the various other parties know that you are able to pay for it. The next step is to send a message to all participants confirming the transaction: your card gets charged, your airline sends you a ticket confirmation, your hotel sends you a room reservation, etc.

Why can't this be Paxos instead?

There are many reasons, so let's tackle just the most obvious one: Paxos can't represent the fact that in this algorithm, the different parties care about different things. The hotel doesn't know or care about airline tickets, the rental agency can't find you a place to stay, and neither of them knows whether you have enough credit available from your bank.

To demonstrate the concrete problem here, let's say you tried to set up a Paxos instance where the travel website, your bank, the hotel, the airline, and the rental agency each have 1 node in the Paxos group — a total of 5 nodes.

Per the rules of Paxos, if 3 of the 5 nodes (a majority) agree on a proposal, that proposal is accepted — irrevocably so. But this doesn't make sense! That would mean that the travel agency's node, the airline's node, and the hotel's node could decide to agree on the proposal and promise you that you can take your trip, even though your bank won't cover the payment and the rental agency has no cars available to rent!

You said the example had both Paxos and 2PC — where's the Paxos?

Remember how we said 2PC steps are binding? For example, your bank must not forget that it promised to reserve some of your credit to cover a transaction that is in progress, regardless of what failures might happen in its system. In general, 2PC has a hard time recovering from failures of either the transaction coordinator (the travel website in our example) or the transaction participants (hotel / airline / bank / rental agency).

Turning the 2PC transaction coordinator and each transaction participant into a Paxos group is the key to "not forgetting" in the presence of failures. For example, a Paxos-replicated bank system can continue to process your transaction even if a minority of its servers crash, and will not forget to pay your travel agency for your trip once the transaction commits.

Could you have used 2PC instead of Paxos for the replication here?

Replication isn't the objective here — fault tolerance is! While you could successfully set up 2PC to replicate state across multiple identical machines, recall that 2PC does not tolerate failures: the first time a participant machine crashes in a 2PC setup, the protocol grinds to a halt.

Recap

Paxos and 2PC aim to solve different problems, and are not interchangeable for each other.
We use 2PC when multiple non-equivalent servers need to agree that a given operation is safe to perform. The "non-equivalent" part means Paxos doesn't work for this use case.
In practice, each of the "servers" in the 2PC computation is actually a separate Paxos group. This is how we ensure the "2PC assumes all nodes never fail" assumption holds up in the real world — the Paxos groups provide fault tolerance.
Paxos provides fault tolerance through replication, and you could configure 2PC to replicate data across multiple equivalent servers. However, such 2PC replication cannot provide fault tolerance on its own because 2PC requires that all nodes continue to work — even a single failure will stop the protocol's progress.