Explainer · 7 June 2026

How two-phase commit and distributed transactions work

A single database can wrap several row updates in one ACID transaction: either every change commits together or none of them do. The moment your business logic touches two independent systems — a Postgres ledger and a Redis cache, an inventory service and a payment gateway, two shards on different machines — you no longer have one transaction log. Two-phase commit (2PC) is the classical protocol for making those separate writes behave atomically: all participants agree to commit, or every prepared participant rolls back. It is elegant on paper, fragile under real network partitions, and still embedded in enterprise databases as XA/JTA. Understanding 2PC explains why distributed systems textbooks spend so much time on consensus, sagas, and eventual consistency instead.

Roles: coordinator and participants

Every 2PC run has one coordinator (also called the transaction manager) and one or more participants (resource managers). The coordinator owns the global transaction ID and drives the protocol; each participant manages a local resource — a database connection, a message broker session, a file store — and maintains a private undo log for rows it might need to roll back.

The application starts a global transaction at the coordinator, then performs work through participants: debit an account on database A, reserve stock on database B, enqueue a confirmation email on a queue. Until the protocol finishes, participants hold locks or versioned write intents so other transactions cannot observe half-done state. When the application calls commit, the coordinator runs 2PC; on rollback, it aborts without entering the commit phase.

This model assumes participants are reliable enough to durably record their prepare vote. If a participant crashes after voting "yes" but before learning the final outcome, it must be able to reconstruct in-doubt transactions after restart — which is where transaction logs, timeouts, and recovery scanners enter operations playbooks.

Phase 1: Prepare (the voting round)

Phase 1 is a blocking question to every participant: "Can you commit the work you've done for transaction T?" The coordinator sends PREPARE with the global transaction ID. Each participant evaluates locally:

Are all local changes still valid (constraints, foreign keys, business rules)?
Can it durably record an prepare record in its transaction log?
Will it be able to commit later if ordered to, without further validation?

If yes, the participant flushes its prepare record to disk — this is the critical durability point — votes YES, and keeps locks held. If anything fails, it votes NO, rolls back locally, and releases locks. The coordinator collects votes. Any single NO means the global transaction aborts: the coordinator sends ABORT to all voters, participants undo, and the application sees a failure.

Prepare is expensive because participants promise future commit without knowing the global outcome yet. Locks acquired during the business logic may be held across network round-trips and coordinator processing — a common source of latency spikes and deadlock chains in XA-heavy workloads.

Phase 2: Commit or abort (the decision)

If every participant voted YES, the coordinator first writes its own commit decision to a durable log, then broadcasts COMMIT to all participants. Each participant applies the commit, releases locks, and acknowledges. The global transaction is now visible everywhere — atomically, from the application's perspective.

If the coordinator decides to abort (because of a NO vote or an application rollback), it logs ABORT and tells participants to undo. Participants that never received prepare simply ignore the transaction.

The ordering matters: the coordinator must persist the decision before sending commit, otherwise a crash could lose the outcome while participants are prepared to commit — leaving the system in an unrecoverable ambiguous state without an external transaction log.

Failure modes: blocking, partitions, and heuristic damage

2PC is a blocking protocol. If the coordinator crashes after participants vote YES but before broadcasting the final decision, participants cannot safely commit (the coordinator might have aborted) and cannot safely abort (the coordinator might have committed). They remain in doubt, holding locks until the coordinator recovers or an operator intervenes. Production outages from stuck XA transactions are often this scenario.

Network partitions make the problem worse: a isolated participant may timeout waiting for phase 2 while another partition commits. The CAP theorem framing applies — you cannot have both immediate global consistency and partition tolerance without sacrificing availability somewhere. Three-phase commit (3PC) adds a pre-commit round to reduce indefinite blocking, but cannot fully solve partitions without extra assumptions about clock sync and failure detectors.

When operators force a decision on a stuck participant — committing or aborting without coordinator agreement — they create a heuristic decision. The cluster may end up with one shard committed and another aborted: the exact inconsistency 2PC was designed to prevent. Runbooks treat heuristic commits as data-corruption incidents requiring manual reconciliation.

XA, JTA, and where 2PC still ships

The X/Open XA standard defines how databases and message brokers expose 2PC interfaces to Java JTA, .NET TransactionScope, and similar application-server coordinators. Classic enterprise patterns — updating Oracle and posting to IBM MQ in one JTA transaction — are 2PC under the hood. Modern cloud-native teams use this sparingly because cross-region 2PC amplifies latency and ties unrelated service availability together: one slow participant blocks the commit of everyone.

Sharded databases that want single-log semantics internally often use Raft or Paxos on a replicated log instead of exposing 2PC to applications — the consensus layer is the coordinator, with well-understood leader election and recovery. Spanner's TrueTime and Percolator-style two-phase locking at scale are research-grade evolutions, not something most product teams should replicate for a checkout flow.

Alternatives: sagas, outbox, and idempotent consumers

Microservice architectures usually reject global 2PC in favor of sagas: a sequence of local transactions, each with a compensating action if a later step fails. Debit payment locally, then reserve inventory locally; if inventory fails, run a compensating refund. Sagas trade atomic isolation for availability and independent service evolution — but expose temporary inconsistency windows readers must tolerate or filter.

The transactional outbox pattern pairs a local DB commit with an outbox table row; a separate relay publishes events to message queues with at-least-once delivery. Downstream services consume idempotently. This achieves "eventually all sides agree" without a cross-service prepare phase — closer in spirit to CRDTs and mergeable state than to strict 2PC.

For read-mostly coordination, an application-level try-confirm-cancel flow (reserve funds, capture or release) mirrors 2PC phases without holding database locks across services — payment processors have operated this way for decades.

Production examples and operational notes

PostgreSQL prepared transactions — PREPARE TRANSACTION 'xid' exposes 2PC to an external coordinator; pg_prepared_xacts lists in-doubt transactions operators must resolve after coordinator crashes.
MySQL XA — XA START / XA END / XA PREPARE / XA COMMIT for cross-shard or cross-engine writes; rarely used across microservices, common in legacy monolith splits.
Spring @Transactional on multiple DataSources — only atomic if a JTA coordinator wires XA drivers; otherwise you have independent commits and silent partial-failure risk.
Kubernetes etcd — uses Raft consensus for a single replicated log, not 2PC exposed to clients; compare when teaching "coordination" vs "cross-database atomicity."

Monitoring should alert on prepared-transaction age, lock wait time, and coordinator recovery lag. Timeouts should abort prepare votes rather than leaving participants blocked indefinitely — accepting occasional false aborts beats wedging inventory tables during a network blip.

Practical checklist

Use 2PC only when participants are few, co-located, and operations accepts blocking during coordinator failure.
Prefer single-service transactions plus outbox/events for cross-service workflows unless regulation mandates atomic cross-store commits.
Never hold locks across remote HTTP calls during prepare — keep prepare phases milliseconds, not seconds.
Document heuristic-recovery runbooks before enabling XA in production.
When global atomicity is non-negotiable, consider one replicated log (Raft) instead of many independent 2PC participants.