Guide
CAP theorem and consistency models explained
Every distributed database, cache cluster, and microservice mesh faces the same uncomfortable truth: networks fail, messages arrive late, and replicas disagree. The CAP theorem states that during a network partition, a system cannot simultaneously guarantee both Consistency (every read sees the latest write) and Availability (every request gets a response). Partition tolerance is not optional in real deployments — partitions happen — so architects choose which guarantee to relax. That choice is expressed through consistency models: from strict linearizability down to eventual convergence. This guide explains CAP and its practical successor PACELC, maps common databases to the spectrum, covers quorum reads, session guarantees, conflict resolution, and how to pick the right trade-off for ledgers, shopping carts, and social feeds.
CAP in plain language
Eric Brewer's CAP theorem (formalized by Gilbert and Lynch) describes three properties of a replicated data store:
- Consistency (C) — all nodes see the same data at the same time; a read after a successful write returns that write. In formal terms, operations appear atomic and ordered as if on a single copy.
- Availability (A) — every request to a non-failing node receives a non-error response, without guarantee that it reflects the latest write.
- Partition tolerance (P) — the system continues operating when network links between nodes are broken or delayed.
During a partition, you must choose: reject some requests to preserve consistency (CP), or accept requests on isolated nodes and risk stale or divergent data (AP). You cannot have both C and A while P holds. When the network is healthy, a well-designed system can deliver consistency and availability — CAP only binds behavior under partition.
Common misreadings: CAP is not "pick two of three forever." Partition tolerance is mandatory for multi-datacenter systems. The real question is what happens during the partition and how quickly you heal afterward.
PACELC: CAP plus normal-case latency
Daniel Abadi's PACELC extension captures what CAP leaves out: even without partitions, systems trade consistency for latency.
If Partition (P): choose Availability or Consistency.
Else (E): choose Latency or Consistency.
A globally replicated database that synchronously writes to three regions before acknowledging a deposit is PC/EC — consistent but slow. A Dynamo-style key-value store that writes to one replica and asynchronously propagates is PA/EL — fast and available, eventually consistent. Most product decisions happen in the EL vs EC lane: "Can this read be stale for 200 ms?"
The consistency spectrum
Consistency is not binary. Models form a spectrum from strongest to weakest:
- Linearizability (strong consistency) — operations appear to execute atomically in real-time order. A read never returns an old value after a newer write has completed. Required for bank balances, inventory decrements, and distributed locks.
- Sequential consistency — all nodes agree on operation order, but that order may not match wall-clock time. Weaker than linearizable; rare as an explicit product guarantee.
- Causal consistency — if operation A causally precedes B (A happened-before B), every node sees A before B. Good for comment threads and collaborative editing where related events must stay ordered.
- Read-your-writes / session consistency — a user always sees their own updates, even if other users see stale data. Achieved with sticky sessions or version tokens passed from write to read.
- Eventual consistency — if writes stop, all replicas converge to the same state. Reads may return stale values for an unbounded (but usually short) window. Acceptable for likes, view counts, and CDN edge caches.
Stronger models cost latency and fault tolerance. Weaker models need explicit idempotency and conflict handling in application code.
How databases map to CAP
No database is purely one letter — behavior depends on configuration:
- CP tendencies — single-leader SQL (PostgreSQL with synchronous replication), etcd, ZooKeeper, Spanner. On partition, minority partitions stop accepting writes or reads to avoid split-brain.
- AP tendencies — Cassandra, DynamoDB (default eventually consistent reads), Riak. Nodes stay up during partition; replicas reconcile later.
- Configurable middle — MongoDB read concern / write concern, Redis with WAIT, CockroachDB with follower reads. Tunable per query.
Replication topology (single-leader, multi-leader, leaderless) largely determines where a store sits on the spectrum. Single-leader async replication is "strong on the leader, eventual on secondaries" — a hybrid most teams live with daily.
Quorum reads and writes
Leaderless systems like Cassandra use quorum consensus to bound staleness without a single master. With replication factor N, choose write quorum W and read quorum R such that R + W > N. Then a read and a write overlap on at least one replica, so the read sees the latest write (assuming no concurrent writers).
Example: N=3, W=2, R=2. A write must reach two of three nodes; a read must query two. Overlap guarantees freshness. Lower R for faster stale reads; raise W for durability. Sloppy quorums and hinted handoff extend availability during node failure at the cost of stricter guarantees.
Quorum math is the bridge between CAP theory and tunable consistency — many "AP" stores offer strong-ish reads when you pay the latency of R=N.
Partitions in production: what users actually see
When a partition splits a cluster, bad things happen unless you planned for them:
- Split-brain — two partitions both accept writes to the "same" data, creating irreconcilable forks. Prevent with majority quorums, fencing tokens, or distributed locks backed by consensus (etcd, ZooKeeper).
- Stale reads — a user refreshes and sees yesterday's cart total. Mitigate with read-your-writes via session stickiness, version vectors, or routing reads to the leader for critical paths.
- Unavailable writes — CP systems reject writes on the minority side. Clients need retries, clear error messages, and exponential backoff.
- Duplicate processing — at-least-once delivery during recovery requires idempotent consumers in event-driven pipelines.
Observability matters: track replication lag, conflict rate, and "stale read" counters. A sudden lag spike often precedes a partition or overloaded replica.
Conflict resolution when replicas diverge
AP systems must merge concurrent writes. Strategies:
- Last-write-wins (LWW) — timestamp or version decides. Simple but loses data silently; clock skew makes it dangerous.
- Application merge — custom logic (union of tags, max of counters). Works when the domain has a natural commutative merge.
- CRDTs — data structures (G-Counter, OR-Set) mathematically guaranteed to converge without coordination. Ideal for collaborative counters and presence, not for account balances.
- Escalate to human / strong path — detect conflict, block one branch, route to support or a strongly consistent store.
Financial invariants (balance never negative) belong on CP stores or single-leader SQL with transactional locking, not on eventually consistent replicas without careful design.
Decision table: which consistency for which data?
| Data type | Recommended model | Why |
|---|---|---|
| Payments, ledger balances | Linearizable / strong | Double-spend and lost updates are unacceptable; accept higher latency. |
| Inventory / seat booking | Strong or serializable transaction | Overselling destroys trust; use row locks or compare-and-swap. |
| User profile after self-edit | Read-your-writes | User must see their save; others can lag briefly. |
| Social likes, view counts | Eventual | Approximate counts are fine; optimize for write throughput. |
| Product catalog / CMS | Eventual + CDN | Stale listing for seconds is tolerable; invalidate on publish. |
| Chat message ordering | Causal or per-channel strong | Messages in a thread must not appear out of order. |
| Feature flags / config | Strong on control plane | Split-brain flag state causes unpredictable behavior everywhere. |
Common mistakes
- Treating CAP as a one-time architecture slide — consistency is per-operation, tunable in many stores.
- Strong consistency everywhere — global synchronous replication kills latency and availability for no user benefit on analytics data.
- Eventual consistency on money — "it usually converges" is not a ledger strategy.
- Ignoring session stickiness — load-balancing reads round-robin across lagging replicas breaks read-your-writes.
- Last-write-wins with wall clocks — NTP skew causes wrong winners; use logical clocks or version vectors.
- No partition drills — teams discover split-brain behavior for the first time in production.
Production checklist
- Classify each entity by required consistency model before choosing a store.
- Document behavior under partition for every critical service (CP vs AP path).
- Monitor replication lag and alert before it exceeds product SLO.
- Use majority quorums or consensus for any resource that must not fork.
- Pass version tokens or use sticky sessions for read-your-writes UX.
- Make all side-effect handlers idempotent for at-least-once delivery.
- Run chaos tests that isolate nodes and verify failover matches design docs.
- Expose consistency knobs (read concern, consistency level) in runbooks, not only in code comments.
- Plan conflict resolution before shipping multi-region writes to AP stores.
- Revisit choices when traffic crosses regions — PACELC latency dominates CAP at scale.
Key takeaways
- CAP forces a choice during partitions: consistency or availability.
- PACELC adds the everyday trade-off: consistency vs latency when the network is fine.
- Consistency models range from linearizable to eventual — match them to business invariants.
- Quorums and session guarantees tune AP systems without abandoning freshness where it matters.
- Conflicts are application problems — CRDTs, merges, or strong stores; not something eventual consistency hides forever.
Related reading
- Database replication explained — single-leader, multi-leader, and leaderless topologies
- Database transactions and isolation levels explained — ACID guarantees on a single node
- Distributed locking explained — fencing tokens and consensus-backed locks
- Event-driven architecture explained — at-least-once delivery and ordering across services