Guide
CRDTs explained: conflict-free replicated data types
Harbor Docs wanted shared meeting notes that worked on a flaky conference Wi-Fi. Two engineers edited the same bullet list offline: one added an action item, the other reordered tasks and deleted a stale line. When both laptops reconnected, a naive “last write wins” merge would have dropped the new action item. The team switched to a CRDT (conflict-free replicated data type): each edit updated a local replica, peers exchanged compact state deltas, and every client converged to the same document without a central coordinator or merge dialog. CRDTs are data structures designed so concurrent updates commute — apply them in any order and replicas reach identical state. They power offline-first mobile apps, collaborative editors, and geo-distributed counters where strong consistency would be too slow or unavailable. This guide explains the math intuition, state-based vs operation-based designs, common CRDT families, a Harbor Docs notes sync worked example, an approach decision table, pitfalls, and a production checklist.
What CRDTs are
A conflict-free replicated data type is a replicated object with two well-defined operations:
- Update — modify the local replica without contacting a leader.
- Merge — combine two replicas into one that incorporates both histories.
CRDTs guarantee that merge is associative, commutative, and idempotent: merging A with B equals merging B with A, and merging a replica with itself changes nothing. That property is what lets disconnected clients sync safely when connectivity returns.
CRDTs sit in the eventual consistency family described in distributed consistency models. They do not provide linearizability on every read; they provide a provable guarantee that all replicas eventually agree. For many product surfaces — presence counters, shopping-cart quantities, shared checklists — that trade-off is acceptable and often desirable for latency and availability.
State-based vs operation-based CRDTs
State-based (CvRDT)
Replicas broadcast their full state (or a compressed state delta). Merge
is a pure function on states: merge(S₁, S₂) → S₃.
G-Counters and grow-only sets are classic examples. State-based CRDTs tolerate
message duplication and reordering because merge is idempotent.
Operation-based (CmRDT)
Replicas broadcast operations (insert character at index, increment counter). Merge applies the operation to local state. Operations must be delivered exactly once and often require causal delivery (see vector clocks in your transport layer). Op-based CRDTs can be more bandwidth-efficient for large documents when you only send deltas, but the messaging layer carries more responsibility.
Libraries like Automerge and Yjs blend both views: users edit locally with op-based feel, while sync rounds exchange encoded states. Pick the transport semantics your stack already provides — Kafka with idempotent consumers favors state merges; WebRTC data channels may prefer small op streams.
Common CRDT families
Counters
A G-Counter (grow-only counter) tracks per-replica increments; merge takes the element-wise max. A PN-Counter pairs two G-Counters (increments and decrements) so you can count down without a central authority — useful for inventory holds and concurrent ticket sales, though business rules may still reject a negative total at the application layer.
Sets
A G-Set only adds elements; merge is set union. An OR-Set (observed-remove set) adds tagged elements and removes by tagging observed additions — concurrent add/remove of the same element resolves predictably. OR-Sets model watchlists, tag collections, and “who has seen this notification” without lost updates.
Registers and maps
An LWW-Register (last-writer-wins) stores a value plus a timestamp and replica ID; merge keeps the highest timestamp. Simple but vulnerable to clock skew. LWW-Maps apply LWW per key. Prefer logical clocks or hybrid logical clocks when wall-clock trust is weak.
Sequences (text and lists)
Text CRDTs (RGA, LSEQ, YATA) assign each character or list item a unique, ordered identifier so insertions at the same position commute. This is how collaborative editors avoid index-shift bugs that plague naive operational-transform implementations. Sequence CRDTs are the heaviest family — expect tombstone accumulation and compaction strategies in production.
Harbor Docs: offline-first shared notes
Harbor Docs shipped a lightweight notes app for field teams. Requirements: edit on mobile offline, sync when back online, no “conflict resolution” modal, and support for checklist reordering plus concurrent text edits in the same paragraph.
- Model split. Title and body used a text CRDT (Yjs document). Checklist items were an OR-Set of item IDs paired with LWW-Registers for text and a G-Counter for completion toggles per device session.
- Local-first writes. Every keystroke updated the in-memory CRDT; IndexedDB persisted the encoded state every two seconds and on blur.
- Sync protocol. On reconnect, peers exchanged state vectors (version clocks per client) and sent only missing update batches over a WebSocket relay — no single writer lock.
- Compaction. Nightly job merged tombstones older than 90 days and snapshot the document to cold storage, keeping hot replicas under 2 MB.
- Semantic guardrails. CRDT merge cannot detect that two different phrases mean the same task; product rules flagged duplicate checklist text for human review rather than pretending the data layer understands semantics.
Result: zero merge dialogs in six weeks of beta, median sync under 400 ms on reconnect, and acceptable storage growth with scheduled compaction.
Approach decision table
| Scenario | Recommended approach | Why not the alternatives |
|---|---|---|
| Offline mobile edits, must converge automatically | CRDT (library-backed text + OR-Set/LWW primitives) | Pessimistic locking fails offline; naive LWW drops edits |
| Centralized document with always-online clients | Operational transform or single-leader ordering | CRDT tombstones and metadata overhead may not pay off |
| Financial ledger, inventory that must never double-spend | Strong consistency + transactions (Postgres, consensus log) | CRDTs do not enforce global invariants across unrelated keys |
| Audit trail and rebuildable projections | Event sourcing with deterministic reducers | CRDT state merges are opaque unless you also log ops |
| Geo-distributed counter (likes, presence) | PN-Counter or dedicated CRDT counter service | Read-modify-write with row locks creates hot shards |
| Warehouse cache fed from OLTP | CDC + idempotent consumers | CRDTs solve peer edits, not primary-replica replication |
Common pitfalls
- Using LWW everywhere — clock skew silently drops legitimate edits; use logical clocks or CRDT types with deterministic tie-breakers.
- Unbounded tombstones — deleted text in sequence CRDTs still occupies metadata; plan compaction and snapshots.
- Expecting semantic merge — CRDTs guarantee syntactic convergence, not that merged business meaning is correct.
- Ignoring global invariants — two CRDT counters can each decrement locally while inventory is one unit; enforce caps at commit or via a strongly consistent ledger for the authoritative total.
- Shipping a custom text CRDT — edge cases in list/text CRDTs took years of research; use Yjs, Automerge, or Diamond Types unless you have a research budget.
- Duplicated op delivery on CmRDTs — without idempotent consumers, operation-based replicas double-apply edits.
- Testing only happy-path sync — fuzz reorder, delay, and partition merges; property tests on merge associativity catch regressions early.
Production checklist
- Classify each field: can it be eventual, or does it need linearizable transactions?
- Pick CRDT types per field (counter, set, register, text) instead of one global LWW document.
- Use an maintained library for text/list CRDTs; do not hand-roll RGA.
- Persist encoded state locally (IndexedDB, SQLite) before acknowledging user saves.
- Design sync as idempotent state or op exchange with version vectors.
- Plan tombstone compaction, snapshots, and max document size guards.
- Instrument merge latency, replica size, and sync batch bytes per session.
- Document which conflicts the CRDT resolves vs which need application-level review.
- Load-test partition healing (flapping network, duplicate messages, slow peers).
- Pair CRDT replicas with backup and disaster recovery for authoritative snapshots.
Key takeaways
- CRDTs merge concurrent edits with mathematical convergence guarantees — no central lock required.
- State-based CRDTs idempotently merge full states; operation-based ones need reliable, often causal, delivery.
- Pick the type per use case — counters, OR-Sets, LWW-Registers, and text sequences behave differently.
- CRDTs solve replication mechanics, not business semantics or global invariants.
- Plan for tombstones and compaction from day one on text and list CRDTs.
Related reading
- CAP theorem and consistency models explained — partition tolerance and where eventual consistency fits
- Distributed systems consistency explained — strong vs eventual guarantees and quorum patterns
- Event sourcing explained — append-only logs as an alternative when audit and replay matter more than peer edits
- Change data capture (CDC) explained — streaming primary-database changes to downstream replicas