Guide

Database replication explained: read replicas, WAL shipping, lag and failover

Database replication keeps one or more copies of your data in sync with a primary database. Teams adopt it for two reasons that often overlap: scaling read-heavy workloads onto read replicas, and surviving hardware failure through standby promotion. Unlike sharding, which splits rows across machines to scale writes, replication duplicates the full dataset so every replica can answer the same queries — at the cost of propagation delay, consistency trade-offs, and operational complexity during failover. This guide walks through synchronous vs asynchronous modes, how PostgreSQL and MySQL ship changes, replication lag and read-your-writes pitfalls, routing patterns, and what to verify before you trust a replica with production traffic.

Primary, replica, and standby: the vocabulary

In the most common topology, one node is the primary (also called leader or master). It accepts all writes. One or more replicas (followers, secondaries, slaves in older docs) receive a stream of changes and apply them locally. A standby is a replica kept warm for failover — often identical in configuration but not serving user reads until promotion.

Reads can be directed to replicas to offload the primary. Writes always go to the primary (in single-primary setups). The replica is eventually consistent: it reflects the primary’s state as of some point in the past, minus replication lag. How far in the past depends on network speed, replica CPU, and whether replication is synchronous or asynchronous.

Managed cloud databases (RDS, Cloud SQL, Aurora, AlloyDB) hide much of this wiring but expose the same concepts: a writer endpoint, reader endpoints, automatic failover, and lag metrics in the console.

How changes propagate: WAL, binlog, and logical replication

Relational databases record every durable change in an append-only log before acknowledging the transaction. PostgreSQL uses the Write-Ahead Log (WAL); MySQL uses the binary log (binlog). Replication ships these log records to replicas, which replay them in order.

Physical (streaming) replication

Physical replication copies low-level page or block changes. PostgreSQL streaming replication sends WAL segments over the network; the standby applies them byte-for-byte. The replica is a near-exact clone — same schema, same extensions, same major version. You cannot subscribe to a subset of tables with physical replication alone.

Logical replication

Logical replication decodes WAL/binlog into row-level change events (INSERT, UPDATE, DELETE) and publishes them to subscribers. PostgreSQL logical replication and MySQL row-based binlog replication support table-level filtering, cross-version upgrades, and feeding data warehouses. Throughput is lower than physical streaming because decoding has CPU overhead, but flexibility is higher.

Snapshot and base backup

A new replica cannot start from an empty disk. You seed it with a base backup (pg_basebackup, mysqldump for small DBs, or storage snapshots) taken at a consistent point, then catch up by replaying subsequent log segments. Cloud providers automate this; self-hosted setups need runbooks for re-seeding a replica that falls too far behind or corrupts.

Synchronous vs asynchronous replication

The durability vs latency trade-off is the central design choice in replication architecture. It maps directly to consistency models in distributed systems.

Mode Commit behavior Data loss risk Write latency
Asynchronous Primary acks when local WAL is fsynced; replica catches up later Uncommitted replica data lost if primary dies before shipping Lowest — no cross-network round trip on commit
Synchronous Primary waits until at least one replica confirms receipt (or apply) Near-zero RPO if sync quorum survives Higher — bounded by replica RTT and apply speed
Quorum / semi-sync Wait for N of M replicas (e.g. Raft in CockroachDB, Patroni) Configurable via quorum size Moderate; tunable

Most OLTP apps default to async replication for read scaling and disaster recovery with acceptable risk. Financial ledgers, inventory reservation, and anything where losing the last second of writes is unacceptable push toward sync or quorum replication — often at the cost of write throughput and availability during network partitions (the CAP trade-off in practice).

PostgreSQL offers synchronous_commit levels from off through remote_apply. MySQL semi-sync waits for binlog receipt, not necessarily apply. Read the fine print: “synchronous” in marketing often means “less async,” not linearizable across all readers.

Replication lag: causes, measurement, and user-visible bugs

Replication lag is the time (or byte offset) between a commit on the primary and the same change being visible on a replica. Healthy async setups run sub-second lag; heavy write bursts, long-running queries on replicas, network congestion, or under-provisioned replica hardware can push lag to seconds or minutes.

Monitor lag continuously:

  • PostgreSQL: pg_stat_replication, replay_lag, write_lag
  • MySQL: Seconds_Behind_Master (approximate; know its caveats)
  • Managed DB: cloud metrics dashboards and alerts on lag thresholds

Lag creates subtle bugs when apps route reads to replicas:

  • Read-your-writes violation: user saves profile, page reloads from replica, sees stale data.
  • Duplicate detection races: uniqueness check on replica misses row still in flight.
  • Session stickiness gaps: load balancer sends next request to a different, laggier replica.

Mitigations include routing a user’s reads to the primary for a short window after writes (sticky session), using lag-aware load balancers (PgBouncer, ProxySQL, RDS Proxy), or accepting eventual consistency in the product copy. Critical read-after-write paths should not use replicas without an explicit consistency strategy.

Read/write splitting and connection routing

The standard scaling pattern: all writes to primary, analytical and list queries to replicas. Implementation layers differ:

  • Application-level: ORM or repository chooses connection by query type. Explicit and testable; every developer must remember the rules.
  • Proxy layer: Pgpool-II, HAProxy, ProxySQL, or cloud RDS Proxy parse or tag queries and route automatically.
  • Driver middleware: some drivers accept read vs write connection hints.

Pair read replicas with connection pooling. Replicas multiply connection counts; without pooling, you exhaust max_connections fast. Pool per replica, not one giant pool fanning out randomly.

Replicas excel at report queries, search facets, and dashboards that tolerate seconds of staleness. They do not scale write throughput — every write still hits the primary and is replayed to every replica. When write volume exceeds one machine, you need sharding or vertical partitioning, not more replicas.

Failover, promotion, and split-brain

When the primary fails, a replica must be promoted to accept writes. Manual promotion is simple but slow (human on call, DNS update, app config change). Automated failover tools — Patroni, repmgr, Orchestrator, cloud managed failover — elect a new primary and reconfigure peers.

Recovery Point Objective (RPO) and Recovery Time Objective (RTO)

RPO is how much data you can lose (async lag at failure time). RTO is how long until service resumes. Sync replication minimizes RPO; rehearsed runbooks and health checks minimize RTO. Test failover quarterly; untested failover is fiction.

Split-brain

Split-brain occurs when two nodes both believe they are primary — often after a network partition. Both accept writes; data diverges; reconciliation is painful. Prevention requires an external coordination service (etcd, ZooKeeper, Consul) or cloud control plane that grants exactly one write lease. Never promote a replica while the old primary might still be alive without fencing (STONITH: shoot the other node in the head).

Rewinding and rejoining

After failover, the old primary (if it recovers) must not rejoin as primary. PostgreSQL pg_rewind can resync a demoted node to the new timeline; otherwise rebuild from backup. Document this before you need it at 3 a.m.

Multi-primary (active-active) replication

Multi-primary or active-active setups allow writes on multiple nodes. MySQL Group Replication, Galera Cluster, and CockroachDB’s distributed SQL offer this. Benefits: lower write latency for geo-distributed users, no single write choke point. Costs: conflict resolution (last-write-wins loses data), sequence/ID generation complexity, and harder transaction isolation semantics across nodes.

Most teams should stay on single-primary replication until geographic write latency is a measured bottleneck, not a hypothetical one. Active-active is an advanced mode with sharp edges.

Replication vs caching vs sharding: decision order

Scale in this order before reaching for harder tools:

  1. Indexes and query tuning on the primary.
  2. Read replicas for read-heavy, lag-tolerant workloads.
  3. Application caching (Redis) for hot keys and session data.
  4. Vertical scaling (bigger machine, faster disk).
  5. Sharding when write throughput or dataset size exceeds one node.

Replication pairs naturally with Redis caching: cache absorbs flash read traffic; replicas handle heavier analytical queries; primary stays focused on writes and strongly consistent reads.

Production checklist

  1. Define RPO/RTO targets; choose sync vs async accordingly.
  2. Monitor replication lag with alerts (not just dashboards).
  3. Document which queries may hit replicas vs must use primary.
  4. Implement read-your-writes for post-mutation user flows.
  5. Size replicas for peak replay throughput, not idle CPU.
  6. Use connection pooling per replica; cap total connections.
  7. Rehearse failover: promote, repoint apps, verify writes, rejoin old primary.
  8. Fence or isolate failed primaries to prevent split-brain.
  9. Encrypt replication traffic (TLS) on untrusted networks.
  10. Keep replica major version compatible; test logical replication for upgrades.
  11. Back up replicas independently only if you understand consistency points.
  12. Revisit sharding when write rate, not read rate, is the bottleneck.

Key takeaways

  • Replication copies data for read scaling and disaster recovery; it does not scale writes.
  • Async is default for latency; sync/quorum trades speed for durability.
  • Lag is inevitable in async mode — design routing and UX around it.
  • Failover needs fencing and rehearsal; split-brain corrupts data silently.
  • Replicas before shards — simpler, reversible, and often enough for years.

Related reading