Explainer · 7 June 2026

How ACID database transactions work

Move $100 from Alice to Bob and two things must never happen: the money disappears, or it exists in both accounts at once. A relational database solves that with transactions — grouped operations that obey ACID: Atomicity, Consistency, Isolation, and Durability. Those four properties are why banks, ledgers, and inventory systems trust Postgres and MySQL more than a flat file — and why getting isolation wrong still produces ghost reads and double spends under load.

The four properties, in plain language

Atomicity means all-or-nothing. Either every statement in BEGIN … COMMIT succeeds, or the database rolls back to the state before the transaction started. If the debit from Alice succeeds but the credit to Bob fails on a disk error, atomicity undoes the debit — no partial transfer.

Consistency means invariants hold. The database enforces rules you declare: foreign keys, CHECK constraints, unique indexes, and triggers. A transaction cannot leave the schema in an illegal state — you cannot insert an order line for a product ID that does not exist. Note that "consistency" in ACID is narrower than in distributed systems; for partition trade-offs see our CAP theorem explainer.

Isolation means concurrent transactions do not step on each other in ways your application cannot reason about. Without isolation, one user's report might read half-updated rows while another user's transfer is mid-flight. Isolation is implemented with locks, multi-version concurrency control (MVCC), or both — and it is the property engineers tune most often.

Durability means committed data survives crashes. After COMMIT returns, a power failure must not erase the transfer. Engines achieve durability by writing to a write-ahead log (WAL) on disk before acknowledging commit, then flushing dirty pages asynchronously.

What happens inside BEGIN … COMMIT

When your application opens a transaction, the database assigns it a transaction ID and a snapshot of visible row versions (in MVCC systems like Postgres). Each UPDATE does not overwrite data in place immediately — it creates a new row version and marks the old one dead for that snapshot. Readers see consistent snapshots; writers coordinate through row-level locks or version checks.

On COMMIT, the engine appends a commit record to the WAL and waits until that log record is durable (fsync or equivalent on the storage device). Only then does it tell the client success. The actual data pages may sit in memory for seconds; recovery after crash replays the WAL to reconstruct committed work. That separation — log first, pages later — is why SSD write latency dominates commit time on small transactions.

On ROLLBACK or failure, the transaction's writes are discarded. MVCC systems leave dead tuples for a vacuum process to reclaim; lock-based systems release held locks. Long-running transactions are expensive because they pin old row versions and block vacuum — a common production footgun when someone leaves a reporting query open in the same session as OLTP traffic.

Isolation levels and the anomalies they prevent

SQL defines isolation levels by which phenomena they forbid. From weakest to strongest:

READ UNCOMMITTED — allows dirty reads (seeing uncommitted data). Rare in practice; most engines treat it as READ COMMITTED.
READ COMMITTED — default in Postgres and Oracle. Each statement sees only committed rows at statement start. Prevents dirty reads but allows non-repeatable reads (same query twice returns different rows) and phantom reads (new rows appear in a range scan).
REPEATABLE READ — default in MySQL InnoDB. The snapshot is fixed for the whole transaction. Repeatable reads of existing rows, but phantoms may still appear in some engines unless next-key locking is used.
SERIALIZABLE — strongest guarantee: concurrent transactions behave as if they ran one after another. Implemented with strict locking or serializable snapshot isolation (SSI) that detects read/write conflicts and aborts one transaction.

Higher isolation reduces anomalies but increases lock contention, abort rate, and tail latency. Payment ledgers often use READ COMMITTED with explicit SELECT … FOR UPDATE on balance rows — pessimistic locking on the hot path. Analytics dashboards use READ COMMITTED or a read replica with snapshot isolation so they never block writers. Choosing a level is a product decision: would you rather retry a failed checkout or show a user a balance that flickers?

Deadlocks, lock waits, and optimistic concurrency

When transaction A locks row 1 then waits for row 2, while B locks row 2 then waits for row 1, the database detects a deadlock and aborts one victim (usually the younger transaction). Your application must retry — idempotent handlers and clear error codes matter here, the same discipline you apply when building webhook consumers that may receive duplicates.

Optimistic concurrency avoids long-held locks: read a row with a version column, compute the new state in application memory, then UPDATE … WHERE id = ? AND version = ?. If zero rows update, someone else won — retry with fresh data. This scales well when conflicts are rare (profile edits, cart updates) but fails under hot inventory counters unless you shard or use database-native atomic increments.

Indexes change lock granularity. A missing index on a foreign key can escalate a single-row update into a table scan that locks thousands of rows — the kind of surprise that shows up only at Black Friday traffic. That is one reason database indexing and transaction design belong in the same design review.

Durability mechanics: WAL, checkpoints, and replicas

The write-ahead log is an append-only sequence of physical or logical redo records. Before a page change is considered committed, its log entry must reach stable storage. Checkpoints periodically flush dirty buffers and record a recovery starting point so crash recovery does not replay from genesis.

Synchronous replication extends durability across machines: commit waits until a standby has persisted the WAL. You survive primary disk failure without losing acknowledged writes — at the cost of cross-region latency. Asynchronous replication acknowledges faster but can lose the last few seconds of commits if the primary dies — acceptable for some caches, unacceptable for money.

Object storage and managed databases expose durability as SLA tiers ("single AZ" vs "multi-AZ"). Read the fine print: durability is about survival after failure, not backup — you still need point-in-time recovery for operator error and ransomware.

Beyond one database: distributed transactions

ACID across two independent databases (order service on Postgres, inventory on MySQL) is hard. Classic two-phase commit (2PC) prepares both sides, then commits both — but if the coordinator crashes between phases, participants hold locks until timeout. That is why microservices favor sagas: a sequence of local transactions with compensating actions (cancel shipment if payment later fails) instead of one global lock.

Blockchains are a different durability model: consensus replaces the WAL, and "commit" means sufficient confirmations on a replicated ledger. Application code that verifies on-chain payments before crediting a SQL balance is implementing cross-system consistency without 2PC — eventual alignment with explicit reconciliation and idempotent credit handlers.

When you relax ACID on purpose — Cassandra tunable consistency, Dynamo-style last-write-wins — you gain write throughput and partition tolerance at the cost of application complexity. Document which invariants your code re-implements and which you truly give up.

Practical checklist for production transactions

Keep transactions short. No network calls inside BEGIN … COMMIT.
Access rows in a consistent order across code paths to reduce deadlocks.
Use the weakest isolation level that preserves your invariants; escalate deliberately with FOR UPDATE or SERIALIZABLE only where needed.
Make retries safe: unique constraints, idempotency keys, and clear mapping from serialization failure to HTTP 409 — patterns familiar from rate-limited APIs that also shed load under contention.
Monitor lock wait time, deadlock rate, long-running transactions, and replication lag — not just query latency.
Test crash recovery: kill -9 the database mid-commit in staging and verify no torn transfers.