Explainer · 7 June 2026

How mutexes, locks, and concurrency primitives work

Modern servers run dozens of threads on the same memory. Without coordination, two threads can read-modify-write the same variable in overlapping steps and silently lose updates. A mutex (mutual exclusion lock) is the simplest fix: only one thread may hold it at a time, so critical sections run as if they were atomic — even though the hardware offers no such guarantee by default.

Why shared memory needs explicit rules

Consider a bank account balance stored as an integer. Thread A reads 100, adds 50, writes 150. Thread B reads 100 (before A writes), subtracts 30, writes 70. The true balance should be 120; you get 70 or 150 depending on who wins the last write. This is a data race — undefined behavior in C/C++ and a source of Heisenbugs everywhere else.

The OS scheduler can preempt any thread between any two instructions. That is the point of preemptive multitasking explained in our CPU process scheduling guide — throughput improves, but shared-state invariants need protection. Locks, atomics, and message passing are the three main strategies; this article focuses on the first two because they appear in nearly every production codebase.

Databases face the same problem at a higher level: ACID transactions are essentially locks (or MVCC snapshots) around rows and pages. Understanding mutex semantics in application code makes database isolation levels less mysterious.

Mutexes: the baseline primitive

A mutex has two operations: lock() and unlock(). The first thread to lock succeeds immediately. A second thread blocks — the kernel parks it until the owner unlocks. The protected code between lock and unlock is the critical section; keep it short. Holding a mutex while waiting on network I/O serializes your entire thread pool and tanks throughput.

POSIX pthread_mutex_t, C++ std::mutex, Rust Mutex<T>, Java synchronized, and Go's sync.Mutex all implement the same contract with different ergonomics. Rust goes further: the type system refuses to access guarded data unless you hold the lock guard — a compile-time reminder that races are logic errors, not runtime surprises.

Reentrant (recursive) mutexes allow the same thread to lock twice without deadlocking itself. They are convenient for callback-heavy code but easy to misuse: they hide the fact that your "small" critical section secretly calls back into code that needs the same lock, ballooning contention. Prefer non-recursive mutexes and refactor call graphs when possible.

Spinlocks, RW locks, and semaphores

A spinlock busy-waits in a tight loop instead of yielding to the kernel. On multicore machines with very short critical sections (updating a few counters inside the kernel itself), spinning avoids the microseconds of scheduler overhead. On single-core or long sections, spinning wastes CPU and makes things worse. Rule of thumb: spinlocks belong in kernel and embedded code, rarely in application servers.

A read-write lock (RW lock) allows many concurrent readers OR one writer. It wins when reads dominate — caches, configuration snapshots, routing tables. Writers still exclude everyone. Upgrade from read lock to write lock is a classic deadlock footgun; many APIs forbid it or require releasing the read lock first.

A semaphore generalizes the idea to a counter of permits. Initialize with n; each acquire() decrements (blocking at zero), each release() increments. A mutex is a semaphore with n = 1. Semaphores model bounded queues ("at most 100 in-flight requests") and thread pools better than a bare mutex. Counting semaphores coordinate producer-consumer pipelines in event-driven architectures where backpressure matters.

Condition variables pair with mutexes: a thread waits on cond_wait until another thread signals that a predicate became true (queue non-empty, buffer has space). Always wait in a loop re-checking the predicate — spurious wakeups are allowed by the spec.

Atomics and lock-free structures

Hardware provides atomic read-modify-write instructions: compare-and-swap (CAS), fetch-and-add, and memory fences that order visibility across cores. C11/C++11 std::atomic, Java volatile plus java.util.concurrent.atomic, and Rust atomics let you increment a counter or publish a pointer without a mutex when the operation is a single machine word.

Lock-free algorithms guarantee system-wide progress: at least one thread completes its operation in finite steps even if others are stalled. Wait-free is stronger — every thread finishes in bounded steps. Lock-free queues and stacks built from CAS loops appear in high-frequency trading, game engines, and language runtimes. They are harder to prove correct than mutex-backed code; use them when profiling shows lock contention on a hot path, not by default.

Memory ordering matters: relaxed, acquire, release, and seq_cst control which writes other cores may see and when. Getting this wrong produces bugs that disappear under gdb. Start with seq_cst (the default in many APIs) until you understand the happens-before graph you need.

Deadlocks, livelocks, and ordering discipline

A deadlock needs four ingredients (Coffman conditions): mutual exclusion, hold-and-wait, no preemption, and circular wait. Break any one and deadlocks cannot form. Production code usually breaks circular wait with a global lock ordering: if you ever need mutex A and B, always acquire A before B. Document the order; enforce it in code review.

Try-lock patterns acquire with timeout or non-blocking mode and back off — useful when lock order cannot be total. Deadlock detection (wait-for graphs) appears in databases; application servers rarely implement it and instead prefer timeouts plus crash recovery.

Livelock is softer: threads keep yielding to each other without making progress (two people squeezing past in a hallway). Backoff with random jitter fixes many livelock cases. Starvation is when one thread never acquires the lock; fair mutexes (FIFO queue of waiters) cost throughput but bound worst-case wait time.

Alternatives: message passing and CRDTs

Locks are not the only answer. Actor models and channels (Erlang, Go goroutines with unbuffered channels) forbid shared mutable state by design — each worker owns its data; messages copy or move ownership. Fewer races, but more allocation and harder global invariants.

For replicated state across nodes, locks on one machine do not help. CRDTs merge concurrent updates without a central lock; databases use two-phase commit or optimistic concurrency instead. Pick the coordination layer that matches your failure model — local mutex, distributed transaction, or commutative merge.

Practical checklist for production code

Minimize critical section size — compute outside the lock, copy results in.
Never call foreign code (callbacks, plugins) while holding locks you do not control.
Prefer one lock per data structure over one giant global lock.
Use RW locks only when profiling shows read-heavy contention; they are slower than mutexes under write load.
Reach for atomics for counters and flags; reach for lock-free only after mutex profiling fails.
Establish and document lock ordering across modules before the deadlock appears in production at 3 a.m.

Solana programs on-chain have a different concurrency model — single-threaded per account with explicit account locks enforced by the runtime — but off-chain indexers, RPC proxies, and settlement services are ordinary multithreaded servers. The same mutex discipline applies there.