How Garbage Collection Algorithms Work

Why automatic reclamation exists

In languages like C and Rust (without a GC), the programmer calls free or relies on ownership rules to release memory. That is precise and predictable, but easy to get wrong: use-after-free, double-free, and memory leaks from forgotten pointers are decades-old bug classes.

Managed runtimes trade manual control for safety. You allocate freely; the collector finds unreachable objects and returns their space to a free list. The cost shows up elsewhere — CPU cycles during collection, extra memory reserved as a buffer, and occasional stop-the-world pauses when the collector must scan stacks and registers without the mutator threads moving pointers underneath it.

Understanding GC is not academic. A web API that allocates a new string on every request may trigger minor collections thousands of times per second. A real-time WebSocket feed can miss delivery deadlines if a major collection pauses all threads at the wrong moment. Backend engineers tune heap sizes, choose collector algorithms, and design object lifetimes with the same care they apply to database indexing or rate limiting.

Reachability: what "garbage" actually means

GC does not ask "was this variable ever used again?" It asks a graph question: starting from roots — thread stacks, CPU registers, static globals, JNI handles, interned strings — can we follow pointers and reach this object? If not, the object is garbage regardless of whether your code still holds a stale reference in a forgotten map entry.

This reachability model is why circular linked structures can be collected even when every node points to another node in the ring, as long as nothing outside the ring points in. It is also why memory leaks in GC languages still happen: if a long-lived root (a static cache, a closure, an event listener) keeps a subgraph reachable, the collector correctly preserves it forever.

The heap is therefore a directed graph of objects and references. Collection algorithms differ in how they walk that graph, when they walk it, and whether they move objects to compact memory or leave them in place.

Reference counting: simple, but incomplete alone

The oldest automatic scheme increments a counter on every new reference and decrements on every drop. When the count hits zero, memory is reclaimed immediately — no global pause, predictable latency.

The fatal flaw is cycles. Two objects referencing each other never reach zero even when the rest of the program forgets them. Python combines reference counting (for prompt cleanup of acyclic objects) with a periodic cycle-detecting collector. Swift and Objective-C use autorelease pools and weak references to mitigate retention graphs. Pure reference counting without a cycle pass leaks cyclic structures silently.

Reference counting also pays atomic increment/decrement cost on every pointer assignment in multithreaded code. That is why most high-throughput server runtimes prefer tracing collectors for the main heap, sometimes using reference counting only for specialized subsystems (COM in Windows, some native interop layers).

Mark-sweep: the tracing baseline

Mark-sweep is the canonical tracing algorithm, dating to the 1960s. It runs in two phases:

Mark — Starting from roots, depth-first or breadth-first traverse the object graph. Set a mark bit on every reachable object.
Sweep — Walk the entire heap. Unmarked objects are dead; return their memory to a free list. Clear mark bits on survivors for the next cycle.

Mark-sweep is simple and does not move objects, so pointer addresses stay stable — important for C extensions and memory-mapped I/O. The downsides are well known: the sweep pass must touch every heap slot even if almost nothing died (cost scales with heap size, not garbage volume), and free memory fragments into holes that may be too small for the next large allocation without a separate compaction pass.

Modern collectors rarely use naive mark-sweep alone on the full heap. They layer generational partitioning, concurrent marking, and compaction on top. But every advanced collector still contains a mark phase at its core.

Copying collectors and the generational hypothesis

Copying GC divides memory into two semispaces. Allocation happens in one; when it fills, live objects are copied to the other semispace and pointers are updated. The old semispace is discarded wholesale — collection and compaction happen in one pass. The cost is halving usable memory and rewriting every outgoing pointer during copy.

The generational hypothesis, observed empirically across decades of programs, states that most objects die young. A newborn temporary string or HTTP response buffer is unlikely to survive the next few collections. Runtimes exploit this by splitting the heap:

Young generation (nursery) — small, collected frequently with a fast copying collector. Minor GC pauses are short because the nursery is tiny.
Old generation (tenured) — objects that survive enough minor collections are promoted here and collected rarely with a heavier mark-sweep-compact or concurrent algorithm.

Java's G1 and ZGC, .NET's generational GC, and JavaScript's V8 all use this split. Promotion has a cost — copying live data out of the nursery — but it concentrates expensive full-heap work on the small fraction of objects that actually live long. Tuning -XX:NewSize or Node's --max-old-space-size is really tuning how much short-lived garbage you absorb cheaply before paying for a major collection.

Concurrent and incremental GC: chasing low pause times

Stop-the-world mark-sweep on a 32 GB heap can pause every application thread for hundreds of milliseconds — unacceptable for interactive UIs and tight SLAs. Concurrent collectors run marking (and sometimes sweeping) alongside mutator threads, using only brief safepoints for root scanning.

The hard problem is the mutator can store a pointer to a new object into an already-scanned old object while the marker is mid-traversal, hiding live data from the collector (a lost object bug). The standard fix is tri-color marking with a write barrier:

White — not yet visited (candidate garbage).
Gray — visited, but its outgoing references not yet scanned.
Black — visited and fully scanned (no white children).

The invariant: no black object may point directly to a white object. When the mutator writes a white reference into a black object, the write barrier shades the referenced object gray (or logs the edge for the marker to revisit). Incremental collectors slice marking into small time quanta between mutator work, trading slightly more total CPU for smoother latency.

Go 1.5+ uses a concurrent mark-sweep collector with write barriers. Java ZGC and Shenandoah target sub-millisecond pauses on large heaps via colored pointers and concurrent relocation. JavaScript V8 combines generational scavenging in the young generation with incremental marking in the old generation, which is why long-running Node processes occasionally show small but frequent GC activity in profilers rather than one giant pause.

Trade-offs: throughput, latency, and memory

No collector optimizes everything. The design space has three competing axes:

Throughput — total work completed per CPU second. Batch jobs and ETL pipelines prefer collectors that pause rarely and scan fast, accepting occasional long stops.
Latency (pause time) — worst-case or p99 stop duration. Trading apps, games, and voice/video stacks need concurrent or incremental collectors even if they use 10–30% more CPU overall.
Memory overhead — copying and generational schemes need headroom (semispaces, remembered sets, card tables for cross-generational pointers). Running with heap nearly full triggers constant collection — thrashing that looks like a memory leak in metrics but is really insufficient reservation.

Compaction fights fragmentation by sliding live objects together, but moving objects requires updating every pointer that references them — fine inside a managed runtime, painful at JNI boundaries. Mark-sweep without compaction is simpler but fragments over months of uptime.

Choosing a collector is an engineering decision, not a moral one. Rust and C++ avoid GC pauses by pushing lifetime discipline to the compiler. Go picks a concurrent collector because goroutine-heavy servers cannot tolerate long stops. Python and Ruby accept global interpreter lock interactions with their collectors. The right answer depends on your allocation rate, object lifetime distribution, and whether a 50 ms pause breaks user trust.

Practical signals that GC is your bottleneck

Before rewriting algorithms, confirm GC is the culprit. Useful signals:

CPU profiles showing high time in GC, scavenge, or runtime allocation functions.
Latency spikes aligned with collection logs (GC pause lines in Go, gc events in Chrome DevTools, JVM GC logs).
Heap size climbing to the cap then flatlining — the collector is keeping up but barely; any traffic spike triggers thrash.
Excessive short-lived allocations — parsing JSON into new objects per request, string concatenation in loops, boxing primitives in hot paths.

Mitigations follow directly from the algorithms above: reuse buffers and object pools for hot paths, prefer value types or stack allocation where the language allows, increase heap headroom if you are memory-rich and CPU-poor, switch to a low-latency collector (G1 → ZGC, Node flags for incremental marking), or reduce allocation rate so minor collections stay cheap. Sometimes the win is structural — streaming parsers instead of loading entire payloads into heap strings — the same instinct that drives compact structures like Bloom filters instead of giant hash tables.

How garbage collection algorithms work