Explainer · 7 June 2026

How cache replacement policies work

Caches exist because fast storage is scarce. Your CPU keeps recently used instructions in L1; the operating system pins hot pages in RAM; Redis holds session tokens in memory; a CDN edge stores popular assets closer to users. Every layer faces the same decision when a new item arrives and no slot is free: which resident entry should be discarded? That decision rule is the replacement policy (also called an eviction policy). A good policy maximizes cache hits — lookups that find data already present — and minimizes expensive misses to slower tiers. A bad policy evicts exactly what you need next. This explainer walks through the classic algorithms, how hardware and software approximate them, and how to match a policy to your access pattern.

Hits, misses, and the working set

A cache hit returns data without touching a slower backing store (DRAM, disk, origin server, database). A cache miss fetches from that slower tier and usually inserts the result into the cache, possibly evicting something else. Performance is dominated by the hit ratio and the cost gap between tiers — an L1 miss that hits L2 is cheap; a page fault that reads from NVMe is expensive.

The set of addresses a program touches repeatedly over a window is its working set. If the working set fits entirely in the cache, hit ratio approaches 100%. Real workloads exceed cache capacity, so replacement policy determines which part of the working set stays resident. Policies assume temporal locality (recently used data will be used again soon) and spatial locality (nearby addresses tend to be accessed together). Most algorithms exploit temporal locality; line size and prefetch exploit spatial locality.

FIFO and random: simple baselines

First-in, first-out (FIFO) evicts the oldest inserted entry regardless of use. Implementation is a queue — O(1) insert and evict. FIFO is easy but often wrong: an item inserted early may still be hot, while a recently inserted cold item survives because it has not reached the queue head yet. FIFO can suffer from Belady's anomaly: adding cache capacity sometimes increases miss rate under certain access patterns, because insertion order does not track usefulness.

Random replacement picks a victim uniformly. Surprisingly competitive in some hardware caches because it avoids pathological orderings and needs no metadata per line. It ignores locality entirely — fine when access is already uniform, poor for skewed hot sets.

LRU: recency as a proxy for future use

Least recently used (LRU) evicts the entry whose last access is oldest. The intuition: if you used it recently, you will probably use it again. Exact LRU on N entries requires tracking access order — traditionally a doubly linked list plus hash map for O(1) touch and evict. Every read or write moves the entry to the "most recent" side.

Exact LRU is common in software caches (in-process memoization, some ORM second-level caches) but expensive at scale. CPU L1/L2 caches and many database buffer pools use LRU approximations instead:

Clock (second-chance) — entries sit in a circular buffer with a reference bit. The hand sweeps; cleared bits mean "recently used, give another chance"; set bits get cleared and the entry survives one more lap. Linux's active/inactive page lists are a multi-generational variant.
NRU (not recently used) — OS sets access bits on page table entries; periodic scans approximate LRU without moving pages.
Segmented LRU (SLRU) — probationary and protected segments; new entries start in probation and promote on second hit.

LRU excels on recency-heavy workloads (session data, repeated API reads) but fails on scanning patterns — a one-pass sequential read fills the cache with cold lines that evict genuinely hot data. One-time full-table scans are a classic LRU killer in database buffer pools.

LFU: frequency over recency

Least frequently used (LFU) evicts the entry accessed the fewest times (ties broken by age). LFU protects long-lived hot keys that LRU might drop during a brief burst of unrelated traffic — a configuration blob read thousands of times per hour but not in the last millisecond.

Naive LFU keeps counters forever, so entries touched once at startup never leave. Production systems add aging or decay: counters halve every window, or a window LFU (W-TinyLFU in Caffeine) tracks frequency in a compact sketch and admits new entries only if they are hotter than the current minimum. Redis 4.0+ offers volatile-lfu and allkeys-lfu alongside LRU variants when maxmemory is set.

LFU shines when a small set of keys dominates traffic (popular product IDs, leaderboards). It struggles when "frequency" and "recency" diverge — a key accessed heavily yesterday but never today still ranks high until counters decay.

ARC and adaptive policies

Adaptive Replacement Cache (ARC), developed at IBM Research, maintains two LRU lists — one for recently seen items (T1, cache hits) and one for items seen only once (T2, ghost history from B1/B2 metadata lists that store keys without values). A target parameter p shifts capacity between T1 and T2 based on whether recent evictions from B1 or B2 would have been hits. ARC self-tunes between recency- and frequency-favored behavior without manual tuning.

Variants like CAR (clock with ARC-style adaptation) reduce metadata overhead. Managed caches (Memcached does not ship ARC natively; Caffeine and some storage engines do) benefit when workloads mix scans, hot sets, and one-hit wonders. The cost is bookkeeping — ARC is more complex than pure LRU and harder to reason about in incident postmortems.

Where replacement runs in the stack

The same problem appears at every level, with different constraints:

CPU caches — fixed associativity (4-way, 8-way sets); random or pseudo-LRU within each set; evictions are invisible to software except through performance counters.
OS page cache — reclaims clean pages cheaply; dirty pages must be written back. Pressure triggers the kernel's kswapd reclaim path; policy interacts with swappiness and cgroup memory limits.
Application caches — Redis, Memcached, in-process Guava/Caffeine maps; explicit TTLs often combine with eviction (expire stale keys and evict when full).
CDN edge caches — LRU or LFU at the PoP plus HTTP Cache-Control TTLs; origin shield layers add another tier.
Database buffer pool — InnoDB and PostgreSQL use LRU variants with mid-point insertion (new pages start in the middle, not the head) to protect from sequential scans.

TTL expiration is related but distinct: it removes entries by age regardless of popularity. Many production caches use TTL + size-bound eviction — TTL handles freshness, LRU/LFU handles capacity when inserts outpace expirations.

Thrashing and when policy is not enough

If the working set exceeds cache size sustainedly, no policy saves you — miss rate stays high and the system spends more time moving data than using it. In virtual memory this is thrashing: constant page faults and disk I/O with little forward progress. The fix is reducing memory footprint, adding RAM, or changing the algorithm (smaller pages, better data structures), not switching from LRU to LFU.

Watch for one-hit wonders polluting large caches (CDN caching every unique query string) and for lock contention on global LRU lists in multi-threaded servers — sharded caches (one LRU per CPU or hash bucket) trade perfect global optimality for scalability.

Choosing a policy in practice

Skewed hot set, stable popularity — LFU or W-TinyLFU (Caffeine, Redis LFU).
Recency-driven sessions and API responses — LRU or segmented LRU.
Mixed unknown workload — ARC or adaptive clock; measure hit ratio under production traffic.
Hardware-like simplicity — random or FIFO only when metadata cost exceeds benefit.
Scan-heavy batch jobs — bypass cache for the scan (Postgres seq_scan hints, dedicated read replicas) rather than tuning eviction alone.

Always measure with representative traffic. Micro-benchmarks that fit in L3 cache lie. Track hit ratio, evicted-bytes, and tail latency when maxmemory or cgroup limits approach saturation.

Common pitfalls

Assuming LRU is always optimal — optimal replacement (Belady's MIN) needs future knowledge; LRU is a heuristic.
Caching unbounded key cardinality — unique URLs or user IDs evict the entire hot set on first access each.
Ignoring stampede on eviction — mass expiry or flush triggers thundering herd to origin; use singleflight or stale-while-revalidate.
Confusing TTL with LRU — a key can be LRU-eligible but not yet expired, or expired but still occupying memory until accessed.
Global LRU under contention — one lock on a million-entry list becomes the bottleneck before memory does.

Practical checklist

Characterize access pattern: recency-heavy, frequency-heavy, or scan-dominated.
Set explicit cache size limits (maxmemory, JVM heap, CDN quota).
Pick eviction policy to match pattern; prefer adaptive libraries when unsure.
Combine TTL for freshness with size eviction for capacity.
Monitor hit ratio and eviction rate; alert before sustained miss storms.
Isolate scan workloads from interactive caches where possible.