Explainer · 7 June 2026

How skip lists and probabilistic search structures work

A sorted linked list is simple to implement but painfully slow to search — every lookup walks node by node. A skip list adds extra forward pointers at random heights so you can leap across large gaps, turning average search into O(log n) without the rotation logic of red-black or AVL trees. William Pugh introduced the structure in 1990; today it powers Redis sorted sets, parts of LevelDB memtables, and many concurrent in-memory indexes.

The express-lane intuition

Picture a sorted singly linked list from 1 to 16. To find 14 you compare at every step — up to 16 hops. Now add a second level: every other node also points forward two steps. Search starts at the top level, moves right while the next key is still less than the target, then drops down a level and repeats. You skim long runs on the express lane and finish the last few steps on the local lane.

A real skip list generalizes this with many levels. Each inserted node flips a coin to decide its tower height: level 0 always exists (the base list); each subsequent level has probability p (commonly 0.5 or 0.25). Tall towers are rare; most nodes sit on one or two levels. Expected pointer count per node is 1/(1-p), so memory overhead stays modest.

Unlike B-tree indexes that pack keys into fixed-size pages, skip lists are purely pointer-based. That makes them attractive when you want ordered iteration and simple insert/delete without page splits — at the cost of worse cache locality than a well-packed tree stored in a single array.

Search, insert, and delete

Search maintains a cursor at each level from top to bottom. At level L, walk forward while next.key < target. When the next key would overshoot, drop to level L-1 and continue. When you reach level 0, either you are on the node or the key is absent. Expected comparisons: O(log_{1/p} n).

Insert reuses the search path: the array of predecessors at each level tells you exactly where to splice the new node. Generate a random height h, then link the new node between pred[i] and its former successor for all levels i <= h. If h exceeds the current list height, grow the head tower. No rebalancing rotations — the random heights statistically spread towers so express lanes stay useful.

Delete is symmetric: find the node, unlink it from every level it occupies, and shrink the global height if the top level empties. All three operations are expected O(log n); worst-case O(n) is theoretically possible if randomness is adversarial, which is why security-sensitive code uses a cryptographic RNG for heights or switches to deterministic balanced trees.

Why probability instead of balance rules

Red-black trees enforce invariants with recoloring and rotations after every insert. Skip lists trade deterministic balance for expected balance: with probability 0.5, about half of nodes appear at level 1, a quarter at level 2, and so on — mimicking a perfectly balanced binary tree's shape on average.

The analysis is cleaner than tree amortized proofs. Tower height is geometrically distributed; search path length is the sum of independent hops. That simplicity is why skip lists appear in textbooks right after hash tables: they teach how randomization buys structure without complex invariants.

The trade-off is memory fragmentation and pointer chasing. Each node stores a variable-length array of forward pointers. A B-tree node packs many keys per cache line; a skip list node may occupy several cache lines for a tall tower. For disk-backed storage, B-trees and LSM trees win on sequential I/O; skip lists shine in RAM-heavy workloads where code simplicity and ordered scans matter more than raw bandwidth.

Where skip lists show up in production

Redis sorted sets (ZSET) combine a hash table for member lookup with a skip list keyed by score for range queries (ZRANGEBYSCORE, rank operations). The dual structure gives O(1) member fetch and O(log n) ordered access — something a plain hash map cannot offer.

LevelDB and RocksDB memtables keep freshly written keys in a sorted in-memory structure before flushing to immutable SSTables. Early LevelDB memtables used skip lists for their insert-heavy, iterator-friendly profile. Once data lands on disk, the LSM compaction machinery takes over with B-tree-like SSTable blocks.

Concurrent skip lists (Java ConcurrentSkipListMap, some lock-free C++ libraries) use marker nodes or careful CAS loops on individual forward pointers so readers rarely block writers. They are a middle ground between a coarse global lock on a tree and a fully lock-free hash table that cannot maintain order.

Priority schedulers and leaderboards need "top K by score" and range eviction. Skip lists expose O(log n + k) range iteration — the same API shape as an ordered map but with less implementation surface than a tree with parent pointers.

Choosing skip lists vs alternatives

Use a hash map when you only need key lookup with no ordering. Use a B-tree or B+ tree when data lives on disk, you need predictable worst-case latency, or the dataset is huge and cache-friendly packing matters. Use a skip list when you want sorted iteration, range queries, and relatively simple concurrent inserts in memory — and you accept probabilistic bounds.

Tuning parameter p affects the constant factors: higher p means more pointers per node but shorter expected search paths. p = 0.25 is common in Redis-like implementations; p = 0.5 matches textbook diagrams. Measure for your access pattern; micro-benchmarks lie less than Big-O alone.

For duplicate scores or composite keys, store a tie-breaker (member id, insertion sequence) in the comparison function so the ordering is total. Partial orders break skip-list assumptions the same way they break any sorted container.

Implementation pitfalls

  1. Use a seeded RNG with good distribution; rand() % 2 in a tight loop can bias towers on some platforms.
  2. Cap maximum height at log_{1/p}(n) + constant to bound pointer arrays — Redis caps at 32 levels.
  3. On delete, free tower arrays carefully; dangling pointers at higher levels cause subtle use-after-free bugs.
  4. Iterators must tolerate concurrent inserts (weak consistency) or hold locks — document which you provide.
  5. For on-disk persistence, serialize nodes in key order so rebuild does not require re-randomizing heights (optional determinism).

Blockchains and Solana programs rarely embed skip lists on-chain — account size and compute budgets favor flat vectors or program-specific maps — but off-chain indexers ranking NFTs by price, parsing order-book deltas, or maintaining leaderboards for arcade games use exactly this pattern in application memory.

Related on Solana Garden: Hash tables and hash maps explained, Database indexing explained, LSM trees explained, Bloom filters explained, Explainers hub.