Guide

Consistent hashing explained

You run a distributed cache with four Redis nodes. Traffic is routed by hash(key) % 4. A fifth node joins the cluster for Black Friday capacity — and suddenly every key maps to a different node. Cache hit rates collapse, databases absorb a thundering herd, and your on-call engineer spends the holiday watching latency graphs climb. That is the cost of naive modulo hashing: it works until the denominator changes. Consistent hashing solves the remapping problem by placing both keys and servers on a circular hash space. When nodes are added or removed, only keys adjacent to the change move — typically about 1/n of the total for n nodes. This guide explains the hash ring, virtual nodes, where consistent hashing appears in production (load balancers, distributed caches, database shards), alternative algorithms, hotspot mitigation, and a checklist before you ship it.

The modulo hashing problem

The simplest way to partition keys across N servers is remainder division: compute a stable hash of the key — SHA-256 truncated to 64 bits, MurmurHash3, xxHash — and take hash % N. Server index 2 owns every key whose hash ends in ...02, ...06, and so on. The mapping is deterministic, evenly distributed (for a good hash function), and trivial to implement.

The failure mode is cluster membership change. When N goes from 4 to 5, roughly 80% of keys land on a different server. A cache cluster loses most of its warm data in one event. A sharded database must migrate terabytes before the new layout is stable. Even shrinking the cluster — removing a failed node and recomputing with N - 1 — triggers the same wholesale reshuffle.

What you want instead is minimal remapping: when one node joins or leaves, only keys that must move to preserve balance should move. Consistent hashing achieves this by treating the hash space as a ring rather than a fixed set of buckets.

The hash ring

Imagine the output of your hash function as integers from 0 to 232 - 1 arranged in a circle. Each physical server is assigned one or more positions on the ring by hashing its identifier — hostname, IP, or a synthetic vnode label. To route a key, hash it to a position on the same ring, then walk clockwise until you hit the first server position. That server owns the key.

Adding a server inserts new positions on the ring. Only keys that previously mapped to the next server clockwise — but now fall between the new server and that successor — change owners. Removing a server deletes its positions; keys that pointed there walk forward to the next surviving position. In expectation, adding one node to a cluster of n remaps about 1/(n+1) of keys — not (n-1)/n as modulo would.

Implementation detail: store server positions in a sorted structure — a balanced tree or sorted array — and use binary search to find the successor of a key's hash. Lookup is O(log V) where V is the total number of positions (physical nodes times virtual nodes). For in-process routing this is negligible; for millions of lookups per second, prefer jump consistent hash (discussed below) or cache the successor table with incremental updates.

Virtual nodes (vnodes)

Assigning one ring position per physical server sounds elegant but creates load imbalance. Hash positions are random; one server might accidentally own 40% of the arc while another owns 12%. Uneven arcs mean uneven request volume, memory use, and — in databases — hotspot shards that bottleneck the whole cluster.

Virtual nodes fix this by giving each physical server many positions on the ring — commonly 100 to 200 vnodes per machine. Keys still map to vnodes, but vnodes map back to physical hosts. More positions smooth the arc distribution: the law of large numbers pulls per-server ownership toward 1/n of the ring.

Trade-offs to know:

  • More vnodes = better balance but larger routing tables and slower membership updates.
  • Fewer vnodes = simpler but higher variance — acceptable for homogenous small clusters, risky at scale.
  • Heterogeneous hardware can assign more vnodes to larger machines so they own proportionally more of the ring without custom weight fields.

Amazon Dynamo, Apache Cassandra, and many managed cache layers use vnode counts as a tunable knob. Document your chosen count and the rebalance behavior when it changes — altering vnode multiplicity still moves keys.

Where consistent hashing is used

Distributed caches and CDNs

Memcached clients, Redis Cluster slot migration, and CDN edge selection all need stable key-to-node affinity. When a cache node fails, consistent hashing redistributes only its slice of keys to survivors — limiting cold-cache stampedes compared to modulo. Pair with TTL and cache-aside patterns so remapped keys repopulate gracefully rather than hammering origin.

Load balancers and service meshes

Layer 7 proxies sometimes pin sessions or cache partitions to backends via consistent hashing on a cookie, user ID, or URL path. Unlike naive IP hash, adding a backend during a scale-out does not invalidate every existing mapping — important when backends hold local state or connection pools. See load balancing algorithms for when least-connections beats hash-based routing.

Database and storage sharding

Horizontally partitioned databases map primary-key ranges or hash buckets to shard nodes. Consistent hashing (or range splitting on the same ring) limits data movement during resharding. Cross-shard queries remain expensive — hashing solves which node owns this row, not join optimization. Plan migrations with dual-write or backfill pipelines; the algorithm reduces data in flight, it does not eliminate it.

Object storage and distributed filesystems

Systems like Swift and Ceph place objects on storage nodes via consistent hashing variants, often with replication walking clockwise to place replica 2 and 3 on distinct successors. Replication factor and failure domain rules layer on top of the basic ring walk.

Alternatives and variants

Rendezvous hashing (HRW)

Highest Random Weight hashing assigns each key-node pair a combined hash score; the node with the highest score wins the key. No explicit ring structure — membership changes still remap only keys tied to the departing node, with excellent balance. Lookup cost is O(n) per key over all nodes, which is fine for tens of backends but expensive for thousands. Some systems use HRW for small replica sets and consistent hashing for large pools.

Jump consistent hash

Google’s jump consistent hash (2014) maps keys to buckets with minimal remapping and O(ln n) time with no memory for a ring table — only the bucket count. It requires bucket indices 0..n-1 (not arbitrary node names), so you maintain an indirection table from index to host. Popular in data pipelines and storage systems that frequently resize bucket counts.

Maglev hashing

Google's Maglev load balancer builds a lookup table from consistent hashing principles with near-perfect balance and fast lookup — used when connection tables must be computed once and shared across hardware. Heavier to implement than a simple ring but valuable at datacenter scale.

Hotspots, replication, and failure

Consistent hashing spreads keys uniformly in hash space — not uniformly in business importance. A viral product ID, a celebrity user, or a shared configuration key can still concentrate traffic on one node. Mitigations:

  • Salting hot keys — store key#0, key#1, key#2 on different nodes and aggregate reads.
  • Local LRU on top — edge caches absorb repeated reads of the same key.
  • Separate metadata service for truly global hot keys rather than routing them through the hash layer.

When a node fails, its keys fail over to successors. If replicas were placed on the next k clockwise nodes, a single failure may overload those successors — handoff throttling and gradual rebalance spread recovery load. In distributed systems with quorum reads, ensure failover does not violate replica placement rules (same rack, same availability zone).

Production checklist

  1. Confirm modulo hashing is the actual pain — if the cluster size is static, simpler partitioning may suffice.
  2. Choose a fast, stable hash function (xxHash, MurmurHash3); avoid cryptographic hashes unless you need collision resistance against adversaries.
  3. Set vnode count per physical node; load-test balance variance at expected cluster sizes.
  4. Implement atomic membership views — clients must agree which ring version is current during rolling changes.
  5. Plan key migration on scale events: background copy, dual-read, cutover, and delete from old node.
  6. Monitor per-node key count, memory, and QPS — imbalance alerts fire before users notice.
  7. Document behavior when a node is down vs removed — temporary failure should not always trigger full rebalance.
  8. Load-test add/remove node events and measure cache miss ratio and origin load during transition.
  9. For client-side routing libraries, version-pin the ring protocol — mixed client versions during deploys cause split routing.
  10. Evaluate jump or rendezvous hashing if your backend count is small or bucket indices are numeric.

Key takeaways

  • Modulo hashing remaps almost every key when server count changes; consistent hashing remaps only a slice.
  • The hash ring walks clockwise from key hash to the first server position — simple to reason about and widely implemented.
  • Virtual nodes are essential for load balance at scale; one position per server is rarely enough.
  • Use cases span caches, balancers, shards, and storage — any system that routes keys to a changing set of backends.
  • Hot keys and failover storms still need application-level mitigation; the algorithm is necessary but not sufficient.

Related reading