Explainer · 7 June 2026
How sorting algorithms work
Sorting looks trivial until it is not. A leaderboard with a million rows,
a database ORDER BY over an unindexed column, a merge step in
external sort, or deduplicating a stream of events all depend on the same
question: how do you rearrange n items into order with minimal
comparisons and memory traffic? Computer science has spent decades refining
the answer. This explainer covers comparison sorts versus integer-key sorts,
what stability means and when you must preserve it, how
quicksort, mergesort, and heapsort differ in practice, and why
O(n log n) is the realistic lower bound for general comparison
sorting — plus what production languages actually ship under the hood.
The sorting problem in one paragraph
Given a sequence of n records, each with a key (the value you sort on) and optional payload (everything else), produce a permutation where keys are non-decreasing (or non-increasing). You may only rearrange elements or copy them through auxiliary storage; the goal is to minimize comparisons, memory writes, and cache misses. For small n, almost anything works. At scale, the constant factors hidden inside Big-O notation — branch prediction, memory locality, and whether your sort is in-place — dominate wall-clock time. That is why understanding algorithm families matters more than memorizing one "best" sort.
Comparison sorts: the dominant family
A comparison sort decides order by comparing pairs of keys with a total-order relation (less-than, equal, greater-than). Bubble sort, insertion sort, selection sort, quicksort, mergesort, and heapsort all belong here. They make no assumptions about key structure beyond comparability.
Quicksort picks a pivot, partitions the array into
elements less than, equal to, and greater than the pivot (Lomuto or Hoare
schemes), then recurses on the sides. Average case is
O(n log n) with good cache behavior because it operates on
contiguous memory. Worst case is O(n²) when pivots are always
the min or max — sorted input with a naive "first element" pivot triggers
this. Production implementations use median-of-three, random pivots, or
switch to heapsort after recursion depth exceeds
2 log n (introsort).
Mergesort divides the array in half, sorts each half
recursively, then merges two sorted runs in linear time. It guarantees
O(n log n) regardless of input order and is naturally
stable if the merge prefers the left run on ties. The cost
is O(n) auxiliary memory (or careful in-place variants with
worse constants). External sorting on disk — when data exceeds RAM — is
essentially mergesort over sorted runs.
Heapsort builds a max-heap in O(n), then
repeatedly extracts the maximum and sifts down. It is in-place,
O(n log n) worst case, but with poor cache locality compared to
quicksort. See our
heaps and priority queues explainer
for the heap invariant that makes extract-max cheap.
Simple insertion sort is O(n²) worst case but
fast for tiny arrays and nearly sorted input. Every serious library uses it
as a base case when subarrays shrink below ~16–32 elements.
Stability: when equal keys must keep their relative order
A sort is stable if two records with equal keys appear in the same relative order in the output as in the input. Stability sounds academic until you sort twice on different columns — for example, sort employees by department, then by salary within department. An unstable sort on the second key scrambles the department grouping from the first pass.
Mergesort and insertion sort are stable. Quicksort and heapsort are not
(without extra bookkeeping). Python's sorted() and Java's
Arrays.sort(Object[]) use stable timsort; Java's primitive
Arrays.sort(int[]) uses a dual-pivot quicksort that is unstable
but faster on numbers. Database query planners often rely on stable sort
semantics when merging pre-sorted index runs.
Why O(n log n) is the comparison-sort floor
The decision-tree argument: each comparison has two outcomes, so a tree of
depth d can distinguish at most 2^d orderings. There
are n! possible permutations of n distinct keys, so you
need at least log₂(n!) = Ω(n log n) comparisons in the worst
case. No comparison-only algorithm beats that asymptotic bound.
That bound does not apply when keys are not arbitrary comparable
objects but bounded integers, fixed-width strings, or floating-point values
with known exponent ranges. Those cases open the door to
non-comparison sorts that beat O(n log n).
Non-comparison sorts: counting, radix, and bucket
Counting sort assumes keys are integers in range
[0, k]. It counts occurrences of each key, computes prefix sums
to find output positions, then scatters records in a stable second pass.
Time is O(n + k). When k is small relative to
n — sorting ages, byte values, or histogram buckets — counting
sort wins decisively.
Radix sort processes keys digit by digit (least-significant
or most-significant digit first), using a stable sub-sort (usually counting
sort) on each digit. Sorting 32-bit integers takes four byte passes:
O(4 · (n + 256)) ≈ O(n). LSD radix sort is stable; MSD radix
sort can be adaptive on sparse key spaces. GPU and columnar analytics engines
use radix variants heavily.
Bucket sort distributes keys into buckets assuming uniform
distribution, sorts buckets individually, then concatenates. It is
O(n) average when assumptions hold and degrades when they do
not.
These algorithms require key structure. You cannot radix-sort arbitrary objects without a fixed-width encoding. Comparison sorts remain the general fallback.
What production runtimes actually use
C++ std::sort — introsort (quicksort + heapsort
fallback + insertion sort base case). Unstable for user types.
Python list.sort / sorted —
timsort, a stable hybrid of mergesort and insertion sort that exploits
existing runs of already-sorted subarrays in real data. Adaptive
behavior makes it fast on partially ordered logs and event streams.
Rust slice::sort — pattern-defeating quicksort
(pdqsort) for unstable sorts; stable sort uses mergesort. Both switch to
insertion sort for small slices.
Go sort.Slice — pattern-defeating quicksort
(similar family to Rust). Go 1.19+ improved worst-case guarantees.
Database engines — in-memory sorts often use quicksort or timsort variants; spill-to-disk paths use external mergesort. Sorting is frequently paired with indexes so the engine reads pre-ordered runs instead of sorting from scratch.
Cache locality and why quicksort often wins in RAM
Asymptotic complexity ignores memory hierarchy. Quicksort and insertion sort scan contiguous arrays with good spatial locality. Heapsort jumps through a binary tree layout that scatters accesses across cache lines. Mergesort copies through auxiliary buffers — extra bandwidth, but predictable sequential writes. On modern CPUs, a theoretically optimal algorithm with poor locality can lose to a "worse" Big-O on realistic n. Our CPU caches explainer covers why layout beats raw operation counts.
Parallel sorts (parallel mergesort, samplesort) partition work across cores but add synchronization overhead. They pay off only when n is large enough that divide-and-conquer subproblems saturate the machine.
Choosing a sort in practice
- General comparable objects, speed priority — library default (introsort / pdqsort). Do not hand-roll quicksort unless you need custom in-place behavior.
- Stability required — timsort, mergesort, or stable radix/counting when keys are numeric with bounded range.
- Small n (< ~20) — insertion sort or sorting network; branch prediction favors simple loops.
- Fixed-width integers or bytes — radix or counting sort; often 2–5× faster than comparison sorts at large n.
- Data on disk — external mergesort; minimize passes and I/O block size.
- Top-K only — partial heap or quickselect
(
O(n)average to find the k-th element) beats full sort.
Common pitfalls
- Using
O(n²)sorts at scale — insertion sort on 100k rows will hurt; know your library's crossover thresholds. - Ignoring stability on multi-key sorts — chaining unstable sorts destroys prior ordering.
- Comparing floats with
<without a tie-break policy — NaN and -0 vs +0 violate total order unless you use a defined comparator. - Assuming sorted input is free — verifying sortedness is
O(n); merging two sorted runs isO(n)but sorting from scratch is not. - Radix sort on unbounded strings — variable-length keys need careful MSD handling or fall back to comparison sort.
Practical checklist
- Identify whether keys are general comparables or bounded integers.
- Decide if stability matters for multi-pass or secondary-key ordering.
- Prefer library sorts (timsort, introsort, pdqsort) over custom implementations.
- For large numeric arrays, benchmark radix sort against your library default.
- When sorting is hot path, add an index or pre-sort at ingest instead of sorting at query time.
- Profile with realistic data — partially sorted inputs favor adaptive algorithms.
Related on Solana Garden: Heaps and priority queues, Hash tables, CPU caches and memory hierarchy, Database indexing guide, Explainers hub.