Explainer · 7 June 2026

How sorting algorithms work

Sorting looks trivial until it is not. A leaderboard with a million rows, a database ORDER BY over an unindexed column, a merge step in external sort, or deduplicating a stream of events all depend on the same question: how do you rearrange n items into order with minimal comparisons and memory traffic? Computer science has spent decades refining the answer. This explainer covers comparison sorts versus integer-key sorts, what stability means and when you must preserve it, how quicksort, mergesort, and heapsort differ in practice, and why O(n log n) is the realistic lower bound for general comparison sorting — plus what production languages actually ship under the hood.

The sorting problem in one paragraph

Given a sequence of n records, each with a key (the value you sort on) and optional payload (everything else), produce a permutation where keys are non-decreasing (or non-increasing). You may only rearrange elements or copy them through auxiliary storage; the goal is to minimize comparisons, memory writes, and cache misses. For small n, almost anything works. At scale, the constant factors hidden inside Big-O notation — branch prediction, memory locality, and whether your sort is in-place — dominate wall-clock time. That is why understanding algorithm families matters more than memorizing one "best" sort.

Comparison sorts: the dominant family

A comparison sort decides order by comparing pairs of keys with a total-order relation (less-than, equal, greater-than). Bubble sort, insertion sort, selection sort, quicksort, mergesort, and heapsort all belong here. They make no assumptions about key structure beyond comparability.

Quicksort picks a pivot, partitions the array into elements less than, equal to, and greater than the pivot (Lomuto or Hoare schemes), then recurses on the sides. Average case is O(n log n) with good cache behavior because it operates on contiguous memory. Worst case is O(n²) when pivots are always the min or max — sorted input with a naive "first element" pivot triggers this. Production implementations use median-of-three, random pivots, or switch to heapsort after recursion depth exceeds 2 log n (introsort).

Mergesort divides the array in half, sorts each half recursively, then merges two sorted runs in linear time. It guarantees O(n log n) regardless of input order and is naturally stable if the merge prefers the left run on ties. The cost is O(n) auxiliary memory (or careful in-place variants with worse constants). External sorting on disk — when data exceeds RAM — is essentially mergesort over sorted runs.

Heapsort builds a max-heap in O(n), then repeatedly extracts the maximum and sifts down. It is in-place, O(n log n) worst case, but with poor cache locality compared to quicksort. See our heaps and priority queues explainer for the heap invariant that makes extract-max cheap.

Simple insertion sort is O(n²) worst case but fast for tiny arrays and nearly sorted input. Every serious library uses it as a base case when subarrays shrink below ~16–32 elements.

Stability: when equal keys must keep their relative order

A sort is stable if two records with equal keys appear in the same relative order in the output as in the input. Stability sounds academic until you sort twice on different columns — for example, sort employees by department, then by salary within department. An unstable sort on the second key scrambles the department grouping from the first pass.

Mergesort and insertion sort are stable. Quicksort and heapsort are not (without extra bookkeeping). Python's sorted() and Java's Arrays.sort(Object[]) use stable timsort; Java's primitive Arrays.sort(int[]) uses a dual-pivot quicksort that is unstable but faster on numbers. Database query planners often rely on stable sort semantics when merging pre-sorted index runs.

Why O(n log n) is the comparison-sort floor

The decision-tree argument: each comparison has two outcomes, so a tree of depth d can distinguish at most 2^d orderings. There are n! possible permutations of n distinct keys, so you need at least log₂(n!) = Ω(n log n) comparisons in the worst case. No comparison-only algorithm beats that asymptotic bound.

That bound does not apply when keys are not arbitrary comparable objects but bounded integers, fixed-width strings, or floating-point values with known exponent ranges. Those cases open the door to non-comparison sorts that beat O(n log n).

Non-comparison sorts: counting, radix, and bucket

Counting sort assumes keys are integers in range [0, k]. It counts occurrences of each key, computes prefix sums to find output positions, then scatters records in a stable second pass. Time is O(n + k). When k is small relative to n — sorting ages, byte values, or histogram buckets — counting sort wins decisively.

Radix sort processes keys digit by digit (least-significant or most-significant digit first), using a stable sub-sort (usually counting sort) on each digit. Sorting 32-bit integers takes four byte passes: O(4 · (n + 256)) ≈ O(n). LSD radix sort is stable; MSD radix sort can be adaptive on sparse key spaces. GPU and columnar analytics engines use radix variants heavily.

Bucket sort distributes keys into buckets assuming uniform distribution, sorts buckets individually, then concatenates. It is O(n) average when assumptions hold and degrades when they do not.

These algorithms require key structure. You cannot radix-sort arbitrary objects without a fixed-width encoding. Comparison sorts remain the general fallback.

What production runtimes actually use

C++ std::sort — introsort (quicksort + heapsort fallback + insertion sort base case). Unstable for user types.

Python list.sort / sorted — timsort, a stable hybrid of mergesort and insertion sort that exploits existing runs of already-sorted subarrays in real data. Adaptive behavior makes it fast on partially ordered logs and event streams.

Rust slice::sort — pattern-defeating quicksort (pdqsort) for unstable sorts; stable sort uses mergesort. Both switch to insertion sort for small slices.

Go sort.Slice — pattern-defeating quicksort (similar family to Rust). Go 1.19+ improved worst-case guarantees.

Database engines — in-memory sorts often use quicksort or timsort variants; spill-to-disk paths use external mergesort. Sorting is frequently paired with indexes so the engine reads pre-ordered runs instead of sorting from scratch.

Cache locality and why quicksort often wins in RAM

Asymptotic complexity ignores memory hierarchy. Quicksort and insertion sort scan contiguous arrays with good spatial locality. Heapsort jumps through a binary tree layout that scatters accesses across cache lines. Mergesort copies through auxiliary buffers — extra bandwidth, but predictable sequential writes. On modern CPUs, a theoretically optimal algorithm with poor locality can lose to a "worse" Big-O on realistic n. Our CPU caches explainer covers why layout beats raw operation counts.

Parallel sorts (parallel mergesort, samplesort) partition work across cores but add synchronization overhead. They pay off only when n is large enough that divide-and-conquer subproblems saturate the machine.

Choosing a sort in practice

General comparable objects, speed priority — library default (introsort / pdqsort). Do not hand-roll quicksort unless you need custom in-place behavior.
Stability required — timsort, mergesort, or stable radix/counting when keys are numeric with bounded range.
Small n (< ~20) — insertion sort or sorting network; branch prediction favors simple loops.
Fixed-width integers or bytes — radix or counting sort; often 2–5× faster than comparison sorts at large n.
Data on disk — external mergesort; minimize passes and I/O block size.
Top-K only — partial heap or quickselect (O(n) average to find the k-th element) beats full sort.

Common pitfalls

Using O(n²) sorts at scale — insertion sort on 100k rows will hurt; know your library's crossover thresholds.
Ignoring stability on multi-key sorts — chaining unstable sorts destroys prior ordering.
Comparing floats with < without a tie-break policy — NaN and -0 vs +0 violate total order unless you use a defined comparator.
Assuming sorted input is free — verifying sortedness is O(n); merging two sorted runs is O(n) but sorting from scratch is not.
Radix sort on unbounded strings — variable-length keys need careful MSD handling or fall back to comparison sort.

Practical checklist

Identify whether keys are general comparables or bounded integers.
Decide if stability matters for multi-pass or secondary-key ordering.
Prefer library sorts (timsort, introsort, pdqsort) over custom implementations.
For large numeric arrays, benchmark radix sort against your library default.
When sorting is hot path, add an index or pre-sort at ingest instead of sorting at query time.
Profile with realistic data — partially sorted inputs favor adaptive algorithms.