Explainer · 7 June 2026

How dynamic programming and memoization work

A naive recursive solution to Fibonacci recomputes fib(2) thousands of times before it finishes fib(40). A memo table turns that exponential tree into linear work. That idea — remember answers to subproblems you will ask again — is the heart of dynamic programming (DP). DP is not a single algorithm but a design technique for optimization problems where (1) an optimal solution decomposes into optimal solutions of smaller pieces, and (2) those smaller pieces repeat. This explainer walks through the two properties that make DP applicable, the difference between top-down memoization and bottom-up tabulation, how to define state for classic problems, and how to tell when greedy or plain recursion is the better tool.

Two properties: optimal substructure and overlapping subproblems

Every DP problem satisfies both conditions. Miss either one and caching sub-answers will not help — or will be wrong.

Optimal substructure means the best answer to the full problem can be built from best answers to strictly smaller subproblems. In the 0/1 knapsack (each item taken at most once), if an optimal packing of capacity W includes item i, then the remaining items must optimally fill the leftover capacity W - weight[i]. If a sub-choice were suboptimal, swapping it for a better sub-packing would improve the whole — contradiction. Not every recursive problem has this property: longest simple path in a general graph does not, because optimal subpaths can share vertices and invalidate each other.

Overlapping subproblems means the recursion tree revisits the same arguments many times. Fibonacci is the textbook case: fib(n) calls fib(n-1) and fib(n-2), and both eventually need fib(n-2) again. Merge sort, by contrast, divides into disjoint halves — no overlap — so memoization buys nothing; divide-and-conquer without a table is correct. DP shines when the same state (i, w) or (i, j) appears in exponentially many branches.

Top-down memoization vs bottom-up tabulation

Both approaches fill a table of subproblem answers; they differ in evaluation order and call structure.

Top-down (memoization) keeps the natural recursive formulation and wraps it with a cache — typically a hash map or a 2D array indexed by state variables. You compute only states the recursion actually reaches (useful when the state space is sparse). The control flow mirrors the problem statement, which helps while learning. Downsides: recursion depth limits in some languages, function-call overhead, and harder cache locality than tight loops.

Bottom-up (tabulation) chooses an iteration order that guarantees dependencies are ready before they are read — often increasing indices so smaller subproblems are solved first. You may avoid recursion entirely and sometimes shrink space by keeping only the previous row or two (rolling arrays). Tabulation makes total work and memory explicit up front, which helps in embedded or real-time code paths.

Asymptotic time is usually identical when both visit the same states. Pick based on clarity, stack depth, and whether you need all states or only a fraction.

Defining state: the real design step

The hardest part of DP is not the loop — it is choosing what a table cell means. A state is a minimal tuple of variables that fully describes a subproblem instance. Common patterns:

Single index — dp[i] = best answer using the first i elements (or ending at position i). House robber, climbing stairs.
Two indices — dp[i][j] on prefixes of two strings (longest common subsequence), or dp[i][w] for knapsack with first i items and capacity w.
Interval / range — dp[l][r] = optimal cost to merge or parse the subarray from l to r (matrix chain multiplication, burst balloons).
Bitmask — dp[mask] where bits encode which subset of items or cities has been used (traveling salesman on small n).

Once state is fixed, the transition writes how a cell relates to neighbors: take or skip an item, extend a match, split an interval. Wrong state definitions produce correct-looking code that returns nonsense — always sanity-check on tiny hand examples before scaling.

Classic examples (and what each teaches)

Fibonacci — Overlap without interesting choices; memo turns O(2^n) into O(n). Closed form exists, but the pattern generalizes.

Coin change (minimum coins) — State dp[amount] = fewest coins to make amount. Transition tries each coin denomination and takes 1 + dp[amount - coin]. Unbounded knapsack variant (coins reusable). If denominations are canonical (e.g. US coins), greedy works — but arbitrary denominations require DP.

0/1 knapsack — dp[i][w] = max value using items 1..i with capacity w. Transition: skip item i or take it if it fits. Space can roll to one dimension by iterating w descending so each item is used at most once per row.

Longest common subsequence (LCS) — dp[i][j] on prefixes of strings A and B. Match extends diagonally; mismatch takes max of dropping a character from either side. Basis for diff tools, bioinformatics alignments, and edit-distance variants.

Shortest paths — Bellman-Ford and Floyd-Warshall are DP on graphs: relax edges in an order that guarantees shortest paths to intermediates are known. Dijkstra is greedy with a priority queue and needs non-negative edges; it is not universal DP.

Space optimization and reconstruction

A full n × W knapsack table may be gigabytes when W is large. Sometimes only the final optimum is needed — then keep one row and overwrite carefully. When you must reconstruct the chosen items, store parent pointers or a bitset alongside the value table.

Pseudopolynomial time — Knapsack is O(nW) in the numeric value of capacity, not input bit length. Huge W makes DP impractical; approximation algorithms or integer programming may be required. Recognizing pseudopolynomial vs truly polynomial (O(n log n) sorting) prevents shipping a solution that passes small tests and times out in production.

When DP is the wrong tool

No overlapping subproblems — merge sort, quicksort partitions, tree traversals on unique nodes. Recursion alone suffices.
Greedy is provably optimal — activity selection with compatible intervals, Huffman coding, Dijkstra with non-negative weights. Greedy is simpler and often faster; prove or cite the exchange argument before reaching for a table.
State explosion — TSP needs 2^n bitmask states; beyond ~20 cities, DP is theoretical only. Use heuristics or ILP solvers.
No optimal substructure — longest path in a graph with cycles, problems where combining locally optimal pieces creates global conflicts.

DP in production systems

Interview-style DP also appears in real pipelines, often disguised:

Resource allocation — ad pacing, budget caps, and slot assignment with discrete choices map to knapsack-like DP or integer programs.
Sequence alignment — diffing, plagiarism detection, and genomic matching use edit-distance DP at scale (with banding heuristics when strings are long).
Parser optimization — CYK and some chart parsers build parse tables over spans — interval DP on grammar rules.
Cache / prefetch policies — not always labeled DP, but optimal replacement on small caches can be modeled as Markov decision processes solved by tabulation.

Production code usually adds constraints: streaming inputs, bounded memory, parallel row updates. The state definition discipline from textbook DP transfers directly to designing those bounded tables.

Common pitfalls

Undefined base cases — dp[0] and empty-prefix rows must be explicit; off-by-one in indices corrupts the entire table.
Wrong iteration order — 0/1 knapsack space optimization requires descending w; ascending reuses the same item multiple times.
Confusing memoization with caching arbitrary functions — if subproblems do not overlap, you only add memory overhead.
Integer overflow — accumulating paths or combinatorial counts without modular arithmetic or big integers.
Skipping proof of optimal substructure — especially when transitions involve min/max over several predecessors.

Practical checklist

Can you write a naive recursive solution? Draw the recursion tree — do states repeat?
State the optimal substructure claim in one sentence; try to falsify it with a counterexample.
Define dp[...] meaning before writing transitions.
List base cases and iteration order (or memo lookup order).
Estimate time as (number of states) × (work per transition); check pseudopolynomial traps.
Compare against a greedy heuristic on sample data — if greedy always wins, prove why.