Guide
Data structures and algorithms fundamentals explained
Every production bug that “only happens at scale” eventually traces back to a
structure or algorithm choice someone made when the dataset was small. A linear scan
over ten rows is fine; a linear scan over ten million rows is an outage. Data
structures are the containers — arrays, hash maps, trees — that organize data
so common operations stay fast. Algorithms are the step-by-step procedures
— sort, search, shortest path — that transform or traverse that data efficiently.
This guide covers the fundamentals that still matter after your framework ships its
own Map and sort(): how to read Big O complexity,
which structure to pick for which access pattern, and where these abstractions show up
in real systems from
database indexes to
game pathfinding.
Big O notation — the language of scaling
Big O describes how runtime or memory grows as input size
n increases, ignoring constants and lower-order terms. It answers:
“If I double the data, does work double, quadruple, or stay flat?”
- O(1) — constant time. Hash map lookup (average case), array index access. Ideal for hot paths.
- O(log n) — logarithmic. Binary search on sorted data, balanced tree operations. Scales to billions of rows.
- O(n) — linear. Scan every element once — filter a list, sum values, find max in unsorted data.
- O(n log n) — efficient comparison sorts (merge sort, quick sort average case). The practical floor for general-purpose sorting.
- O(n²) — quadratic. Nested loops comparing every pair — fine for
n < 500, painful at scale. Bubble sort territory. - O(2ⁿ) — exponential. Brute-force subsets, naive recursive Fibonacci. Usually a signal to add memoization or rethink the approach.
Big O is about asymptotic behavior, not microseconds on your laptop. A
O(n log n) algorithm with tiny constants can beat a O(n) algorithm with huge overhead
for small n. Use Big O to compare approaches at the scale you expect in
production — then profile. Also distinguish worst case from
average case: hash maps are O(1) average but O(n) worst case under
adversarial collisions; quick sort is O(n log n) average but O(n²) worst case without
good pivot selection.
Linear structures: arrays, linked lists, stacks, queues
Arrays and dynamic arrays
An array stores elements in contiguous memory. Index access is O(1);
insertion or deletion in the middle is O(n) because elements shift. Dynamic arrays
(JavaScript arrays, Python lists, Java ArrayList) amortize append to
O(1) by doubling capacity when full. Arrays excel when you read by index, iterate
sequentially, or need cache-friendly memory layout. They struggle when you insert
or delete frequently in the middle.
Linked lists
A linked list chains nodes with pointers. Insert/delete at a known node is O(1); finding that node is O(n). Random access is O(n). Linked lists appear less in application code today — dynamic arrays and deque abstractions cover most cases — but the concept underpins LRU caches, memory allocators, and functional persistent data structures. If you never need random access and insert at head/tail constantly, a doubly linked list or deque can still win.
Stacks and queues
A stack is last-in-first-out (LIFO): push and pop from one end. Call stacks, undo history, depth-first search, and parsing nested expressions all use stacks. A queue is first-in-first-out (FIFO): enqueue at back, dequeue from front. Job schedulers, breadth-first search, and message brokers are queue-shaped. Both are typically O(1) for their core operations when backed by arrays or linked structures with head/tail pointers.
Hash tables — the workhorse of modern software
A hash table (hash map, dictionary, Map,
Object in JS) maps keys to values via a hash function that spreads keys
across buckets. Average-case get, set, and delete are O(1). That is why virtually
every language ships one in the standard library.
Under the hood: the hash function converts a key to an integer bucket index; collisions (two keys landing in the same bucket) are resolved by chaining (linked lists in each bucket) or open addressing (probe for the next empty slot). Load factor triggers rehashing — resize and redistribute — which is O(n) but amortized over many inserts.
Production gotchas: keys must be hashable and stable. Mutable objects as keys are dangerous — if a key's hash changes after insertion, lookups fail silently. In distributed systems, consistent hashing maps keys to nodes so adding/removing servers minimizes data movement. For ordered iteration by key, use a sorted map (tree-based) instead — O(log n) operations but keys stay sorted.
Trees — hierarchical data and ordered lookup
Binary search trees (BST)
A binary search tree stores nodes where left children are smaller and right children are larger. Search, insert, and delete are O(log n) in a balanced tree, O(n) if the tree degenerates into a linked list (sorted insert order into a plain BST). Self-balancing variants — AVL trees, red-black trees — maintain O(log n) guarantees by rotating nodes after inserts and deletes.
Heaps
A binary heap is a complete binary tree satisfying the heap property: in a min-heap, every parent is smaller than its children. Insert and extract-min are O(log n); peek-min is O(1). Heaps power priority queues, Dijkstra's algorithm, “top K” streaming problems, and task schedulers where the highest-priority item must surface quickly without full sorting.
B-trees and database indexes
B-trees (and B+ trees) generalize balanced trees for disk: wide nodes with many keys minimize disk seeks. That is exactly what database B-tree indexes use — O(log n) point lookups and range scans on indexed columns. Understanding trees explains why compound indexes matter and why random UUID primary keys can fragment index pages compared to sequential IDs.
Graphs — relationships, routing, and dependencies
A graph is nodes (vertices) connected by edges. Social networks, package routes, dependency graphs, and blockchain transaction DAGs are all graph problems. Graphs can be directed or undirected, weighted or unweighted, cyclic or acyclic.
Core traversals:
- Breadth-first search (BFS) — explore layer by layer using a queue. Finds shortest path in unweighted graphs. O(V + E).
- Depth-first search (DFS) — dive deep using a stack (or recursion). Detects cycles, topological sort for build order. O(V + E).
Weighted shortest paths use Dijkstra's algorithm (non-negative weights, priority queue) or Bellman-Ford (handles negative edges). Game pathfinding often uses A* — Dijkstra plus a heuristic that biases search toward the goal — covered in our A* pathfinding guide. Represent graphs as adjacency lists (sparse, memory-efficient) or adjacency matrices (dense, O(1) edge lookup).
Essential algorithms: search and sort
Binary search
Binary search finds a target in a sorted array by halving the search space each step — O(log n). Preconditions: data must be sorted (or you maintain a sorted view). Variants find insertion points, first/last occurrence, and answer “smallest value ≥ target” on sorted arrays. Binary search bugs are legendary (off-by-one on mid calculation); use a consistent invariant: “answer is in range [lo, hi).”
Sorting
General comparison sorts cannot beat O(n log n). Know these families:
- Merge sort — stable, guaranteed O(n log n), O(n) extra space. Good for external sorting and parallel merge.
- Quick sort — in-place average O(n log n), cache-friendly, used by many standard libraries. Worst case O(n²) without randomization or median-of-three pivots.
- Counting / radix sort — O(n) when key range is bounded (integers in a small range, fixed-length strings). Not general-purpose but blazing when applicable.
In practice, call your language's built-in sort — it is tuned for real data — and focus on whether you need stability (preserve equal-element order) or can sort indices instead of moving heavy objects.
Choosing the right structure — a decision framework
| Access pattern | Structure | Typical complexity |
|---|---|---|
| Lookup by key, no ordering | Hash map | O(1) average |
| Lookup by key, sorted iteration | Balanced BST / sorted map | O(log n) |
| Random access by index | Array | O(1) |
| Frequent insert/delete at ends | Deque / linked list | O(1) |
| Next highest-priority item | Heap / priority queue | O(log n) insert/extract |
| Range queries on sorted data | B-tree index / sorted array + binary search | O(log n + k) for k results |
| Relationships and paths | Graph + BFS/DFS/Dijkstra | O(V + E) to O(E log V) |
Start with the simplest structure that meets your access pattern. A hash map beats a hand-rolled tree 90% of the time. Reach for exotic structures — tries, skip lists, bloom filters — when measurement proves you need them: prefix search (tries), probabilistic membership with false positives (bloom filters), or concurrent lock-free queues under extreme contention.
Production checklist
- Estimate n at peak — the size that matters is production peak, not dev fixture size.
- Profile before optimizing — Big O guides choice; flame graphs confirm bottlenecks.
- Watch hidden O(n²) — nested loops, repeated
.includes()inside loops, ORM N+1 queries (see our N+1 guide). - Prefer library implementations — standard sort, hash map, and heap are battle-tested; only hand-roll when profiling demands it.
- Consider memory, not just time — adjacency matrix vs list, materialized view vs on-the-fly join. Space-time tradeoffs show up in caching layers and Redis design.
- Test edge cases — empty input, single element, duplicates, already-sorted input (quick sort worst case), hash collisions under load.
Key takeaways
- Data structures encode access patterns; picking the wrong one turns O(log n) into O(n) at scale.
- Big O compares growth rates — use it to reason about peak load, then verify with profiling.
- Hash maps are the default for key lookup; trees and B-trees when you need ordering or range scans.
- Graphs model relationships; BFS/DFS/Dijkstra/A* solve path and dependency problems.
- Binary search and efficient sorts are foundational — know when preconditions (sorted data) hold.
- Interview trivia and production engineering converge here: the same structures power indexes, caches, schedulers, and game AI.
Related reading
- Database indexing explained — B-trees in production: compound keys, EXPLAIN plans, and write trade-offs
- Software design patterns explained — structural patterns that compose objects and subsystems
- A* pathfinding explained — graph search with heuristics in games and robotics
- N+1 query problem explained — when algorithmic complexity hides inside ORM lazy loading