Guide

Data structures and algorithms fundamentals explained

Every production bug that “only happens at scale” eventually traces back to a structure or algorithm choice someone made when the dataset was small. A linear scan over ten rows is fine; a linear scan over ten million rows is an outage. Data structures are the containers — arrays, hash maps, trees — that organize data so common operations stay fast. Algorithms are the step-by-step procedures — sort, search, shortest path — that transform or traverse that data efficiently. This guide covers the fundamentals that still matter after your framework ships its own Map and sort(): how to read Big O complexity, which structure to pick for which access pattern, and where these abstractions show up in real systems from database indexes to game pathfinding.

Big O notation — the language of scaling

Big O describes how runtime or memory grows as input size n increases, ignoring constants and lower-order terms. It answers: “If I double the data, does work double, quadruple, or stay flat?”

O(1) — constant time. Hash map lookup (average case), array index access. Ideal for hot paths.
O(log n) — logarithmic. Binary search on sorted data, balanced tree operations. Scales to billions of rows.
O(n) — linear. Scan every element once — filter a list, sum values, find max in unsorted data.
O(n log n) — efficient comparison sorts (merge sort, quick sort average case). The practical floor for general-purpose sorting.
O(n²) — quadratic. Nested loops comparing every pair — fine for n < 500, painful at scale. Bubble sort territory.
O(2ⁿ) — exponential. Brute-force subsets, naive recursive Fibonacci. Usually a signal to add memoization or rethink the approach.

Big O is about asymptotic behavior, not microseconds on your laptop. A O(n log n) algorithm with tiny constants can beat a O(n) algorithm with huge overhead for small n. Use Big O to compare approaches at the scale you expect in production — then profile. Also distinguish worst case from average case: hash maps are O(1) average but O(n) worst case under adversarial collisions; quick sort is O(n log n) average but O(n²) worst case without good pivot selection.

Linear structures: arrays, linked lists, stacks, queues

Arrays and dynamic arrays

An array stores elements in contiguous memory. Index access is O(1); insertion or deletion in the middle is O(n) because elements shift. Dynamic arrays (JavaScript arrays, Python lists, Java ArrayList) amortize append to O(1) by doubling capacity when full. Arrays excel when you read by index, iterate sequentially, or need cache-friendly memory layout. They struggle when you insert or delete frequently in the middle.

Linked lists

A linked list chains nodes with pointers. Insert/delete at a known node is O(1); finding that node is O(n). Random access is O(n). Linked lists appear less in application code today — dynamic arrays and deque abstractions cover most cases — but the concept underpins LRU caches, memory allocators, and functional persistent data structures. If you never need random access and insert at head/tail constantly, a doubly linked list or deque can still win.

Stacks and queues

A stack is last-in-first-out (LIFO): push and pop from one end. Call stacks, undo history, depth-first search, and parsing nested expressions all use stacks. A queue is first-in-first-out (FIFO): enqueue at back, dequeue from front. Job schedulers, breadth-first search, and message brokers are queue-shaped. Both are typically O(1) for their core operations when backed by arrays or linked structures with head/tail pointers.

Hash tables — the workhorse of modern software

A hash table (hash map, dictionary, Map, Object in JS) maps keys to values via a hash function that spreads keys across buckets. Average-case get, set, and delete are O(1). That is why virtually every language ships one in the standard library.

Under the hood: the hash function converts a key to an integer bucket index; collisions (two keys landing in the same bucket) are resolved by chaining (linked lists in each bucket) or open addressing (probe for the next empty slot). Load factor triggers rehashing — resize and redistribute — which is O(n) but amortized over many inserts.

Production gotchas: keys must be hashable and stable. Mutable objects as keys are dangerous — if a key's hash changes after insertion, lookups fail silently. In distributed systems, consistent hashing maps keys to nodes so adding/removing servers minimizes data movement. For ordered iteration by key, use a sorted map (tree-based) instead — O(log n) operations but keys stay sorted.

Trees — hierarchical data and ordered lookup

Binary search trees (BST)

A binary search tree stores nodes where left children are smaller and right children are larger. Search, insert, and delete are O(log n) in a balanced tree, O(n) if the tree degenerates into a linked list (sorted insert order into a plain BST). Self-balancing variants — AVL trees, red-black trees — maintain O(log n) guarantees by rotating nodes after inserts and deletes.

Heaps

A binary heap is a complete binary tree satisfying the heap property: in a min-heap, every parent is smaller than its children. Insert and extract-min are O(log n); peek-min is O(1). Heaps power priority queues, Dijkstra's algorithm, “top K” streaming problems, and task schedulers where the highest-priority item must surface quickly without full sorting.

B-trees and database indexes

B-trees (and B+ trees) generalize balanced trees for disk: wide nodes with many keys minimize disk seeks. That is exactly what database B-tree indexes use — O(log n) point lookups and range scans on indexed columns. Understanding trees explains why compound indexes matter and why random UUID primary keys can fragment index pages compared to sequential IDs.

Graphs — relationships, routing, and dependencies

A graph is nodes (vertices) connected by edges. Social networks, package routes, dependency graphs, and blockchain transaction DAGs are all graph problems. Graphs can be directed or undirected, weighted or unweighted, cyclic or acyclic.

Core traversals:

Breadth-first search (BFS) — explore layer by layer using a queue. Finds shortest path in unweighted graphs. O(V + E).
Depth-first search (DFS) — dive deep using a stack (or recursion). Detects cycles, topological sort for build order. O(V + E).

Weighted shortest paths use Dijkstra's algorithm (non-negative weights, priority queue) or Bellman-Ford (handles negative edges). Game pathfinding often uses A* — Dijkstra plus a heuristic that biases search toward the goal — covered in our A* pathfinding guide. Represent graphs as adjacency lists (sparse, memory-efficient) or adjacency matrices (dense, O(1) edge lookup).

Essential algorithms: search and sort

Binary search

Binary search finds a target in a sorted array by halving the search space each step — O(log n). Preconditions: data must be sorted (or you maintain a sorted view). Variants find insertion points, first/last occurrence, and answer “smallest value ≥ target” on sorted arrays. Binary search bugs are legendary (off-by-one on mid calculation); use a consistent invariant: “answer is in range [lo, hi).”

Sorting

General comparison sorts cannot beat O(n log n). Know these families:

Merge sort — stable, guaranteed O(n log n), O(n) extra space. Good for external sorting and parallel merge.
Quick sort — in-place average O(n log n), cache-friendly, used by many standard libraries. Worst case O(n²) without randomization or median-of-three pivots.
Counting / radix sort — O(n) when key range is bounded (integers in a small range, fixed-length strings). Not general-purpose but blazing when applicable.

In practice, call your language's built-in sort — it is tuned for real data — and focus on whether you need stability (preserve equal-element order) or can sort indices instead of moving heavy objects.

Choosing the right structure — a decision framework

Access pattern	Structure	Typical complexity
Lookup by key, no ordering	Hash map	O(1) average
Lookup by key, sorted iteration	Balanced BST / sorted map	O(log n)
Random access by index	Array	O(1)
Frequent insert/delete at ends	Deque / linked list	O(1)
Next highest-priority item	Heap / priority queue	O(log n) insert/extract
Range queries on sorted data	B-tree index / sorted array + binary search	O(log n + k) for k results
Relationships and paths	Graph + BFS/DFS/Dijkstra	O(V + E) to O(E log V)

Start with the simplest structure that meets your access pattern. A hash map beats a hand-rolled tree 90% of the time. Reach for exotic structures — tries, skip lists, bloom filters — when measurement proves you need them: prefix search (tries), probabilistic membership with false positives (bloom filters), or concurrent lock-free queues under extreme contention.

Production checklist

Estimate n at peak — the size that matters is production peak, not dev fixture size.
Profile before optimizing — Big O guides choice; flame graphs confirm bottlenecks.
Watch hidden O(n²) — nested loops, repeated .includes() inside loops, ORM N+1 queries (see our N+1 guide).
Prefer library implementations — standard sort, hash map, and heap are battle-tested; only hand-roll when profiling demands it.
Consider memory, not just time — adjacency matrix vs list, materialized view vs on-the-fly join. Space-time tradeoffs show up in caching layers and Redis design.
Test edge cases — empty input, single element, duplicates, already-sorted input (quick sort worst case), hash collisions under load.

Key takeaways

Data structures encode access patterns; picking the wrong one turns O(log n) into O(n) at scale.
Big O compares growth rates — use it to reason about peak load, then verify with profiling.
Hash maps are the default for key lookup; trees and B-trees when you need ordering or range scans.
Graphs model relationships; BFS/DFS/Dijkstra/A* solve path and dependency problems.
Binary search and efficient sorts are foundational — know when preconditions (sorted data) hold.
Interview trivia and production engineering converge here: the same structures power indexes, caches, schedulers, and game AI.