Guide
SQL query optimization explained
Your analytics dashboard runs a single SQL query against 2.3 million order rows.
Locally it finishes in 200 ms; in production it hangs for 12 seconds and times out
under concurrent load. Someone adds an index on created_at and latency
drops to 800 ms — until the next feature ships a WHERE LOWER(email) LIKE '%@corp.com'
filter that forces a sequential scan again. Query optimization is not a one-time index
sprinkle; it is the discipline of writing SQL the optimizer can execute efficiently,
reading EXPLAIN output to verify assumptions, and fixing structural
patterns — joins, pagination, aggregations — before you reach for sharding or a
read replica. This guide covers plan reading, SARGable predicates, join strategies,
pagination traps, aggregation tuning, ORM pitfalls, and how optimization connects to
indexing,
the N+1 problem,
and SQL fundamentals.
What query optimization is — and what it is not
Query optimization is the process of reducing the work a database must do to answer a question: fewer rows scanned, fewer random disk reads, less memory for sorts, fewer round trips from the application. The database optimizer chooses among equivalent execution plans; your job is to give it good options and verify the choice with EXPLAIN.
It is not:
- Blind index creation — indexes help specific access patterns; wrong indexes slow writes and confuse the planner. See the dedicated indexing guide.
- Caching alone — Redis hides slow queries until cache misses or invalidation storms; fix the query first.
- Throwing hardware at bad SQL — bigger instances delay pain; quadratic joins and OFFSET pagination still collapse at scale.
Optimization sits between schema design and application architecture. A normalized schema with proper foreign keys, selective indexes, and well-shaped queries beats denormalization until you have measured proof otherwise.
Reading EXPLAIN plans
Every serious tuning session starts with EXPLAIN (ANALYZE, BUFFERS)
in PostgreSQL or EXPLAIN FORMAT=JSON in MySQL. You are looking for
four signals:
- Access method — Seq Scan (full table read) vs Index Scan / Index Only Scan (targeted lookup). Seq scans on large hot tables are the first red flag.
- Row estimates vs actual — PostgreSQL shows
rows=… loops=…estimated vs actual. Large mismatches mean stale statistics; runANALYZEbefore concluding the plan is wrong. - Cost and timing — relative cost units guide the planner;
ANALYZEadds wall-clock time per node. Find the node consuming 80% of total time. - Buffers — shared hits vs reads reveal cache effectiveness. Heavy read I/O on a repeated query suggests missing indexes or working-set larger than RAM.
Common plan nodes and what they mean:
- Nested Loop — for each row in the outer table, probe the inner. Fast when the inner side is index-backed and outer cardinality is small; disastrous when both sides are large and unindexed.
- Hash Join — builds a hash table on the smaller input, probes with the larger. Good for equi-joins without useful indexes; memory-heavy.
- Merge Join — both inputs sorted on join keys. Efficient when indexes already deliver ordered rows.
- Sort / HashAggregate — GROUP BY and DISTINCT materialize intermediate results. Expensive on wide rows or high cardinality groups.
Capture plans in staging with production-like data volume. A query fast on 10k rows may hide a nested loop that explodes at 10M.
SARGable predicates and filter pushdown
A predicate is SARGable (Search ARGument-able) when the optimizer can use an index to narrow rows. The classic anti-pattern:
-- Not SARGable: function on indexed column
WHERE YEAR(created_at) = 2026
-- SARGable: range on bare column
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01'
Other SARGability killers:
WHERE LOWER(email) = 'user@example.com'— wrap the column; use a functional index or store normalized email separately.WHERE status IN (SELECT …)on uncorrelated subqueries — sometimes rewritten to JOINs; sometimes not. Test both shapes.- Leading wildcard LIKE —
LIKE '%foo'cannot use a B-tree; consider trigram indexes (PostgreSQLpg_trgm) or full-text search for prefix-agnostic patterns. - OR across different columns — often becomes sequential scan;
rewrite as
UNION ALLof two indexed lookups when selective enough.
Push filters as early as possible. In JOIN queries, place the most selective conditions on indexed columns before joining wide tables. Subqueries in SELECT lists execute per row — move them to JOINs or lateral joins when the optimizer does not fold them.
Join optimization
Join performance depends on cardinality, index coverage, and join order. Practical rules:
- Index foreign keys — every FK column used in JOIN or WHERE should have an index unless write volume explicitly forbids it.
- Join smaller first — filter driving tables before joining fact tables with millions of rows. A 50-row user lookup joined to orders beats scanning all orders then filtering user_id.
- Avoid SELECT * — wide rows inflate hash tables, sort buffers, and network transfer. Project only needed columns; covering indexes become viable.
- Semi-joins vs duplicates — INNER JOIN to a non-unique dimension multiplies rows; use EXISTS or DISTINCT when you only need presence checks.
The N+1 pattern is a join problem disguised as many round trips: one query for parents, N queries for children. Fix with eager JOINs, batched IN queries, or DataLoader-style batching — each trades memory for latency differently.
Aggregations, sorting, and window functions
GROUP BY on high-cardinality keys (user_id on an events table) forces
large hash aggregates. Mitigations:
- Pre-aggregate in materialized views or nightly rollups for dashboards.
- Filter before grouping —
WHERE created_at > now() - interval '7 days'shrinks working set dramatically. - Index columns in GROUP BY and ORDER BY when the query pattern is stable.
ORDER BY without a supporting index requires a sort step. If you only
need the top K rows, prefer ORDER BY … LIMIT K with an index matching
the sort key over sorting the entire result set in application memory.
Window functions (ROW_NUMBER(), LAG(), running totals)
scan partitions defined by PARTITION BY. Ensure partition keys are
indexed when windows run over large tables; consider whether a self-join on ranked
subqueries is cheaper for your engine and statistics.
Pagination: OFFSET vs keyset
LIMIT 20 OFFSET 100000 is deceptively simple and catastrophically slow.
The database must scan and discard 100,000 rows before returning 20. Page depth
grows linearly with offset — users on page 5,000 hammer your database.
Keyset pagination (cursor-based) uses the last seen sort key:
-- Keyset: O(1) regardless of page depth (with index on created_at, id)
SELECT id, created_at, title
FROM posts
WHERE (created_at, id) < ('2026-06-01', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;
Tradeoffs: no random page jumps without storing cursors; composite sort keys need tie-breaker columns (typically primary key). For admin UIs requiring arbitrary pages, consider search indexes or precomputed rank columns instead of deep OFFSET.
ORM and application-layer pitfalls
ORMs generate SQL you do not see until production. Recurring slow-query sources:
- Lazy loading — triggers N+1; enable query logging in staging and assert query count per request.
- Implicit cartesian products — missing JOIN conditions in complex relation graphs.
- Over-fetching — loading full entity graphs when DTO projections suffice.
- Parameter sniffing — cached plan optimized for one parameter
value performs badly for another. Mitigate with plan guides, statement-level
preparation, or occasional
REINDEX/ statistics refresh. - Connection churn — opening a connection per query adds latency; pool correctly via connection pooling.
Enable slow-query logging (PostgreSQL log_min_duration_statement,
MySQL slow query log) with thresholds tuned to your SLO. Aggregate by normalized
query shape, not raw text with different literals.
Common anti-patterns
- Optimizing without measurement — guessing indexes from column names instead of reading EXPLAIN on real data.
- Correlated subqueries in SELECT — execute once per output row; rewrite to JOINs when the planner does not decorrelate.
- COUNT(*) on huge tables — exact counts require full scans; use approximate counts, cached totals, or rolling counters for UI badges.
- SELECT DISTINCT as band-aid — fixes duplicate rows from bad joins but pays sort/hash cost; fix the join instead.
- Hint obsession — optimizer hints lock fragile plans; prefer schema and query fixes; hints as last resort with documented rationale.
- Ignoring vacuum and bloat — dead tuple bloat slows scans; monitor autovacuum and table bloat on PostgreSQL.
Production checklist
- Reproduce slowness in staging with production-scale data and concurrent load.
- Run EXPLAIN ANALYZE; identify the highest-cost node; verify row estimates.
- Check for SARGable rewrites before adding indexes.
- Confirm FK and filter columns have appropriate indexes; avoid redundant duplicates.
- Replace OFFSET pagination with keyset cursors on high-traffic list endpoints.
- Audit ORM query count per request; eliminate N+1 patterns.
- Enable slow-query logging with alerting on p95 latency by query fingerprint.
- Refresh statistics after large bulk loads; re-check plans.
- Document optimized queries and the EXPLAIN baseline in code review or runbooks.
- Re-test after major version upgrades — planner behavior changes.
Key takeaways
- EXPLAIN ANALYZE is the source of truth — tune what the plan shows, not what you assume the query does.
- SARGable predicates on bare indexed columns unlock index scans; functions and leading wildcards block them.
- Join shape and filter order matter as much as indexes — reduce cardinality before joining wide fact tables.
- Keyset pagination replaces OFFSET for deep lists at scale.
- Optimization pairs with indexing, pooling, and fixing N+1 — not replacing them.
Related reading
- Database indexing explained — B-trees, compound keys, covering indexes, and when indexes hurt writes
- N+1 query problem explained — lazy loading, eager JOINs, batched queries, and DataLoader patterns
- SQL fundamentals explained — joins, aggregations, transactions, and parameterized queries
- Database connection pooling explained — pool sizing, idle timeouts, and avoiding connection storms