Guide
Materialized views explained
Your analytics dashboard runs the same expensive JOIN and GROUP BY on every page load — scanning millions of rows while product engineers wait. A regular SQL view is just a saved query definition; the database still executes it from scratch each time. A materialized view stores the query result on disk like a table, so reads become fast index lookups at the cost of staleness until you refresh. Used well, materialized views sit between raw tables and external caches: closer to the data, transactional where your engine supports it, and ideal for read-heavy aggregates that tolerate minutes (not milliseconds) of lag. This guide covers view vs materialized view mechanics, refresh strategies, indexing the snapshot, trade-offs against Redis caching and read replicas, engine differences, a worked e-commerce rollup example, a decision table, and a production checklist.
Views vs materialized views: definition vs snapshot
A standard view is syntactic sugar — a named SELECT stored in the catalog.
When you query SELECT * FROM daily_revenue_view, the planner
inlines the underlying SQL and runs it against base tables every time.
Complexity, I/O, and lock contention scale with traffic.
A materialized view persists the result set physically. PostgreSQL, Oracle, SQL Server, and some warehouses treat it as a first-class object; MySQL has no native materialized view (teams emulate with summary tables plus ETL). After creation, reads hit the stored snapshot:
CREATE MATERIALIZED VIEW daily_revenue AS
SELECT
date_trunc('day', created_at) AS day,
region,
COUNT(*) AS order_count,
SUM(total_cents) AS revenue_cents
FROM orders
WHERE status = 'paid'
GROUP BY 1, 2;
CREATE UNIQUE INDEX ON daily_revenue (day, region);
The dashboard now queries daily_revenue — thousands of rows
instead of millions — and benefits from the unique index on
(day, region). The base orders table can grow
without slowing this read path, as long as refresh keeps the snapshot
acceptably current.
When materialized views earn their keep
Materialized views shine when all of these hold:
- Expensive, repeatable reads — aggregations, window functions, multi-table JOINs run constantly.
- Tolerable staleness — dashboards, internal reports, search facets, recommendation features updated every N minutes.
- Stable query shape — the SELECT definition changes rarely; ad-hoc analytics belong in a warehouse.
- Database-native freshness — you want one source of truth inside the RDBMS rather than duplicating logic in an app cache.
Poor fits: real-time balances, inventory counts that must match checkout, or queries that change shape weekly. For those, optimize the live query with indexes and SARGable predicates, or stream incremental updates through an outbox or change-data-capture pipeline.
Refresh strategies: manual, scheduled, concurrent, incremental
Full refresh (manual or scheduled)
The simplest model: re-run the entire SELECT and replace the snapshot. PostgreSQL:
REFRESH MATERIALIZED VIEW daily_revenue;
Without CONCURRENTLY, PostgreSQL takes an exclusive lock — readers
block until refresh completes. Schedule off-peak via cron or pg_cron. For
large snapshots, full refresh CPU and I/O can spike; monitor replication lag
if the view lives on a primary that feeds replicas.
Concurrent refresh (PostgreSQL)
Requires at least one unique index on the materialized view:
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue;
Readers keep using the old snapshot while a new one builds, then swap. Slower than blocking refresh but essential for dashboards queried 24/7. Still a full recompute — not incremental.
Incremental and trigger-based maintenance
When only a fraction of base rows change between refreshes, full recompute wastes work. Patterns:
- Summary tables + triggers — on INSERT/UPDATE to
orders, upsert the affected(day, region)bucket. Classic MySQL approach. - Incremental materialized views — some warehouses (BigQuery, Snowflake) support automatic incremental refresh when the defining query qualifies.
- CDC-driven rollups — Debezium or logical replication feeds a consumer that patches aggregates; more moving parts but near-real-time.
Incremental paths demand rigorous idempotency and late-arriving data handling (orders backdated to yesterday must adjust yesterday's bucket).
Refresh scheduling and SLAs
Document an explicit freshness SLA: "Revenue dashboard may lag up to 15 minutes." Align cron frequency with that SLA plus refresh duration. If refresh takes 12 minutes, a 15-minute schedule leaves almost no margin — widen the window or optimize the defining query before adding hardware.
Indexing, storage, and write amplification
Materialized views consume disk like tables — the snapshot plus any indexes you create. Index columns that appear in WHERE and ORDER BY on dashboard queries, mirroring normal table tuning. A unique index enables concurrent refresh in PostgreSQL and prevents duplicate bucket rows from a sloppy GROUP BY.
Writes to base tables do not automatically update the snapshot. Until refresh, the materialized view is stale. Heavy refresh schedules increase write amplification: each refresh rewrites the entire snapshot. Balance refresh frequency against:
- Disk churn and autovacuum pressure on PostgreSQL
- Buffer pool eviction — cold snapshots can evict hot OLTP pages
- Replication bandwidth if snapshots live on the primary
For very large rollups, consider building materialized views on a reporting replica so OLTP primaries stay focused on transactional traffic.
Worked example: regional daily revenue rollup
An e-commerce platform serves a "Sales by region" chart. Raw query cost on
orders (80M rows): ~4.2s p95 with proper indexes on
created_at. Target: under 50ms p95 for the dashboard API.
Step 1 — create the materialized view and supporting index (shown above). Step 2 — API reads:
SELECT day, region, revenue_cents
FROM daily_revenue
WHERE day >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY day DESC, revenue_cents DESC;
Measured p95 after index: ~8ms. Step 3 — schedule
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue every
10 minutes via pg_cron. Step 4 — expose
last_refreshed_at in the API response so users understand data age.
If intraday ops needs near-real-time numbers, add a supplementary query for
"today only" against live orders (smaller scan with partial index
on created_at >= CURRENT_DATE) and UNION with the materialized
history — a common hybrid pattern.
Decision table: materialized view vs alternatives
| Approach | Best for | Freshness | Complexity |
|---|---|---|---|
| Materialized view | Stable SQL aggregates inside the DB | Minutes (refresh interval) | Low–medium |
| App cache (Redis) | Arbitrary computed objects, cross-service joins | Seconds–minutes (TTL) | Medium (invalidation logic) |
| Read replica + live query | Ad-hoc analytics, changing query shapes | Replication lag (ms–s) | Medium (routing, lag monitoring) |
| OLAP warehouse / lakehouse | Petabyte scans, BI tools, data science | Minutes–hours (ETL) | High |
| Trigger-maintained summary table | MySQL, near-row-level incremental updates | Seconds | High (trigger correctness) |
Many production systems combine layers: materialized views for canonical daily rollups, Redis for per-user dashboard fragments, and a warehouse for analyst SQL. Avoid storing the same aggregate in three places without an owner for each freshness tier.
Engine notes: PostgreSQL, MySQL, and beyond
PostgreSQL
Full support since 9.3. Use CONCURRENTLY for online refresh.
Materialized views cannot reference other materialized views in all versions
cleanly — flatten or refresh in dependency order. pg_matviews
catalog lists definitions; include refresh jobs in migration reviews.
MySQL / MariaDB
No native materialized views. Emulate with real tables populated by scheduled jobs or triggers. Treat the summary table schema as a migration-managed artifact.
SQL Server indexed views
Indexed views auto-maintain when constraints match strict requirements (SCHEMABINDING, deterministic functions). Less explicit refresh control, stronger consistency guarantees on supported DML paths.
Cloud warehouses
Snowflake, BigQuery, and Redshift materialized views target analytics workloads with automatic refresh and clustering — different cost model than OLTP Postgres, but the staleness trade-off concept is identical.
Common pitfalls
- Unbounded snapshot growth — rollups without retention prune forever; archive old buckets to cold storage.
- Refresh during peak — full refresh competes with OLTP; move to replicas or quiet hours.
- Missing unique index — blocks concurrent refresh and allows duplicate groups.
- Hidden staleness bugs — finance reconciling to materialized numbers that lag an hour behind live ledger.
- ORM ignorance — ORMs rarely model materialized views; document raw SQL or read-only models explicitly.
- Schema drift — changing base columns without updating the view definition breaks refresh silently until runtime.
Production checklist
- Confirm the query is read-heavy, shape-stable, and tolerates documented staleness.
- Create the materialized view on a replica when OLTP lock or I/O impact matters.
- Add indexes matching dashboard filter and sort columns; unique index for concurrent refresh.
- Measure full refresh duration before committing to SLA frequency.
- Schedule refresh with monitoring — alert on job failure or duration regression.
- Expose
last_refreshed_atto downstream consumers and support teams. - Version-control the CREATE statement in migrations; review on base-table schema changes.
- Load-test read path after creation; compare EXPLAIN plans to pre-materialization baseline.
- Document owner and retirement criteria — remove unused snapshots that still refresh nightly.
- Re-evaluate when row churn makes incremental or cache-based approaches cheaper.
Key takeaways
- Materialized views store query results on disk — reads skip repeated heavy computation.
- Staleness is the product decision — define acceptable lag before choosing refresh cadence.
- Concurrent refresh keeps dashboards online but requires unique indexes and costs more CPU.
- Incremental maintenance trades implementation complexity for fresher, cheaper updates.
- Combine with indexes, replicas, and caches — each layer solves a different slice of the read-performance problem.
Related reading
- Database indexing explained — B-trees, compound keys, and EXPLAIN plans
- SQL query optimization explained — tune the defining query before materializing
- Application caching explained — when Redis beats a database snapshot
- Database replication explained — hosting rollups on read replicas