Guide

Materialized views explained

Your analytics dashboard runs the same expensive JOIN and GROUP BY on every page load — scanning millions of rows while product engineers wait. A regular SQL view is just a saved query definition; the database still executes it from scratch each time. A materialized view stores the query result on disk like a table, so reads become fast index lookups at the cost of staleness until you refresh. Used well, materialized views sit between raw tables and external caches: closer to the data, transactional where your engine supports it, and ideal for read-heavy aggregates that tolerate minutes (not milliseconds) of lag. This guide covers view vs materialized view mechanics, refresh strategies, indexing the snapshot, trade-offs against Redis caching and read replicas, engine differences, a worked e-commerce rollup example, a decision table, and a production checklist.

Views vs materialized views: definition vs snapshot

A standard view is syntactic sugar — a named SELECT stored in the catalog. When you query SELECT * FROM daily_revenue_view, the planner inlines the underlying SQL and runs it against base tables every time. Complexity, I/O, and lock contention scale with traffic.

A materialized view persists the result set physically. PostgreSQL, Oracle, SQL Server, and some warehouses treat it as a first-class object; MySQL has no native materialized view (teams emulate with summary tables plus ETL). After creation, reads hit the stored snapshot:

CREATE MATERIALIZED VIEW daily_revenue AS
SELECT
  date_trunc('day', created_at) AS day,
  region,
  COUNT(*) AS order_count,
  SUM(total_cents) AS revenue_cents
FROM orders
WHERE status = 'paid'
GROUP BY 1, 2;

CREATE UNIQUE INDEX ON daily_revenue (day, region);

The dashboard now queries daily_revenue — thousands of rows instead of millions — and benefits from the unique index on (day, region). The base orders table can grow without slowing this read path, as long as refresh keeps the snapshot acceptably current.

When materialized views earn their keep

Materialized views shine when all of these hold:

  • Expensive, repeatable reads — aggregations, window functions, multi-table JOINs run constantly.
  • Tolerable staleness — dashboards, internal reports, search facets, recommendation features updated every N minutes.
  • Stable query shape — the SELECT definition changes rarely; ad-hoc analytics belong in a warehouse.
  • Database-native freshness — you want one source of truth inside the RDBMS rather than duplicating logic in an app cache.

Poor fits: real-time balances, inventory counts that must match checkout, or queries that change shape weekly. For those, optimize the live query with indexes and SARGable predicates, or stream incremental updates through an outbox or change-data-capture pipeline.

Refresh strategies: manual, scheduled, concurrent, incremental

Full refresh (manual or scheduled)

The simplest model: re-run the entire SELECT and replace the snapshot. PostgreSQL:

REFRESH MATERIALIZED VIEW daily_revenue;

Without CONCURRENTLY, PostgreSQL takes an exclusive lock — readers block until refresh completes. Schedule off-peak via cron or pg_cron. For large snapshots, full refresh CPU and I/O can spike; monitor replication lag if the view lives on a primary that feeds replicas.

Concurrent refresh (PostgreSQL)

Requires at least one unique index on the materialized view:

REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue;

Readers keep using the old snapshot while a new one builds, then swap. Slower than blocking refresh but essential for dashboards queried 24/7. Still a full recompute — not incremental.

Incremental and trigger-based maintenance

When only a fraction of base rows change between refreshes, full recompute wastes work. Patterns:

  • Summary tables + triggers — on INSERT/UPDATE to orders, upsert the affected (day, region) bucket. Classic MySQL approach.
  • Incremental materialized views — some warehouses (BigQuery, Snowflake) support automatic incremental refresh when the defining query qualifies.
  • CDC-driven rollups — Debezium or logical replication feeds a consumer that patches aggregates; more moving parts but near-real-time.

Incremental paths demand rigorous idempotency and late-arriving data handling (orders backdated to yesterday must adjust yesterday's bucket).

Refresh scheduling and SLAs

Document an explicit freshness SLA: "Revenue dashboard may lag up to 15 minutes." Align cron frequency with that SLA plus refresh duration. If refresh takes 12 minutes, a 15-minute schedule leaves almost no margin — widen the window or optimize the defining query before adding hardware.

Indexing, storage, and write amplification

Materialized views consume disk like tables — the snapshot plus any indexes you create. Index columns that appear in WHERE and ORDER BY on dashboard queries, mirroring normal table tuning. A unique index enables concurrent refresh in PostgreSQL and prevents duplicate bucket rows from a sloppy GROUP BY.

Writes to base tables do not automatically update the snapshot. Until refresh, the materialized view is stale. Heavy refresh schedules increase write amplification: each refresh rewrites the entire snapshot. Balance refresh frequency against:

  • Disk churn and autovacuum pressure on PostgreSQL
  • Buffer pool eviction — cold snapshots can evict hot OLTP pages
  • Replication bandwidth if snapshots live on the primary

For very large rollups, consider building materialized views on a reporting replica so OLTP primaries stay focused on transactional traffic.

Worked example: regional daily revenue rollup

An e-commerce platform serves a "Sales by region" chart. Raw query cost on orders (80M rows): ~4.2s p95 with proper indexes on created_at. Target: under 50ms p95 for the dashboard API.

Step 1 — create the materialized view and supporting index (shown above). Step 2 — API reads:

SELECT day, region, revenue_cents
FROM daily_revenue
WHERE day >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY day DESC, revenue_cents DESC;

Measured p95 after index: ~8ms. Step 3 — schedule REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue every 10 minutes via pg_cron. Step 4 — expose last_refreshed_at in the API response so users understand data age.

If intraday ops needs near-real-time numbers, add a supplementary query for "today only" against live orders (smaller scan with partial index on created_at >= CURRENT_DATE) and UNION with the materialized history — a common hybrid pattern.

Decision table: materialized view vs alternatives

Approach Best for Freshness Complexity
Materialized view Stable SQL aggregates inside the DB Minutes (refresh interval) Low–medium
App cache (Redis) Arbitrary computed objects, cross-service joins Seconds–minutes (TTL) Medium (invalidation logic)
Read replica + live query Ad-hoc analytics, changing query shapes Replication lag (ms–s) Medium (routing, lag monitoring)
OLAP warehouse / lakehouse Petabyte scans, BI tools, data science Minutes–hours (ETL) High
Trigger-maintained summary table MySQL, near-row-level incremental updates Seconds High (trigger correctness)

Many production systems combine layers: materialized views for canonical daily rollups, Redis for per-user dashboard fragments, and a warehouse for analyst SQL. Avoid storing the same aggregate in three places without an owner for each freshness tier.

Engine notes: PostgreSQL, MySQL, and beyond

PostgreSQL

Full support since 9.3. Use CONCURRENTLY for online refresh. Materialized views cannot reference other materialized views in all versions cleanly — flatten or refresh in dependency order. pg_matviews catalog lists definitions; include refresh jobs in migration reviews.

MySQL / MariaDB

No native materialized views. Emulate with real tables populated by scheduled jobs or triggers. Treat the summary table schema as a migration-managed artifact.

SQL Server indexed views

Indexed views auto-maintain when constraints match strict requirements (SCHEMABINDING, deterministic functions). Less explicit refresh control, stronger consistency guarantees on supported DML paths.

Cloud warehouses

Snowflake, BigQuery, and Redshift materialized views target analytics workloads with automatic refresh and clustering — different cost model than OLTP Postgres, but the staleness trade-off concept is identical.

Common pitfalls

  • Unbounded snapshot growth — rollups without retention prune forever; archive old buckets to cold storage.
  • Refresh during peak — full refresh competes with OLTP; move to replicas or quiet hours.
  • Missing unique index — blocks concurrent refresh and allows duplicate groups.
  • Hidden staleness bugs — finance reconciling to materialized numbers that lag an hour behind live ledger.
  • ORM ignorance — ORMs rarely model materialized views; document raw SQL or read-only models explicitly.
  • Schema drift — changing base columns without updating the view definition breaks refresh silently until runtime.

Production checklist

  • Confirm the query is read-heavy, shape-stable, and tolerates documented staleness.
  • Create the materialized view on a replica when OLTP lock or I/O impact matters.
  • Add indexes matching dashboard filter and sort columns; unique index for concurrent refresh.
  • Measure full refresh duration before committing to SLA frequency.
  • Schedule refresh with monitoring — alert on job failure or duration regression.
  • Expose last_refreshed_at to downstream consumers and support teams.
  • Version-control the CREATE statement in migrations; review on base-table schema changes.
  • Load-test read path after creation; compare EXPLAIN plans to pre-materialization baseline.
  • Document owner and retirement criteria — remove unused snapshots that still refresh nightly.
  • Re-evaluate when row churn makes incremental or cache-based approaches cheaper.

Key takeaways

  • Materialized views store query results on disk — reads skip repeated heavy computation.
  • Staleness is the product decision — define acceptable lag before choosing refresh cadence.
  • Concurrent refresh keeps dashboards online but requires unique indexes and costs more CPU.
  • Incremental maintenance trades implementation complexity for fresher, cheaper updates.
  • Combine with indexes, replicas, and caches — each layer solves a different slice of the read-performance problem.

Related reading