Guide

PostgreSQL fundamentals explained

Your startup picks a database on day one and lives with that choice for years. PostgreSQL (often called Postgres) is the relational engine behind Stripe, Instagram, Spotify, and thousands of SaaS products — not because it is trendy, but because it combines strict ACID guarantees with pragmatic features: multi-version concurrency control (MVCC), rich indexing, native JSONB, full-text search, extensions like PostGIS and pgvector, and mature streaming replication. This guide explains PostgreSQL's process architecture, how MVCC lets readers and writers coexist, the index types that matter in production, the write-ahead log (WAL) and crash recovery, why VACUUM is not optional, connection pooling with PgBouncer, read-replica patterns, table partitioning, a small e-commerce schema worked example, when Postgres beats MySQL or SQLite, common pitfalls, and a practitioner checklist. For transaction semantics in depth, pair this with our database transactions guide.

What PostgreSQL is

PostgreSQL is an object-relational database management system (ORDBMS) — relational at its core (tables, rows, SQL, foreign keys) but extensible with custom types, operators, and functions. It is open source (PostgreSQL License), runs on every major OS, and is the default choice when you need a general-purpose OLTP database that can also handle analytics-light workloads, geospatial data, or vector search without bolting on a separate store.

Unlike SQLite (embedded, single-writer) or managed key-value stores, Postgres assumes a client-server model: applications connect over TCP (port 5432 by default), send SQL, and receive result sets. A single Postgres cluster (one data directory) can host many logical databases; each database contains schemas, tables, indexes, and views.

Process architecture

A running Postgres instance is a postmaster supervisor plus worker processes:

  • Backend processes — one per client connection; parses SQL, plans queries, reads/writes buffers.
  • Background workers — checkpointer, WAL writer, autovacuum launcher, stats collector.
  • Shared memoryshared_buffers (page cache), WAL buffers, lock tables.

Because each connection spawns a backend process (~5–10 MB RAM), high-concurrency apps almost always front Postgres with a connection pool like PgBouncer rather than opening thousands of direct connections.

MVCC: how Postgres handles concurrency

Traditional databases lock rows on update, blocking readers. PostgreSQL uses multi-version concurrency control (MVCC): an UPDATE does not overwrite the old row in place — it marks the old version dead and inserts a new version. Each row carries system columns xmin (inserting transaction ID) and xmax (deleting/updating transaction ID, if any).

When a transaction runs, Postgres builds a snapshot of visible transaction IDs. A row version is visible if its xmin committed before the snapshot and its xmax is either unset or committed after the snapshot. Readers never block writers; writers never block readers. Conflicts appear only when two transactions try to update the same row — the second blocks until the first commits, then may fail with a serialization error depending on isolation level.

MVCC is why DELETE and UPDATE leave dead tuples on disk until VACUUM reclaims space, and why long-running transactions (or orphaned idle-in-transaction sessions) can cause table bloat and autovacuum stalls. See isolation levels for how Read Committed, Repeatable Read, and Serializable map onto snapshots.

Storage, WAL, and crash recovery

Data files are organized in 8 KB pages (blocks). Modified pages sit in shared buffers; the write-ahead log (WAL) records changes before they hit data files. On crash, Postgres replays WAL from the last checkpoint — this is how durability works even if the OS buffers were not flushed.

synchronous_commit = on (default) waits for WAL to reach disk before acknowledging a commit — safe, slightly slower. Setting it off for bulk loads trades durability for speed (you may lose the last second of commits on crash). Streaming replication ships WAL to standby servers for read replicas and failover.

Checkpoints throttle WAL growth and bound recovery time. Monitor pg_stat_bgwriter and disk I/O — checkpoint spikes on slow disks are a common production pain point.

Indexes that matter in production

Postgres defaults to B-tree indexes — excellent for equality and range queries on scalars (=, <, BETWEEN, ORDER BY). Most foreign-key and lookup columns need a B-tree. For deeper index theory, see database indexing explained.

Specialized index types

  • GIN (Generalized Inverted Index) — JSONB containment (@>), full-text search (tsvector), arrays. Larger and slower to update than B-tree, but indispensable for document queries.
  • GiST — geospatial (PostGIS), range types, nearest-neighbor searches.
  • BRIN (Block Range Index) — tiny indexes for naturally ordered data (timestamps in event logs) where each block stores min/max summaries.
  • Hash — equality-only; rarely used because B-tree handles = well enough.

Index patterns

  • Compound indexes(tenant_id, created_at DESC) supports WHERE tenant_id = ? ORDER BY created_at; column order matters.
  • Partial indexesWHERE deleted_at IS NULL indexes only active rows, saving space on soft-delete tables.
  • Covering indexes (INCLUDE) — store extra columns in the leaf for index-only scans without visiting the heap.
  • Expression indexesLOWER(email) for case-insensitive uniqueness.

Use EXPLAIN (ANALYZE, BUFFERS) to verify the planner picks your index. Our SQL query optimization guide walks through reading plans and fixing sequential scans.

JSONB, arrays, and beyond relational

JSONB stores binary JSON with fast containment queries and GIN indexing. Use it for semi-structured attributes (product metadata, user preferences) that would be painful as dozens of nullable columns — but keep hot query fields in proper typed columns for B-tree speed.

CREATE INDEX idx_products_attrs ON products USING GIN (attributes);
SELECT id FROM products WHERE attributes @> '{"color": "red"}';

Native arrays, UUID, ENUM, range types, and composite types reduce application glue code. Extensions add domains: pgvector for embedding search, PostGIS for geospatial, citext for case-insensitive text. Enable with CREATE EXTENSION — but document dependencies; extensions complicate managed-cloud upgrades.

VACUUM, bloat, and autovacuum

Dead tuples from MVCC accumulate until VACUUM marks their space reusable. Autovacuum runs automatically based on insert/update/delete volume. If autovacuum falls behind — common on busy tables with default settings — tables swell, indexes bloat, and sequential scans slow down.

  • Watch n_dead_tup in pg_stat_user_tables.
  • Tune per-table autovacuum_vacuum_scale_factor for large tables (default 20% is too lazy for multi-million-row tables).
  • VACUUM FULL rewrites the table to reclaim disk to the OS — locks exclusively; use pg_repack online instead in production.
  • Never disable autovacuum globally. Long idle in transaction sessions hold back xmin and prevent cleanup.

Replication, pooling, and scaling reads

Streaming replication ships WAL records to one or more standbys. Synchronous replicas trade latency for zero data loss on primary failure; asynchronous replicas are standard for read scaling with seconds of lag. Route read-only analytics queries to replicas; keep writes and read-your-writes paths on the primary.

PgBouncer in transaction pooling mode multiplexes thousands of app connections onto dozens of Postgres backends. Caveat: session-level features (prepared statements, SET, temporary tables, advisory locks) break unless you use session pooling or disable prepared statements in the ORM.

Vertical scaling (bigger CPU/RAM, faster NVMe) still outperforms sharding complexity for most workloads under a few terabytes. When you outgrow a single primary, consider sharding or read replicas before jumping to distributed SQL.

Table partitioning

Declarative partitioning splits one logical table into physical child tables by RANGE (dates), LIST (regions), or HASH (even spread). The planner prunes partitions that cannot match the query — a WHERE created_at >= '2026-06-01' scans only June's partition.

Partition when tables exceed tens of millions of rows and queries naturally filter on the partition key (time-series events, multi-tenant data keyed by tenant). Each partition needs its own indexes; maintenance (VACUUM, REINDEX) runs per partition. Detach old partitions for cheap archival: ALTER TABLE ... DETACH PARTITION.

Worked example: e-commerce orders schema

A mid-size shop needs customers, orders, and line items with fast lookup by email and recent orders by customer.

CREATE TABLE customers (
  id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email      CITEXT NOT NULL UNIQUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE orders (
  id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id  UUID NOT NULL REFERENCES customers(id),
  status       TEXT NOT NULL CHECK (status IN ('pending','paid','shipped')),
  total_cents  BIGINT NOT NULL,
  created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_orders_customer_created
  ON orders (customer_id, created_at DESC);

Design choices: UUID primary keys avoid coordination across services (vs serial BIGSERIAL). CITEXT gives case-insensitive email uniqueness without application normalization. The compound index supports SELECT * FROM orders WHERE customer_id = $1 ORDER BY created_at DESC LIMIT 20 via an index-only scan if you add INCLUDE (status, total_cents). Order line items would live in a child table with order_id indexed — avoid storing arrays of line items in JSONB if you filter by product SKU.

At scale, partition orders by created_at monthly and attach a read replica for the admin dashboard. Wrap checkout in a transaction with isolation at least Read Committed; use SELECT ... FOR UPDATE on inventory rows to prevent overselling.

PostgreSQL vs alternatives

Scenario Prefer PostgreSQL when… Consider alternative when…
Default web app backend You need ACID, joins, migrations, JSONB flexibility Truly ephemeral data or sub-millisecond KV-only access patterns
MySQL / MariaDB Complex queries, CTEs, partial indexes, extensions matter Legacy LAMP stack or team expertise is entirely MySQL-specific
SQLite Multi-user writes, replication, concurrent connections Embedded mobile/desktop app, single writer, zero ops
Document store (MongoDB) Relationships, transactions across documents, ad-hoc SQL reporting Schema-free rapid prototyping with no cross-document consistency needs
Managed cloud (RDS, Cloud SQL, Neon, Supabase) You want backups, failover, and patching handled On-prem compliance or cost at very large committed-use scale
Vector / search specialty pgvector + GIN full-text covers moderate scale in one DB Billion-vector ANN or sub-10 ms search at huge QPS needs a dedicated engine

Common pitfalls

  • Connection storms — serverless functions or ORMs opening a new connection per request exhaust max_connections. Pool at the app or PgBouncer layer.
  • Missing indexes on foreign keys — Postgres does not auto-index FK columns; joins and cascades scan full tables without them.
  • Over-indexing — every index slows writes and consumes disk; index the queries you actually run, verified with pg_stat_user_indexes.
  • Ignoring bloat — monitoring query time but not dead-tuple ratio leads to mysterious slowdowns months later.
  • SELECT * on wide tables — forces heap fetches; project only needed columns, especially with TOAST-compressed JSONB/text.
  • Serial ID exhaustionSERIAL is INTEGER (max ~2.1B); use BIGSERIAL or UUID for high-volume tables.
  • Replication lag blindness — reading stale inventory counts from async replicas causes oversell; route consistency-sensitive reads to primary.

Production checklist

  • Set shared_buffers ~25% of RAM (cap near 8–16 GB on large boxes); tune effective_cache_size to guide the planner.
  • Enable log_min_duration_statement or pg_stat_statements to catch slow queries early.
  • Index foreign keys and every column in frequent WHERE / JOIN clauses; verify with EXPLAIN ANALYZE.
  • Configure autovacuum aggressively on high-churn tables; alert on n_dead_tup thresholds.
  • Front with PgBouncer (transaction mode) when connection count exceeds ~100.
  • Automate base backups + WAL archiving; test restore quarterly.
  • Use migrations tools (Flyway, Liquibase, Rails, Prisma) — never hand-edit prod.
  • Set statement timeouts and idle_in_transaction_session_timeout to kill runaway sessions.
  • Monitor replication lag, disk usage, checkpoint frequency, and lock waits.
  • Document extensions and major version upgrade path before you depend on betas.

Key takeaways

  • PostgreSQL is a full-featured OLTP database with MVCC concurrency, rich SQL, and extensions — the pragmatic default for new backends.
  • MVCC enables concurrent reads and writes but requires autovacuum discipline to control dead-tuple bloat.
  • B-tree indexes cover most queries; add GIN for JSONB and full-text; use BRIN for append-only time series.
  • WAL provides durability and powers streaming replication to read replicas.
  • Connection pooling, sane indexing, and EXPLAIN-driven tuning beat premature sharding for most teams.

Related reading