Guide
PostgreSQL fundamentals explained
Your startup picks a database on day one and lives with that choice for years. PostgreSQL (often called Postgres) is the relational engine behind Stripe, Instagram, Spotify, and thousands of SaaS products — not because it is trendy, but because it combines strict ACID guarantees with pragmatic features: multi-version concurrency control (MVCC), rich indexing, native JSONB, full-text search, extensions like PostGIS and pgvector, and mature streaming replication. This guide explains PostgreSQL's process architecture, how MVCC lets readers and writers coexist, the index types that matter in production, the write-ahead log (WAL) and crash recovery, why VACUUM is not optional, connection pooling with PgBouncer, read-replica patterns, table partitioning, a small e-commerce schema worked example, when Postgres beats MySQL or SQLite, common pitfalls, and a practitioner checklist. For transaction semantics in depth, pair this with our database transactions guide.
What PostgreSQL is
PostgreSQL is an object-relational database management system (ORDBMS) — relational at its core (tables, rows, SQL, foreign keys) but extensible with custom types, operators, and functions. It is open source (PostgreSQL License), runs on every major OS, and is the default choice when you need a general-purpose OLTP database that can also handle analytics-light workloads, geospatial data, or vector search without bolting on a separate store.
Unlike SQLite (embedded, single-writer) or managed key-value stores, Postgres assumes a client-server model: applications connect over TCP (port 5432 by default), send SQL, and receive result sets. A single Postgres cluster (one data directory) can host many logical databases; each database contains schemas, tables, indexes, and views.
Process architecture
A running Postgres instance is a postmaster supervisor plus worker processes:
- Backend processes — one per client connection; parses SQL, plans queries, reads/writes buffers.
- Background workers — checkpointer, WAL writer, autovacuum launcher, stats collector.
- Shared memory —
shared_buffers(page cache), WAL buffers, lock tables.
Because each connection spawns a backend process (~5–10 MB RAM), high-concurrency apps almost always front Postgres with a connection pool like PgBouncer rather than opening thousands of direct connections.
MVCC: how Postgres handles concurrency
Traditional databases lock rows on update, blocking readers. PostgreSQL uses
multi-version concurrency control (MVCC): an UPDATE does
not overwrite the old row in place — it marks the old version dead and inserts a new
version. Each row carries system columns xmin (inserting transaction ID)
and xmax (deleting/updating transaction ID, if any).
When a transaction runs, Postgres builds a snapshot of visible
transaction IDs. A row version is visible if its xmin committed before
the snapshot and its xmax is either unset or committed after the
snapshot. Readers never block writers; writers never block readers. Conflicts appear
only when two transactions try to update the same row — the second blocks until the
first commits, then may fail with a serialization error depending on isolation level.
MVCC is why DELETE and UPDATE leave dead tuples
on disk until VACUUM reclaims space, and why long-running transactions (or orphaned
idle-in-transaction sessions) can cause table bloat and autovacuum stalls. See
isolation levels
for how Read Committed, Repeatable Read, and Serializable map onto snapshots.
Storage, WAL, and crash recovery
Data files are organized in 8 KB pages (blocks). Modified pages sit in shared buffers; the write-ahead log (WAL) records changes before they hit data files. On crash, Postgres replays WAL from the last checkpoint — this is how durability works even if the OS buffers were not flushed.
synchronous_commit = on (default) waits for WAL to reach disk before
acknowledging a commit — safe, slightly slower. Setting it off for
bulk loads trades durability for speed (you may lose the last second of commits on
crash). Streaming replication ships WAL to standby servers for
read replicas
and failover.
Checkpoints throttle WAL growth and bound recovery time. Monitor
pg_stat_bgwriter and disk I/O — checkpoint spikes on slow disks are a
common production pain point.
Indexes that matter in production
Postgres defaults to B-tree indexes — excellent for equality and
range queries on scalars (=, <, BETWEEN,
ORDER BY). Most foreign-key and lookup columns need a B-tree. For deeper
index theory, see
database indexing explained.
Specialized index types
- GIN (Generalized Inverted Index) — JSONB containment
(
@>), full-text search (tsvector), arrays. Larger and slower to update than B-tree, but indispensable for document queries. - GiST — geospatial (PostGIS), range types, nearest-neighbor searches.
- BRIN (Block Range Index) — tiny indexes for naturally ordered data (timestamps in event logs) where each block stores min/max summaries.
- Hash — equality-only; rarely used because B-tree handles
=well enough.
Index patterns
- Compound indexes —
(tenant_id, created_at DESC)supportsWHERE tenant_id = ? ORDER BY created_at; column order matters. - Partial indexes —
WHERE deleted_at IS NULLindexes only active rows, saving space on soft-delete tables. - Covering indexes (INCLUDE) — store extra columns in the leaf for index-only scans without visiting the heap.
- Expression indexes —
LOWER(email)for case-insensitive uniqueness.
Use EXPLAIN (ANALYZE, BUFFERS) to verify the planner picks your index.
Our
SQL query optimization guide
walks through reading plans and fixing sequential scans.
JSONB, arrays, and beyond relational
JSONB stores binary JSON with fast containment queries and GIN indexing. Use it for semi-structured attributes (product metadata, user preferences) that would be painful as dozens of nullable columns — but keep hot query fields in proper typed columns for B-tree speed.
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);
SELECT id FROM products WHERE attributes @> '{"color": "red"}';
Native arrays, UUID, ENUM,
range types, and composite types reduce application
glue code. Extensions add domains: pgvector for embedding search,
PostGIS for geospatial, citext for case-insensitive text.
Enable with CREATE EXTENSION — but document dependencies; extensions
complicate managed-cloud upgrades.
VACUUM, bloat, and autovacuum
Dead tuples from MVCC accumulate until VACUUM marks their space reusable. Autovacuum runs automatically based on insert/update/delete volume. If autovacuum falls behind — common on busy tables with default settings — tables swell, indexes bloat, and sequential scans slow down.
- Watch
n_dead_tupinpg_stat_user_tables. - Tune per-table
autovacuum_vacuum_scale_factorfor large tables (default 20% is too lazy for multi-million-row tables). - VACUUM FULL rewrites the table to reclaim disk to the OS — locks
exclusively; use
pg_repackonline instead in production. - Never disable autovacuum globally. Long
idle in transactionsessions hold backxminand prevent cleanup.
Replication, pooling, and scaling reads
Streaming replication ships WAL records to one or more standbys. Synchronous replicas trade latency for zero data loss on primary failure; asynchronous replicas are standard for read scaling with seconds of lag. Route read-only analytics queries to replicas; keep writes and read-your-writes paths on the primary.
PgBouncer in transaction pooling mode multiplexes thousands of app
connections onto dozens of Postgres backends. Caveat: session-level features (prepared
statements, SET, temporary tables, advisory locks) break unless you use
session pooling or disable prepared statements in the ORM.
Vertical scaling (bigger CPU/RAM, faster NVMe) still outperforms sharding complexity for most workloads under a few terabytes. When you outgrow a single primary, consider sharding or read replicas before jumping to distributed SQL.
Table partitioning
Declarative partitioning splits one logical table into physical
child tables by RANGE (dates), LIST (regions), or HASH (even spread). The planner
prunes partitions that cannot match the query — a
WHERE created_at >= '2026-06-01' scans only June's partition.
Partition when tables exceed tens of millions of rows and queries naturally filter
on the partition key (time-series events, multi-tenant data keyed by tenant). Each
partition needs its own indexes; maintenance (VACUUM, REINDEX) runs per partition.
Detach old partitions for cheap archival: ALTER TABLE ... DETACH PARTITION.
Worked example: e-commerce orders schema
A mid-size shop needs customers, orders, and line items with fast lookup by email and recent orders by customer.
CREATE TABLE customers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email CITEXT NOT NULL UNIQUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL REFERENCES customers(id),
status TEXT NOT NULL CHECK (status IN ('pending','paid','shipped')),
total_cents BIGINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);
Design choices: UUID primary keys avoid coordination across services
(vs serial BIGSERIAL). CITEXT gives case-insensitive email
uniqueness without application normalization. The compound index supports
SELECT * FROM orders WHERE customer_id = $1 ORDER BY created_at DESC LIMIT 20
via an index-only scan if you add INCLUDE (status, total_cents). Order
line items would live in a child table with order_id indexed — avoid
storing arrays of line items in JSONB if you filter by product SKU.
At scale, partition orders by created_at monthly and attach
a read replica for the admin dashboard. Wrap checkout in a transaction with isolation
at least Read Committed; use SELECT ... FOR UPDATE on inventory rows to
prevent overselling.
PostgreSQL vs alternatives
| Scenario | Prefer PostgreSQL when… | Consider alternative when… |
|---|---|---|
| Default web app backend | You need ACID, joins, migrations, JSONB flexibility | Truly ephemeral data or sub-millisecond KV-only access patterns |
| MySQL / MariaDB | Complex queries, CTEs, partial indexes, extensions matter | Legacy LAMP stack or team expertise is entirely MySQL-specific |
| SQLite | Multi-user writes, replication, concurrent connections | Embedded mobile/desktop app, single writer, zero ops |
| Document store (MongoDB) | Relationships, transactions across documents, ad-hoc SQL reporting | Schema-free rapid prototyping with no cross-document consistency needs |
| Managed cloud (RDS, Cloud SQL, Neon, Supabase) | You want backups, failover, and patching handled | On-prem compliance or cost at very large committed-use scale |
| Vector / search specialty | pgvector + GIN full-text covers moderate scale in one DB | Billion-vector ANN or sub-10 ms search at huge QPS needs a dedicated engine |
Common pitfalls
- Connection storms — serverless functions or ORMs opening a new
connection per request exhaust
max_connections. Pool at the app or PgBouncer layer. - Missing indexes on foreign keys — Postgres does not auto-index FK columns; joins and cascades scan full tables without them.
- Over-indexing — every index slows writes and consumes disk;
index the queries you actually run, verified with
pg_stat_user_indexes. - Ignoring bloat — monitoring query time but not dead-tuple ratio leads to mysterious slowdowns months later.
SELECT *on wide tables — forces heap fetches; project only needed columns, especially with TOAST-compressed JSONB/text.- Serial ID exhaustion —
SERIALisINTEGER(max ~2.1B); useBIGSERIALor UUID for high-volume tables. - Replication lag blindness — reading stale inventory counts from async replicas causes oversell; route consistency-sensitive reads to primary.
Production checklist
- Set
shared_buffers~25% of RAM (cap near 8–16 GB on large boxes); tuneeffective_cache_sizeto guide the planner. - Enable
log_min_duration_statementorpg_stat_statementsto catch slow queries early. - Index foreign keys and every column in frequent
WHERE/JOINclauses; verify with EXPLAIN ANALYZE. - Configure autovacuum aggressively on high-churn tables; alert on
n_dead_tupthresholds. - Front with PgBouncer (transaction mode) when connection count exceeds ~100.
- Automate base backups + WAL archiving; test restore quarterly.
- Use migrations tools (Flyway, Liquibase, Rails, Prisma) — never hand-edit prod.
- Set statement timeouts and
idle_in_transaction_session_timeoutto kill runaway sessions. - Monitor replication lag, disk usage, checkpoint frequency, and lock waits.
- Document extensions and major version upgrade path before you depend on betas.
Key takeaways
- PostgreSQL is a full-featured OLTP database with MVCC concurrency, rich SQL, and extensions — the pragmatic default for new backends.
- MVCC enables concurrent reads and writes but requires autovacuum discipline to control dead-tuple bloat.
- B-tree indexes cover most queries; add GIN for JSONB and full-text; use BRIN for append-only time series.
- WAL provides durability and powers streaming replication to read replicas.
- Connection pooling, sane indexing, and EXPLAIN-driven tuning beat premature sharding for most teams.
Related reading
- Database transactions and isolation levels explained — ACID, MVCC snapshots, deadlocks
- Database indexing explained — B-trees, compound keys, covering indexes
- SQL query optimization explained — EXPLAIN plans and SARGable predicates
- Database connection pooling explained — PgBouncer modes and pool sizing