Guide
SQL fundamentals explained: queries, joins, aggregations, and transactions
SQL (Structured Query Language) is the lingua franca of relational
databases — PostgreSQL, MySQL, SQLite, and most cloud warehouses all speak dialects
of it. Whether you are building a
REST API backed by
Postgres, analyzing payment logs, or debugging a slow dashboard, the same core
statements appear again and again: SELECT to read,
JOIN to combine tables, GROUP BY to summarize, and
BEGIN … COMMIT to keep multi-step updates consistent. This guide
covers the vocabulary and patterns you need before diving into
index tuning or
execution plans.
The relational model: tables, rows, columns, keys
A relational database stores data in tables — named collections
of rows (records) and columns (fields). Each row is one entity; each column has a
fixed type (INTEGER, TEXT, TIMESTAMP,
BOOLEAN, and so on).
- Primary key — uniquely identifies a row (
idis the usual choice). No two rows share the same primary key value. - Foreign key — a column that references another table's primary key, encoding a relationship (e.g.
orders.user_idpoints tousers.id). - Schema — the set of tables, columns, types, and constraints that define your data model.
- Normalization — splitting data across related tables to avoid duplication (store each user's email once in
users, not on every order row).
Example mental model: a users table holds accounts; an
orders table holds purchases with a user_id foreign key.
SQL lets you query across both in a single statement instead of fetching and
merging in application code.
Reading data: SELECT, WHERE, ORDER BY, LIMIT
The workhorse statement is SELECT. It specifies which columns to
return, which table to read, optional filters, sorting, and a row cap.
SELECT id, email, created_at
FROM users
WHERE status = 'active'
AND created_at >= '2026-01-01'
ORDER BY created_at DESC
LIMIT 50;
SELECT *returns every column — fine for exploration, avoid in production APIs where you should list columns explicitly.WHEREfilters rows before they are returned. Combine conditions withAND,OR, and parentheses.ORDER BYsorts results. AddDESCfor descending.LIMITcaps row count — pair withOFFSETfor simple pagination, though keyset pagination scales better on large tables.
Comparison operators (=, <>, <,
IN, BETWEEN, LIKE) and null checks
(IS NULL, IS NOT NULL) cover most filters. Remember:
NULL is not equal to anything, including itself — use
IS NULL instead of = NULL.
Combining tables: INNER JOIN and LEFT JOIN
Real schemas spread data across tables. JOINs combine rows based on matching key columns.
INNER JOIN
Returns only rows that have a match in both tables. If a user has no
orders, they disappear from an inner join between users and
orders.
SELECT u.email, o.total_cents, o.placed_at
FROM users u
INNER JOIN orders o ON o.user_id = u.id
WHERE o.placed_at >= CURRENT_DATE - INTERVAL '30 days';
LEFT JOIN
Keeps every row from the left table even when there is no match on the right —
unmatched columns come back as NULL. Use this for "all users and
their recent orders, including users with zero orders."
SELECT u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.email;
Table aliases (u, o) keep queries readable. Always put
the join condition in ON, not mixed into WHERE, unless
you deliberately want to filter after the join.
Summarizing with GROUP BY and HAVING
Aggregate functions collapse many rows into one value:
COUNT(*)— number of rows in a groupSUM(column)— total of numeric valuesAVG(column)— arithmetic meanMIN/MAX— smallest or largest value
GROUP BY defines the groups. Every non-aggregated column in
SELECT must appear in GROUP BY (or be functionally
dependent on it in databases that allow that).
SELECT product_id,
COUNT(*) AS units_sold,
SUM(amount_cents) AS revenue_cents
FROM order_items
WHERE placed_at >= '2026-06-01'
GROUP BY product_id
HAVING SUM(amount_cents) > 100000
ORDER BY revenue_cents DESC;
WHERE filters rows before grouping;
HAVING filters groups after aggregation. A common mistake
is putting aggregate conditions in WHERE — that is a syntax error in
standard SQL.
Writing data: INSERT, UPDATE, DELETE
INSERT
INSERT INTO users (email, status)
VALUES ('alice@example.com', 'active')
RETURNING id;
RETURNING (PostgreSQL) or similar clauses let you read generated
keys without a second round trip — useful when creating a parent row and
immediately inserting child rows in one transaction.
UPDATE
UPDATE orders
SET status = 'shipped', shipped_at = NOW()
WHERE id = 48291 AND status = 'paid';
Always include a selective WHERE clause. An unqualified
UPDATE touches every row in the table.
DELETE
DELETE FROM sessions
WHERE expires_at < NOW();
Many teams prefer soft deletes — an deleted_at
timestamp column — so data can be recovered and historical reports stay intact.
Hard deletes are irreversible without backups.
Transactions: keeping multi-step work atomic
A transaction groups statements into one unit: either all succeed or none do. Transferring value between accounts is the classic example — debit one row, credit another, commit only if both updates succeed.
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
If anything fails, ROLLBACK undoes the work in that transaction.
Databases guarantee ACID properties — atomicity, consistency,
isolation, durability. For a deeper dive on what each letter means in practice,
see our
ACID transactions explainer.
Application code should open a transaction around related writes. ORMs often
expose transaction { … } blocks; raw drivers use
BEGIN/COMMIT. Keep transactions short — long-held
locks block other queries.
Indexes, query plans, and performance basics
Without an index, finding rows by email may scan the entire table —
fine at thousands of rows, painful at millions. B-tree indexes (the default in
Postgres and MySQL) speed up equality and range lookups on indexed columns.
The optimizer chooses an access path based on statistics. A filter on an
unindexed column forces a sequential scan; a selective indexed filter becomes an
index seek. Our
database indexing guide
covers compound keys, covering indexes, and when indexes hurt write throughput.
When a query is mysteriously slow, read the
execution plan explainer
and run EXPLAIN ANALYZE before guessing.
Parameterized queries and SQL injection
Never concatenate user input into SQL strings. An attacker who controls a search
box can inject '; DROP TABLE users; -- if your code builds queries
like "SELECT * FROM users WHERE email = '" + input + "'".
Use parameterized queries (prepared statements) instead — the database treats user values as data, not executable SQL:
-- Pseudocode: driver-specific syntax varies
db.query(
'SELECT id FROM users WHERE email = $1',
[userEmail]
);
ORMs parameterize by default when you use their query builders. Raw SQL is fine when you keep parameters bound. Escaping is a weaker fallback; binding is the standard defense.
SQL vs NoSQL: when each fits
Relational SQL databases excel when:
- Your data has clear entities and relationships (users, orders, line items).
- You need multi-row transactions with strong consistency.
- Ad hoc reporting and joins across tables are common.
Document stores (MongoDB), wide-column systems (Cassandra), and key-value caches (Redis) trade some relational rigor for flexible schemas, horizontal shard patterns, or extreme write throughput. Many production stacks use both — Postgres as the source of truth plus Redis for caching hot reads.
Start with SQL unless you have a specific reason not to. Joins, constraints, and mature tooling beat premature NoSQL complexity for most new products.
Key takeaways
- Tables store rows; primary keys identify rows and foreign keys link related tables.
SELECT … WHERE … ORDER BY … LIMITis the basic read pattern; list columns explicitly in APIs.- INNER JOIN keeps only matching rows; LEFT JOIN preserves all rows from the left table.
GROUP BY+ aggregates summarize data; filter groups withHAVING, notWHERE.- Wrap related writes in transactions; understand ACID before relying on consistency guarantees.
- Add indexes on columns you filter and join; verify with execution plans, not intuition.
- Always use parameterized queries — never string-concatenate untrusted input into SQL.
Related reading
- Database indexing explained — B-trees, compound keys, and covering indexes
- GraphQL API design explained — how resolvers map to SQL queries underneath
- ACID database transactions explained — isolation levels and failure modes
- REST API design explained — pagination, idempotency, and backing HTTP with a relational store