Guide

SQL fundamentals explained: queries, joins, aggregations, and transactions

SQL (Structured Query Language) is the lingua franca of relational databases — PostgreSQL, MySQL, SQLite, and most cloud warehouses all speak dialects of it. Whether you are building a REST API backed by Postgres, analyzing payment logs, or debugging a slow dashboard, the same core statements appear again and again: SELECT to read, JOIN to combine tables, GROUP BY to summarize, and BEGIN … COMMIT to keep multi-step updates consistent. This guide covers the vocabulary and patterns you need before diving into index tuning or execution plans.

The relational model: tables, rows, columns, keys

A relational database stores data in tables — named collections of rows (records) and columns (fields). Each row is one entity; each column has a fixed type (INTEGER, TEXT, TIMESTAMP, BOOLEAN, and so on).

Primary key — uniquely identifies a row (id is the usual choice). No two rows share the same primary key value.
Foreign key — a column that references another table's primary key, encoding a relationship (e.g. orders.user_id points to users.id).
Schema — the set of tables, columns, types, and constraints that define your data model.
Normalization — splitting data across related tables to avoid duplication (store each user's email once in users, not on every order row).

Example mental model: a users table holds accounts; an orders table holds purchases with a user_id foreign key. SQL lets you query across both in a single statement instead of fetching and merging in application code.

Reading data: SELECT, WHERE, ORDER BY, LIMIT

The workhorse statement is SELECT. It specifies which columns to return, which table to read, optional filters, sorting, and a row cap.

SELECT id, email, created_at
FROM users
WHERE status = 'active'
  AND created_at >= '2026-01-01'
ORDER BY created_at DESC
LIMIT 50;

SELECT * returns every column — fine for exploration, avoid in production APIs where you should list columns explicitly.
WHERE filters rows before they are returned. Combine conditions with AND, OR, and parentheses.
ORDER BY sorts results. Add DESC for descending.
LIMIT caps row count — pair with OFFSET for simple pagination, though keyset pagination scales better on large tables.

Comparison operators (=, <>, <, IN, BETWEEN, LIKE) and null checks (IS NULL, IS NOT NULL) cover most filters. Remember: NULL is not equal to anything, including itself — use IS NULL instead of = NULL.

Combining tables: INNER JOIN and LEFT JOIN

Real schemas spread data across tables. JOINs combine rows based on matching key columns.

INNER JOIN

Returns only rows that have a match in both tables. If a user has no orders, they disappear from an inner join between users and orders.

SELECT u.email, o.total_cents, o.placed_at
FROM users u
INNER JOIN orders o ON o.user_id = u.id
WHERE o.placed_at >= CURRENT_DATE - INTERVAL '30 days';

LEFT JOIN

Keeps every row from the left table even when there is no match on the right — unmatched columns come back as NULL. Use this for "all users and their recent orders, including users with zero orders."

SELECT u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.email;

Table aliases (u, o) keep queries readable. Always put the join condition in ON, not mixed into WHERE, unless you deliberately want to filter after the join.

Summarizing with GROUP BY and HAVING

Aggregate functions collapse many rows into one value:

COUNT(*) — number of rows in a group
SUM(column) — total of numeric values
AVG(column) — arithmetic mean
MIN / MAX — smallest or largest value

GROUP BY defines the groups. Every non-aggregated column in SELECT must appear in GROUP BY (or be functionally dependent on it in databases that allow that).

SELECT product_id,
       COUNT(*) AS units_sold,
       SUM(amount_cents) AS revenue_cents
FROM order_items
WHERE placed_at >= '2026-06-01'
GROUP BY product_id
HAVING SUM(amount_cents) > 100000
ORDER BY revenue_cents DESC;

WHERE filters rows before grouping; HAVING filters groups after aggregation. A common mistake is putting aggregate conditions in WHERE — that is a syntax error in standard SQL.

Writing data: INSERT, UPDATE, DELETE

INSERT

INSERT INTO users (email, status)
VALUES ('alice@example.com', 'active')
RETURNING id;

RETURNING (PostgreSQL) or similar clauses let you read generated keys without a second round trip — useful when creating a parent row and immediately inserting child rows in one transaction.

UPDATE

UPDATE orders
SET status = 'shipped', shipped_at = NOW()
WHERE id = 48291 AND status = 'paid';

Always include a selective WHERE clause. An unqualified UPDATE touches every row in the table.

DELETE

DELETE FROM sessions
WHERE expires_at < NOW();

Many teams prefer soft deletes — an deleted_at timestamp column — so data can be recovered and historical reports stay intact. Hard deletes are irreversible without backups.

Transactions: keeping multi-step work atomic

A transaction groups statements into one unit: either all succeed or none do. Transferring value between accounts is the classic example — debit one row, credit another, commit only if both updates succeed.

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

If anything fails, ROLLBACK undoes the work in that transaction. Databases guarantee ACID properties — atomicity, consistency, isolation, durability. For a deeper dive on what each letter means in practice, see our ACID transactions explainer.

Application code should open a transaction around related writes. ORMs often expose transaction { … } blocks; raw drivers use BEGIN/COMMIT. Keep transactions short — long-held locks block other queries.

Indexes, query plans, and performance basics

Without an index, finding rows by email may scan the entire table — fine at thousands of rows, painful at millions. B-tree indexes (the default in Postgres and MySQL) speed up equality and range lookups on indexed columns.

The optimizer chooses an access path based on statistics. A filter on an unindexed column forces a sequential scan; a selective indexed filter becomes an index seek. Our database indexing guide covers compound keys, covering indexes, and when indexes hurt write throughput. When a query is mysteriously slow, read the execution plan explainer and run EXPLAIN ANALYZE before guessing.

Parameterized queries and SQL injection

Never concatenate user input into SQL strings. An attacker who controls a search box can inject '; DROP TABLE users; -- if your code builds queries like "SELECT * FROM users WHERE email = '" + input + "'".

Use parameterized queries (prepared statements) instead — the database treats user values as data, not executable SQL:

-- Pseudocode: driver-specific syntax varies
db.query(
  'SELECT id FROM users WHERE email = $1',
  [userEmail]
);

ORMs parameterize by default when you use their query builders. Raw SQL is fine when you keep parameters bound. Escaping is a weaker fallback; binding is the standard defense.

SQL vs NoSQL: when each fits

Relational SQL databases excel when:

Your data has clear entities and relationships (users, orders, line items).
You need multi-row transactions with strong consistency.
Ad hoc reporting and joins across tables are common.

Document stores (MongoDB), wide-column systems (Cassandra), and key-value caches (Redis) trade some relational rigor for flexible schemas, horizontal shard patterns, or extreme write throughput. Many production stacks use both — Postgres as the source of truth plus Redis for caching hot reads.

Start with SQL unless you have a specific reason not to. Joins, constraints, and mature tooling beat premature NoSQL complexity for most new products.

Key takeaways

Tables store rows; primary keys identify rows and foreign keys link related tables.
SELECT … WHERE … ORDER BY … LIMIT is the basic read pattern; list columns explicitly in APIs.
INNER JOIN keeps only matching rows; LEFT JOIN preserves all rows from the left table.
GROUP BY + aggregates summarize data; filter groups with HAVING, not WHERE.
Wrap related writes in transactions; understand ACID before relying on consistency guarantees.
Add indexes on columns you filter and join; verify with execution plans, not intuition.
Always use parameterized queries — never string-concatenate untrusted input into SQL.