Guide
MySQL fundamentals explained
WordPress, Magento, and thousands of SaaS products still run on MySQL — not because teams ignore Postgres, but because MySQL ships everywhere (managed RDS, Aurora, PlanetScale, shared hosting), boots fast, and handles read-heavy OLTP workloads with mature replication tooling. The catch: defaults tuned for a 2005 laptop will melt under real traffic unless you understand InnoDB — clustered primary-key indexes, the buffer pool, redo and undo logs, row-level locking with MVCC, and the binary log that powers replicas. This guide covers MySQL's architecture, indexing rules that differ from PostgreSQL, replication with GTIDs, JSON and partitioning in MySQL 8, a Harbor Retail inventory schema worked example, when MySQL beats or loses to Postgres, common pitfalls, and a production checklist — alongside our PostgreSQL fundamentals guide, SQL fundamentals guide, and database indexing guide.
What MySQL is
MySQL is an open-source relational database management system originally from Sweden, now maintained by Oracle with a large community fork ecosystem (MariaDB, Percona Server). Applications connect over TCP (port 3306) or a local socket, send SQL, and receive result sets — the same client-server contract as PostgreSQL, but with different storage-engine plug-ins and optimizer behavior.
Since MySQL 5.5, InnoDB is the default and only sensible engine
for production. Legacy MyISAM tables lack transactions and crash
recovery — migrate them before anything serious depends on them. InnoDB provides
ACID transactions, row-level locking, foreign keys, and crash-safe durability
through a redo log similar in purpose to PostgreSQL's WAL.
Thread and connection model
MySQL uses a one-thread-per-connection model (with thread pool plugin available on some builds). Each client connection consumes memory for session buffers and sort areas. High-concurrency apps front MySQL with connection pooling (ProxySQL, RDS Proxy, or app-side pools) rather than opening thousands of direct sessions. Unlike Postgres backends, MySQL threads are lighter but still benefit from pool sizing discipline.
InnoDB storage: clustered indexes and the buffer pool
InnoDB's defining design choice is the clustered index: table rows are stored in primary-key order inside the clustered B-tree. There is no separate heap — the primary key is the row layout. Secondary indexes store primary-key values as pointers, so wide or random UUID primary keys bloat every secondary index.
The buffer pool (innodb_buffer_pool_size) caches
data and index pages in RAM. This is the single most important tuning knob: on
dedicated database servers, allocate 60–75% of RAM to the buffer pool. Cold
caches after restart cause temporary read latency spikes until working sets warm.
Durability flows through two logs:
- Redo log — circular write-ahead log recording page changes before they hit data files; enables crash recovery.
- Undo log — stores old row versions for MVCC rollbacks and consistent reads; also feeds purge when transactions commit.
innodb_flush_log_at_trx_commit trades durability for speed: value
1 fsyncs every commit (safest); 2 writes to OS cache
without fsync (faster, small window of loss on power failure). Production OLTP
almost always uses 1 unless you explicitly accept the risk.
MVCC, locking, and isolation
InnoDB implements multi-version concurrency control via undo
chains: readers see a snapshot of row history as of their transaction start (in
REPEATABLE READ, MySQL's default) without blocking writers. Writers
take row locks on update; gap and next-key locks prevent phantom inserts in
repeatable-read ranges.
MySQL's default REPEATABLE READ behaves differently from PostgreSQL's
— InnoDB uses next-key locking to prevent phantoms, which can surprise developers
expecting pure snapshot isolation. READ COMMITTED relaxes gap locks
and is common in ORMs that expect Postgres-like semantics. See our
isolation levels guide
for cross-database comparison.
Deadlocks and long transactions
InnoDB detects deadlocks automatically and rolls back the smaller transaction.
Applications must retry idempotent updates. Long-running transactions hold undo
history, blocking purge and inflating storage — the same class of problem as
Postgres dead tuples, but surfaced through history list length
metrics rather than autovacuum alerts.
Indexing strategy in MySQL
Because of clustered indexes, primary key design is a first-class schema decision:
- Auto-increment integers — sequential inserts, compact secondary indexes, classic choice for high-insert tables.
- UUIDs — globally unique but random inserts fragment the
clustered index; consider
UUID_TO_BIN(uuid, swap_flag)(MySQL 8) or time-ordered IDs (ULID, Snowflake) for better locality. - Composite primary keys — natural for junction tables; secondary indexes inherit the full key width as suffix.
Secondary B-tree indexes cover most queries. MySQL 8 adds invisible indexes (test drop impact without removing) and improved descending index support. Full-text indexes (InnoDB FTS) handle search boxes; for heavy analytics consider offloading to Elasticsearch or a columnar store rather than forcing OLTP indexes to do OLAP.
Use EXPLAIN ANALYZE (MySQL 8.0.18+) to read actual row counts and
timing — the older EXPLAIN estimates alone mislead on large tables.
Covering indexes (all selected columns in the index) avoid clustered lookups and
are often the biggest win on read-heavy dashboards.
Replication and high availability
MySQL replication is log-based: the primary writes committed transactions to the binary log (binlog) in row, statement, or mixed format. Row format is safest for non-deterministic statements; statement format is compact but fragile with triggers and UUID functions.
Replicas pull binlog events via IO threads and apply them through SQL threads. GTID (global transaction identifiers) since MySQL 5.6 simplify failover: each transaction has a unique ID, so promotion does not depend on matching binlog file positions. Managed services (RDS, Aurora, PlanetScale) automate failover; self-hosted teams use Orchestrator, MHA, or ProxySQL for health checks and read/write splitting.
Replication lag and read scaling
Async replication means replicas can lag seconds (or minutes) behind the primary. Route consistency-sensitive reads — inventory counts, account balances, just-written user sessions — to the primary. Eventual-consistency reads (product catalogs, analytics dashboards) can hit replicas behind a lag-aware proxy. Semi-sync and group replication add durability guarantees at latency cost; most web apps tolerate async with careful read routing.
MySQL 8 features that matter in production
- JSON column type — binary storage with functional indexes on
expressions (
ALTER TABLE t ADD INDEX ((CAST(j->>'$.sku' AS CHAR(32))));); convenient for semi-structured attributes without a document store. - Window functions —
ROW_NUMBER,RANK, running totals in SQL without self-joins. - Common table expressions (CTEs) — readable recursive and non-recursive queries; watch recursion depth limits.
- Table partitioning — RANGE, LIST, HASH, and KEY partitioning prune scans on time-series and tenant-id tables; partition pruning must appear in EXPLAIN or you gain nothing.
- Invisible columns and roles — schema evolution and RBAC without application deploys for every grant change.
MySQL 8 removed the query cache (it did not scale on multi-core); plan caching
happens at the prepared-statement layer instead. Character set defaults moved to
utf8mb4 — always use full Unicode; legacy utf8 is a
three-byte subset that breaks emoji.
Worked example: Harbor Retail inventory schema
Harbor Retail runs a multi-store inventory system on MySQL 8 behind a Rails API. Peak traffic hits during promotional weekends; the schema must support fast SKU lookups, atomic stock decrements, and nightly reconciliation reports.
CREATE TABLE products (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
sku VARCHAR(32) NOT NULL,
name VARCHAR(255) NOT NULL,
attrs JSON,
created_at DATETIME(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
UNIQUE KEY uk_sku (sku)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE store_inventory (
store_id SMALLINT UNSIGNED NOT NULL,
product_id BIGINT UNSIGNED NOT NULL,
qty_on_hand INT NOT NULL DEFAULT 0,
qty_reserved INT NOT NULL DEFAULT 0,
updated_at DATETIME(6) NOT NULL,
PRIMARY KEY (store_id, product_id),
KEY idx_product (product_id),
CONSTRAINT fk_inv_product FOREIGN KEY (product_id) REFERENCES products(id)
) ENGINE=InnoDB;
-- Atomic decrement with guard against oversell
UPDATE store_inventory
SET qty_on_hand = qty_on_hand - 1,
updated_at = NOW(6)
WHERE store_id = 12 AND product_id = 88421 AND qty_on_hand > qty_reserved;
Design choices explained:
- Auto-increment PK on products — compact clustered index; SKU uniqueness enforced separately for business lookups.
- Composite PK on inventory — one row per store-product pair; updates lock a single row, not a hot global counter.
- JSON attrs — color/size variants without wide nullable columns; indexed only if filtered in WHERE clauses.
- Conditional UPDATE — single-statement atomicity avoids
read-modify-write races;
ROW_COUNT()tells the API if stock was insufficient.
Nightly reports aggregate sales from a read replica; the API writes go to the primary. Lag monitoring alerts if replica falls more than five seconds behind during flash sales — at that point Harbor routes inventory reads to primary until lag recovers.
Database decision table
| Choose MySQL when… | Prefer PostgreSQL when… | Consider something else when… |
|---|---|---|
| Hosting ecosystem is LAMP/WordPress/PHP-heavy | Rich SQL, CTEs, window functions, and extensions matter day one | Embedded single-file DB — SQLite |
| Managed MySQL (RDS, Aurora) is already provisioned | JSONB + GIN, full-text, PostGIS, pgvector in one engine | Schema-free document bursts — MongoDB |
| Read scaling via mature async replication is enough | Serializable isolation and complex constraints are critical | Sub-millisecond KV at huge QPS — Redis/DynamoDB |
| Team knows MySQL EXPLAIN and Percona tooling | Logical replication and partial indexes are required | Petabyte analytics — warehouse (BigQuery, Snowflake) |
| PlanetScale-style Vitess sharding path is planned | Advisory locks, NOTIFY, and custom types are core features | Edge SQLite sync — Turso, CRDT local-first |
Common pitfalls
- Default buffer pool too small — out-of-box 128 MB on large servers guarantees disk-bound queries; size to RAM.
- Random UUID primary keys — fragment clustered indexes and slow inserts; use sequential IDs or ordered UUID variants.
- Implicit type coercion — comparing string columns to integers prevents index use; match types in predicates.
SELECT *on wide rows — forces full clustered reads; project columns explicitly, especially with JSON blobs.- Replication lag blindness — reading stale stock from replicas causes oversell; route critical reads to primary.
- utf8 vs utf8mb4 — legacy three-byte charset truncates emoji and some names; migrate tables and connection charset together.
- ORM N+1 queries — missing eager loads multiply round trips;
batch with JOINs or
WHERE id IN (...). - Large transactions in binlog — bulk deletes lock replication
apply; chunk deletes with
LIMITand sleep between batches.
Production checklist
- Set
innodb_buffer_pool_sizeto 60–75% of dedicated server RAM. - Enable
slow_query_logandperformance_schemaor Percona PMM for query visibility. - Use
utf8mb4everywhere — schema, connection, and client. - Index foreign keys and every column in frequent
WHERE/JOINclauses; verify with EXPLAIN ANALYZE. - Configure binlog
ROWformat and GTIDs for replication failover. - Monitor replication lag, buffer pool hit rate, and InnoDB history length.
- Front with connection pooling when sessions exceed ~200 concurrent.
- Automate backups (mysqldump + binlog PITR or snapshot); test restore quarterly.
- Set
max_execution_time(ms) on analytics connections to kill runaway queries. - Run schema migrations through Flyway, Liquibase, or framework tools — never hand-edit prod.
Key takeaways
- MySQL + InnoDB is a mature OLTP engine with clustered primary-key storage and row-level MVCC.
- Primary key choice shapes insert performance and secondary index size — prefer sequential IDs for hot tables.
- Buffer pool sizing and redo-log durability settings dominate baseline latency more than micro-optimizations.
- Binlog replication scales reads but requires explicit read-your-writes routing for consistency-sensitive data.
- MySQL 8 closes many historical gaps (JSON, windows, CTEs) but Postgres still leads on extensions and advanced SQL.
Related reading
- PostgreSQL fundamentals explained — MVCC, WAL, and when Postgres is the better default
- SQL fundamentals explained — SELECT, JOIN, aggregation, and subquery patterns that transfer across engines
- Database indexing explained — B-trees, compound keys, and covering indexes
- Django fundamentals explained — ORM patterns that work with MySQL backends via django.db.backends.mysql