Guide

MySQL fundamentals explained

WordPress, Magento, and thousands of SaaS products still run on MySQL — not because teams ignore Postgres, but because MySQL ships everywhere (managed RDS, Aurora, PlanetScale, shared hosting), boots fast, and handles read-heavy OLTP workloads with mature replication tooling. The catch: defaults tuned for a 2005 laptop will melt under real traffic unless you understand InnoDB — clustered primary-key indexes, the buffer pool, redo and undo logs, row-level locking with MVCC, and the binary log that powers replicas. This guide covers MySQL's architecture, indexing rules that differ from PostgreSQL, replication with GTIDs, JSON and partitioning in MySQL 8, a Harbor Retail inventory schema worked example, when MySQL beats or loses to Postgres, common pitfalls, and a production checklist — alongside our PostgreSQL fundamentals guide, SQL fundamentals guide, and database indexing guide.

What MySQL is

MySQL is an open-source relational database management system originally from Sweden, now maintained by Oracle with a large community fork ecosystem (MariaDB, Percona Server). Applications connect over TCP (port 3306) or a local socket, send SQL, and receive result sets — the same client-server contract as PostgreSQL, but with different storage-engine plug-ins and optimizer behavior.

Since MySQL 5.5, InnoDB is the default and only sensible engine for production. Legacy MyISAM tables lack transactions and crash recovery — migrate them before anything serious depends on them. InnoDB provides ACID transactions, row-level locking, foreign keys, and crash-safe durability through a redo log similar in purpose to PostgreSQL's WAL.

Thread and connection model

MySQL uses a one-thread-per-connection model (with thread pool plugin available on some builds). Each client connection consumes memory for session buffers and sort areas. High-concurrency apps front MySQL with connection pooling (ProxySQL, RDS Proxy, or app-side pools) rather than opening thousands of direct sessions. Unlike Postgres backends, MySQL threads are lighter but still benefit from pool sizing discipline.

InnoDB storage: clustered indexes and the buffer pool

InnoDB's defining design choice is the clustered index: table rows are stored in primary-key order inside the clustered B-tree. There is no separate heap — the primary key is the row layout. Secondary indexes store primary-key values as pointers, so wide or random UUID primary keys bloat every secondary index.

The buffer pool (innodb_buffer_pool_size) caches data and index pages in RAM. This is the single most important tuning knob: on dedicated database servers, allocate 60–75% of RAM to the buffer pool. Cold caches after restart cause temporary read latency spikes until working sets warm.

Durability flows through two logs:

  • Redo log — circular write-ahead log recording page changes before they hit data files; enables crash recovery.
  • Undo log — stores old row versions for MVCC rollbacks and consistent reads; also feeds purge when transactions commit.

innodb_flush_log_at_trx_commit trades durability for speed: value 1 fsyncs every commit (safest); 2 writes to OS cache without fsync (faster, small window of loss on power failure). Production OLTP almost always uses 1 unless you explicitly accept the risk.

MVCC, locking, and isolation

InnoDB implements multi-version concurrency control via undo chains: readers see a snapshot of row history as of their transaction start (in REPEATABLE READ, MySQL's default) without blocking writers. Writers take row locks on update; gap and next-key locks prevent phantom inserts in repeatable-read ranges.

MySQL's default REPEATABLE READ behaves differently from PostgreSQL's — InnoDB uses next-key locking to prevent phantoms, which can surprise developers expecting pure snapshot isolation. READ COMMITTED relaxes gap locks and is common in ORMs that expect Postgres-like semantics. See our isolation levels guide for cross-database comparison.

Deadlocks and long transactions

InnoDB detects deadlocks automatically and rolls back the smaller transaction. Applications must retry idempotent updates. Long-running transactions hold undo history, blocking purge and inflating storage — the same class of problem as Postgres dead tuples, but surfaced through history list length metrics rather than autovacuum alerts.

Indexing strategy in MySQL

Because of clustered indexes, primary key design is a first-class schema decision:

  • Auto-increment integers — sequential inserts, compact secondary indexes, classic choice for high-insert tables.
  • UUIDs — globally unique but random inserts fragment the clustered index; consider UUID_TO_BIN(uuid, swap_flag) (MySQL 8) or time-ordered IDs (ULID, Snowflake) for better locality.
  • Composite primary keys — natural for junction tables; secondary indexes inherit the full key width as suffix.

Secondary B-tree indexes cover most queries. MySQL 8 adds invisible indexes (test drop impact without removing) and improved descending index support. Full-text indexes (InnoDB FTS) handle search boxes; for heavy analytics consider offloading to Elasticsearch or a columnar store rather than forcing OLTP indexes to do OLAP.

Use EXPLAIN ANALYZE (MySQL 8.0.18+) to read actual row counts and timing — the older EXPLAIN estimates alone mislead on large tables. Covering indexes (all selected columns in the index) avoid clustered lookups and are often the biggest win on read-heavy dashboards.

Replication and high availability

MySQL replication is log-based: the primary writes committed transactions to the binary log (binlog) in row, statement, or mixed format. Row format is safest for non-deterministic statements; statement format is compact but fragile with triggers and UUID functions.

Replicas pull binlog events via IO threads and apply them through SQL threads. GTID (global transaction identifiers) since MySQL 5.6 simplify failover: each transaction has a unique ID, so promotion does not depend on matching binlog file positions. Managed services (RDS, Aurora, PlanetScale) automate failover; self-hosted teams use Orchestrator, MHA, or ProxySQL for health checks and read/write splitting.

Replication lag and read scaling

Async replication means replicas can lag seconds (or minutes) behind the primary. Route consistency-sensitive reads — inventory counts, account balances, just-written user sessions — to the primary. Eventual-consistency reads (product catalogs, analytics dashboards) can hit replicas behind a lag-aware proxy. Semi-sync and group replication add durability guarantees at latency cost; most web apps tolerate async with careful read routing.

MySQL 8 features that matter in production

  • JSON column type — binary storage with functional indexes on expressions (ALTER TABLE t ADD INDEX ((CAST(j->>'$.sku' AS CHAR(32))));); convenient for semi-structured attributes without a document store.
  • Window functionsROW_NUMBER, RANK, running totals in SQL without self-joins.
  • Common table expressions (CTEs) — readable recursive and non-recursive queries; watch recursion depth limits.
  • Table partitioning — RANGE, LIST, HASH, and KEY partitioning prune scans on time-series and tenant-id tables; partition pruning must appear in EXPLAIN or you gain nothing.
  • Invisible columns and roles — schema evolution and RBAC without application deploys for every grant change.

MySQL 8 removed the query cache (it did not scale on multi-core); plan caching happens at the prepared-statement layer instead. Character set defaults moved to utf8mb4 — always use full Unicode; legacy utf8 is a three-byte subset that breaks emoji.

Worked example: Harbor Retail inventory schema

Harbor Retail runs a multi-store inventory system on MySQL 8 behind a Rails API. Peak traffic hits during promotional weekends; the schema must support fast SKU lookups, atomic stock decrements, and nightly reconciliation reports.

CREATE TABLE products (
  id            BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  sku           VARCHAR(32) NOT NULL,
  name          VARCHAR(255) NOT NULL,
  attrs         JSON,
  created_at    DATETIME(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
  UNIQUE KEY uk_sku (sku)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

CREATE TABLE store_inventory (
  store_id      SMALLINT UNSIGNED NOT NULL,
  product_id    BIGINT UNSIGNED NOT NULL,
  qty_on_hand   INT NOT NULL DEFAULT 0,
  qty_reserved  INT NOT NULL DEFAULT 0,
  updated_at    DATETIME(6) NOT NULL,
  PRIMARY KEY (store_id, product_id),
  KEY idx_product (product_id),
  CONSTRAINT fk_inv_product FOREIGN KEY (product_id) REFERENCES products(id)
) ENGINE=InnoDB;

-- Atomic decrement with guard against oversell
UPDATE store_inventory
SET qty_on_hand = qty_on_hand - 1,
    updated_at = NOW(6)
WHERE store_id = 12 AND product_id = 88421 AND qty_on_hand > qty_reserved;

Design choices explained:

  • Auto-increment PK on products — compact clustered index; SKU uniqueness enforced separately for business lookups.
  • Composite PK on inventory — one row per store-product pair; updates lock a single row, not a hot global counter.
  • JSON attrs — color/size variants without wide nullable columns; indexed only if filtered in WHERE clauses.
  • Conditional UPDATE — single-statement atomicity avoids read-modify-write races; ROW_COUNT() tells the API if stock was insufficient.

Nightly reports aggregate sales from a read replica; the API writes go to the primary. Lag monitoring alerts if replica falls more than five seconds behind during flash sales — at that point Harbor routes inventory reads to primary until lag recovers.

Database decision table

Choose MySQL when… Prefer PostgreSQL when… Consider something else when…
Hosting ecosystem is LAMP/WordPress/PHP-heavy Rich SQL, CTEs, window functions, and extensions matter day one Embedded single-file DB — SQLite
Managed MySQL (RDS, Aurora) is already provisioned JSONB + GIN, full-text, PostGIS, pgvector in one engine Schema-free document bursts — MongoDB
Read scaling via mature async replication is enough Serializable isolation and complex constraints are critical Sub-millisecond KV at huge QPS — Redis/DynamoDB
Team knows MySQL EXPLAIN and Percona tooling Logical replication and partial indexes are required Petabyte analytics — warehouse (BigQuery, Snowflake)
PlanetScale-style Vitess sharding path is planned Advisory locks, NOTIFY, and custom types are core features Edge SQLite sync — Turso, CRDT local-first

Common pitfalls

  • Default buffer pool too small — out-of-box 128 MB on large servers guarantees disk-bound queries; size to RAM.
  • Random UUID primary keys — fragment clustered indexes and slow inserts; use sequential IDs or ordered UUID variants.
  • Implicit type coercion — comparing string columns to integers prevents index use; match types in predicates.
  • SELECT * on wide rows — forces full clustered reads; project columns explicitly, especially with JSON blobs.
  • Replication lag blindness — reading stale stock from replicas causes oversell; route critical reads to primary.
  • utf8 vs utf8mb4 — legacy three-byte charset truncates emoji and some names; migrate tables and connection charset together.
  • ORM N+1 queries — missing eager loads multiply round trips; batch with JOINs or WHERE id IN (...).
  • Large transactions in binlog — bulk deletes lock replication apply; chunk deletes with LIMIT and sleep between batches.

Production checklist

  • Set innodb_buffer_pool_size to 60–75% of dedicated server RAM.
  • Enable slow_query_log and performance_schema or Percona PMM for query visibility.
  • Use utf8mb4 everywhere — schema, connection, and client.
  • Index foreign keys and every column in frequent WHERE / JOIN clauses; verify with EXPLAIN ANALYZE.
  • Configure binlog ROW format and GTIDs for replication failover.
  • Monitor replication lag, buffer pool hit rate, and InnoDB history length.
  • Front with connection pooling when sessions exceed ~200 concurrent.
  • Automate backups (mysqldump + binlog PITR or snapshot); test restore quarterly.
  • Set max_execution_time (ms) on analytics connections to kill runaway queries.
  • Run schema migrations through Flyway, Liquibase, or framework tools — never hand-edit prod.

Key takeaways

  • MySQL + InnoDB is a mature OLTP engine with clustered primary-key storage and row-level MVCC.
  • Primary key choice shapes insert performance and secondary index size — prefer sequential IDs for hot tables.
  • Buffer pool sizing and redo-log durability settings dominate baseline latency more than micro-optimizations.
  • Binlog replication scales reads but requires explicit read-your-writes routing for consistency-sensitive data.
  • MySQL 8 closes many historical gaps (JSON, windows, CTEs) but Postgres still leads on extensions and advanced SQL.

Related reading