Guide

Software testing fundamentals explained: unit, integration, and end-to-end tests

Software testing is how you prove code behaves as intended before users or money find the gaps. A good test suite catches regressions early, documents expected behavior, and makes refactors safe. A bad one is slow, flaky, and ignored — teams merge with red CI or disable suites entirely. This guide covers the test pyramid (many fast unit tests, fewer integration tests, a thin slice of browser E2E), how to write tests that fail for the right reasons, when to mock vs hit real dependencies, and how testing fits into CI/CD pipelines and production observability.

Why test at all

Manual clicking through every flow before each deploy does not scale. Automated tests run in seconds on every pull request, giving reviewers confidence that a change did not break checkout, authentication, or payout logic. Tests are also living documentation: a well-named test case states the rule (“refunds cannot exceed original charge”) more clearly than a comment that drifts out of date.

The goal is not 100% line coverage — it is risk coverage. Prioritize paths that lose money, leak data, or are hard to debug in production: payment verification, auth boundaries, idempotent webhooks, and anything involving concurrency. Cosmetic CSS tweaks rarely need E2E tests; fee calculation does.

The test pyramid

Mike Cohn’s test pyramid remains the default mental model:

Unit tests (base, largest layer) — test one function, class, or module in isolation. Milliseconds each; thousands can run on every commit.
Integration tests (middle) — exercise real boundaries: API handler + database, message consumer + queue, service + HTTP client against a test double server.
End-to-end (E2E) tests (top, smallest) — drive the product like a user through a browser or mobile shell; slow and brittle if overused.

Invert the pyramid — hundreds of Selenium tests, zero unit tests — and CI takes 45 minutes, flakes block releases, and nobody knows which layer failed. Healthy teams run unit + integration on every PR and reserve E2E for critical user journeys (sign up, pay, export data).

Contract and snapshot tests

Contract tests verify that a consumer and provider agree on an API shape without running both services together — useful in microservice setups. Snapshot tests compare rendered output to a saved baseline; handy for React components but dangerous when developers blindly “update all snapshots” without reading diffs. Treat snapshots as regression guards, not substitutes for assertions on behavior.

Writing good unit tests

The common pattern is Arrange – Act – Assert:

Arrange — set up inputs, mocks, and initial state.
Act — call the function under test once.
Assert — check outputs, side effects, or thrown errors.

One logical behavior per test. Name tests after the scenario: rejectsWithdrawalWhenBalanceInsufficient beats testWithdrawal2. Avoid testing private methods directly — test through the public API; private helpers are implementation details.

Determinism and isolation

Flaky unit tests usually come from hidden dependencies on time, randomness, network, or global state. Inject a Clock or pass Date.now as a parameter. Seed random generators in tests. Reset databases and singletons between cases. If order matters, your tests are coupled — fix setup, not execution order.

Test doubles: mocks, stubs, fakes, and spies

Test doubles replace slow or non-deterministic collaborators:

Stub — returns canned responses (e.g. fake exchange rate).
Mock — records calls and asserts they happened (e.g. “email service was invoked once with this address”).
Fake — working lightweight implementation (in-memory database, embedded SQLite).
Spy — wraps a real object to observe calls without replacing it.

Mock only boundaries you do not own — payment gateways, third-party APIs, the system clock. Mocking your own database layer in every unit test means you never catch SQL bugs; use a real test database in integration tests instead. Over-mocking produces tests that pass while production fails because the mock diverged from reality.

Integration testing in practice

Integration tests answer: “Do these pieces wire together correctly?” Typical setups:

API tests — spin up the HTTP server (or import the framework test client), send requests, assert status codes and JSON bodies. Validate error shapes match your REST API conventions.
Database tests — migrate a throwaway schema, run transactions, roll back or truncate between tests. Use the same SQL dialect as production; SQLite in memory is fast but hides Postgres-specific bugs.
Message queue tests — publish to a test topic, consume in-process, assert side effects. See message queues explained for delivery semantics you must test (at-least-once, idempotent handlers).

Run integration tests in CI with Docker Compose services (Postgres, Redis, Kafka) or cloud emulators. Tag slow suites (@integration) so developers can run fast unit tests locally while CI runs the full matrix.

End-to-end and browser testing

E2E tools — Playwright, Cypress, Selenium — launch a real browser, navigate URLs, click buttons, and assert visible text. They catch bugs unit tests miss: broken routing, CORS misconfiguration, hydration mismatches in React, and wallet connect flows that only fail with a real extension popup.

Keep E2E suites small and stable:

Test journeys, not every button color — one happy path per critical feature.
Use data-testid attributes or roles instead of brittle CSS selectors.
Stub external networks where possible; do not hit production payment APIs from CI.
Parallelize carefully — shared staging accounts cause race conditions.

E2E complements but does not replace lower layers. A failing Playwright test should narrow quickly: reproduce locally, check network tab, then drop down to an integration test once you know which API misbehaved.

TDD and when to write tests

Test-driven development (TDD) cycles red → green → refactor: write a failing test, implement the minimum code to pass, clean up. TDD shines for pure logic (pricing rules, parsers, validation) where examples are easy to enumerate. It is less natural for exploratory UI work — there, test-after or test-alongside is fine.

Practical rule: never merge a bug fix without a test that would have caught it. That single habit prevents repeat incidents more than any coverage percentage target.

Flaky tests and CI hygiene

A flaky test passes and fails on the same commit. Flakes erode trust — teams rerun CI until green and ship real bugs. Common fixes:

Replace sleep(5000) with waiting for a condition (Playwright expect(locator).toBeVisible()).
Avoid shared mutable fixtures; create fresh users per test.
Do not depend on wall-clock timing for async JavaScript — see JavaScript event loop for microtask ordering pitfalls in tests.
Quarantine chronic flakes: mark @flaky, file a ticket, delete or fix within a sprint — never retry: 5 permanently.

Wire tests into CI/CD as hard gates on main: lint, unit, integration, then a nightly or pre-release E2E job if the full browser suite is slow. Pair deploys with feature flags so a test gap does not require an emergency rollback — toggle off the new path while you add coverage.

Testing vs monitoring

Tests prove behavior in controlled environments; monitoring proves it in production. They overlap but do not substitute:

Tests catch regressions before deploy; alerts catch unknown unknowns after.
Synthetic checks (pingdom-style) are E2E tests running 24/7 against prod — keep them read-only and idempotent.
Canary deploys with error-rate comparison are integration tests at scale — see observability guides for SLO-based rollback triggers.

When an incident happens, add a test case that reproduces the failure mode, then fix the code. That closes the loop between production signals and preventive automation.

Starter checklist

Fast unit suite runs in under two minutes locally and on every PR.
Integration tests cover auth, payments, and database migrations you actually ship.
E2E covers no more than five critical user paths; failures include screenshots and traces.
No test depends on execution order or shared global state.
Flaky tests are quarantined or deleted — not infinitely retried.
Every production bug fix includes a regression test.
CI blocks merge on red; staging deploy runs the same commands as production.