Guide

Playwright E2E testing explained

Your unit tests pass. Your staging deploy is green. Then a user reports that checkout fails on mobile Safari because a cookie banner covers the Pay button. That class of bug lives at the integration boundary between your code, the browser, the network, and real DOM timing — exactly where end-to-end (E2E) tests operate. Playwright is Microsoft's open-source browser automation library: it drives Chromium, Firefox, and WebKit from Node.js, Python, or Java with a single API, auto-waits for elements to be actionable, and ships a trace viewer that replays failures frame by frame. This guide covers where E2E fits in the test pyramid, Playwright project setup, locator strategies, fixtures and page objects, network interception, parallel CI runs, debugging flaky tests, a Harbor Supply checkout flow worked example, a tooling decision table, common pitfalls, and a production checklist — alongside our software testing fundamentals guide, CI/CD pipelines explainer, and TypeScript fundamentals overview.

Where E2E testing fits

The test pyramid recommends many fast unit tests at the base, fewer integration tests in the middle, and a thin cap of slow browser tests at the top. E2E tests are expensive: they launch real browsers, hit real (or staging) servers, and run sequentially unless you invest in parallel workers. Use them for critical user journeys that cheaper tests cannot validate: login, checkout, payment confirmation, permission-gated admin actions, and cross-page flows that depend on cookies, redirects, or third-party embeds.

Playwright does not replace unit or API tests. A well-designed suite runs 10–30 focused scenarios — not hundreds of UI permutations better covered by component tests. Each E2E test should answer: “If this breaks, would we lose revenue or trust tonight?”

Playwright vs alternatives

  • Cypress — excellent DX for single-tab SPAs; historically weaker on multi-tab, iframes, and native mobile browsers. Playwright targets all major engines out of the box.
  • Selenium WebDriver — industry standard for a decade; verbose setup, manual waits, and flaky synchronization unless heavily wrapped. Playwright's auto-waiting removes most boilerplate.
  • Puppeteer — Chromium-only; Playwright forked from the same team and added Firefox/WebKit plus richer test runner integration.

Project setup and first test

Initialize with npm init playwright@latest. The scaffold creates playwright.config.ts, a tests/ folder, and installs browser binaries. Key config knobs:

  • baseURL — prefix for page.goto('/checkout') so tests stay environment-portable.
  • projects — matrix across browsers and viewports (desktop Chrome, mobile Safari).
  • retries — retry failed tests in CI only (not locally) to absorb transient infra noise.
  • trace: 'on-first-retry' — capture traces when a retry fires; invaluable for debugging.

A minimal test uses the built-in test runner (@playwright/test):

import { test, expect } from '@playwright/test';

test('homepage shows primary CTA', async ({ page }) => {
  await page.goto('/');
  await expect(page.getByRole('link', { name: 'Get started' })).toBeVisible();
});

The page fixture is a fresh browser tab per test. Playwright isolates storage state so parallel tests do not leak cookies.

Locators: finding elements reliably

Fragile selectors (div > div:nth-child(3) > button) break when designers reorder markup. Playwright's locators re-query the DOM on every action, surviving re-renders in React and Vue. Prefer this priority order:

  1. getByRole('button', { name: 'Submit order' }) — mirrors accessibility tree; resilient and user-centric.
  2. getByLabel('Email address') — ties to form labels.
  3. getByText('Order confirmed') — visible copy users see.
  4. getByTestId('checkout-submit') — explicit test hooks when roles are ambiguous.
  5. CSS/XPath — last resort for canvas or legacy widgets.

Auto-waiting is Playwright's core reliability feature. click(), fill(), and assertions automatically wait until the element is attached, visible, stable (not animating), enabled, and not obscured. You should almost never call page.waitForTimeout() — replace sleeps with expect(locator).toBeVisible() or page.waitForURL().

Chaining and strict mode

Locators chain: page.getByRole('row').filter({ hasText: 'SKU-4421' }).getByRole('button', { name: 'Remove' }). By default Playwright enforces strict mode: if a locator matches multiple elements, the test fails loudly instead of clicking the wrong one.

Fixtures, page objects, and test data

As suites grow, duplicate login steps clutter every file. Playwright fixtures extend the test harness:

import { test as base } from '@playwright/test';

export const test = base.extend({
  authenticatedPage: async ({ page }, use) => {
    await page.goto('/login');
    await page.getByLabel('Email').fill('qa@harbor.supply');
    await page.getByLabel('Password').fill(process.env.QA_PASSWORD!);
    await page.getByRole('button', { name: 'Sign in' }).click();
    await page.waitForURL('/dashboard');
    await use(page);
  },
});

The page object model wraps locators and actions in a class (CheckoutPage.fillShipping(...)) so tests read as user stories. Keep page objects thin — business assertions stay in the test file; page objects expose verbs, not expectations.

Store credentials in environment variables or a secrets manager, never in git. Use storageState to save cookies after one login and reuse across tests, cutting suite time dramatically.

Network control and isolation

E2E tests against production are dangerous and slow. Point baseURL at a staging stack. When you need determinism:

  • page.route('**/api/inventory/**', route => route.fulfill({ json: mockStock })) — stub API responses.
  • page.route('**/*', route => route.abort()) plus allowlist — block analytics and ads that slow tests.
  • request fixture — hit REST APIs directly for setup (seed cart) without clicking through UI.

Mock third-party payment iframes carefully: stub the network contract your app expects after redirect, or use the provider's sandbox with test card numbers. Never run real charges in CI.

CI integration and parallelism

Playwright shards across machines: npx playwright test --shard=1/4 splits the suite for four CI runners. Install browsers in CI with npx playwright install --with-deps chromium (full matrix only when you truly need cross-browser gates).

Typical GitHub Actions pattern: build the app, start the server in background, wait for health endpoint, run tests with CI=true (enables retries), upload playwright-report and trace artifacts on failure. Wire this into your CI/CD pipeline as a required check on main — but keep the suite under ~15 minutes or developers will bypass it.

Debugging failures

  • npx playwright test --ui — interactive mode with time travel.
  • --debug — step through with inspector.
  • Trace viewer — open trace.zip to see DOM snapshots, network log, and console per step.
  • Screenshots and video on failure — enable in config for CI artifacts.

Worked example: Harbor Supply checkout flow

Harbor Supply runs a B2B parts catalog. The critical journey: search SKU, add to cart, enter PO number, confirm shipping, reach payment stub. A focused Playwright spec:

  1. Arrange — fixture logs in as buyer@harbor.supply; API route stubs inventory so SKU-7740 always shows 50 units.
  2. Actpage.getByPlaceholder('Search parts').fill('SKU-7740'); click first result; getByRole('button', { name: 'Add to cart' }); navigate to cart; fill PO field; click Continue.
  3. Assertexpect(page.getByRole('heading', { name: 'Payment' })).toBeVisible(); expect(page.getByText('Order total: $')).toContainText('284.00'); verify cart badge cleared after completion.

Run the same spec on projects: [{ name: 'mobile', use: { ...devices['iPhone 13'] } }] to catch the cookie-banner overlap bug. One test, two engines of confidence.

Tooling decision table

Scenario Recommended approach Avoid
Pure function logic Unit test (Jest/Vitest) Playwright for math helpers
REST handler contracts API integration test with supertest Clicking through UI to hit one endpoint
React component variants Component test (Testing Library) Full browser per prop combination
Login + checkout + email Playwright E2E on staging Mocking entire browser in unit tests
Visual pixel diffs Playwright toHaveScreenshot() with threshold Screenshot every page on every commit
Load at 10k RPS Dedicated load tool (k6, Locust) Parallel Playwright as load generator

Common pitfalls

  • Testing implementation details — asserting on internal CSS classes instead of roles and visible outcomes.
  • Hard-coded waitForTimeout — masks races; fails randomly when CI is slow.
  • Oversized suite — 200 E2E tests that take an hour; team stops running them.
  • Shared mutable state — tests depending on order; use isolated accounts or DB transactions per test.
  • Production targets — flakiness from CDN, rate limits, and real payments; always use staging.
  • Ignoring trace artifacts — re-running locally without the CI trace wastes hours.
  • Disabling strict mode globally — hides duplicate-element bugs that users hit in edge layouts.

Production checklist

  • Define 10–20 critical journeys; map each to a single Playwright spec.
  • Use role- and label-based locators; add data-testid only where needed.
  • Configure baseURL, retries on CI, and trace-on-first-retry.
  • Reuse auth via storageState; never commit passwords.
  • Stub external payment, email, and analytics in CI.
  • Shard across workers; keep total runtime under 15 minutes.
  • Upload HTML report and traces as CI artifacts on failure.
  • Run mobile viewport on at least one project for layout regressions.
  • Gate merges on E2E green alongside unit tests per your CI/CD pipeline.

Key takeaways

  • Playwright drives real Chromium, Firefox, and WebKit with auto-waiting that eliminates most manual synchronization.
  • Prefer getByRole and getByLabel locators over brittle CSS paths.
  • Keep the E2E suite small, fast, and focused on revenue-critical paths.
  • Fixtures, page objects, and storageState keep suites maintainable as flows grow.
  • Traces and the UI mode turn flaky-test debugging from guesswork into replay.

Related reading