Guide

WebGPU fundamentals explained

Harbor Commerce's merchant analytics dashboard plotted 1.2 million click coordinates into a heatmap every time a store owner changed the date range. A JavaScript loop on the main thread took 380 ms and froze scroll. Moving binning and color normalization into a WebGPU compute shader cut that to 12 ms on a mid-range laptop GPU — without shipping a native app. WebGPU is the W3C standard that exposes modern graphics and general-purpose GPU compute to browsers through a Vulkan/Metal/D3D12 translation layer. It replaces the aging WebGL model with explicit resource management, first-class compute, and WGSL (WebGPU Shading Language). This guide covers adapter and device initialization, buffers and bind groups, render vs compute pipelines, the Harbor Commerce heatmap refactor, a technique decision table against WebGL and WebAssembly, pitfalls, and a production checklist alongside our Web Workers guide (off-main-thread CPU) and game rendering optimization guide (frame budgets and draw-call discipline).

What WebGPU is (and how it differs from WebGL)

WebGL (1.0 and 2.0) wrapped OpenGL ES in a browser sandbox. It worked, but the API is stateful, error-prone, and has no standard compute path — GPGPU hacks through fragment shaders are fragile. WebGPU mirrors how native engines (Vulkan, Metal, Direct3D 12) talk to GPUs: you create explicit buffers, describe pipeline layouts up front, record commands into encoders, and submit command buffers to a queue.

Key differences that matter in production:

Compute shaders — parallel general-purpose kernels (histograms, physics, ML inference) without pretending to draw triangles.
Explicit resources — no hidden global GL state; bind groups declare which buffers and textures a shader reads.
WGSL — a single shading language validated at pipeline creation, instead of GLSL dialects per driver.
Async initialization — navigator.gpu.requestAdapter() and adapter.requestDevice() are promises; handle missing GPU gracefully.
Better multi-threading story — command encoding can be prepared off the main thread in supporting engines (still evolving).

Browser support (2026): Chrome, Edge, Firefox, and Safari ship WebGPU on desktop; mobile coverage is improving but always feature-detect. Keep a Canvas2D or WebGL fallback for unsupported environments.

Core object model

A typical WebGPU app bootstraps in this order:

GPU — entry via navigator.gpu; returns null if unavailable.
GPUAdapter — represents a physical or integrated GPU; choose powerPreference: 'high-performance' or 'low-power'.
GPUDevice — logical connection; creates buffers, textures, pipelines; emits uncapturederror events you must handle.
GPUQueue — submits command buffers and handles buffer-to-buffer copies.
GPUCanvasContext — configured with a GPUTextureFormat (commonly bgra8unorm) to present frames to a <canvas>.

Buffers and memory usage

GPUBuffer objects hold vertex data, uniform parameters, or compute I/O. Set usage flags at creation: GPUBufferUsage.STORAGE for read/write in compute, VERTEX for geometry, COPY_DST when uploading from JavaScript via queue.writeBuffer(). Map staging buffers for large uploads instead of many tiny writes. Alignment rules matter — uniform buffer offsets must be multiples of 256 bytes.

Bind groups and layouts

Shaders never see raw pointers. A GPUBindGroupLayout declares slots (binding 0 = uniform, binding 1 = storage buffer, binding 2 = texture). GPUBindGroup instances attach concrete resources. Pipelines are immutable once created; swapping bind groups between draw/dispatch calls is cheap. This is how you pass the heatmap grid dimensions, input click array, and output color buffer to a compute kernel in one dispatch.

WGSL: writing shaders

WGSL (WebGPU Shading Language) looks like Rust-meets-HLSL. Vertex and fragment entry points drive rendering; @compute @workgroup_size(64) entry points drive parallel kernels. A minimal compute shader that increments every cell in a buffer:

@group(0) @binding(0) var<storage, read_write> data: array<u32>;

@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
  let i = id.x;
  if (i >= arrayLength(&data)) { return; }
  data[i] = data[i] + 1u;
}

Create a GPUShaderModule from the WGSL source, then a GPUComputePipeline pairing that module with a bind group layout. Dispatch with pass.dispatchWorkgroups(workgroupCountX, ...) where workgroupCount = ceil(elementCount / 64) for the example above.

For rendering, a GPURenderPipeline bundles vertex + fragment shaders, vertex buffer layouts, primitive topology, depth/stencil state, and color target formats. Pipeline creation is expensive — cache pipelines per material, not per mesh.

Command encoding and the frame loop

GPUs execute recorded commands, not immediate API calls. Each frame:

Acquire the canvas texture via context.getCurrentTexture().
Create a GPUCommandEncoder.
Begin a compute pass (bin heatmap cells) and/or render pass (draw textured quad).
End passes, call encoder.finish(), queue.submit([commandBuffer]).

Upload dynamic data (camera matrices, date-filter params) with queue.writeBuffer() before encoding. For readback — when JavaScript needs the result — copy the GPU buffer to a mappable staging buffer, await mapAsync(GPUMapMode.READ), then unmap. Readback synchronizes the CPU and GPU; avoid it every frame if the GPU can render directly to the canvas texture instead.

Harbor Commerce heatmap refactor

The analytics team had three requirements: responsive date-range filters, smooth pan/zoom on the heatmap, and no main-thread jank on a 2020-era corporate laptop fleet.

Before

1.2M {x, y} points stored in a Float32Array on the main thread.
JavaScript nested loops bin into a 512×512 grid, normalize counts, map to a color ramp.
380 ms blocking time per filter change; scroll jank during recompute.

After (WebGPU compute + render)

Upload once: click coordinates in a STORAGE | COPY_DST buffer; date filters pass a small uniform (min/max timestamp).
Compute pass 1: parallel atomic adds into a 512×512 r32uint grid (filter in-shader).
Compute pass 2: normalize grid to rgba8unorm texture using a log-scale color ramp encoded as a 256-entry LUT buffer.
Render pass: full-screen triangle samples the texture; pan/zoom via uniform view matrix only — no rebinding points.
Fallback: Web Worker + OffscreenCanvas with simplified 128×128 grid when navigator.gpu is null.

Result: 12 ms GPU time, main thread free for UI; filter changes feel instant. Development cost was higher than a Worker-only fix, but the same pipeline later accelerated a funnel-sankey layout prototype without rewriting the numeric core.

Technique decision table

Approach	Best for	GPU required	Dev complexity	Portability
CPU JavaScript	<100k points, simple charts	No	Low	Universal
Web Worker + WASM	Heavy numeric loops, no drawing API	No	Medium	High
WebGL 2	3D scenes, wide legacy support	Yes	Medium–high	Very high
WebGPU compute + render	Large parallel workloads, modern pipelines	Yes	High	Good (detect fallback)
WebGPU + WASM	Complex sim logic + GPU buffers	Yes	Very high	Good

Reach for WebGPU when data parallelism dominates and you need either compute or cleaner resource management than WebGL allows. Stay on Canvas2D or SVG for simple static charts. Pair WebGPU with Web Workers when preprocessing or networking should never block the render thread.

Common pitfalls

No feature detection — calling requestAdapter() without checking navigator.gpu crashes Safari versions without WebGPU.
Pipeline churn — creating a new render pipeline per UI state change stalls the GPU; use uniform buffers to parameterize one pipeline.
Alignment violations — uniform buffer offsets not aligned to 256 bytes cause validation errors that are easy to miss in devtools.
Excessive readback — mapping GPU buffers every frame for CPU inspection destroys parallelism; keep data on GPU when possible.
Ignoring device loss — GPU reset (driver update, sleep/wake) fires device.lost; rebuild pipelines and re-upload buffers.
Main-thread encoding at scale — recording huge command buffers synchronously can still jank; batch work and profile with WebGPU timing queries.
Assuming mobile parity — storage buffer size limits and float filterability differ; test on target hardware early.

Production checklist

Feature-detect navigator.gpu; ship Canvas2D, WebGL, or Worker fallback.
Request adapter with appropriate powerPreference; log adapter info in debug builds.
Handle device.lost and uncapturederror with telemetry and graceful recovery.
Define bind group layouts before pipelines; reuse bind groups across frames.
Validate WGSL at build time where possible (e.g. @webgpu/types, wgsl-analyzer).
Pool buffers and textures; avoid per-frame allocation.
Profile with browser GPU inspectors; measure compute dispatch and render pass times separately.
Document minimum GPU/driver assumptions for support teams.
Test thermal throttling on laptops and low-power mode on phones.

Key takeaways

WebGPU is the modern browser GPU API — explicit buffers, bind groups, and first-class compute shaders replace WebGL's implicit global state.
WGSL shaders power both rendering and parallel compute; pipeline creation is expensive, so parameterize with uniforms instead of rebuilding pipelines.
Command encoders record work submitted to a queue — keep heavy parallelism on the GPU and minimize CPU readback.
Harbor-style heatmaps and simulations benefit when millions of elements need the same operation per frame.
Always feature-detect, handle device loss, and keep a CPU or WebGL fallback for unsupported clients.