Guide
WebGPU fundamentals explained
Harbor Commerce's merchant analytics dashboard plotted 1.2 million click coordinates into a heatmap every time a store owner changed the date range. A JavaScript loop on the main thread took 380 ms and froze scroll. Moving binning and color normalization into a WebGPU compute shader cut that to 12 ms on a mid-range laptop GPU — without shipping a native app. WebGPU is the W3C standard that exposes modern graphics and general-purpose GPU compute to browsers through a Vulkan/Metal/D3D12 translation layer. It replaces the aging WebGL model with explicit resource management, first-class compute, and WGSL (WebGPU Shading Language). This guide covers adapter and device initialization, buffers and bind groups, render vs compute pipelines, the Harbor Commerce heatmap refactor, a technique decision table against WebGL and WebAssembly, pitfalls, and a production checklist alongside our Web Workers guide (off-main-thread CPU) and game rendering optimization guide (frame budgets and draw-call discipline).
What WebGPU is (and how it differs from WebGL)
WebGL (1.0 and 2.0) wrapped OpenGL ES in a browser sandbox. It worked, but the API is stateful, error-prone, and has no standard compute path — GPGPU hacks through fragment shaders are fragile. WebGPU mirrors how native engines (Vulkan, Metal, Direct3D 12) talk to GPUs: you create explicit buffers, describe pipeline layouts up front, record commands into encoders, and submit command buffers to a queue.
Key differences that matter in production:
- Compute shaders — parallel general-purpose kernels (histograms, physics, ML inference) without pretending to draw triangles.
- Explicit resources — no hidden global GL state; bind groups declare which buffers and textures a shader reads.
- WGSL — a single shading language validated at pipeline creation, instead of GLSL dialects per driver.
- Async initialization —
navigator.gpu.requestAdapter()andadapter.requestDevice()are promises; handle missing GPU gracefully. - Better multi-threading story — command encoding can be prepared off the main thread in supporting engines (still evolving).
Browser support (2026): Chrome, Edge, Firefox, and Safari ship WebGPU on desktop; mobile coverage is improving but always feature-detect. Keep a Canvas2D or WebGL fallback for unsupported environments.
Core object model
A typical WebGPU app bootstraps in this order:
- GPU — entry via
navigator.gpu; returns null if unavailable. - GPUAdapter — represents a physical or integrated GPU; choose
powerPreference: 'high-performance'or'low-power'. - GPUDevice — logical connection; creates buffers, textures, pipelines; emits
uncapturederrorevents you must handle. - GPUQueue — submits command buffers and handles buffer-to-buffer copies.
- GPUCanvasContext — configured with a
GPUTextureFormat(commonlybgra8unorm) to present frames to a<canvas>.
Buffers and memory usage
GPUBuffer objects hold vertex data, uniform parameters, or
compute I/O. Set usage flags at creation:
GPUBufferUsage.STORAGE for read/write in compute,
VERTEX for geometry,
COPY_DST when uploading from JavaScript via
queue.writeBuffer(). Map staging buffers for large uploads
instead of many tiny writes. Alignment rules matter — uniform buffer
offsets must be multiples of 256 bytes.
Bind groups and layouts
Shaders never see raw pointers. A GPUBindGroupLayout declares slots (binding 0 = uniform, binding 1 = storage buffer, binding 2 = texture). GPUBindGroup instances attach concrete resources. Pipelines are immutable once created; swapping bind groups between draw/dispatch calls is cheap. This is how you pass the heatmap grid dimensions, input click array, and output color buffer to a compute kernel in one dispatch.
WGSL: writing shaders
WGSL (WebGPU Shading Language) looks like Rust-meets-HLSL.
Vertex and fragment entry points drive rendering; @compute @workgroup_size(64)
entry points drive parallel kernels. A minimal compute shader that increments
every cell in a buffer:
@group(0) @binding(0) var<storage, read_write> data: array<u32>;
@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) id: vec3<u32>) {
let i = id.x;
if (i >= arrayLength(&data)) { return; }
data[i] = data[i] + 1u;
}
Create a GPUShaderModule from the WGSL source, then a
GPUComputePipeline pairing that module with a bind group layout.
Dispatch with pass.dispatchWorkgroups(workgroupCountX, ...) where
workgroupCount = ceil(elementCount / 64) for the example above.
For rendering, a GPURenderPipeline bundles vertex + fragment shaders, vertex buffer layouts, primitive topology, depth/stencil state, and color target formats. Pipeline creation is expensive — cache pipelines per material, not per mesh.
Command encoding and the frame loop
GPUs execute recorded commands, not immediate API calls. Each frame:
- Acquire the canvas texture via
context.getCurrentTexture(). - Create a GPUCommandEncoder.
- Begin a compute pass (bin heatmap cells) and/or render pass (draw textured quad).
- End passes, call
encoder.finish(),queue.submit([commandBuffer]).
Upload dynamic data (camera matrices, date-filter params) with
queue.writeBuffer() before encoding. For readback — when
JavaScript needs the result — copy the GPU buffer to a mappable staging
buffer, await mapAsync(GPUMapMode.READ), then unmap. Readback
synchronizes the CPU and GPU; avoid it every frame if the GPU can render
directly to the canvas texture instead.
Harbor Commerce heatmap refactor
The analytics team had three requirements: responsive date-range filters, smooth pan/zoom on the heatmap, and no main-thread jank on a 2020-era corporate laptop fleet.
Before
- 1.2M
{x, y}points stored in a Float32Array on the main thread. - JavaScript nested loops bin into a 512×512 grid, normalize counts, map to a color ramp.
- 380 ms blocking time per filter change; scroll jank during recompute.
After (WebGPU compute + render)
- Upload once: click coordinates in a
STORAGE | COPY_DSTbuffer; date filters pass a small uniform (min/max timestamp). - Compute pass 1: parallel atomic adds into a 512×512
r32uintgrid (filter in-shader). - Compute pass 2: normalize grid to
rgba8unormtexture using a log-scale color ramp encoded as a 256-entry LUT buffer. - Render pass: full-screen triangle samples the texture; pan/zoom via uniform view matrix only — no rebinding points.
- Fallback: Web Worker + OffscreenCanvas with simplified 128×128 grid when
navigator.gpuis null.
Result: 12 ms GPU time, main thread free for UI; filter changes feel instant. Development cost was higher than a Worker-only fix, but the same pipeline later accelerated a funnel-sankey layout prototype without rewriting the numeric core.
Technique decision table
| Approach | Best for | GPU required | Dev complexity | Portability |
|---|---|---|---|---|
| CPU JavaScript | <100k points, simple charts | No | Low | Universal |
| Web Worker + WASM | Heavy numeric loops, no drawing API | No | Medium | High |
| WebGL 2 | 3D scenes, wide legacy support | Yes | Medium–high | Very high |
| WebGPU compute + render | Large parallel workloads, modern pipelines | Yes | High | Good (detect fallback) |
| WebGPU + WASM | Complex sim logic + GPU buffers | Yes | Very high | Good |
Reach for WebGPU when data parallelism dominates and you need either compute or cleaner resource management than WebGL allows. Stay on Canvas2D or SVG for simple static charts. Pair WebGPU with Web Workers when preprocessing or networking should never block the render thread.
Common pitfalls
- No feature detection — calling
requestAdapter()without checkingnavigator.gpucrashes Safari versions without WebGPU. - Pipeline churn — creating a new render pipeline per UI state change stalls the GPU; use uniform buffers to parameterize one pipeline.
- Alignment violations — uniform buffer offsets not aligned to 256 bytes cause validation errors that are easy to miss in devtools.
- Excessive readback — mapping GPU buffers every frame for CPU inspection destroys parallelism; keep data on GPU when possible.
- Ignoring device loss — GPU reset (driver update, sleep/wake) fires
device.lost; rebuild pipelines and re-upload buffers. - Main-thread encoding at scale — recording huge command buffers synchronously can still jank; batch work and profile with WebGPU timing queries.
- Assuming mobile parity — storage buffer size limits and float filterability differ; test on target hardware early.
Production checklist
- Feature-detect
navigator.gpu; ship Canvas2D, WebGL, or Worker fallback. - Request adapter with appropriate
powerPreference; log adapter info in debug builds. - Handle
device.lostanduncapturederrorwith telemetry and graceful recovery. - Define bind group layouts before pipelines; reuse bind groups across frames.
- Validate WGSL at build time where possible (e.g.
@webgpu/types, wgsl-analyzer). - Pool buffers and textures; avoid per-frame allocation.
- Profile with browser GPU inspectors; measure compute dispatch and render pass times separately.
- Document minimum GPU/driver assumptions for support teams.
- Test thermal throttling on laptops and low-power mode on phones.
Key takeaways
- WebGPU is the modern browser GPU API — explicit buffers, bind groups, and first-class compute shaders replace WebGL's implicit global state.
- WGSL shaders power both rendering and parallel compute; pipeline creation is expensive, so parameterize with uniforms instead of rebuilding pipelines.
- Command encoders record work submitted to a queue — keep heavy parallelism on the GPU and minimize CPU readback.
- Harbor-style heatmaps and simulations benefit when millions of elements need the same operation per frame.
- Always feature-detect, handle device loss, and keep a CPU or WebGL fallback for unsupported clients.
Related reading
- WebAssembly (WASM) explained — near-native CPU modules and when to pair WASM with GPU buffers
- Web Workers explained — off-main-thread JavaScript and OffscreenCanvas patterns
- Game rendering optimization explained — draw calls, batching, and frame budgets
- Edge AI and on-device inference explained — running models in the browser with WebGPU backends