Guide

Game input lag and display latency systems explained

Harbor Brawl shipped rollback netcode and two-frame online input delay — players praised cross-region sets. Offline training mode still scored 6.8/10 on a “controls feel tight” survey. Profiling showed the network stack was innocent: end-to-end input lag on a wired fight stick averaged 89 ms from button press to visible character startup on a 60 Hz TV, even at a locked 60 fps simulation. Input was polled at the end of the frame after physics; triple buffering queued two extra rendered frames; and the UI compositor added another 8 ms on Windows. None of that is netcode — it is local pipeline latency engineers often ignore until players call the game sluggish.

The refactor moved input polling to the start of the simulation tick, capped GPU render-ahead to one frame in fighting mode, disabled triple buffering when vsync was on, and exposed a latency preset (Tournament / Standard / Quality) that trades tearing for milliseconds. Measured end-to-end lag on the same hardware fell from 89 ms to 41 ms; offline survey scores rose to 8.9/10 with no combat animation changes. This guide maps the full controller-to-pixel stack, separates input lag from display lag and network delay, covers measurement methods, vsync and frame-pacing tradeoffs, render-ahead and platform APIs, the Harbor Brawl refactor, a technique decision table versus buffering-only mitigations, pitfalls, and a production checklist. It pairs with rollback netcode, input buffering, and frame data.

Input lag vs display lag vs network delay

Players say “lag” for three different problems. Production teams need separate metrics:

  • Input lag (processing lag) — time from physical input (button closure, stick deflection past deadzone) until the game simulation registers the command on the intended frame. Includes USB polling, OS scheduler jitter, engine poll timing, and simulation queue placement.
  • Display latency (render lag) — time from the simulation frame that committed the action until photons on screen change. Includes render queue depth, GPU work, vsync presentation, scanout position, and panel response time.
  • Network delay — round-trip or one-way packet time plus any intentional input delay in netcode. Rollback reduces felt remote lag but does not shrink local display latency.

End-to-end latency is input lag + display latency for local play. At 60 fps one frame is ~16.7 ms; fighting-game players notice differences below three frames. A game can simulate at 120 fps internally but still feel sluggish if the display presents every other simulated frame two buffers late.

The controller-to-pixel pipeline

A typical frame on PC or console looks like this:

  1. Device sampling — fight sticks poll at 1 ms on USB full speed; wireless adds 4–8 ms typical. DualSense and Xbox controllers batch reports unless configured for low-latency mode.
  2. OS delivery — input events land in a queue; game focus, background throttling, and compositor hooks add variable delay.
  3. Engine poll point — reading input at frame end costs up to one simulation frame versus poll-at-start. Some engines poll twice (start for gameplay, end for UI) — document which path drives combat.
  4. Simulation — deterministic tick advances state; buffered inputs from prior frames may consume here (see input buffering).
  5. Render submission — draw calls queued to GPU; render-ahead of 2–3 frames is common when vsync and triple buffering are enabled.
  6. Presentation — swap chain presents on vblank; late frames wait an extra interval. Variable refresh (G-Sync/FreeSync) can cut wait if configured correctly.
  7. Panel — LCD pixel response adds 1–5 ms on gaming monitors; TVs in Game Mode often add 10–20 ms versus PC panels.

Harbor Brawl's 48 ms surprise came from items 3, 5, and 6 combined — not from combat tuning. Fixing poll timing alone recovered 14 ms; render-ahead cap recovered 22 ms; disabling triple buffering recovered 12 ms on the test rig.

Measuring latency honestly

You cannot optimize what you approximate. Useful methods, from cheap to rigorous:

  • High-speed camera — film button press and screen change at 240+ fps; count frames between LED on the switch and first visible motion. Gold standard for total end-to-end; labor-intensive but catches compositor surprises.
  • Leo Bodnar or similar HDMI latency testers — photodiode on screen detects white flash from a test pattern; reports ms. Good for display path; combine with in-engine flash triggered by raw input for full stack.
  • In-engine instrumentation — on confirmed input edge, tag simulation frame ID and log presentation frame ID when the pose first renders; histogram per platform preset. Essential for regression CI on latency presets.
  • Platform APIs — NVIDIA Reflex Latency Marker and PC Latency Flash Indicator; Xbox GDK frame pacing queries; Steam Input latency overlay where available. Wire markers at poll, sim commit, and present.

Report percentiles, not means: p95 and p99 spike when background apps steal cores. Fighting games should budget latency per platform SKU, not one global number. Compare offline and online separately — rollback corrections are a different UX dimension from local sluggishness.

Poll timing and simulation placement

The cheapest win is when you read input relative to FixedUpdate or your lockstep tick:

  • Poll-at-start — read devices immediately before advancing simulation; inputs apply same frame. Standard for fighting games and rhythm titles.
  • Poll-at-end — input applies next frame; adds 0–1 frame jitter depending on thread scheduling. Common in engines that render first on the main thread.
  • Dedicated input thread — timestamp samples at interrupt time; main thread consumes latest sample at sim start. Reduces OS jitter; requires monotonic clocks and careful sync with rollback savestates.

Rollback netcode assumes inputs are tagged to deterministic frame numbers. If poll timing drifts between clients, you get phantom desyncs. Align poll point with rollback input queues and never read raw HID differently on Windows versus Steam Deck without testing.

Vsync, triple buffering, and frame pacing

Vsync eliminates tearing by presenting only on vertical blank, but it queues frames:

  • Double buffering + vsync — if you miss vblank, you wait an extra full frame (stutter). Latency is often one to two frames when GPU keeps pace.
  • Triple buffering — GPU can render ahead while a frame waits to display; smooths fps dips but adds another full frame of latency. Many PC defaults enable this silently.
  • Fast Sync / Adaptive Sync variants — behavior varies by driver; measure, do not assume.
  • Variable refresh displays — when fps matches refresh and render-ahead is capped, latency can approach double-buffer levels without tearing.

Harbor Brawl's Tournament preset uses vsync off with a 1-frame render-ahead cap and optional tearing — unacceptable for menus, acceptable for locals-only bracket stations. Standard uses adaptive vsync with double buffering only. Quality restores triple buffering for story mode. Expose the tradeoff in settings with ms estimates from your instrumentation, not vague labels.

Render-ahead caps and platform low-latency modes

Modern GPUs and APIs expose hooks to limit queued frames:

  • Maximum frame latency / flip queue length — DXGI MaximumFrameLatency, Vulkan present modes, console SDK flip counts. Set to 1 for latency-sensitive modes.
  • NVIDIA Reflex — Low Latency Mode plus in-game markers; Reflex Boost keeps GPU clocks high to avoid CPU-bound present stalls.
  • AMD Anti-Lag+ — similar intent; validate per title build.
  • Console modes — 120 Hz output with 60 fps sim can halve presentation wait if the OS presents every sim frame on alternating refreshes; document whether you simulate 60 or 120 in fighting mode.

Capping render-ahead can lower average fps during shader compilation spikes. Pair with async pipeline warmup in menus so Tournament mode does not hitch on first super cinematic. Latency and stability are jointly tuned, not independent sliders.

Interaction with hitstop, buffers, and frame data

Other feel systems stack on top of raw latency:

  • Hitstop freezes simulation briefly on impact — it does not add input lag but changes when inputs are legal; players perceive frozen games as laggy if hitstop exceeds design intent (see hitstop guide).
  • Input buffers — add intentional leniency (store early presses) without changing poll-to-pixel ms; conflating buffer windows with lag fixes leads to double-counting leniency.
  • Frame data — a 3-frame startup move plus 5 frames of end-to-end lag is effectively 8 frames before you see motion; competitive players internalize both numbers.

When tuning reversals or one-frame links, fix pipeline latency before widening buffer windows — buffers help execution consistency; they cannot fix sluggish display.

Harbor Brawl offline latency refactor

Problem statement: offline training felt worse than online rollback matches on the same build. Instrumentation tagged input edges and first rendered fighting pose per platform.

  • Baseline p50/p95 — 89 ms / 112 ms end-to-end on Windows + 60 Hz TV; 61 ms / 78 ms on 144 Hz monitor with triple buffering on.
  • Poll-at-start migration — moved HID read before SimTick; rollback input queue unchanged. −14 ms p50.
  • Render-ahead cap — DXGI max latency 1 in Tournament preset. −22 ms p50 on GPU-bound scenes.
  • Triple buffer off in fighting presets — double-buffer + adaptive vsync in Standard. −12 ms on TV test chain.
  • Result — 41 ms / 52 ms p50/p95 on TV; offline tight-controls survey 6.8 → 8.9; no change to online disconnect metrics.

Lesson: netcode excellence does not imply local responsiveness. Ship latency presets and measure per SKU.

Technique decision table

Approach Latency impact Best for Tradeoffs
Poll-at-start + 1-frame render-ahead cap Large local win Fighting, rhythm, competitive action Fps dips may tear or stutter without tuning
Input buffering only None on raw ms Execution leniency, recovery cancels Does not fix display sluggishness
Triple buffering + uncapped queue +1–2 frames lag Cinematic single-player, photo mode Unacceptable for frame-perfect genres
Rollback netcode Fixes remote feel, not local display Peer fighting, sports, RTS Determinism cost; separate local tuning still required
Variable refresh + low latency API Moderate win without tearing PC gaming monitors TV support uneven; test Game Mode paths
Simulate 120 / display 60 Mixed Legacy 60 Hz consoles Animation sampling complexity; measure don't assume

Common pitfalls

  • Optimizing fps but not latency — 300 fps with three queued frames can lose to locked 60 with one queued frame.
  • End-of-frame polling on the main thread — invisible +1 frame in profiling averages.
  • Assuming Steam Input / GDK wrappers are zero-cost — measure wired versus Bluetooth on shipping configs.
  • UI compositor on top of fullscreen — borderless windowed often adds a frame on Windows; exclusive fullscreen or proper flip-discard matters.
  • Confusing buffer leniency with lag fixes — widening buffers masks symptoms and breaks strict link training tools.
  • One latency number in patch notes — without platform and preset context, players cannot reproduce your claims.
  • Ignoring TV Game Mode — document recommended display settings; 20 ms of TV processing dwarfs small engine wins.

Production checklist

  • Define end-to-end latency budget per genre mode (fighting vs exploration).
  • Instrument input edge, sim commit frame, and first-presented pose frame.
  • Establish high-speed camera or photodiode baseline on reference hardware.
  • Poll input at simulation start; timestamp on dedicated thread if needed.
  • Cap GPU flip queue / maximum frame latency to 1 in competitive presets.
  • Ship latency presets with estimated ms and tearing tradeoffs documented.
  • Disable triple buffering in latency-sensitive presets; verify per graphics API.
  • Integrate platform low-latency APIs (Reflex, Anti-Lag) where available.
  • Regression-test latency histograms on OS, driver, and SKU matrix changes.
  • Separate offline display tuning from rollback input delay documentation.
  • Publish recommended display settings (Game Mode, refresh, wired controllers).
  • Do not widen input buffers to compensate for uncapped render-ahead.

Key takeaways

  • End-to-end latency is input processing plus display presentation — netcode is a third axis.
  • Poll-at-start and one-frame render-ahead caps are the highest-leverage local fixes.
  • Triple buffering smooths fps at the cost of a full frame of lag; fighting modes should opt out.
  • Measure with cameras or photodiodes; engine logs alone miss compositor delay.
  • Harbor Brawl cut offline lag 89 ms to 41 ms without touching combat data or netcode.

Related reading