Guide

Game AI perception explained

A guard cannot chase what it never saw. Perception is the sensory layer that sits before utility AI, behavior trees, and steering behaviors decide what to do. It answers: does this agent know the player exists, where they were last seen, and how confident is that knowledge? Stealth games, open-world patrols, and squad shooters all depend on believable sight and hearing — not omniscient enemies that snap to your coordinates through walls. This guide covers vision cones and line-of-sight, hearing and noise propagation, memory and alert states, stealth detection meters, performance budgets, and how perception data flows into the rest of your AI stack.

What perception is (and what it is not)

Perception is a filter on world truth. The simulation knows exactly where every entity is; the NPC only knows what its sensors report. That gap is the game. A perception system typically outputs a small struct per agent:

Target list — entities currently sensed (player, allies, items).
Last known position — where a lost target was last confirmed.
Alert level — calm, suspicious, combat, or custom tiers.
Stimulus log — recent events (gunshot heard, corpse found).

Perception is not pathfinding — once you know where to go, A* or navmesh queries handle navigation. It is also not decision-making: perception writes to a blackboard that BTs and utility scorers read. Keeping the boundary clean lets you tune "how sharp are the guards' eyes?" without rewriting chase logic.

Vision: field of view and line of sight

The classic guard cone is a frustum in 2D or 3D: a maximum distance, a horizontal angle (often 90–120 degrees), and sometimes a vertical limit. Cheap rejection tests run first:

Distance — ignore targets beyond sight radius.
Angle — dot product between forward vector and vector-to-target; outside the cone means invisible even if close behind.
Line of sight (LOS) — one or more raycasts from eye height to target center (or feet/head for crouch/prone). A hit on world geometry blocks vision; a hit on the target confirms visibility.

LOS is the expensive step. Common optimizations:

Staggered checks — not every agent raycasts every frame; rotate through a budget of N tests per tick aligned with your frame budget.
Multi-sample rays — three rays (center, left shoulder, right shoulder) reduce "seen through a pixel gap" exploits without full mesh visibility.
Peripheral vs central vision — targets in the outer wedge of the cone require closer distance or slower detection buildup (see stealth meters below).
Spatial partitions — only test LOS against entities in the same grid cell or broad-phase bucket.

Lighting and camouflage can modulate effective range: a dark corner might halve detection rate; a bright flashlight widens the cone. These modifiers attach to the perception layer as scalar multipliers, not hard-coded inside chase states.

Hearing: noise events and propagation

Vision requires facing the target. Hearing is omnidirectional but attenuates with distance. Games model sound as ephemeral stimuli:

{
  position: Vector3,
  loudness: 0.0–1.0,      // footstep vs suppressed pistol
  radius: 15.0,            // max audible distance at full loudness
  sourceTag: "gunshot",    // for bark selection / squad alert
  timestamp: gameTime
}

Each frame (or on event), agents within radius receive the stimulus if nothing blocks it — often a simpler LOS ray than vision, or a 2D occlusion map for indoor levels. Loudness typically falls off linearly or with inverse distance: heard = loudness * (1 - distance/radius). Below a threshold, the event is ignored.

Hearing enables design beats vision cannot: a player behind a guard can distract with a thrown bottle, triggering investigation without direct sight. Squad games broadcast "heard gunfire" to nearby allies so one witness pulls the whole patrol into alert. Tie audio design to perception — if the player cannot hear their own footsteps, enemies should not react to silent animation cycles.

Memory, suspicion, and alert states

Real perception is temporal. When LOS breaks, the agent does not instantly forget. A typical state machine:

Unaware — patrolling; no valid target.
Suspicious — partial stimulus (peripheral glimpse, distant noise); agent walks toward last stimulus point.
Alert / combat — confirmed sight or loud close noise; chase or engage using utility-scored actions.
Search — lost LOS; move to last known position, sweep nearby cover points, then decay back to unaware after a timeout.

Store lastKnownPosition and lastSeenTime on the blackboard. Search behavior can query cover nodes within a radius of that point — no need to cheat and know the player's live coordinates. Decay timers prevent infinite hunt: if the player stays hidden for 30–60 seconds, alert drops one tier. Designers tune these numbers per enemy type: a bloodhound tracks longer than a distracted civilian.

Corpse and evidence stimuli extend memory across agents: finding a body raises alert even if the killer is gone. Implement as a one-shot perception event when an agent's vision cone overlaps a tagged object.

Stealth detection meters

Binary visible/invisible feels harsh in immersive sims. A detection meter accumulates while the player sits in an NPC's vision cone with clear LOS:

detection += baseRate * distanceFactor * lightingFactor * movementFactor * dt
if detection >= 1.0: transition to Alert

Crouching lowers movementFactor; standing in shadow lowers lightingFactor; peripheral placement lowers baseRate. When LOS breaks, detection decays faster than it builds — rewarding brief exposures. UI feedback (icons, rim light, audio sting) maps meter bands to player-readable "almost spotted" tension without revealing exact numbers.

Meters also smooth multiplayer desync: a few frames of contested LOS do not flip combat mode instantly. Cap accumulation per second so frame-rate differences do not change difficulty.

Feeding perception into behavior systems

Perception should write, not decide. Typical blackboard keys:

hasLineOfSightToPlayer: bool
playerDistance: float
lastKnownPlayerPos: Vector3?
alertLevel: enum
heardStimulusThisTick: Stimulus?

Behavior tree decorators gate branches: "Chase" only runs if hasLineOfSightToPlayer or alertLevel == Search. Utility AI considerations read the same keys: "Flee" utility rises when playerDistance < 5 and health is low. Steering behaviors execute after the decision: seek toward lastKnownPlayerPos during search, flee along a vector away from live position during combat.

Run perception on a slower tick than physics (4–10 Hz) for distant agents; snap to every frame for agents in active combat. LOD for AI perception mirrors rendering LOD: far enemies use distance-only checks without raycasts until the player closes in.

Multiplayer and authority

In server-authoritative shooters, the server owns perception. Clients may predict alert animations for responsiveness, but damage and chase targeting must use server-confirmed sight. Common patterns:

Server runs LOS from each AI agent against replicated player positions.
Stimuli (gunshots) are RPC events with position and loudness; server fan-out to affected AI.
Last-known-position replicates so all clients see guards investigate the same spot.

Cheaters exploit AI if clients report "I am hidden." Never trust client stealth flags. For PvP stealth, some titles use server-side vision cones with latency-compensated positions — the same rewind techniques used for hit registration.

Performance checklist

Budget raycasts per frame; queue and round-robin across agents.
Broad-phase cull before narrow LOS — grid, BVH, or engine physics overlap queries.
Cache static occlusion where possible (precomputed visibility graphs for indoor levels).
Share hearing stimuli — one gunshot event, many listeners, single spatial query.
Disable perception for off-screen agents beyond a distance threshold unless gameplay requires it (e.g. stealth radar).
Profile: perception often dominates AI cost when 50+ agents raycast every frame.

Production checklist

Define stimulus types (vision, hearing, touch, social alert) and blackboard schema before wiring BTs.
Visualize cones and last-known positions in debug builds — designers cannot tune blind.
Playtest peripheral vision and crouch edges; players exploit exactly your coarsest LOS sample.
Match hearing radii to audio falloff so feedback feels fair.
Tune alert decay and search duration per enemy archetype, not globally.
In multiplayer, verify server-only perception with packet-loss simulation.
Document cheat toggles (infinite awareness off for QA) separately from shipping defaults.

Key takeaways

Perception filters world truth into what each NPC knows — it should not also choose actions.
Vision = distance + angle + LOS raycasts; hearing = attenuated omnidirectional stimuli.
Last known position and alert decay sell believable search behavior without omniscient AI.
Detection meters trade binary snaps for readable stealth tension.
Budget raycasts and stagger updates — perception is a common AI performance sink.