Guide

Game animation blending explained

A character that snaps instantly from idle to sprint looks broken. A character that crossfades smoothly between walk, jog, and run while turning feels alive. That difference is animation blending — the math and tooling that combines multiple skeletal poses each frame instead of playing one clip in isolation. Blending sits between raw keyframe data and what players perceive as weight, momentum, and responsiveness. This guide covers skeletal animation basics, crossfade transitions, 1D and 2D blend trees, layered upper-body actions, root motion trade-offs, inverse kinematics for foot planting, and the performance habits that keep 60 fps on mid-range hardware.

Skeletal animation in one minute

Most 3D (and many 2D) character animations are skeletal: a hierarchy of bones drives a mesh through skin weights. Each animation clip is a time series of bone transforms — position, rotation, sometimes scale — sampled at 30 or 60 Hz. At runtime the engine evaluates the clip at time t and produces a pose: the full set of bone transforms for that instant.

Blending enters when you need more than one pose at once. Locomotion might mix a walk clip at 40% weight with a run clip at 60% as speed increases. A reload might play on the spine and arms while legs continue a strafe cycle on a lower layer. Without blending, every state change is a hard cut — acceptable for pixel art, unacceptable for a third-person action game where players judge quality in the first three seconds.

Our game state machines guide covers when logic changes state; this guide covers what happens to the skeleton when those states overlap or interpolate.

Crossfade transitions

The simplest blend is a crossfade: over N seconds, fade out clip A while fading in clip B. If w is the blend factor from 0 to 1, the engine typically lerps (linearly interpolates) each bone's local rotation and position: pose = lerp(poseA, poseB, w). Quaternion slerp is preferred for rotations to avoid gimbal wobble.

Choosing fade duration

Short fades (0.1–0.15 s) feel snappy for combat cancels and hit reactions. Longer fades (0.25–0.4 s) suit idle-to-walk and emotional beats. Fades that are too long make input feel mushy; too short reintroduce pops. Profile on target hardware — mobile GPUs are rarely the bottleneck; animation evaluation cost scales with active clips and bone count.

Sync and matched transitions

Naive crossfades between unrelated cycles cause foot slide: feet that appeared planted in walk drift mid-blend. Engines offer sync markers (foot down events) so transitions start at matching phases. Inertial blending (Unreal's Pose Search and similar) carries velocity across transitions so momentum continues instead of resetting — critical for parkour and shooters.

Pair transition tuning with the fixed timestep discipline in our game loop and frame timing guide — animation updates should respect the same simulation step as physics.

Blend trees for locomotion

Hard-switching between idle, walk, jog, and run clips at speed thresholds creates visible seams. A blend tree treats speed (or strafe direction) as a continuous parameter and mixes clips proportionally.

1D blend trees

A 1D tree maps a single parameter — usually Speed — to weighted clips. At speed 0 you get idle; at 2 m/s walk dominates; at 5 m/s run takes over. Intermediate values blend neighbors. Designers place thresholds where weights shift; programmers drive the parameter from physics velocity or input magnitude.

2D blend trees

Strafing needs two axes: forward speed and lateral speed (or magnitude + direction angle). A 2D tree places clips at compass points — idle center, walk forward, walk left, run forward — and interpolates inside the polygon they form. Freeform directional modes handle arbitrary stick angles; cartesian modes suit tank controls.

Blend trees do not replace physics. Our game physics guide explains how rigid bodies move; animation should follow or lead that motion via root motion (below), not fight it every frame.

Animation layers and masks

Layers stack independent blend trees on the same skeleton. Base layer runs full-body locomotion. Upper-body layer plays aim, reload, or gesture clips with an avatar mask that limits bones to spine, arms, and head. Layers have their own weights and can override or additive-blend.

Additive layers add a delta pose on top of the base — useful for breathing, recoil kick, or hit flinch without duplicating every locomotion clip. Additive clips are authored as offsets from a reference pose; the engine applies final = base + additive * weight.

Layer explosion is a maintenance trap. Cap active layers (base + one override + one additive is a common budget). Document which bones each mask includes — a mask that accidentally includes the pelvis will make upper-body swings drag the whole character off-center.

Root motion vs in-place animation

In-place clips animate the skeleton but leave the root transform to gameplay code — you set world position from velocity * deltaTime. Root motion bakes translation and rotation into the root bone; the animation drives where the character goes. Root motion improves foot fidelity on stairs and attack lunges but fights network prediction in multiplayer.

Hybrid setups extract root motion for attacks and use in-place locomotion for run cycles. Always clarify which system owns rotation — camera-relative movement usually keeps facing in code while legs animate beneath.

Inverse kinematics and foot planting

Even good blends leave feet floating on slopes or sliding during turns. Foot IK raycasts from ankle to ground and adjusts foot position and rotation to match terrain normal. Look-at IK rotates the head toward targets. Full-body IK solvers (FABRIK, CCD) reach hands to ledges — expensive, use sparingly.

IK runs after blending produces a pose. Order matters: blend clips, apply IK corrections, then skin the mesh. Over-aggressive foot IK on fast turns causes jitter; blend IK weight down when angular velocity spikes.

State machines meet blend trees

Production rigs combine animation state machines (nodes = states, edges = transitions with crossfade rules) with nested blend trees inside each state. "Locomotion" state holds a 2D blend tree; "Jump" state plays a one-shot clip with exit time back to locomotion when feet land.

Transition conditions — grounded, speed > threshold, trigger "Attack" — live in the same graph. Keep gameplay logic that sets parameters separate from the graph that consumes them. A common pattern: controller code writes Speed, IsGrounded, AimYaw each frame; the anim graph reads them blindly. That split mirrors the data-oriented split in our entity component system guide.

Engine notes

Unity — Animator Controller with blend trees, avatar masks, and optional Playables API for custom mixing.
Unreal — Animation Blueprints with State Machines, Blend Spaces (1D/2D), Control Rig for IK.
Godot 4 — AnimationTree with BlendSpace2D, AnimationNodeStateMachine, and SkeletonModifier3D for IK.
Browser / custom — glTF clips + Three.js AnimationMixer crossfades; weight normalization manual; skinning on GPU.

Performance and memory

Animation cost ≈ (bones evaluated) × (clips blended) × (characters on screen). Mitigations:

LOD for animation — distant NPCs update every 2–3 frames or use simpler bone sets (see our LOD guide).
Clip compression — key reduction, uniform sampling; verify feet still look planted.
GPU skinning — offload matrix palette to vertex shader; CPU only evaluates poses.
Shared graphs — one Animator Controller instance per archetype, not per enemy clone.
Curves off — disable scale curves and unused facial bones on background characters.

Profile with anim graphs visible. A blend tree with eight simultaneous clips per character burns more than a single clip — cap active leaf nodes.

Common mistakes

Linear blend across unrelated poses — elbows invert when mixing clips with different arm poses; use sync points or intermediate clips.
Driving speed from animation instead of physics — causes ice-skating; velocity should lead, animation should follow (unless using root motion by design).
Ignoring bind-pose differences — retargeted clips from another skeleton need humanoid retarget or bone name maps; otherwise blends look twisted.
One global crossfade time — combat needs snappy; emotional scenes need slow; tune per transition edge.
Upper-body layer without mask — full-body override accidentally locks legs during reload.
No fallback pose — when a clip fails to load, T-pose flashes; always default to idle.

Key takeaways

Blending combines multiple skeletal poses per frame; crossfades are the simplest form.
Blend trees map continuous parameters (speed, direction) to weighted clips — essential for locomotion.
Layers and masks let upper-body actions run over full-body movement without duplicating every clip.
Root motion trades control for foot fidelity; choose per action type and network model.
IK polishes feet and aim after blending; tune weights to avoid jitter on fast turns.
Keep parameter writers (gameplay) separate from animation graphs (presentation) for maintainability.