Guide

Expected Shortfall (CVaR) explained

Harbor Capital's multi-asset sleeve reported a one-day 99% value at risk (VaR) of $1.8 million on a $120 million book in March 2024. When the desk replayed the twenty worst historical days in the prior decade, average losses on those days exceeded $3.4 million — nearly double the VaR threshold. VaR answered “how bad is the cutoff we breach 1% of the time?” but stayed silent on “how bad are the days beyond that cutoff?” Expected Shortfall (ES), also called Conditional VaR (CVaR) or average value at risk (AVaR), closes that gap: it is the expected loss given that loss exceeds VaR at the chosen confidence level. Basel's Fundamental Review of the Trading Book (FRTB) shifted bank capital rules toward ES for exactly this reason. This guide explains what ES measures, why it is a coherent risk metric while VaR is not, calculation methods (historical, parametric, Monte Carlo), backtesting and elicitability trade-offs, regulatory context, a Harbor Capital tail-risk review, a method decision table, pitfalls, and a practitioner checklist. For dependence modeling in simulation, see financial copulas; for crisis replay without parametric tails, see portfolio stress testing.

What Expected Shortfall actually means

ES at confidence level α (e.g. 97.5% or 99%) is the conditional expectation of loss in the worst (1 − α) fraction of outcomes. For a one-day 97.5% ES of $2.1 million: on the 2.5% worst days — roughly six to seven trading days per year if the model is well calibrated — you expect to lose on average $2.1 million, not merely at least some VaR threshold.

Formally, for loss random variable L and confidence α ∈ (0,1):

ES_α(L) = E[L | L ≥ VaR_α(L)]

When the loss distribution is continuous, ES at α equals the average of losses in the top (1 − α) quantile. ES is always at least as large as VaR at the same confidence — often materially larger when tails are fat or skewed left. That extra buffer is what risk committees want when sizing position limits and capital reserves.

ES vs median loss vs maximum drawdown

ES is not median loss (the 50% quantile) and not worst-case loss (the empirical minimum return). It sits between VaR and stress scenarios: a tail average that penalizes fat tails without assuming a single apocalyptic path. Pair ES with maximum drawdown for path-dependent survival analysis over multi-week crashes.

Why VaR fails coherence and ES does not

Artzner, Delbaen, Eber, and Heath (1999) defined four axioms a “good” risk measure should satisfy:

Monotonicity: if portfolio A always loses less than B, its risk score should be lower.
Translation invariance: adding cash reduces risk dollar-for-dollar.
Positive homogeneity: scaling positions scales risk proportionally.
Subadditivity: risk(A + B) ≤ risk(A) + risk(B) — diversification should not increase reported risk.

VaR violates subadditivity for non-elliptical distributions: merging two portfolios can paradoxically lower reported VaR while true tail exposure rises. ES satisfies all four axioms (under mild conditions) and is therefore called a coherent risk measure. Regulators and institutional allocators treat coherence as more than academic hygiene — incoherent metrics can understate desk-level capital needs when books are split across legal entities or strategies.

Convex risk measures and optimization

ES is also convex in portfolio weights, which makes ES-constrained portfolio optimization tractable as a linear program when returns are scenario-based. Mean-variance optimizers ignore tail averages; ES-aware optimizers explicitly trade expected return against tail pain — a natural complement to modern portfolio theory for funds that publish tail-risk budgets to LPs.

Three ways to calculate ES

Historical simulation

Sort N historical P&L observations ascending (losses as positive numbers). VaR at 97.5% is the loss at rank ceil(0.975 × N). ES is the arithmetic mean of all losses at or above that VaR rank. No distributional assumption required; weakness is that sparse tails in short windows underestimate crisis severity. Harbor uses at least ten years of daily returns for liquid sleeves, with bootstrap resampling when history is thin.

Parametric (variance-covariance)

Assume multivariate normal or Student-t returns. ES has closed forms under normality; under Student-t with ν degrees of freedom, ES scales with tail heaviness beyond Gaussian VaR. Fast for large books; dangerous when volatility clusters or skewed crypto legs violate elliptical assumptions. Filter returns through GARCH before applying parametric ES when variance is clearly time-varying.

Monte Carlo simulation

Generate M correlated scenarios (often via copulas on GARCH-filtered marginals), compute portfolio P&L per scenario, sort, take tail mean. Flexible for options convexity and path-dependent structures parametric models miss. Cost is compute and model risk in the copula choice — Gaussian copulas still understate joint crashes unless replaced with t or vine structures in stress-sensitive sleeves.

Backtesting, elicitability, and model validation

VaR admits clean traffic-light backtests (count exceptions vs expected breach rate). ES backtesting is harder: ES is not elicitable — there is no scoring rule that is minimized only when your ES forecast is correct — so regulators use approximate tests (McNeil and Frey; Acerbi and Székely) comparing realized tail averages to forecast ES on exception days.

Practical validation workflow:

Run VaR exception test first; if VaR is miscalibrated, ES built on the same window is suspect.
On VaR breach days, compare realized loss to forecast ES; persistent underestimation triggers model review.
Replay known crisis windows (2008, March 2020, 2022 rates) as stress scenarios independent of the ES model.
Document liquidity horizons — ES on mark-to-market returns can shrink when assets cannot be sold at marks during freezes.

Regulatory context: FRTB and internal limits

Basel III's FRTB replaced VaR-based market-risk capital with ES at 97.5% over a ten-day horizon, with stressed calibration windows and liquidity horizons per risk factor. Even non-bank allocators mirror FRTB language in investor reports because LPs now ask for ES alongside Sharpe ratio and drawdown stats. Internal desk limits often set ES at 99% for intraday books and 97.5% for monthly investor reporting — always state horizon and confidence in the same breath as the dollar figure.

Harbor Capital tail-risk sleeve: worked example

Harbor's global macro sleeve (rates, FX, equity index futures) ran historical ES on 2,520 trading days ending Q1 2024:

One-day 99% VaR: $1.82 million (breached 28 days in sample — slightly rich vs 25 expected, within statistical noise).
One-day 99% historical ES: $3.37 million (mean of the 26 worst days).
Parametric Gaussian ES at 99%: $2.41 million — 28% below historical ES, flagging tail underestimation.
Monte Carlo ES with t-copula (4 d.f.): $3.12 million after GARCH filtering.

The committee set a hard stop at 1.15× historical 99% ES ($3.88 million) rather than VaR, and required the t-copula Monte Carlo ES to stay within 10% of historical ES before adding convex option overlays. When ES and VaR diverged widened during the 2022 rates shock, the desk cut gross leverage 18% before drawdown breached the maximum drawdown mandate — ES acted as an early tail-warning gauge VaR alone missed.

Method decision table

Method	Best when	Watch out for
Historical ES	Liquid linear books, long history, LP reporting transparency	Short windows miss unseen crises; ghost effects from single outliers
Parametric ES	Large factor models, real-time intraday limits, normal-ish tails	Gaussian tails understate joint crashes; static covariance in crises
Monte Carlo ES	Options, structured credit, copula-dependent multi-asset books	Copula and marginal misspecification; slow convergence on 99.9% tails
Stress replay (complement)	Regime breaks, liquidity freezes, validating ES model plausibility	Not a substitute for probabilistic ES; scenario selection bias

Common pitfalls

Quoting ES without horizon and confidence: “ES is $2M” is meaningless without “one-day 97.5%.”
Using Gaussian ES on crypto or EM sleeves: kurtosis blows up tail averages; historical or t-based methods required.
Ignoring liquidity: mark-to-market ES on illiquid private credit overstates exitability.
Same window for VaR and ES without review: if VaR backtest fails, ES inherits the calibration error.
Treating ES as a hard loss cap: ES is an expected tail average; individual days can still exceed it.
Comparing ES across funds with different horizons: ten-day regulatory ES is not comparable to one-day allocator ES without scaling.

Practitioner checklist

State confidence level, horizon, and portfolio scope on every ES report.
Compute historical ES alongside VaR; investigate when ES/VaR ratio exceeds ~1.5.
Validate VaR exceptions before trusting ES backtests.
Use GARCH-filtered or copula-based Monte Carlo ES for options and fat-tailed legs.
Replay 2008, COVID, and rate-shock windows as stress complements.
Set internal limits on ES, not VaR alone, for tail-sensitive sleeves.
Document liquidity assumptions and haircut marks for illiquid positions.
Disclose ES methodology consistently in LP quarterly letters.
Reconcile parametric, historical, and simulated ES; flag >15% divergence.
Pair ES with drawdown and stress testing for path-dependent survival.

Key takeaways

Expected Shortfall is the average loss in the worst tail fraction — not just the VaR threshold.
ES is coherent (subadditive); VaR is not — a material reason regulators prefer ES for capital.
Historical ES is transparent; parametric ES is fast; Monte Carlo ES handles convexity and copula dependence.
ES backtesting is harder than VaR; combine approximate tail tests with crisis stress replay.
Harbor-style desks set hard stops on ES multiples of VaR when tails are fat or regimes shift.