Guide
Expected Shortfall (CVaR) explained
Harbor Capital's multi-asset sleeve reported a one-day 99% value at risk (VaR) of $1.8 million on a $120 million book in March 2024. When the desk replayed the twenty worst historical days in the prior decade, average losses on those days exceeded $3.4 million — nearly double the VaR threshold. VaR answered “how bad is the cutoff we breach 1% of the time?” but stayed silent on “how bad are the days beyond that cutoff?” Expected Shortfall (ES), also called Conditional VaR (CVaR) or average value at risk (AVaR), closes that gap: it is the expected loss given that loss exceeds VaR at the chosen confidence level. Basel's Fundamental Review of the Trading Book (FRTB) shifted bank capital rules toward ES for exactly this reason. This guide explains what ES measures, why it is a coherent risk metric while VaR is not, calculation methods (historical, parametric, Monte Carlo), backtesting and elicitability trade-offs, regulatory context, a Harbor Capital tail-risk review, a method decision table, pitfalls, and a practitioner checklist. For dependence modeling in simulation, see financial copulas; for crisis replay without parametric tails, see portfolio stress testing.
What Expected Shortfall actually means
ES at confidence level α (e.g. 97.5% or 99%) is the conditional
expectation of loss in the worst (1 − α) fraction of
outcomes. For a one-day 97.5% ES of $2.1 million: on the 2.5% worst days —
roughly six to seven trading days per year if the model is well calibrated — you expect
to lose on average $2.1 million, not merely at least some VaR
threshold.
Formally, for loss random variable L and confidence
α ∈ (0,1):
ESα(L) = E[L | L ≥ VaRα(L)]
When the loss distribution is continuous, ES at α equals the average of
losses in the top (1 − α) quantile. ES is always at least as large
as VaR at the same confidence — often materially larger when tails are fat or
skewed left. That extra buffer is what risk committees want when sizing
position limits
and capital reserves.
ES vs median loss vs maximum drawdown
ES is not median loss (the 50% quantile) and not worst-case loss (the empirical minimum return). It sits between VaR and stress scenarios: a tail average that penalizes fat tails without assuming a single apocalyptic path. Pair ES with maximum drawdown for path-dependent survival analysis over multi-week crashes.
Why VaR fails coherence and ES does not
Artzner, Delbaen, Eber, and Heath (1999) defined four axioms a “good” risk measure should satisfy:
- Monotonicity: if portfolio A always loses less than B, its risk score should be lower.
- Translation invariance: adding cash reduces risk dollar-for-dollar.
- Positive homogeneity: scaling positions scales risk proportionally.
- Subadditivity: risk(A + B) ≤ risk(A) + risk(B) — diversification should not increase reported risk.
VaR violates subadditivity for non-elliptical distributions: merging two portfolios can paradoxically lower reported VaR while true tail exposure rises. ES satisfies all four axioms (under mild conditions) and is therefore called a coherent risk measure. Regulators and institutional allocators treat coherence as more than academic hygiene — incoherent metrics can understate desk-level capital needs when books are split across legal entities or strategies.
Convex risk measures and optimization
ES is also convex in portfolio weights, which makes ES-constrained portfolio optimization tractable as a linear program when returns are scenario-based. Mean-variance optimizers ignore tail averages; ES-aware optimizers explicitly trade expected return against tail pain — a natural complement to modern portfolio theory for funds that publish tail-risk budgets to LPs.
Three ways to calculate ES
Historical simulation
Sort N historical P&L observations ascending (losses as positive numbers).
VaR at 97.5% is the loss at rank ceil(0.975 × N). ES is the arithmetic
mean of all losses at or above that VaR rank. No distributional assumption
required; weakness is that sparse tails in short windows underestimate crisis severity.
Harbor uses at least ten years of daily returns for liquid sleeves, with
bootstrap resampling
when history is thin.
Parametric (variance-covariance)
Assume multivariate normal or Student-t returns. ES has closed forms under normality; under
Student-t with ν degrees of freedom, ES scales with tail heaviness beyond
Gaussian VaR. Fast for large books; dangerous when
volatility clusters
or skewed crypto legs violate elliptical assumptions. Filter returns through GARCH before
applying parametric ES when variance is clearly time-varying.
Monte Carlo simulation
Generate M correlated scenarios (often via
copulas
on GARCH-filtered marginals), compute portfolio P&L per scenario, sort, take tail mean.
Flexible for options convexity and path-dependent structures parametric models miss. Cost is
compute and model risk in the copula choice — Gaussian copulas still understate joint
crashes unless replaced with t or vine structures in stress-sensitive sleeves.
Backtesting, elicitability, and model validation
VaR admits clean traffic-light backtests (count exceptions vs expected breach rate). ES backtesting is harder: ES is not elicitable — there is no scoring rule that is minimized only when your ES forecast is correct — so regulators use approximate tests (McNeil and Frey; Acerbi and Székely) comparing realized tail averages to forecast ES on exception days.
Practical validation workflow:
- Run VaR exception test first; if VaR is miscalibrated, ES built on the same window is suspect.
- On VaR breach days, compare realized loss to forecast ES; persistent underestimation triggers model review.
- Replay known crisis windows (2008, March 2020, 2022 rates) as stress scenarios independent of the ES model.
- Document liquidity horizons — ES on mark-to-market returns can shrink when assets cannot be sold at marks during freezes.
Regulatory context: FRTB and internal limits
Basel III's FRTB replaced VaR-based market-risk capital with ES at 97.5% over a ten-day horizon, with stressed calibration windows and liquidity horizons per risk factor. Even non-bank allocators mirror FRTB language in investor reports because LPs now ask for ES alongside Sharpe ratio and drawdown stats. Internal desk limits often set ES at 99% for intraday books and 97.5% for monthly investor reporting — always state horizon and confidence in the same breath as the dollar figure.
Harbor Capital tail-risk sleeve: worked example
Harbor's global macro sleeve (rates, FX, equity index futures) ran historical ES on 2,520 trading days ending Q1 2024:
- One-day 99% VaR: $1.82 million (breached 28 days in sample — slightly rich vs 25 expected, within statistical noise).
- One-day 99% historical ES: $3.37 million (mean of the 26 worst days).
- Parametric Gaussian ES at 99%: $2.41 million — 28% below historical ES, flagging tail underestimation.
- Monte Carlo ES with t-copula (4 d.f.): $3.12 million after GARCH filtering.
The committee set a hard stop at 1.15× historical 99% ES ($3.88 million) rather than VaR, and required the t-copula Monte Carlo ES to stay within 10% of historical ES before adding convex option overlays. When ES and VaR diverged widened during the 2022 rates shock, the desk cut gross leverage 18% before drawdown breached the maximum drawdown mandate — ES acted as an early tail-warning gauge VaR alone missed.
Method decision table
| Method | Best when | Watch out for |
|---|---|---|
| Historical ES | Liquid linear books, long history, LP reporting transparency | Short windows miss unseen crises; ghost effects from single outliers |
| Parametric ES | Large factor models, real-time intraday limits, normal-ish tails | Gaussian tails understate joint crashes; static covariance in crises |
| Monte Carlo ES | Options, structured credit, copula-dependent multi-asset books | Copula and marginal misspecification; slow convergence on 99.9% tails |
| Stress replay (complement) | Regime breaks, liquidity freezes, validating ES model plausibility | Not a substitute for probabilistic ES; scenario selection bias |
Common pitfalls
- Quoting ES without horizon and confidence: “ES is $2M” is meaningless without “one-day 97.5%.”
- Using Gaussian ES on crypto or EM sleeves: kurtosis blows up tail averages; historical or t-based methods required.
- Ignoring liquidity: mark-to-market ES on illiquid private credit overstates exitability.
- Same window for VaR and ES without review: if VaR backtest fails, ES inherits the calibration error.
- Treating ES as a hard loss cap: ES is an expected tail average; individual days can still exceed it.
- Comparing ES across funds with different horizons: ten-day regulatory ES is not comparable to one-day allocator ES without scaling.
Practitioner checklist
- State confidence level, horizon, and portfolio scope on every ES report.
- Compute historical ES alongside VaR; investigate when ES/VaR ratio exceeds ~1.5.
- Validate VaR exceptions before trusting ES backtests.
- Use GARCH-filtered or copula-based Monte Carlo ES for options and fat-tailed legs.
- Replay 2008, COVID, and rate-shock windows as stress complements.
- Set internal limits on ES, not VaR alone, for tail-sensitive sleeves.
- Document liquidity assumptions and haircut marks for illiquid positions.
- Disclose ES methodology consistently in LP quarterly letters.
- Reconcile parametric, historical, and simulated ES; flag >15% divergence.
- Pair ES with drawdown and stress testing for path-dependent survival.
Key takeaways
- Expected Shortfall is the average loss in the worst tail fraction — not just the VaR threshold.
- ES is coherent (subadditive); VaR is not — a material reason regulators prefer ES for capital.
- Historical ES is transparent; parametric ES is fast; Monte Carlo ES handles convexity and copula dependence.
- ES backtesting is harder than VaR; combine approximate tail tests with crisis stress replay.
- Harbor-style desks set hard stops on ES multiples of VaR when tails are fat or regimes shift.
Related reading
- Value at Risk (VaR) explained — quantile thresholds, three VaR methods, and where VaR stops
- Financial copulas explained — tail dependence for Monte Carlo ES simulation
- Portfolio stress testing explained — crisis replay when ES models break
- GARCH volatility modeling explained — filtering returns before parametric ES