Guide

Sterling ratio explained

Harbor Capital's managed-futures sleeve posted a respectable Calmar ratio of 0.58 over three years — until a diligence analyst recomputed risk using the Sterling ratio. The fund had suffered one sharp −14% drawdown early in the window (Calmar's denominator) but spent most of the period grinding 3–8% below its high-water mark in shallow, persistent dips. Average drawdown was −9.2% while maximum drawdown was only −14%. Sterling, which divides excess return by average drawdown rather than the single worst peak-to-trough loss, dropped from an implied Calmar-friendly profile to a score that failed Harbor's 0.45 floor. The committee trimmed the allocation. Named after Dean Sterling's work on commodity trading advisor (CTA) evaluation in the 1980s, the Sterling ratio answers: how much excess return does this strategy earn per unit of typical underwater pain? This guide defines average drawdown, walks through the Sterling formula, contrasts Sterling with Calmar, Sharpe, and Sortino, works a Harbor Capital sleeve example, provides a metric decision table, lists pitfalls, and ends with an allocator checklist.

Average drawdown vs maximum drawdown

Every performance metric that adjusts returns for drawdown must decide how to summarize the equity curve's dips. Two common choices:

  • Maximum drawdown (MDD) — the single worst peak-to-trough percentage decline in the window. Used by Calmar and many risk dashboards.
  • Average drawdown — the arithmetic mean of all drawdown observations while the portfolio is below its running high-water mark (or the mean of completed peak-to-trough episodes, depending on vendor definition).

Maximum drawdown is intuitive and conservative: one bad month can dominate the denominator for years. Average drawdown captures chronic underwater time — a fund that never crashes −40% but lives permanently 5–10% below peak looks fine on Calmar if its worst episode was modest, yet still delivers a miserable investor experience. Sterling penalizes that pattern.

How average drawdown is computed

A common implementation: at each observation date (daily or monthly), compute drawdown as (current equity / running peak) − 1. Average drawdown is the mean of those values over periods where drawdown < 0 (or the mean across all dates, which pulls toward zero when the fund is at new highs). Some databases instead average only completed drawdown episodes from peak to recovery — always confirm the vendor methodology before comparing funds.

For deeper treatment of peak-to-trough mechanics and recovery math, see maximum drawdown explained.

The Sterling ratio formula

The classical Sterling ratio over a lookback period (often 36 months):

Sterling ratio = (annualized return − risk-free rate) / |average drawdown|

Both numerator and denominator must cover the same window. The risk-free rate is typically Treasury bills matched to the measurement horizon. Some practitioners omit the risk-free subtraction and report “return / average drawdown” — label it clearly; the two are not interchangeable across databases.

Worked arithmetic

Suppose a CTA delivers 7.2% annualized return over three years with a 4.5% annualized T-bill rate. Excess return = 2.7%. Average drawdown over the window = −8.5%. Sterling ≈ 2.7 / 8.5 = 0.32. A peer with the same excess return but average drawdown −5.0% scores Sterling ≈ 0.54 — same headline alpha, less time spent underwater on average.

Sterling vs Burke ratio

The Burke ratio (sometimes confused with Sterling) divides excess return by the square root of the sum of squared drawdowns. It weights large dips more than small ones but less than Calmar's single-max approach. Sterling sits between Sharpe (volatility in the denominator) and Calmar (worst drawdown) on the sensitivity spectrum. When a database reports “Sterling,” verify it is not actually Burke or a Calmar variant with a relabeled name.

How to interpret Sterling values

Like Calmar, Sterling has no universal theoretical distribution. Allocator heuristics from CTA and hedge fund databases:

  • Below 0.20 — excess return is thin relative to typical underwater depth; chronic drawdown drag likely dominates.
  • 0.20 – 0.40 — acceptable for diversifying alternatives in a multi-asset book; common for equity-heavy strategies after extended sideways markets.
  • 0.40 – 0.70 — strong average-drawdown efficiency; typical of well-run trend followers with controlled leverage.
  • Above 0.70 — exceptional on paper; verify window length, fee treatment, and whether average drawdown was computed on daily vs monthly data.

Sterling above 0.70 over a window that missed a full commodity cycle or rates shock is a marketing snapshot, not structural evidence. Pair any headline number with rolling 36-month charts and net-of-fee returns.

When Sterling beats Calmar

Use Sterling when you care about persistent underwater periods, not just the single worst crash. A fund with one −25% event and quick recovery can show worse Calmar than a fund with endless −8% grinds — Sterling reverses that ranking. Use Calmar when mandate language references maximum loss tolerance or regulatory drawdown limits tied to peak-to-trough.

Sterling vs Calmar, Sharpe, and Sortino

Metric Numerator Denominator Best when
Sterling Excess return vs risk-free Average drawdown CTA due diligence, chronic underwater detection, multi-episode pain
Calmar Annualized return Maximum drawdown Crisis tail tolerance, worst-case loss budgeting
Sharpe Excess return vs risk-free Total volatility (σ) Broad fund comparison, mean-variance inputs
Sortino Return above MAR hurdle Downside deviation Asymmetric profiles, spending-floor mandates

A strategy can show Sharpe 1.0 and Sterling 0.25 if returns are volatile but symmetric, yet the equity curve spends half the sample 10% below peak. Conversely, a positive-skew CTA with rare large wins and frequent small losses may look mediocre on Sharpe but strong on Sterling if average drawdown stays shallow between trends. Pair Sterling with capture ratios to see bull/bear participation alongside underwater time.

Worked example: Harbor Capital managed-futures sleeve

Harbor Capital allocates 10% to a trend-following CTA within its endowment diversifiers bucket. Over the 36 months ending Q1 2026 (net of fees):

  • Annualized return: +6.4%
  • Risk-free (3-month T-bills, annualized): +4.8%
  • Excess return: +1.6%
  • Maximum drawdown: −12.8% (single 2024 correlation spike)
  • Average drawdown: −7.4% (frequent shallow dips between trends)
  • Calmar ≈ 6.4 / 12.8 = 0.50
  • Sterling ≈ 1.6 / 7.4 = 0.22

Calmar cleared Harbor's 0.40 alternative hurdle; Sterling did not. Rolling analysis showed the fund spent 62% of trading days more than 4% below its high-water mark — acceptable for a crisis diversifier in theory, but the excess return per unit of typical pain was too thin for a 10% weight. The committee reduced allocation to 6% and added a parallel sleeve with Sterling 0.41 on the same window.

Implementation note

Harbor's performance team computes average drawdown on daily net returns with drawdown measured against a monthly high-water mark updated at month-end. Monthly data would understate average drawdown for a daily-traded CTA. Document frequency, fee gross/net treatment, and whether the running peak resets after redemptions.

Decision table: when Sterling is the right metric

Your question Start here Also check
Does this CTA earn enough excess return for how often it sits underwater? 36-month Sterling net of fees vs 0.35 hurdle Time below high-water mark, Calmar for tail
Fund looks good on Calmar but feels painful to hold Sterling and average drawdown duration Rolling underwater charts, investor redemption data
Compare two CTAs with similar max drawdown Sterling on identical window and frequency Upside/downside capture, worst monthly loss
Equity alternative with long sideways grind Sterling vs Sharpe divergence Sharpe ratio, information ratio vs benchmark
Retiree spending floor Sterling on withdrawal-adjusted portfolio Sortino, sequence-of-returns risk

Common pitfalls

  • Vendor definition mismatch. Mean of daily underwater observations vs mean of completed episodes produce different Sterling values — never compare across databases blindly.
  • Omitting risk-free rate. Some reports use raw return / average drawdown; confirm before benchmarking against a Sterling hurdle that includes T-bills.
  • Monthly data on daily strategies. CTAs and HFT-adjacent funds need daily or weekly drawdown sampling; monthly smooths average drawdown artificially.
  • Short windows. Three years with one regime can flatter Sterling; require a full commodity or rates cycle when possible.
  • Ignoring fees. Gross Sterling overstates what limited partners keep.
  • Conflating Sterling with Burke. Burke's squared-drawdown denominator punishes large dips differently; label metrics correctly in IC memos.
  • Survivorship bias. Dead funds that lingered underwater before liquidation disappear from databases with inflated Sterling survivors.
  • Redemption timing. High-water marks that ignore mid-month subscriptions distort average drawdown for open-end funds.

Allocator checklist

  • Compute Sterling over at least 36 months of net daily or monthly returns.
  • Subtract a risk-free rate matched to the window (document the source).
  • Define average drawdown methodology and keep it consistent across managers.
  • Report Sterling alongside Calmar — tail risk and chronic pain need both lenses.
  • Plot rolling 36-month Sterling to detect regime deterioration.
  • Measure percentage of time spent more than 5% below high-water mark.
  • Cross-check with Sharpe, Sortino, and capture ratios for a full picture.
  • Confirm gross vs net returns and whether leverage is embedded in the track.
  • Recompute after strategy changes, fee revisions, or key-person departures.
  • Document frequency (daily vs monthly) in manager diligence files.

Key takeaways

  • Sterling divides excess return by average drawdown, penalizing chronic underwater time rather than only the worst peak-to-trough loss.
  • It complements Calmar: use Sterling when persistent shallow drawdowns matter; use Calmar when maximum loss tolerance is the binding constraint.
  • CTA and managed-futures allocators pioneered Sterling; it remains underused outside alternatives despite revealing grind that Sharpe and Calmar can miss.
  • Methodology details (daily vs monthly, completed episodes vs all underwater observations) dominate cross-fund comparisons.
  • Pair Sterling with capture ratios, maximum drawdown, and Sortino for complete manager due diligence.

Related reading