Guide
LLM RAG document freshness decay explained
Harbor HR’s employee policy bot indexed every handbook revision since 2019. When someone asked “How many PTO days do new hires get?” dense retrieval returned three near-identical chunks: the 2026 policy (15 days), the 2024 policy (12 days), and a 2021 archive (10 days). Cosine similarity scores clustered at 0.84, 0.83, and 0.82 — the model synthesized 12 days because that chunk had slightly richer wording. On 120 freshness-sensitive probes, outdated-policy answers hit 34% even though the correct 2026 document was always in the index.
Incremental sync was healthy; embeddings were current. The failure was
ranking: semantic similarity alone treats a retired policy and its
replacement as interchangeable. Engineers added
document freshness decay at query time: a monotonic boost derived
from effective_date metadata, plus hard supersession rules when
supersedes_doc_id links exist. Outdated-policy answers fell to 8%;
recall@5 on time-sensitive probes rose from 61% to 89%. This guide covers decay
functions, version graphs, query-time versus index-time boosting, interaction with
incremental index updates,
the Harbor HR refactor, a technique decision table versus hard date filters and
clarification gates, pitfalls, and a production checklist. It complements
metadata filtering
and
RAG evaluation
for teams where wrong-era answers carry real compliance cost.
Why semantic similarity ignores document age
Embedding models optimize for paraphrase similarity, not temporal authority. A superseded travel-expense policy and its replacement share vocabulary, structure, and section headings — often producing vectors closer to each other than to unrelated current docs. Without an explicit freshness signal, top-k retrieval becomes a lottery among versions.
This is separate from stale vectors (old text still embedded after a source edit). Freshness decay addresses stale authority: multiple valid versions coexist in the index because archives are intentionally retained for audit, legal hold, or employee grandfathering. You need ranking logic that prefers the authoritative edition without deleting history.
Freshness metadata every chunk should carry
Decay functions are only as good as timestamps and version links. Minimum viable payload fields:
effective_date— when the policy or doc became authoritative (ISO 8601 date, not file mtime).expires_at(optional) — hard sunset for time-bounded notices; null means open-ended until superseded.doc_version— monotonic integer or semver string for human debugging.supersedes_doc_id/superseded_by— explicit version graph edges; stronger than date alone when backdated corrections ship.status—active,deprecated,archived; tombstone archived rows from default retrieval via pre-filters.audience_scope(optional) — region, role, or cohort when grandfathered rules apply to subsets only.
Ingest pipelines should reject chunks missing effective_date for
policy-class sources. File modification time from SharePoint or S3 is a poor
proxy — editors touch formatting without changing policy substance.
Decay functions: how to score recency
Let age_days = max(0, today - effective_date) and
s be the base similarity score from dense or hybrid retrieval.
Apply a multiplicative freshness factor f(age) in [0, 1]:
Exponential decay (default for policies)
f(age) = exp(-λ · age_days) with half-life
t½ = ln(2) / λ. Harbor HR used λ = 0.01 (half-life ~69 days):
a 2024 chunk at age 730 days gets factor 0.0007; a 2026 chunk at age 30 days
gets 0.74. Final score: s' = s × f(age). Tune half-life per corpus
— HR policies change yearly; security bulletins need t½ < 14 days.
Step / plateau decay (regulatory corpora)
Full weight while status = active; zero weight (or exclude) when
deprecated. Use when legal mandates exactly one current edition and
archives must never surface in default answers.
Linear ramp (news and release notes)
f(age) = max(0, 1 - age_days / H) for horizon H (e.g.
180 days). Simpler to explain to stakeholders; less aggressive tail suppression
than exponential.
Supersession override
When chunk A has superseded_by = B, set f = 0 for A
regardless of age if B is active in the same audience scope. This
fixes backdated effective_date errors and same-day publish races
better than decay alone.
Query-time vs index-time boosting
Query-time decay (Harbor HR approach): retrieve top-N by base
similarity, re-rank with s' = s × f(age) plus supersession rules.
Pros: tune λ without re-embedding; A/B test half-lives in production. Cons: extra
CPU per query; must retrieve enough candidates that old high-similarity chunks do
not crowd out recent moderate matches (Harbor used N=40, cut to 8 after rerank).
Index-time boosting: bake decay into stored scores or maintain a
separate freshness_vector dimension. Pros: cheaper at query time.
Cons: scores go stale daily unless a scheduled job refreshes factors; harder to
experiment.
Metadata pre-filter: restrict to effective_date >= cutoff
when queries imply “current” (“today’s policy”,
“latest”). Pair with decay rather than replacing it — users rarely
say “current” explicitly.
For hybrid BM25+dense pipelines, apply decay after reciprocal rank fusion so lexical ties on boilerplate headings do not bypass recency.
When users need historical answers
Not every query wants the newest doc. Signals for historical mode:
- Explicit time phrases (“in 2022”, “before the merger”).
- Comparative intent (“how did PTO change?”) — route to a timeline synthesis path with multiple versions in context.
- Audit / legal role metadata on the user session.
Disable or invert decay when historical mode fires; otherwise compliance officers cannot retrieve retired rules they are required to cite. This overlaps with clarification gates when scope is unclear (“Which plan year?”).
Harbor HR refactor (worked example)
Before: 14,200 policy chunks; nightly incremental sync via content-hash diff; pure cosine top-8; no date metadata on 22% of legacy PDFs.
- Backfill
effective_datefrom legal’s version registry; flag unparseable PDFs for manual tagging. - Build supersession graph from “Replaces document ID” fields in the source CMS.
- Retrieve 40 candidates; apply exponential decay (t½ = 90 days) + supersession zeroing; cross-encoder rerank top 12 to final 8.
- Eval: 120 time-sensitive probes + 200 ahistorical controls; track outdated-policy rate and unnecessary-archive rate separately.
After: outdated-policy rate 34%→8%; p95 latency +11 ms; archive-only answers on ahistorical controls 2% (within tolerance); manual policy tickets −41% over six weeks.
Technique decision table
| Approach | Best when | Weak when |
|---|---|---|
| Similarity-only retrieval | Evergreen technical docs; single authoritative version per topic | Versioned policies, price lists, org charts, security advisories |
| Query-time freshness decay | Multiple coexisting versions; need tunable half-life; incremental sync already works | Massive candidate pools without rerank budget; missing date metadata |
| Hard effective_date filter | Queries always imply “current”; single active flag reliable | Grandfathered cohorts; historical comparisons; backdated corrections |
| Supersession graph only | Clean CMS with explicit replace links; legal needs full archive | Organic wiki sprawl without version discipline |
| Delete old chunks on publish | No audit requirement; minimize storage | Compliance retention; “what changed?” questions; rollback |
| Clarification gate (“which year?”) | Ambiguous scope with small candidate year set | Users expect silent current-default; high friction on mobile |
Freshness decay pairs with
cross-encoder reranking
when decay alone leaves semantically noisy top-k; rerankers are relatively blind to
calendar metadata unless you inject effective_date into the passage
header at index time.
Common pitfalls
- Using file mtime as effective_date — cosmetic edits reset decay incorrectly; substantive backdates break supersession order.
- Decay without supersession — two active versions with the same effective_date still tie; graph edges disambiguate.
- Over-aggressive half-life — valid evergreen pages
(ethics principles, API concepts) get suppressed; scope decay by
content_class. - Retrieving too few candidates — top-8 by similarity may be all legacy; expand pool before decay rerank.
- Ignoring audience_scope — US policy ranks over EU grandfathered rules for the wrong employee.
- No eval split — tuning λ on the same set you ship hides regressions on historical-intent queries.
- Assuming incremental sync fixes ranking — fresh embeddings ≠ fresh authority; both layers matter.
Production checklist
- Require
effective_dateandstatuson versioned source types. - Model supersession links at ingest; validate no cycles in the version graph.
- Choose decay family and half-life per content class; document defaults.
- Retrieve expanded candidate set; apply decay + supersession; then rerank.
- Pre-filter
status != archivedfor default employee-facing bots. - Implement historical-mode detection to bypass decay when appropriate.
- Log base score, decay factor, and final score for debugging stale answers.
- Build time-sensitive eval probes separate from general RAG QA.
- Re-tune half-life after major corpus reorganization or CMS migration.
- Coordinate with incremental sync owners so tombstones and decay rules agree.
Key takeaways
- Semantic similarity treats retired policies and their replacements as near-duplicates — freshness must be explicit in ranking, not assumed from ingest cadence.
- Exponential decay with supersession graph overrides is the default pattern for versioned policy corpora; tune half-life per content class.
- Query-time decay lets you A/B tune recency without re-embedding; expand the candidate pool so old high-similarity chunks do not block reranking.
- Harbor HR cut outdated-policy answers from 34% to 8% with +11 ms p95 latency — ranking fix, not a new embedding model.
- Pair freshness decay with historical-mode routing and clarification gates when users need past editions or ambiguous plan years.
Related reading
- RAG incremental index updates explained — keep vectors aligned with source edits
- LLM vector metadata filtering explained — pre-filter active vs archived status
- LLM ambiguous query clarification explained — disambiguate plan year and scope
- RAG evaluation explained — time-sensitive probe sets and regression tracking