Guide

LLM Graph RAG explained

Harbor Legal indexed 2,400 master service agreements for a procurement team. Associates asked questions that no single chunk could answer: “Which vendors changed liability caps after the 2024 cyber-insurance refresh?” Standard RAG returned isolated amendment paragraphs mentioning “liability” or “cyber” but missed the cross-contract pattern. Recall@10 on held-out portfolio-level questions sat at 47%.

Graph RAG (popularized by Microsoft’s GraphRAG pipeline, 2024) builds a knowledge graph from the corpus — entities, relationships, and text-unit provenance — then clusters densely connected regions into communities with LLM-generated summaries. At query time, a router chooses local search (entity-centric subgraph + neighboring chunks) or global search (map-reduce over community summaries) depending on whether the question is specific or thematic. Harbor Legal shipped Graph RAG alongside their existing dense index; portfolio-level recall@10 rose to 78% and attorney time per cross-contract review dropped 41%. This guide covers graph construction, community detection, local vs global search, multi-hop contrast, the Harbor refactor, a technique decision table versus flat chunk RAG and agentic RAG, pitfalls, and a production checklist.

What Graph RAG adds beyond flat chunk retrieval

Vector RAG assumes the answer lives in one or a few semantically similar chunks. That assumption breaks when:

Evidence is distributed — the same clause type appears across hundreds of contracts with small wording differences.
Questions are thematic — “What are our biggest concentration risks?” requires synthesizing many documents, not finding one.
Relationships matter — vendor → subsidiary → amendment → insurance rider chains are structural, not lexical.
Entity aliases abound — “Acme Corp”, “Acme Holdings LLC”, and “Supplier A” refer to one party.

Graph RAG precomputes structure at index time so query time can traverse neighborhoods instead of hoping embeddings align. The trade-off is upfront cost: entity extraction, graph storage, and community summarization are expensive one-time (or periodic) jobs compared to naive chunk-and-embed.

Index-time pipeline: from documents to communities

Text units and entity-relation extraction

Documents split into text units (typically 300–800 tokens with overlap). An LLM or hybrid NER pipeline extracts entities (organizations, people, dates, clause types) and relationships (“AMENDS”, “GOVERNED_BY”, “CAPS_LIABILITY_AT”) with pointers back to source text units. Harbor Legal used a 8B extractor with a legal ontology schema and human-audited few-shot exemplars; extraction precision on entity types hit 91% after two revision passes.

Graph construction and deduplication

Entities become nodes; relationships become weighted edges. A entity resolution step merges duplicates via embedding similarity plus deterministic rules (LEI, DUNS, normalized legal names). Without resolution, graphs sprawl and community detection fragments.

Community detection and hierarchical summaries

Algorithms like Leiden partition the graph into communities at multiple levels (fine to coarse). For each community, an LLM reads member entities, their incident edges, and representative text units, then writes a community report — a standalone summary of what that cluster is about. Coarse-level reports summarize child communities, forming a tree Harbor Legal attorneys could browse without running a query.

Dual indexes

Production Graph RAG keeps both the knowledge graph (for traversal) and a standard vector index over text units (for lexical/semantic fallback). Queries rarely use only one store.

Query-time routing: local search vs global search

Local search (entity-centric)

Best for questions anchored to named entities: “What termination rights does Vendor X have in the 2023 MSA?” Steps:

Map query entities to graph nodes (fuzzy match + alias table).
Expand 1–2 hops along high-salience edge types.
Collect linked text units; optionally rerank with a cross-encoder.
Generate answer with citations to text-unit IDs.

Local search resembles multi-hop RAG but uses precomputed structure instead of iterative LLM-planned retrieval steps at query time.

Global search (community map-reduce)

Best for thematic or aggregate questions: “How did indemnification language shift across healthcare vendors since 2022?” Steps:

Embed the question; score all community reports at a chosen hierarchy level.
Take top-N community summaries as context shards.
Map: ask the LLM partial answers per shard with strict citation to report IDs.
Reduce: synthesize a single answer; flag contradictions across shards.

Global search is where Graph RAG most clearly beats flat RAG — no single chunk embedding matches “shift across healthcare vendors,” but community reports already encode that synthesis.

Query classifier

A lightweight classifier or LLM router assigns LOCAL, GLOBAL, or HYBRID. Harbor Legal routed ambiguous queries to hybrid: global summaries for framing, then local expansion on entities mentioned in the draft answer.

Harbor Legal refactor: numbers and architecture

Harbor’s corpus: 2,400 MSAs, 8,100 amendments, 1,200 insurance riders (~940k text units). Index build (entity extract + Leiden + summaries) ran overnight on a 4×L4 GPU pool; incremental nightly jobs processed only changed contracts.

Entity graph — 186k nodes, 412k edges after resolution.
Communities — 3 hierarchy levels; 2,400 leaf communities, 380 mid-level, 48 top-level theme clusters.
Eval split — 220 local questions, 180 global/thematic questions.
Flat RAG baseline — recall@10: local 71%, global 47%.
Graph RAG — recall@10: local 83%, global 78%.
End-to-end attorney rating — answer usefulness 62% → 84%.
Latency — local p95 3.2s (+0.8s vs flat); global p95 8.1s (map-reduce over 12 shards).

They kept flat RAG as fallback when the graph router confidence was low (BM25 hit on exact clause numbers) and wired graph provenance into their existing citation UI so attorneys could click through to PDF page anchors.

Technique decision table

Approach	Strength	Weakness	Use when
Flat chunk RAG	Simple, fast to ship	Misses distributed and thematic questions	Small corpus; answers in one passage
Graph RAG local	Structured multi-document entity context	Index build cost; needs good entity resolution	Named-entity questions with cross-doc links
Graph RAG global	Thematic synthesis over whole corpus	Higher latency; summary staleness on updates	Portfolio-wide trends, risk themes, audits
Multi-hop RAG	Flexible at query time	LLM plans each hop; cost and error compounding	Exploratory queries; graph too stale to maintain
Agentic RAG	Self-correcting retrieval loops	Unpredictable tool chains; harder to eval	Mixed tool + search workflows
Parent-child chunking only	Better context windows per hit	No explicit relationship model	Hierarchical docs; limited cross-doc needs

Common pitfalls

Skipping entity resolution — duplicate nodes split communities and break local search; invest in merge rules early.
Over-extracting relationships — noisy edges connect unrelated documents; use confidence thresholds and domain schemas.
Stale community summaries — amended contracts without re-summarization mislead global search; tie summary jobs to document versions.
Global search on entity-specific questions — wastes tokens and dilutes precision; train the router on labeled local vs global sets.
Treating Graph RAG as replacement for all RAG — flat retrieval still wins on exact keyword and table lookups.
Unbounded graph expansion — 3-hop traversals on dense graphs stuff irrelevant chunks; cap hops and edge weights per query class.
No provenance in summaries — community reports must cite text-unit IDs or attorneys cannot verify claims.
Underestimating index cost — full re-graph monthly on huge corpora blows budget; design incremental entity upserts.

Production checklist

Define entity and relationship ontology for your domain before extraction.
Split documents into text units with stable IDs and PDF page anchors.
Run extraction + resolution; audit precision on a 200-unit gold set.
Build graph; run Leiden (or similar) at 2–3 hierarchy levels.
Generate community reports with mandatory provenance citations.
Keep parallel vector + BM25 indexes for fallback and hybrid fusion.
Train or prompt a local/global/hybrid query router; log routing decisions.
Implement incremental updates: new docs → entities → affected communities only.
Eval separate local and global question sets; track recall@k and citation accuracy.
Cap global map-reduce shards; use hierarchical summarization for very large corpora.
Expose graph neighborhoods in the UI for trust and debugging.

Key takeaways

Graph RAG precomputes entity structure and community summaries so query time can traverse relationships instead of relying on chunk similarity alone.
Local search answers entity-specific questions; global search map-reduces over community reports for thematic portfolio questions.
Entity resolution and incremental summary refresh are as important as the retrieval algorithm.
Harbor Legal raised global recall@10 from 47% to 78% on cross-contract questions with a dual-index Graph RAG pipeline.
Keep flat RAG as fallback; Graph RAG complements rather than replaces dense retrieval.
Use multi-hop or agentic RAG when you cannot afford graph index maintenance; use Graph RAG when structure is stable and cross-document synthesis is core.