Guide
LLM Graph RAG explained
Harbor Legal indexed 2,400 master service agreements for a procurement team. Associates asked questions that no single chunk could answer: “Which vendors changed liability caps after the 2024 cyber-insurance refresh?” Standard RAG returned isolated amendment paragraphs mentioning “liability” or “cyber” but missed the cross-contract pattern. Recall@10 on held-out portfolio-level questions sat at 47%.
Graph RAG (popularized by Microsoft’s GraphRAG pipeline, 2024) builds a knowledge graph from the corpus — entities, relationships, and text-unit provenance — then clusters densely connected regions into communities with LLM-generated summaries. At query time, a router chooses local search (entity-centric subgraph + neighboring chunks) or global search (map-reduce over community summaries) depending on whether the question is specific or thematic. Harbor Legal shipped Graph RAG alongside their existing dense index; portfolio-level recall@10 rose to 78% and attorney time per cross-contract review dropped 41%. This guide covers graph construction, community detection, local vs global search, multi-hop contrast, the Harbor refactor, a technique decision table versus flat chunk RAG and agentic RAG, pitfalls, and a production checklist.
What Graph RAG adds beyond flat chunk retrieval
Vector RAG assumes the answer lives in one or a few semantically similar chunks. That assumption breaks when:
- Evidence is distributed — the same clause type appears across hundreds of contracts with small wording differences.
- Questions are thematic — “What are our biggest concentration risks?” requires synthesizing many documents, not finding one.
- Relationships matter — vendor → subsidiary → amendment → insurance rider chains are structural, not lexical.
- Entity aliases abound — “Acme Corp”, “Acme Holdings LLC”, and “Supplier A” refer to one party.
Graph RAG precomputes structure at index time so query time can traverse neighborhoods instead of hoping embeddings align. The trade-off is upfront cost: entity extraction, graph storage, and community summarization are expensive one-time (or periodic) jobs compared to naive chunk-and-embed.
Index-time pipeline: from documents to communities
Text units and entity-relation extraction
Documents split into text units (typically 300–800 tokens with overlap). An LLM or hybrid NER pipeline extracts entities (organizations, people, dates, clause types) and relationships (“AMENDS”, “GOVERNED_BY”, “CAPS_LIABILITY_AT”) with pointers back to source text units. Harbor Legal used a 8B extractor with a legal ontology schema and human-audited few-shot exemplars; extraction precision on entity types hit 91% after two revision passes.
Graph construction and deduplication
Entities become nodes; relationships become weighted edges. A entity resolution step merges duplicates via embedding similarity plus deterministic rules (LEI, DUNS, normalized legal names). Without resolution, graphs sprawl and community detection fragments.
Community detection and hierarchical summaries
Algorithms like Leiden partition the graph into communities at multiple levels (fine to coarse). For each community, an LLM reads member entities, their incident edges, and representative text units, then writes a community report — a standalone summary of what that cluster is about. Coarse-level reports summarize child communities, forming a tree Harbor Legal attorneys could browse without running a query.
Dual indexes
Production Graph RAG keeps both the knowledge graph (for traversal) and a standard vector index over text units (for lexical/semantic fallback). Queries rarely use only one store.
Query-time routing: local search vs global search
Local search (entity-centric)
Best for questions anchored to named entities: “What termination rights does Vendor X have in the 2023 MSA?” Steps:
- Map query entities to graph nodes (fuzzy match + alias table).
- Expand 1–2 hops along high-salience edge types.
- Collect linked text units; optionally rerank with a cross-encoder.
- Generate answer with citations to text-unit IDs.
Local search resembles multi-hop RAG but uses precomputed structure instead of iterative LLM-planned retrieval steps at query time.
Global search (community map-reduce)
Best for thematic or aggregate questions: “How did indemnification language shift across healthcare vendors since 2022?” Steps:
- Embed the question; score all community reports at a chosen hierarchy level.
- Take top-N community summaries as context shards.
- Map: ask the LLM partial answers per shard with strict citation to report IDs.
- Reduce: synthesize a single answer; flag contradictions across shards.
Global search is where Graph RAG most clearly beats flat RAG — no single chunk embedding matches “shift across healthcare vendors,” but community reports already encode that synthesis.
Query classifier
A lightweight classifier or LLM router assigns LOCAL,
GLOBAL, or HYBRID. Harbor Legal routed ambiguous
queries to hybrid: global summaries for framing, then local expansion on
entities mentioned in the draft answer.
Harbor Legal refactor: numbers and architecture
Harbor’s corpus: 2,400 MSAs, 8,100 amendments, 1,200 insurance riders (~940k text units). Index build (entity extract + Leiden + summaries) ran overnight on a 4×L4 GPU pool; incremental nightly jobs processed only changed contracts.
- Entity graph — 186k nodes, 412k edges after resolution.
- Communities — 3 hierarchy levels; 2,400 leaf communities, 380 mid-level, 48 top-level theme clusters.
- Eval split — 220 local questions, 180 global/thematic questions.
- Flat RAG baseline — recall@10: local 71%, global 47%.
- Graph RAG — recall@10: local 83%, global 78%.
- End-to-end attorney rating — answer usefulness 62% → 84%.
- Latency — local p95 3.2s (+0.8s vs flat); global p95 8.1s (map-reduce over 12 shards).
They kept flat RAG as fallback when the graph router confidence was low (BM25 hit on exact clause numbers) and wired graph provenance into their existing citation UI so attorneys could click through to PDF page anchors.
Technique decision table
| Approach | Strength | Weakness | Use when |
|---|---|---|---|
| Flat chunk RAG | Simple, fast to ship | Misses distributed and thematic questions | Small corpus; answers in one passage |
| Graph RAG local | Structured multi-document entity context | Index build cost; needs good entity resolution | Named-entity questions with cross-doc links |
| Graph RAG global | Thematic synthesis over whole corpus | Higher latency; summary staleness on updates | Portfolio-wide trends, risk themes, audits |
| Multi-hop RAG | Flexible at query time | LLM plans each hop; cost and error compounding | Exploratory queries; graph too stale to maintain |
| Agentic RAG | Self-correcting retrieval loops | Unpredictable tool chains; harder to eval | Mixed tool + search workflows |
| Parent-child chunking only | Better context windows per hit | No explicit relationship model | Hierarchical docs; limited cross-doc needs |
Common pitfalls
- Skipping entity resolution — duplicate nodes split communities and break local search; invest in merge rules early.
- Over-extracting relationships — noisy edges connect unrelated documents; use confidence thresholds and domain schemas.
- Stale community summaries — amended contracts without re-summarization mislead global search; tie summary jobs to document versions.
- Global search on entity-specific questions — wastes tokens and dilutes precision; train the router on labeled local vs global sets.
- Treating Graph RAG as replacement for all RAG — flat retrieval still wins on exact keyword and table lookups.
- Unbounded graph expansion — 3-hop traversals on dense graphs stuff irrelevant chunks; cap hops and edge weights per query class.
- No provenance in summaries — community reports must cite text-unit IDs or attorneys cannot verify claims.
- Underestimating index cost — full re-graph monthly on huge corpora blows budget; design incremental entity upserts.
Production checklist
- Define entity and relationship ontology for your domain before extraction.
- Split documents into text units with stable IDs and PDF page anchors.
- Run extraction + resolution; audit precision on a 200-unit gold set.
- Build graph; run Leiden (or similar) at 2–3 hierarchy levels.
- Generate community reports with mandatory provenance citations.
- Keep parallel vector + BM25 indexes for fallback and hybrid fusion.
- Train or prompt a local/global/hybrid query router; log routing decisions.
- Implement incremental updates: new docs → entities → affected communities only.
- Eval separate local and global question sets; track recall@k and citation accuracy.
- Cap global map-reduce shards; use hierarchical summarization for very large corpora.
- Expose graph neighborhoods in the UI for trust and debugging.
Key takeaways
- Graph RAG precomputes entity structure and community summaries so query time can traverse relationships instead of relying on chunk similarity alone.
- Local search answers entity-specific questions; global search map-reduces over community reports for thematic portfolio questions.
- Entity resolution and incremental summary refresh are as important as the retrieval algorithm.
- Harbor Legal raised global recall@10 from 47% to 78% on cross-contract questions with a dual-index Graph RAG pipeline.
- Keep flat RAG as fallback; Graph RAG complements rather than replaces dense retrieval.
- Use multi-hop or agentic RAG when you cannot afford graph index maintenance; use Graph RAG when structure is stable and cross-document synthesis is core.
Related reading
- RAG retrieval-augmented generation explained — baseline retrieve-then-generate architecture
- LLM RAG multi-hop retrieval explained — query-time iterative retrieval without a prebuilt graph
- Agentic RAG explained — self-correcting retrieval loops and tool orchestration
- Named entity recognition explained — entity extraction foundations for graph construction