Guide

LLM PII detection and redaction explained

Harbor Analytics' fraud-scoring assistant ingested merchant dispute PDFs, chunked them into a vector index, and called a hosted LLM on every new chargeback. A compliance audit found full card numbers and cardholder names in three places: raw PDF text in object storage, embedding payloads in the vector DB, and OpenTelemetry spans that logged the assembled prompt verbatim. Legal did not care that the team had a “no training on customer data” clause with the vendor — the breach surface was their own pipeline. The fix was not “stop using AI”; it was a PII detection and redaction pipeline with defined entity types, scanners at ingress and egress, reversible surrogate tokens for internal replay, and hard blocks before any third-party API call.

PII detection identifies names, government IDs, payment instruments, contact details, and health identifiers in unstructured text. Redaction replaces or removes those spans so downstream systems never store or transmit raw values. In LLM stacks this is distinct from broad data privacy policy and from content safety guardrails — PII pipelines are deterministic hygiene you run on every byte, not a model instruction. This guide covers entity taxonomy, detection techniques, where to place scanners, redaction strategies, the Harbor Analytics refactor, a technique decision table, pitfalls, and a production checklist.

What counts as PII in LLM pipelines

Regulators and enterprise contracts use overlapping definitions. For engineering, treat PII as any field whose presence in a prompt, log, or index could trigger a breach notification or contract violation if exposed to the wrong party.

Category	Examples	Typical detection difficulty
Direct identifiers	Full name, SSN, passport, national ID, email, phone	Medium — regex helps; names need NER
Financial	Card PAN, CVV, IBAN, bank account, crypto wallet tied to KYC	Low for PAN (Luhn); high for free-form account notes
Health (PHI)	Diagnosis codes, prescription text, MRN, provider notes	High — domain NER and allowlists
Location	Street address, precise GPS, IP in some jurisdictions	Medium — address parsers plus context
Quasi-identifiers	Employer + job title + ZIP, rare occupation strings	Very high — often policy, not automation
Secrets in user paste	API keys, session cookies, passwords in support tickets	Low for key patterns; treat as PII for logging

Quasi-identifiers rarely appear on checkbox compliance forms but matter when you embed millions of support tickets: three fields together can re-identify a person even after you mask their email.

Detection techniques and trade-offs

No single detector catches everything at acceptable latency. Production stacks layer methods and tune per entity type.

Rule and regex scanners

Fast, auditable, and cheap. Use for credit cards (Luhn validation), emails, phones, SSN patterns, and JWT/API-key shapes. Weak on names, addresses with odd formatting, and international ID formats. Libraries like Microsoft Presidio, AWS Comprehend custom patterns, or in-house rule packs are common starting points.

Named-entity recognition (NER)

Statistical or transformer models label PERSON, LOCATION, ORG spans in context. Better than regex for “John sent payment to Maria” but produces false positives on product names and city names in shipping addresses. Run NER after regex so structured IDs are already masked; use confidence thresholds and entity-specific policies.

LLM classifiers and judges

A small model or LLM-as-judge pass can flag subtle PHI in clinical notes or policy numbers in legal text. Higher latency and cost; risk of the classifier missing what the main model would see. Reserve for high-risk document classes, not every chat turn.

Allowlists and blocklists

Merchant names, internal project codenames, and public company tickers should not be redacted as PERSON or ORG. Maintain tenant-specific allowlists to cut false positive rate without widening leak windows.

Where to place scanners in the stack

Teams that only scrub user input still leak when the model quotes retrieved chunks or when tool results echo database rows. Map every surface:

Client ingress — browser or mobile before upload; reduces bytes on the wire but cannot trust hostile clients alone.
API gateway (authoritative ingress) — scan assembled prompt (system + history + RAG context + tool output) immediately before vendor API call; block or strip on high-confidence hits.
RAG ingestion — redact at chunk time so embeddings never store raw PAN; see document ingestion patterns.
Tool and function returns — CRM lookups and SQL agents re-introduce PII; scan outbound tool payloads before they enter context.
Egress (model output) — models paraphrase retrieved text and can re-surface masked values; scan completions before showing to users or writing to tickets.
Observability — traces and prompt logs are the most common audit failure; redact span attributes and store hashes instead of raw prompts where possible.

Ingress without egress is half a fence. Harbor Analytics added egress scanning after a pilot model echoed a card number from a retrieved paragraph that ingress had missed because the chunk was ingested before scanners existed.

Redaction strategies

Detection without a consistent replacement strategy breaks downstream UX and forensics. Pick per entity type:

Strategy	What users/model see	Reversible?	Best for
Deletion	Gap or empty string	No	Logs, third-party APIs, public exports
Static mask	`**--**-1234`	No	Display to analysts; model still sees last-4 pattern
Surrogate token	`<PERSON_a7f3>`	Yes, with vault	Internal replay, eval sets, support handoff
Format-preserving encryption	Same shape, different digits	Yes, with key	Legacy systems that validate format
Hash fingerprint	`sha256:9b2c...` in logs only	No (one-way)	Correlating duplicate leaks without storing value

Surrogate tokens map Jane Doe to <PERSON_8c21> in the prompt sent to the vendor while a tenant-scoped vault holds the mapping. Support staff with entitlement can de-tokenize in the UI; the vendor never sees the raw name. Rotate vault keys and expire mappings on retention schedules aligned with retention policy.

Harbor Analytics refactor (worked example)

The fraud team rebuilt ingestion and inference around a single pii_sanitize(text, policy_id) service:

Policy packs per document class — chargebacks: PAN, name, email, phone; KYC uploads: government ID types plus address; public FAQs: email only.
Layered detectors — regex for financial IDs, spaCy NER for names, allowlist for merchant DBA strings; LLM judge only on “medical hardship” free-text fields.
Ingestion gate — chunks written to the vector index store only sanitized text plus content_hash; originals in encrypted bucket with stricter ACL.
Gateway block mode — if PAN confidence > 0.9 after sanitize, abort API call and return a structured error to the analyst UI instead of silently truncating.
Trace redaction — OpenTelemetry exporter runs the same policy on span attributes; raw prompts never leave the compliance VPC.
Regression tests — golden files with synthetic PII in edge layouts (tables, OCR noise, JSON tool payloads) run in CI on every policy change.

Chargeback summarization quality dropped 2% on name-heavy narratives until they switched from deletion to surrogate tokens so the model could still reason about “the same person” across paragraphs without seeing literal names.

Technique decision table

Approach	Use when	Avoid when
Regex-only gateway	Low-risk internal copilots, strict input length, financial IDs dominate	Clinical, legal, or HR text with names and addresses
NER + regex hybrid	General enterprise RAG and support bots	You lack allowlists and false positives flood human review
LLM classifier pass	High-risk document classes, after cheaper layers	Every token on sub-200ms chat latency budgets
Block on detect	Regulated data must never reach vendor	UX requires best-effort answer and risk is acceptable
Surrogate tokens	Internal replay, eval, and analyst UI de-tokenize	No vault ops or key management maturity
Vendor-native DLP	Single cloud, fast MVP, audit accepts shared responsibility	Multi-vendor routing or data must not leave your region
Trust the model prompt	Never in production	Always — models ignore “do not repeat PII” under retrieval pressure

Common pitfalls

Scanning only the latest user message — RAG context and tool results carry most PII in production incidents.
Redacting logs but not embeddings — vectors are searchable for years; ingest-time sanitize is mandatory.
Over-redaction without allowlists — stripping merchant names destroys fraud signals and trains users to paste elsewhere.
Irreversible masks in eval golden sets — you cannot debug regressions when every name became ****.
Surrogate vault in the same DB as prompts — one SQL injection restores full PII; isolate vault with tighter IAM.
Ignoring model egress — completion can quote retrieved chunks verbatim; egress scan is not optional.
Locale-blind regex — EU phone formats and national IDs slip through US-centric rules.
No metrics on detector drift — false negative rate rises when users paste new formats; sample and label monthly.
PII in few-shot examples — real tickets in prompts become permanent leak surface; synthetic or surrogate-only exemplars.

Production checklist

Inventory PII categories per workflow and document class.
Define policy packs (entities, actions: mask, surrogate, block).
Run authoritative scan at API gateway on full assembled context.
Sanitize at RAG ingestion; never embed raw regulated fields.
Scan tool outputs and model completions (egress) before persist or display.
Redact OpenTelemetry and prompt logs; store hashes for correlation.
Maintain tenant allowlists for product, merchant, and org names.
Use surrogate tokens where internal replay needs reversible mapping.
CI golden tests with synthetic PII in tables, JSON, and OCR noise.
Alert on block-mode triggers and rising false-positive review queue depth.
Document which vendor regions receive sanitized vs blocked traffic.
Re-scan historical indexes after policy tightening; backfill is not automatic.

Key takeaways

PII pipelines are engineering controls — not a sentence in the system prompt.
Scan ingress, egress, ingestion, and logs — four different leak paths.
Layer regex, NER, and selective LLM judges by entity risk and latency budget.
Surrogate tokens preserve utility for analysts without sending raw values to vendors.
Block mode beats silent truncation when contract or law forbids external exposure.