Guide

LLM agent PII detection and redaction pipeline systems explained

Harbor Benefits deployed an enrollment assistant that read member intake forms, verified eligibility against a payer API, and opened Zendesk tickets for edge cases. When a member pasted their full Social Security number into chat — “my SSN is 523-44-8912, please confirm my plan” — the agent echoed the number into the ticket body and a CRM note field visible to 40 tier-1 support reps. A compliance scan flagged the incident 11 days later. In the prior quarter, 23% of agent-generated outbound records (tickets, emails, webhook payloads, trace exports) contained at least one regulated identifier the model had copied from context.

After Harbor rebuilt the stack around a dedicated PII detection and redaction pipeline — layered detectors, reversible token vaults, scrub points before every model call and tool write, and redacted audit trails — regulated-field leaks fell to 0.2%, false-positive blocks on legitimate eligibility checks dropped from 8% to 1.1%, and mean time to produce a regulator-ready incident packet shrank from days to 18 minutes. This guide explains PII taxonomy for agents, how redaction differs from guardrails and secret injection, detector stacks, vault patterns, where to place scrubbers in the agent loop, integration with tenant isolation, the Harbor refactor, a technique decision table, pitfalls, and a production checklist.

PII pipelines vs guardrails vs secrets vs tenant isolation

Privacy failures in agent systems rarely come from one missing checkbox. Four layers address different leak classes:

Guardrails — validate structure and policy on model output (schema, toxicity, disallowed actions). They do not reliably spot a nine-digit number formatted like prose.
Secret injection — keep API keys and database passwords out of prompts. SSNs are not secrets in the vault sense; they are user data the model is allowed to reason about locally but must not persist broadly.
Tenant isolation — prevent Customer A’s vectors and runs from mixing with Customer B’s. Isolation does not stop a single tenant’s agent from writing PII into a shared support queue.
PII pipeline — detect regulated and sensitive fields in text and structured payloads, replace or block them before they cross trust boundaries (model providers, logs, third-party tools, human-visible tickets).

Harbor’s Zendesk leak passed guardrails (valid JSON ticket body), secret hygiene (no API keys in the payload), and tenant routing (correct payer namespace). What failed was data minimization on egress — the class of problem a PII pipeline owns.

PII taxonomy for agent systems

Build detectors against an explicit catalog, not ad hoc regex for “things that look sensitive”:

Direct identifiers

Government IDs — SSN, national ID, passport number, driver license (with jurisdiction-specific formats and checksum rules).
Financial account numbers — bank account, routing, IBAN, card PAN (PCI scope — often block entirely rather than tokenize in agent context).
Health identifiers — MRN, Medicare HICN, insurance member ID where regulated as PHI under HIPAA or local equivalents.
Biometric templates — rarely in text agents, but block if base64 embeddings appear in uploads.

Quasi-identifiers and contact data

Email, phone, street address, GPS coordinates, device advertising IDs.
Full name combined with other quasi-fields (single names are noisy; use composite rules).
Date of birth, age under 18, school name in K–12 contexts.

Contextual sensitive content

Free-text medical diagnoses, legal allegations, salary figures in HR workflows.
Authentication artifacts — OTP codes, password reset links, session cookies (treat like secrets; block, do not tokenize).

Tag each class with action policy: mask for logs, vault token for model reasoning, hard block for PCI PAN, allow-with-consent for verified internal APIs only.

Detection stack: regex, validators, NER, and ensemble scoring

Production pipelines combine cheap high-precision rules with slower ML layers:

Format regex with checksums — Luhn for cards, SSA invalid ranges for SSN, IBAN mod-97. High precision; misses obfuscated variants (“five two three dash…”).
Normalization pass — strip punctuation and spoken-number words before re-running regex; catches common user workarounds.
Named-entity recognition — transformer NER for PERSON, LOCATION, ORG; tune on domain forms (benefits intake has different false-positive rates than legal contracts).
Classifier ensemble — score spans with a lightweight model trained on labeled agent traffic; require two agreeing signals before blocking high-friction fields.
Allowlists — test member IDs matching internal patterns, demo sandbox fixtures, known fake SSN ranges in staging.

Run detectors on structured fields separately — JSON keys like ssn, dob, member_id should trigger even when values lack regex signatures. Harbor’s worst leaks came from the model renaming fields in tool args (“tax_id”) while copying the same value.

Redaction strategies: mask, vault token, hash, and block

Strategy	Model sees	Reversible for authorized tools?	Typical use
Mask	`*--8912`	No	Logs, traces, human tickets
Vault token	`[PII:SSN:v8k2m]`	Yes, via broker	Model reasoning + selective tool hydration
Hash (salted)	`sha256:abc…`	No	Dedup and fraud signals without storing raw
Hard block	Request rejected	N/A	PCI PAN, live session tokens

Reversible token vault pattern

When the model must call an eligibility API that needs a real SSN, do not put the raw value in context. Flow:

Ingress scrubber detects SSN, stores ciphertext in a vault keyed by [PII:SSN:v8k2m] scoped to run_id and tenant_id.
Prompt and tool observations contain only the token.
Pre-tool hook on verify_eligibility resolves token to plaintext inside the broker, calls the API, returns redacted summary to the model (“member eligible, plan Gold”).
Vault entry expires when the run completes or after TTL.

This mirrors credential injection but for user PII with stricter retention and access logging.

Pipeline placement in the agent loop

Scrub at every trust-boundary crossing, not only on user ingress:

User message ingress — before first model call; tokenize fields needed for later tools.
Retrieved RAG chunks — knowledge bases often contain legacy tickets with PII; scrub at retrieval time.
Tool results — payer APIs return full member rows; project to allowlisted columns before appending to context.
Model output pre-tool — scan proposed tool arguments; block or rewrite before side effects.
Outbound integrations — Zendesk, Slack, email, webhooks get a final egress scrub.
Observability — traces and span attributes store token IDs or masks, never raw values; align with audit policy.

Harbor added a single scrubber service with consistent policies rather than per-integration one-offs — the Zendesk adapter had been redacting emails while the CRM adapter was not.

Harbor Benefits refactor walkthrough

Before: privacy relied on system prompt instructions (“never include SSN in tickets”) and quarterly manual ticket audits.

After:

Published a PII catalog with 47 field types mapped to mask, vault, or block.
Deployed ingress + egress scrubbers on the shared agent middleware hook pipeline.
Introduced vault tokens for SSN and member ID; only verify_eligibility and update_enrollment could resolve tokens, with per-call audit events.
Structured tool schemas marked sensitive fields; guardrails rejected payloads where redacted tokens were stripped by the model.
Weekly eval set of 2,400 synthetic intakes measuring leak rate and false-positive block rate.

Outcomes over two quarters: regulated-field leak rate 23% → 0.2%, false-positive eligibility blocks 8% → 1.1%, regulator packet assembly days → 18 minutes, zero repeat SSN-in-ticket incidents in production.

Technique decision table

Approach	Best when	Weak when
Prompt-only (“don’t leak PII”)	Prototypes, low-risk internal bots	Any regulated egress or third-party model
Regex at ingress only	Known fixed formats (SSN, email)	Model paraphrases or tool-returned PII
Full pipeline + vault tokens	Tools need raw IDs; minimize exposure	Vault ops and key management overhead
Hard block all PII	Public-facing Q&A with no identity tools	Enrollment, KYC, support workflows
On-prem model only	Strict data residency	Does not fix ticket and log leaks

Common pitfalls

Scrubbing prompts but not tool args — models copy PII from memory into create_ticket.body.
Irreversible mask before tool needs value — eligibility APIs fail; teams bypass pipeline “temporarily.”
Vault without tenant scope — token from Tenant A resolved in Tenant B’s run.
Over-redaction breaking utility — masking every digit kills order-number troubleshooting; tune per field class.
Logging scrubber bypass — debug flags dump raw prompts; gate behind break-glass with mandatory audit.
Third-party model retention — redact before sending to external APIs if contracts prohibit PII processing.
False negatives on obfuscation — spoken numbers and unicode homoglyphs; invest in normalization.

Production checklist

Inventory PII field types per workflow with mask, vault, or block policy.
Scrub at ingress, RAG retrieval, tool results, pre-tool output, and egress.
Use vault tokens for fields tools must resolve; TTL-bound to run scope.
Validate structured tool schemas; reject stripped or renamed sensitive keys.
Log token IDs in traces, not raw values; align with compliance retention.
Measure leak rate and false-positive rate on a fixed eval set weekly.
Document break-glass debug path with dual-control and auto-expiry.
Review third-party model and tool DPAs for PII processing allowances.

Key takeaways

PII pipelines minimize data on egress — distinct from guardrails, secrets, and tenant walls.
Layer regex, NER, and schema rules — models paraphrase and tools return fresh PII.
Vault tokens enable safe tool use without keeping raw SSNs in context.
Scrub every trust boundary — especially tool writes to Zendesk, email, and webhooks.
Harbor cut regulated leaks 23% to 0.2% with catalog, middleware scrubbers, and vault brokers.