Guide

Differential privacy explained

Harbor Analytics shipped a gradient-boosted fraud classifier trained on 4.2 million labeled card transactions. Accuracy beat the prior rules engine by eleven points, but legal paused the launch with one question: could a competitor or attacker learn whether a specific customer appeared in the training set — or reconstruct their spending pattern from published model weights? Anonymizing columns and hashing account IDs is necessary but not sufficient; models memorize outliers and membership inference attacks can succeed on nominally “aggregated” releases. Differential privacy (DP) answers with a formal guarantee: changing one individual's data in the dataset can only shift the probability of any output by a bounded factor. That bound is parameterized by epsilon (ε) and optionally delta (δ), and it composes across queries so teams can account for a finite privacy budget. This guide defines (ε, δ)-DP intuitively and mathematically, covers Laplace and Gaussian output perturbation, DP-SGD for neural networks, local versus central DP, privacy budget accounting, works a Harbor Analytics deployment, provides a method decision table, lists pitfalls, and ends with a production checklist. For retention and logging policy, see LLM data privacy; for training without centralizing raw data, see federated learning.

What differential privacy guarantees

A randomized algorithm M satisfies (ε, δ)-differential privacy if, for any two neighboring datasets D and D' that differ in exactly one individual's record, and for any set of possible outputs S:

Pr[M(D) ∈ S] ≤ e^ε · Pr[M(D') ∈ S] + δ

Intuition: an observer seeing the output of M gains at most ε bits of information about whether any single person was included. Smaller ε means stronger privacy. The δ term allows a tiny probability of catastrophic failure — often set to something negligible like 1 / n² where n is dataset size.

Neighboring datasets

Two standard definitions:

Bounded DP: D' has the same number of rows as D but one row differs (replace-one neighbor).
Unbounded DP: D' has one fewer row (add/remove-one neighbor) — common in production when users can delete accounts.

Document which neighbor relation your ε applies to; auditors will ask. Harbor Analytics uses unbounded neighbors because GDPR deletion requests remove a row entirely.

What DP does and does not promise

Does: limit inference about any specific individual's presence or attributes from model outputs, statistics, or gradients.
Does not: prevent learning population-level patterns — that is the point of analytics.
Does not: replace access control, encryption, or consent management — DP is one layer in a defense stack.
Does not: automatically stop model misuse at inference time; pair with policy and monitoring.

Core mechanisms: Laplace and Gaussian noise

The simplest DP algorithms add calibrated noise to a numeric output.

Laplace mechanism

For a function f with sensitivity Δf (maximum change in f when one neighbor changes), release:

M(D) = f(D) + Laplace(0, Δf / ε)

Example: counting users in a cohort has sensitivity 1 (one person changes the count by at most 1), so a count of 12,483 becomes 12,483 plus Laplace noise scaled by 1/ε. At ε = 1, noise has standard deviation 1; at ε = 0.1, noise is ten times wider and the count is much less precise.

Gaussian mechanism

For (ε, δ)-DP with δ > 0, add Gaussian noise:

M(D) = f(D) + Normal(0, σ²)

where σ is calibrated from Δf, ε, and δ via analytic formulas (often implemented by libraries). Gaussian noise composes more cleanly across many steps — why DP-SGD uses it.

Sensitivity is the design lever

Lower sensitivity means less noise for the same ε. Techniques that shrink sensitivity include clipping gradients, aggregating into coarse bins, and limiting how much one user can contribute (per-user row caps). Harbor caps each account at 500 transactions per training window so a single whale cannot dominate a gradient sum.

DP-SGD: private deep learning

Publishing raw gradients from neural network training leaks training examples (membership inference and reconstruction attacks). Differentially private stochastic gradient descent (DP-SGD), introduced by Abadi et al., privatizes training:

Per-example gradient clipping: bound each sample's gradient norm to C (e.g. 1.0).
Noise addition: add Gaussian noise to the sum of clipped gradients before the optimizer step.
Privacy accounting: track (ε, δ) consumed per epoch using composition theorems or tight accountants (RDP, PLD).

Libraries like Opacus (PyTorch), TensorFlow Privacy, and JAX implementations wrap standard training loops. Expect accuracy trade-offs: Harbor's document-embedding classifier lost 2.1 F1 points at ε = 3 versus non-private training, acceptable for a compliance-facing release.

Hyperparameters that matter

Clip norm C: too low clips signal; too high requires more noise.
Noise multiplier: directly trades privacy for utility; tune jointly with ε target.
Batch size: larger batches improve signal-to-noise per step but change privacy accounting.
Epochs: more passes consume more budget — early stopping is a privacy tool.

Privacy budget accounting and composition

Every DP release spends from a finite privacy budget. If query A uses (ε₁, δ₁) and query B uses (ε₂, δ₂), naive composition gives (ε₁ + ε₂, δ₁ + δ₂) for adaptive adversaries. Advanced composition and renyi differential privacy (RDP) accountants yield tighter bounds — critical when DP-SGD runs thousands of steps.

Practical budget policies

Single release: one model training run with total ε declared in the model card.
Dashboard analytics: allocate ε per metric per quarter; stop publishing when budget exhausts.
Per-user budgets: in local DP on devices, each user has their own ε; server aggregates without seeing raw events.

Harbor's privacy board caps any production model at ε = 8, δ = 10⁻⁶ per training snapshot, with mandatory re-approval above ε = 4 for external API exposure.

Local DP vs central DP

Model	Where noise is added	Trust assumption	Typical use
Central (curator) DP	Trusted aggregator adds noise to statistics or training	Server honest-but-curious; users trust curator	Census statistics, enterprise ML on warehouse data
Local DP	Each device adds noise before sending anything	Server untrusted; users protect themselves	Browser telemetry, keyboard emoji frequency, mobile analytics
Distributed DP / secure aggregation	Clients clip; server adds noise to sum of encrypted gradients	Hybrid: crypto hides raw gradients, server adds final noise	Cross-device federated learning at scale

Local DP needs much larger ε (or many more users) for the same utility as central DP because noise is added per user before aggregation. Choose central DP when you already operate a governed data warehouse; choose local DP when raw events must never leave the device.

Harbor Analytics: fraud classifier with DP-SGD

Harbor's payments risk team retrained a tabular neural network (four hidden layers, ~180k parameters) on 4.2M transactions across 890k accounts. Legal required a documented privacy guarantee before exposing scores via partner API.

Threat model

Adversary: semi-honest API consumer with model weights or repeated score queries.
Goal: infer membership of a known cardholder or reconstruct high-value spending outliers.
Neighbor: unbounded — remove or add one account's full transaction history.

Implementation

Per-account row cap of 500 transactions in the 90-day window (sensitivity control).
Feature normalization on bounded scales; no raw merchant strings in the model (hashed buckets only).
DP-SGD via Opacus: clip norm 1.0, noise multiplier 1.1, batch size 2048, 15 epochs.
RDP accountant reported final (ε = 3.2, δ = 10⁻⁶) for the training run.
Membership inference attack benchmark (shadow models) showed attack AUC 0.52 private vs 0.71 non-private baseline.

Utility outcome

Test AUC dropped from 0.941 (non-private) to 0.918 (DP). False-positive rate at the production threshold rose from 0.8% to 1.1% — within the 0.5pp budget risk accepted by operations. Legal approved external API access with ε documented in the model card and a annual retrain budget of ε = 3.2 per snapshot (no mid-year fine-tune without new accounting).

Method decision table

Scenario	Recommended approach	Avoid
Publishing aggregate counts or histograms	Laplace or Gaussian mechanism with known sensitivity	Raw counts with “we rounded” as the only protection
Training neural nets on sensitive warehouse data	DP-SGD with RDP accountant; per-user contribution caps	Assuming anonymization alone prevents memorization
Mobile telemetry, server cannot be trusted	Local DP with randomized response or local noise	Central DP claims when raw events leave the device unnoised
Cross-silo training (hospitals, banks)	Federated learning + secure aggregation + central DP noise on updates	Federated averaging without DP when gradients leak patient rows
Releasing ML model weights publicly	DP training + empirical membership attack eval; consider not releasing weights	Publishing exact weights from non-private training on PII
LLM fine-tune on customer support logs	DP-SGD or PATE; strict ε budget; synthetic data supplement	Full fine-tune without extraction testing or DP

Common pitfalls

ε-washing. Claiming “differential privacy” without stating ε, δ, neighbor definition, and composition scope.
Ignoring post-processing. DP outputs can be transformed freely without spending more budget — but combining DP and non-DP sources breaks guarantees.
Unbounded sensitivity. A single user contributing millions of rows makes gradient sums huge; clip or cap contributions first.
Composition blind spots. Each hyperparameter sweep, dashboard refresh, and A/B test spends budget unless isolated.
Utility shock without stakeholder buy-in. ε = 0.5 may be unusable; negotiate acceptable accuracy loss before engineering starts.
Confusing DP with encryption. Encrypted training data is not DP until noise and accounting are applied to released outputs.
Stale ε in model cards. Retraining with the same nominal ε but more epochs or data changes the actual guarantee.
Local DP with tiny cohorts. Noise dominates when only hundreds of users contribute; need scale or higher ε.

Production checklist

Define neighbor relation (bounded vs unbounded) and document it in the privacy policy.
Set target (ε, δ) with legal and product sign-off before training.
Bound per-user contribution (row caps, clipping) to control sensitivity.
Use a maintained DP library (Opacus, TF Privacy) rather than hand-rolled noise.
Run a privacy accountant for the full training schedule, not a single step.
Benchmark utility loss vs non-private baseline on held-out test data.
Run membership inference or extraction attacks as a sanity check.
Publish ε, δ, and training date in the model card or API documentation.
Track cumulative privacy spend across releases in a budget ledger.
Pair DP with access control, retention limits, and data governance.

Key takeaways

Differential privacy bounds how much any single individual's data can influence model outputs, parameterized by (ε, δ).
Laplace and Gaussian mechanisms add calibrated noise; DP-SGD privatizes gradient-based training at a measurable utility cost.
Privacy budgets compose — every query and training run spends ε; accountants prevent silent over-spend.
Central DP suits governed warehouses; local DP suits untrusted servers and device telemetry at scale trade-offs.
DP complements but does not replace encryption, access control, or adversarial testing.