Guide

HashiCorp Vault explained

Your staging database password lives in a Slack thread, production copies it from a shared 1Password vault, and the CI pipeline still injects DB_PASS as a plain environment variable anyone with repo access can read. When an engineer leaves or a laptop is stolen, you rotate manually and hope nothing was cached. HashiCorp Vault is an open-source secrets management platform that stores sensitive values encrypted at rest, issues short-lived dynamic credentials on demand, and logs every access for audit. It is not a password manager for humans — it is infrastructure that applications and operators call over TLS to lease secrets with explicit TTLs and path-based policies. This guide covers Vault’s architecture (storage, seal/unseal, namespaces), secrets engines (KV v2, database, PKI, Transit), authentication methods (AppRole, Kubernetes, JWT/OIDC), policy design in HCL, integration with Kubernetes and Helm, a Harbor Fleet order-service worked example, a tooling decision table, common pitfalls, and a production checklist alongside our zero trust overview and OAuth guide.

What Vault is — and what it is not

Vault is a centralized secrets broker. Clients authenticate, receive a token (or wrapped token) scoped by policies, then read, write, or generate secrets through HTTP APIs. Unlike dropping secrets into config files — which violates twelve-factor config discipline when those files are committed — Vault keeps ciphertext in a storage backend (Consul, integrated Raft, cloud object store) and decrypts only while unsealed.

Vault is not a replacement for OAuth user login. OIDC auth in Vault lets humans obtain tokens for break-glass admin; application workloads use AppRole, Kubernetes service account JWTs, or cloud IAM. Vault also does not replace your database — the database secrets engine creates temporary DB users with CREATE USER privileges on the target server.

Core concepts

  • Storage backend — durable encrypted blob store; loss of quorum here is catastrophic.
  • Seal / unseal — master key shards (Shamir) or auto-unseal via cloud KMS; Vault serves no secrets while sealed.
  • Secrets engine — plugin mounted at a path (secret/, database/, pki/) that implements read/write/generate semantics.
  • Auth method — how callers prove identity before receiving a token.
  • Policy — HCL document granting capabilities (read, create, update, delete, list) on path prefixes.
  • Lease and TTL — every secret has a time-to-live; renewal extends life until max TTL; revocation is immediate.
  • Audit device — append-only log of every request (success and denial) for compliance.

Seal, unseal, and high availability

On startup Vault is sealed: the encryption key protecting the storage backend is split and unknown to the process. Operators (or an auto-unseal integration with AWS KMS, GCP CKMS, or Azure Key Vault) provide enough key shards to reconstruct the master key. Only then can Vault decrypt stored secrets and serve traffic.

Production clusters run 3 or 5 Vault nodes behind a load balancer with integrated Raft storage or Consul as the backend. One node is the active leader handling writes; standbys forward or replicate. If the leader dies, Raft elects a new leader — but if the cluster seals (e.g. all nodes restart without auto-unseal), every dependent service loses secret access until someone unseals. Runbooks must document who holds unseal shards and test disaster recovery quarterly.

Auto-unseal trade-offs

Auto-unseal delegates trust to a cloud KMS. Convenience wins for always-on SaaS; regulated environments sometimes mandate manual Shamir shards so no single cloud API call can decrypt everything. Hybrid setups use auto-unseal for routine restarts and break-glass Shamir for recovery when KMS is unavailable.

KV secrets engine (version 2)

The KV v2 engine at secret/data/ is the starting point for API keys, TLS bundles, and third-party tokens. Version 2 adds soft-delete, version history, and check-and-set via cas metadata — critical when two deploy pipelines might overwrite the same path.

# Write API key (CLI)
vault kv put secret/data/harbor-fleet/stripe api_key=sk_live_...

# Read with version metadata
vault kv get -format=json secret/data/harbor-fleet/stripe

# Policy snippet — read only this path
path "secret/data/harbor-fleet/stripe" {
  capabilities = ["read"]
}

KV is static secrets: you still manage rotation schedules. For databases and cloud IAM, prefer dynamic engines below. Store only what must be static (vendor API keys with no rotation API) in KV; everything else should lease and expire.

Dynamic database credentials

The database secrets engine connects to PostgreSQL, MySQL, MongoDB, and others with a privileged root connection (itself stored in Vault). When an app requests credentials at database/creds/orders-readonly, Vault runs SQL to create a user named like v-token-orders-r-abc123, grants the role defined in config, and returns username/password with a 1-hour default TTL. When the lease expires, Vault drops the user.

Why dynamic beats shared passwords

  • Blast radius — a leaked credential works only until TTL; no long-lived app_user in every container.
  • Audit — each lease maps to a Vault token identity; database logs show distinct usernames per service.
  • Rotation without deploys — apps renew leases via sidecar or SDK; no weekend password rotation window.

Configure connection URLs, roles (creation and revocation SQL), and statement timeouts. Test revocation paths — orphaned DB users after failed revokes fill pg_user over months.

PKI, Transit, and other engines

Beyond KV and database, three engines appear frequently in production:

  • PKI — internal certificate authority; issue short-lived TLS certs for service-to-service mTLS aligned with zero trust models. Define roles with allowed_domains, max TTL, and key types; automate renewal via cert-manager or Vault Agent.
  • Transit — encryption-as-a-service; apps send plaintext, Vault encrypts with a named key and returns ciphertext. Useful when you cannot store keys in the app but need field-level encryption in your database.
  • AWS/GCP/Azure secrets engines — mint cloud IAM credentials with scoped policies instead of long-lived access keys on EC2 instances.

Enable only engines you use. Each mount increases attack surface and operator cognitive load.

Authentication methods for workloads

Auth translates identity into a Vault token. Pick one method per deployment target:

AppRole (VMs and CI)

A role ID (like a username) and secret ID (like a password, deliverable once via CI OIDC or wrapped token) authenticate the pipeline or VM. Bind secret_id_ttl and token_ttl tightly; use cidr_list when callers have fixed egress IPs.

Kubernetes auth

Pods present their service account JWT; Vault validates it against the cluster API and maps namespace + service account to a policy. This is the standard pattern on Kubernetes: deploy Vault Agent Injector or the Secrets Store CSI driver to mount secrets as files without baking them into images. Helm charts from Helm releases reference annotations like vault.hashicorp.com/agent-inject: "true".

JWT / OIDC (humans)

Operators log in via Google or Okta; Vault maps group claims to policies. Restrict human tokens to break-glass paths — day-to-day app secrets should never flow through engineer laptops.

Policy design and least privilege

Policies are HCL files listing path patterns and capabilities. Follow least privilege by service: the orders API policy reads database/creds/orders-readonly and secret/data/harbor-fleet/stripe; it cannot reach secret/data/harbor-fleet/admin.

# orders-api.hcl
path "database/creds/orders-readonly" {
  capabilities = ["read"]
}
path "secret/data/harbor-fleet/stripe" {
  capabilities = ["read"]
}
path "sys/renew" {
  capabilities = ["update"]
}

Use templated policies with identity metadata when many services share a pattern. Test policies with vault token capabilities <token> <path> before production cutover. Deny-by-default: if a path is not listed, access fails.

Namespaces (Enterprise)

Open-source Vault is single-tenant. Enterprise namespaces isolate teams (e.g. harbor-fleet/ vs harbor-payments/) with delegated administration. Without Enterprise, run separate clusters or strict path prefixes per team.

Worked example: Harbor Fleet order service

Harbor Fleet runs a Node.js order API on Kubernetes. Requirements: PostgreSQL read/write creds, Stripe API key, and internal mTLS to the inventory service. Before Vault, values lived in K8s Secrets synced from a CI variable — rotation meant redeploying three Deployments.

Setup

  1. Enable database engine; configure postgresql://vault-admin@db.internal/harbor connection.
  2. Create roles orders-readwrite (CREATE USER with GRANT on orders schema) and TTL 3600 s.
  3. Store Stripe key at secret/data/harbor-fleet/stripe via CI with short-lived OIDC token.
  4. Enable PKI mount; issue orders.harbor-fleet.svc certs with 24 h TTL.
  5. Configure Kubernetes auth: role orders-api bound to namespace fleet, SA orders-api, policy orders-api.

Runtime

Vault Agent sidecar authenticates with the pod SA, renders /vault/secrets/db-creds.json and renews the database lease at 80% TTL. The app reads files on startup and watches for SIGHUP on renewal. Stripe key injects similarly. cert-manager fetches PKI certs; nginx sidecar terminates mTLS to inventory.

Outcome

Credential leak from a pod snapshot exposes at most 48 minutes of DB access (remaining TTL). Audit log ties each lease to fleet/orders-api identity. Stripe rotation updates one KV path; Agent reloads without image rebuild. On-call runbook shrinks from “rotate password in five repos” to “update KV version and verify Agent reload.”

Tooling decision table

Approach Best for Strengths Watch for
HashiCorp Vault Multi-cloud, dynamic DB/IAM, audit requirements Dynamic creds, rich engines, unified audit Operational complexity, unseal runbooks
Kubernetes Secrets Small clusters, non-sensitive config Native, simple Base64 not encryption; etcd encryption at rest only
Cloud SM (AWS/GCP) Single-cloud workloads Managed HA, IAM integration Vendor lock-in, limited dynamic DB on all engines
SOPS + Git GitOps static secrets Versioned, reviewable in PRs No dynamic leases; key management still hard
.env / CI variables Local dev, prototypes Fast No audit, broad access, rotation pain

Vault Agent, CSI driver, and sidecar patterns

Applications should not embed Vault SDK calls everywhere. Standard patterns:

  • Vault Agent — sidecar or systemd unit that authenticates, renders templates to disk, renews leases, and optionally executes reload hooks.
  • Secrets Store CSI driver — mounts secrets as volumes; provider talks to Vault; pods see files not env vars (safer against /proc leaks).
  • Init container + shared volume — lighter than perpetual sidecar when TTL exceeds pod lifetime and renewal is unnecessary.

Prefer files over environment variables for secret delivery — env vars appear in crash dumps and process listings. If you must use env, set them in the entrypoint from a file Agent renders, never in the Dockerfile.

Common pitfalls

  • Root token in production — revoke after bootstrap; use limited admin policies and OIDC for humans.
  • No auto-unseal plan — sealed cluster during region outage blocks all deploys; test unseal drills.
  • Overly broad policiespath "secret/*" { capabilities = ["read"] } negates segmentation.
  • Ignoring lease renewal — apps crash when DB passwords expire mid-request; implement renew or recycle pods before TTL.
  • Failed revocation — monitor revocation errors; orphaned dynamic users accumulate.
  • Audit device on slow disk — blocking audit sinks add latency; use buffered devices and monitor drop rates.
  • Storing Vault tokens in git — use wrapped tokens or CI OIDC with minutes-long TTL.
  • Vault as only backup of secrets — export break-glass procedures; loss of Raft quorum without backup is unrecoverable.

Production checklist

  • Deploy HA cluster (3+ nodes) with integrated Raft or supported storage backend.
  • Configure auto-unseal or document Shamir shard custody and quarterly drill.
  • Enable audit device(s) to immutable storage (S3 Object Lock, separate account).
  • Revoke root token after initial policy and auth setup.
  • Enable KV v2, database, and PKI only as needed; define roles with minimal SQL grants.
  • Create per-service policies; test with token capabilities before binding auth roles.
  • Integrate Kubernetes auth or AppRole for each workload; no long-lived shared secret IDs.
  • Deploy Vault Agent or CSI driver; deliver secrets as files with reload hooks.
  • Set sensible default and max TTLs; alert on renewal failures and revocation errors.
  • Backup Raft snapshots off-cluster; verify restore in staging.
  • Document break-glass: human OIDC login, emergency KV rotation, sealed-cluster recovery.

Key takeaways

  • Vault centralizes secrets with encryption, leases, and audit — replacing shared passwords in env vars and chat logs.
  • Dynamic database and cloud credentials shrink blast radius; static secrets belong in KV v2 with version history.
  • Authentication maps workloads (AppRole, Kubernetes JWT) and humans (OIDC) to least-privilege HCL policies.
  • Agent sidecars and CSI drivers keep application code ignorant of Vault API details while renewing leases automatically.
  • HA, unseal, audit, and backup are non-negotiable — a sealed or un-audited Vault is worse than no Vault.

Related reading