Guide

Cryptographic hashing explained

A cryptographic hash function takes input of any size and produces a fixed-length fingerprint — a string of bytes that looks random but is completely determined by the input. Change one character and the output changes unpredictably. Hashing is everywhere: password storage, file integrity checks, digital signatures, Bitcoin mining, and commit-reveal schemes in provably fair games. This guide explains what makes a hash function cryptographically secure, which algorithms to use for which job, and the mistakes that turn a strong primitive into a data breach waiting to happen.

What hashing is (and is not)

Hashing is one-way: given a hash output, you cannot feasibly recover the original input. That is different from encryption, which is reversible with a key. It is also different from encoding (Base64, hex) — encoding is for transport, not security. Anyone can decode Base64; nobody should be able to invert SHA-256.

A good cryptographic hash function has three core properties:

Deterministic. The same input always produces the same output.
Fast to compute. Hashing a gigabyte file should take seconds, not hours — for integrity checks. (Password hashing deliberately violates this; see below.)
Collision-resistant. It is computationally infeasible to find two different inputs that produce the same hash. Preimage resistance means you cannot reverse-engineer an input from its hash.

These properties make hashes ideal for verification: you store or publish the hash, and later hash the candidate data to see if it matches. You never need to store the original secret alongside the hash.

Common hash algorithms

SHA-256 and the SHA-2 family

SHA-256 outputs 256 bits (32 bytes), usually displayed as 64 hexadecimal characters. It powers Bitcoin's proof-of-work, TLS certificate fingerprints, Git commit IDs, and content-addressed storage. SHA-256 is fast on modern CPUs and GPUs — which is exactly why you must not use it alone for password storage. Attackers can compute billions of SHA-256 hashes per second on commodity hardware.

SHA-3 (Keccak)

SHA-3 uses a different internal design (sponge construction) from SHA-2. It is not a drop-in replacement for every SHA-256 use case, but it provides an independent security margin if SHA-2 were ever broken. Most web and blockchain stacks still default to SHA-256 for compatibility.

BLAKE2 and BLAKE3

BLAKE2 and BLAKE3 are modern alternatives optimized for speed and security. BLAKE3 parallelizes well across CPU cores — useful for hashing large files or building Merkle trees quickly. Neither is standard for password storage.

Legacy algorithms to avoid

MD5 and SHA-1 are broken for security purposes: practical collision attacks exist. Do not use them for signatures, certificates, or anything an adversary might target. Upgrading old systems still on SHA-1 (some Git repos, legacy APIs) should be a priority.

Password hashing: bcrypt, scrypt, and Argon2

Storing passwords requires a slow, memory-hard hash — an algorithm deliberately expensive so offline brute-force attacks cost real time and RAM. Never store plaintext passwords. Never store reversible encryption of passwords unless you have a compelling key-management story (you probably do not).

Salt: defeating rainbow tables

A salt is a unique random value per password, concatenated with the password before hashing. Without salts, two users with password password123 get identical hashes — attackers precompute rainbow tables of common passwords once and crack every match instantly. With per-user salts, each hash is unique even when passwords repeat. Salts are not secret; store them alongside the hash in your database.

bcrypt

bcrypt embeds the salt and a cost factor (work factor) in the output string. Increase the cost as hardware gets faster — cost 12 is a reasonable starting point in 2026, but benchmark on your hardware and target ~250–500 ms per hash on the server. bcrypt caps input at 72 bytes; long passphrases should be pre-hashed with SHA-256 before bcrypt in some libraries.

Argon2

Argon2id won the Password Hashing Competition and is the recommended choice for new systems. It exposes separate tuning knobs for time cost, memory cost, and parallelism — making GPU and ASIC attacks more expensive than bcrypt alone. Use Argon2id unless your language runtime lacks a vetted implementation (then bcrypt is still far better than SHA-256).

Pepper: an optional server secret

A pepper is a secret key stored outside the database (environment variable, HSM) and mixed into every password before hashing. If the database leaks but the pepper does not, offline cracking becomes harder. Peppers are not a substitute for proper slow hashing and per-user salts — they are defense in depth.

HMAC: hashing with a secret key

HMAC (Hash-based Message Authentication Code) combines a hash function with a secret key to produce a MAC — proof that a message was not tampered with and was created by someone who knows the key. HMAC-SHA256 secures webhook signatures, API request authentication, and cookie integrity tokens.

HMAC is not the same as simply hashing secret + message. The nested construction (defined in RFC 2104) closes length-extension attacks that plague naive compositions. When verifying HMACs, use a constant-time comparison function — early-exit string equality leaks timing information that helps attackers forge signatures byte by byte.

For new designs needing authenticated encryption (confidentiality plus integrity), prefer AES-GCM or ChaCha20-Poly1305 over rolling your own encrypt-then-HMAC scheme — but HMAC remains the right tool for signing payloads you do not need to encrypt.

Merkle trees and blockchain integrity

A Merkle tree hashes leaf nodes (transactions, file chunks), then hashes pairs of child hashes upward until a single Merkle root summarizes the entire dataset. Change any leaf and the root changes — a compact integrity proof. Bitcoin blocks include a Merkle root of transactions; Ethereum uses similar structures in its state trie.

Merkle proofs let light clients verify a single transaction is included in a block without downloading every transaction — they need only a logarithmic path of sibling hashes from leaf to root. The same pattern appears in Git (commit trees), IPFS (content addressing), and certificate transparency logs.

On Solana, transaction signatures and account data integrity rely on different primitives (Ed25519 signatures, not Merkle mining), but hash functions still underpin address derivation, program IDs, and commit-reveal fairness proofs where a server publishes hash(server_seed) before revealing the seed.

Hashes in web security and APIs

Content Security Policy nonces

CSP can allow inline scripts only when they carry a nonce matching a per-request random value. That nonce is often generated from a cryptographic random source and sometimes hashed for cache keys.

JWT and signing

JWTs are signed, not encrypted. HS256 signs the header and payload with HMAC-SHA256; RS256 uses RSA signatures over a hash of the content. The security of the token depends on the signing algorithm and key strength — never accept alg: none or allow algorithm downgrade attacks.

Subresource Integrity (SRI)

Browsers can verify CDN-hosted scripts using integrity="sha384-..." attributes. The hash in the HTML must match the downloaded file; a compromised CDN cannot silently inject malicious code without breaking the hash check.

API idempotency keys

Hashing request payloads to derive idempotency keys is common in payment APIs. Use a canonical serialization (sorted JSON keys) before hashing so equivalent requests produce identical keys.

Provably fair commit-reveal

In commit-reveal fairness schemes, the server generates a secret seed, publishes commitment = hash(seed) before the player acts, then reveals the seed after the outcome. The player verifies hash(revealed_seed) == commitment to confirm the server did not change the seed after seeing the bet. The hash function must be collision-resistant — if the server could find two seeds with the same hash, it could commit to one and reveal another.

Use SHA-256 or SHA-3 for commitments; include the player's client seed in the hashed material so the server cannot precompute favorable outcomes. Document the exact concatenation order and encoding (hex vs base64) so third parties can reproduce verification independently.

Common mistakes

SHA-256 for passwords. Too fast — attackers brute-force billions of guesses per second. Use Argon2id or bcrypt.
No per-user salt. Identical hashes for identical passwords enable rainbow-table attacks across your entire user base.
MD5 or SHA-1 for security. Collision attacks are practical; migrate legacy uses immediately.
Timing leaks in HMAC verification. Use constant-time compare; never === on attacker-controlled MACs in hot paths.
Hashing for secrecy. Hashes do not encrypt. Publishing hash(password) still lets attackers offline-crack weak passwords.
Assuming hash uniqueness. Collisions are astronomically rare for SHA-256 but guaranteed in theory — do not use hashes as primary database keys without understanding birthday-bound risks at your scale.

Key takeaways

Cryptographic hashes are one-way fingerprints — deterministic, fast (for integrity), and collision-resistant.
SHA-256 suits integrity checks, blockchains, and commitments; Argon2id/bcrypt suit passwords with per-user salts.
HMAC authenticates messages with a secret key — use constant-time verification.
Merkle trees compress integrity proofs for large datasets — foundational to Bitcoin and content-addressed systems.
Match the algorithm to the threat model: speed for file checksums, slowness for password defense, signatures for tamper-evident tokens.