Explainer · 7 June 2026

How checksums and error detection codes work

Digital storage and transmission are not perfect. Cosmic rays flip bits in RAM, capacitors leak charge on SSDs, and noisy links garble packets. Before you trust a file, a database page, or a network frame, something has to answer a cheap question: did this blob of bits arrive exactly as sent? Error detection codes — parity bits, additive checksums, cyclic redundancy checks (CRC), and Hamming error-correcting codes (ECC) — are the first line of defense. They are fast, small, and everywhere. They are also widely misunderstood: a CRC detects accidental corruption; it does not prove authenticity. This explainer walks through how each family works, where you meet them in production systems, and what they cannot do.

The problem: silent corruption

Computers represent data as sequences of bits. Any step — copying memory, writing a disk block, transmitting over Wi-Fi — can flip a 0 to a 1 or drop a byte. Without a guard, corruption propagates silently: a bad TCP segment becomes a bad HTTP response; a flipped inode pointer destroys a filesystem; a wrong floating-point coefficient corrupts a scientific result. The goal of error detection is to make corruption audible: attach a compact redundancy tag computed from the payload. On read, recompute the tag; mismatch means reject and retry (or repair, if you have ECC).

Error detection trades overhead (extra bits and CPU) against coverage (probability of catching errors). A single parity bit catches any odd number of bit flips in a word but misses even numbers. A 32-bit CRC catches essentially all burst errors up to 32 bits and most random multi-bit errors with probability 1 − 2⁻³². Hamming codes go further: they can correct a single-bit error while still detecting double-bit errors. Choosing the right code depends on error model, latency budget, and whether hardware already provides ECC.

Parity: the simplest redundancy

Even parity adds one bit so the total number of 1-bits in a codeword is even. Odd parity does the opposite. RAID-5 stripes parity across disks so one failed drive can be rebuilt from the others — parity here is XOR across sectors, which is equivalent to even parity over the stripe. DIMM ECC memory extends the idea: extra bits per 64-bit word let the memory controller detect (and on server-grade ECC, correct) single-bit errors in the chip.

Parity is O(1) to update but weak alone. Two flipped bits in the same word leave parity unchanged. That is why production systems layer stronger codes on top: disk sectors carry CRCs in addition to ECC, network frames end with a 32-bit frame check sequence, and filesystems like ZFS store full-width checksums per block.

Additive checksums: fast but collision-prone

The classic Internet checksum used in IPv4, UDP, and TCP (with variations) treats the payload as a sequence of 16-bit words, sums them with carry wraparound into a 16-bit accumulator, and stores the one's complement of that sum. Verification adds the checksum field back into the sum; a valid packet yields 0xFFFF (or 0 in one's-complement arithmetic).

Additive checksums are cheap in software — a tight loop with few branches — which mattered when routers were slow. They are also weak: certain reorderings of 16-bit words produce the same sum, so some error patterns slip through. IPv4 kept the checksum for historical reasons; IPv6 dropped the header checksum entirely, relying on link-layer CRC and transport-layer integrity (TCP checksum or application-level checks). When you need stronger detection at line rate, CRC wins.

CRC: polynomial division over GF(2)

A cyclic redundancy check treats the bitstream as a polynomial with coefficients in GF(2) (binary field: addition is XOR). Fix a generator polynomial G(x) of degree k (for CRC-32, degree 32). Append k zero bits to the message polynomial M(x), divide by G(x), and take the remainder as the CRC tag. On receive, divide message-plus-CRC by the same polynomial; remainder zero means no detectable error under the CRC's error model.

Hardware implements this efficiently with a linear feedback shift register (LFSR): each incoming bit shifts through a register, XORing taps where the polynomial has a 1 coefficient. Software uses lookup tables or carry-less multiply instructions (e.g. Intel PCLMULQDQ) for multi-byte chunks. Common polynomials include:

  • CRC-32 (IEEE 802.3) — Ethernet frame check sequence (FCS), PNG chunks, ZIP archives, many file formats. Polynomial 0x04C11DB7 with reflected input/output.
  • CRC-32C (Castagnoli) — iSCSI, SCTP, ext4 and Btrfs metadata checksums; better Hamming distance for certain lengths than classic CRC-32.
  • CRC-16-CCITT — X.25, HDLC, Bluetooth payloads.

CRCs excel at burst errors — consecutive bit flips from timing glitches or magnetic dropouts — because remainder arithmetic spreads influence across the tag. They do not resist an adversary who can choose message bits to collide with a target CRC; for that you need a MAC or cryptographic hash with a secret key.

Hamming codes: detect and correct

Richard Hamming codes place parity bits at power-of-two positions (1, 2, 4, 8, …) so each parity bit covers a distinct subset of data bits. When you read the word, recompute syndromes — XOR masks that pinpoint which bit flipped. A (7,4) Hamming code encodes 4 data bits into 7 bits and corrects any single-bit error. Extended Hamming adds an overall parity bit for double-error detection.

Server DRAM uses SECDED (single error correct, double error detect) variants — often Hamming-based with extra bits for 64-bit words. SSD controllers apply stronger BCH or LDPC codes over flash pages where raw bit error rates rise with wear. The theme is the same: spend redundancy bits proportional to expected error rate so reads stay reliable without retransmitting from a remote host.

Where you meet these codes in production

Ethernet appends a 32-bit FCS (CRC-32) to every frame; the NIC validates before delivering to the kernel. Bad frames are dropped; TCP never sees them. TCP and UDP still carry a checksum (pseudo-header plus payload) so software stacks catch errors that slip past a buggy driver or on loopback paths without hardware offload.

Filesystems store checksums with data. ZFS uses Fletcher-4 or SHA-256 per block (checksum algorithm is a dataset property) and verifies on every read — catching silent corruption that ECC missed. ext4 and Btrfs optional metadata checksums use CRC-32C. Content-addressable systems (git objects, IPFS CIDs, Merkle trees) go further: the identifier is a cryptographic hash, giving detection plus identity — but at higher CPU cost than a line-rate CRC.

Compression formats (gzip, zlib) include Adler-32 or CRC-32 trailers so truncated or corrupted streams fail loudly instead of returning garbage bytes. Pairing compression with weak detection is dangerous: a single undetected bit flip can explode through a DEFLATE window; always verify after decompress.

Checksums are not hashes (and not MACs)

A cryptographic hash (SHA-256, BLAKE3) is designed for collision resistance against adversaries: finding two messages with the same hash should be computationally infeasible. A CRC is designed for random and burst errors on the wire — collisions are expected and easy to craft intentionally. Do not use CRC-32 as a password fingerprint or deduplication key in an untrusted environment.

A message authentication code (HMAC-SHA256, Poly1305) requires a secret key and proves both integrity and origin to someone who knows the key. TLS record protection uses AEAD ciphers (e.g. AES-GCM, ChaCha20-Poly1305) that combine encryption with authentication — stronger than any standalone checksum. Use the right tool: CRC for accidental corruption at line rate, hashes for content identity, MACs/AEAD for authenticated channels.

Common pitfalls

  • Checksum offloading surprises — NICs verify Ethernet FCS in hardware; capturing packets in Wireshark may show "bad FCS" because the tag was stripped before software sees it. TCP checksum offload can hide errors from userspace unless you validate end-to-end.
  • Endianness and polynomial reflection — CRC-32 implementations differ on bit order (reflected vs non-reflected). Mixing libraries produces valid-looking but incompatible tags.
  • Assuming detection means repair — detecting corruption without replicas or ECC leaves you with only "read failed." ZFS checksums plus mirrors enable self-healing; a lone ext4 volume without checksums may return bad data silently on consumer hardware.
  • Truncated streams — Adler-32 is weaker than CRC-32 for some error classes; always check length and magic bytes, not just the trailer.
  • Using CRC for security — an attacker can flip payload bits and recompute CRC in linear time. Never treat checksum match as proof of trust.

Practical checklist

  • Match the code to the error model: burst noise favors CRC; memory favors ECC.
  • Verify after decompress and after network reassembly, not only at one layer.
  • Enable filesystem checksums (ZFS, Btrfs, ext4 metadata) on data you cannot lose.
  • Use cryptographic hashes or Merkle trees when content identity must be tamper-evident.
  • Test CRC implementations against known vectors (e.g. "123456789" → CRC-32 0xCBF43926) before interoperating with hardware.
  • Monitor ecc_errors / machine check logs on servers — rising correctable counts predict DIMM failure.

Related on Solana Garden: TCP congestion control and reliable transport, Merkle trees and content-addressable storage, File systems, inodes, and journaling, Lossless compression algorithms, Explainers hub.