Explainer · 7 June 2026
How IEEE 754 floating point works
Open a JavaScript console and type 0.1 + 0.2. The result is not
0.3 — it is 0.30000000000000004. That is not a bug in
your browser. It is the predictable consequence of how almost every modern CPU
stores real numbers: IEEE 754 binary floating point. A
float64 (JavaScript's only number type, Rust's f64,
Python's float) is not a decimal fraction. It is a compact encoding
of a binary fraction with a sign bit, a biased exponent, and a mantissa.
Most decimal literals you write cannot be represented exactly in that format, so
arithmetic happens on nearby approximations and rounding error accumulates. This
explainer walks through the bit layout, special values, precision limits, and the
engineering habits that keep money, science, and blockchains from silently
drifting.
Scientific notation in binary
Humans write large and small numbers in scientific notation:
6.022 × 10²³. IEEE 754 does the same in base 2:
1.10011… × 2exponent. A normalized
binary64 value packs three fields into 64 bits:
- Sign (1 bit) — zero for positive, one for negative.
Surprisingly,
+0.0and-0.0are distinct bit patterns but compare equal in most languages. - Exponent (11 bits) — stored with a bias of 1023 so unsigned comparison works. The true exponent ranges from about −1022 to +1023 for normal numbers.
- Mantissa / significand (52 bits) — the fractional part
after the implicit leading
1.(except for subnormals). Together with the implicit bit, you get roughly 53 bits of precision — about 15–17 significant decimal digits.
Binary32 (float in C, f32 in Rust)
uses 1 + 8 + 23 bits — only ~7 decimal digits of precision. GPUs, ML tensors,
and game physics often trade accuracy for bandwidth with half-precision
(float16) or bfloat16, which is even coarser.
The encoding is clever: multiply mantissa by two to the exponent power and you recover the value — entirely in hardware with fused multiply-add units on modern CPUs. The catch is that the set of representable values is sparse near large magnitudes: the gap between adjacent floats grows with the exponent, which is why adding a tiny epsilon to a billion can round back to a billion unchanged.
Why 0.1 is not exact
In decimal, 0.1 = 1/10 terminates cleanly. In binary, 1/10
is a repeating fraction — just like 1/3 = 0.333… in decimal. The
hardware stores the closest representable binary float, which is slightly above
the true tenth. Add two such approximations and the error shows up in the last
decimal places when you print in base 10.
This is the same class of problem as mishandling Unicode and UTF-8: the human-facing representation (decimal text, Unicode code points) is not what the machine stores (binary floats, variable-width bytes). Bugs appear at the boundary when you assume the abstraction is exact.
// JavaScript — all IEEE 754 binary64
0.1 + 0.2 === 0.3 // false
0.1 + 0.2 - 0.3 // 5.551115123125783e-17
Number.EPSILON // smallest diff between 1 and next float: ~2.22e-16
Never use === for floats in production logic. Compare with a
tolerance: Math.abs(a - b) < epsilon, or use an epsilon-aware
helper like Python's math.isclose. For sorting and hashing,
remember that tiny differences break key equality — our
hash tables explainer
calls out floating-point keys as fragile for exactly this reason.
Rounding modes and operation order
Every arithmetic result that does not land exactly on a representable value must
be rounded. IEEE 754 defines rounding modes; most languages
default to round to nearest, ties to even (banker's rounding).
That minimizes bias over millions of operations but means 2.5 might
round to 2 while 3.5 rounds to 4 — both
valid under the standard.
Operation order matters. Summing a large array of small numbers after one huge number loses the small contributions entirely (swamping). Classic fixes:
- Kahan summation — track a running compensation term for lost low-order bits.
- Pairwise / tree reduction — add in balanced pairs so magnitudes stay similar at each step.
- Fused multiply-add (FMA) — compute
a × b + cwith a single rounding at the end, improving matrix multiply accuracy.
Catastrophic cancellation is a separate trap: subtracting two nearly equal large numbers leaves only the noisy difference digits. Reformulate algebraically (e.g. the quadratic formula's stable variant) or use higher precision for the intermediate step.
Special values: NaN, infinity, subnormals
IEEE 754 reserves bit patterns beyond normal numbers:
- ±Infinity — exponent all ones, mantissa zero. Overflow of finite values, or explicit division by zero in languages that allow it.
- NaN (Not a Number) — exponent all ones, mantissa non-zero.
Results like
0/0orsqrt(-1). NaN poisons comparisons:NaN == NaNis false; useisNaNorObject.is(x, NaN). - Subnormal (denormal) numbers — exponent zero, implicit leading bit is 0. Extend range toward zero at the cost of gradually losing precision — important for gradual underflow in scientific code, slow on some CPUs.
Signaling vs quiet NaNs exist for legacy FPUs; most application code only sees quiet NaNs. When debugging GPU shaders or WASM, unexpected NaNs often trace back to uninitialized memory or a divide before a clamp.
Float32 vs float64 in practice
| Type | Bits | Decimal digits | Max finite (~) | Typical use |
|---|---|---|---|---|
| float16 / bfloat16 | 16 | ~3–4 | 6.5×10⁴ – 3.4×10³⁸ | ML inference, GPU bandwidth |
| binary32 | 32 | ~7 | 3.4×10³⁸ | Graphics, audio, embedded sensors |
| binary64 | 64 | ~15–17 | 1.8×10³⁰⁸ | General-purpose languages, science |
| float128 (rare) | 128 | ~34 | vast | Specialized numerics libraries |
Choose width based on error budget, not habit. A simulation that integrates positions over millions of timesteps in float32 will drift visibly; ML training often tolerates float16 with loss scaling. Profile before promoting everything to float64 — memory bandwidth is often the bottleneck.
Why money and blockchains avoid floats
Currency ledgers need exact arithmetic: 0.01 + 0.01
must equal 0.02 every time, for every user, on every continent.
Binary floats cannot guarantee that for decimal money. Production systems use:
- Integer minor units — store cents, lamports, or satoshis as integers; divide only for display.
- Decimal types — Python
Decimal, JavaBigDecimal, Rustrust_decimal— base-10 coefficients with explicit scale. - Fixed-point rationals — on-chain programs often use u64 amounts with a declared number of decimal places (e.g. USDC with 6 decimals).
DeFi pricing touches the same lesson: AMM formulas look continuous, but on-chain execution is integer math with rounding toward the pool or the trader. Our stablecoin peg explainer covers redemption economics; the implementation detail is that peg integrity depends on exact ledger entries, not approximate floats. Off-chain analytics may use doubles for charts, but settlement layers must not.
Alternatives when floats are not enough
- Arbitrary-precision floats — MPFR, mpmath — for reproducible science at configurable precision.
- Interval arithmetic — track upper and lower bounds per operation; certifies error envelopes in numerical libraries.
- Rational numbers — exact fractions where denominators stay small; symbolic math systems use them before floating approximations.
- Posits (optional IEEE rival) — tapered precision; niche adoption but interesting for some HPC workloads.
For most application developers the actionable split is simpler: floats for measurements, graphics, and statistics where tiny error is acceptable; integers or decimals for anything that must reconcile to zero.
Common pitfalls
- Equality tests on computed floats — use epsilon comparisons or decimal types.
- Serializing floats as JSON strings for IDs — different parsers can round-trip differently; prefer integers.
- Mixing float and decimal in one expression — Python promotes int to float silently; keep types explicit at money boundaries.
- Assuming associativity —
(a + b) + cmay differ froma + (b + c); parallel reductions change results. - Display rounding hiding drift — UI shows two decimals while internal state diverges; audit raw integer balances.
- Comparing across architectures — strict cross-platform reproducibility may require fixed libraries or integer-only paths.
Practical checklist
- Know your type width (f32 vs f64) and the precision it buys you.
- Never compare floats with exact equality in business logic.
- Store money and token amounts as integers with a fixed decimal scale.
- Use compensated summation or sorted accumulation for large float totals.
- Watch for NaN propagation after divides, logs, and square roots.
- Document whether APIs return decimals as strings, integers, or floats.
Related on Solana Garden: Hash tables explained, Unicode and UTF-8 explained, Stablecoin peg mechanics, Liquidity pools and AMMs guide, Explainers hub.