Explainer · 7 June 2026
How Git version control internals work
Most developers use Git daily without knowing what lives under
.git/. That is fine until a merge goes sideways, a rebase
rewrites history someone else built on, or a CI pipeline deploys the
wrong commit. Git is not magic — it is a content-addressable
object store plus a handful of movable pointers. Every
git add, commit, branch, and
push is just creating, linking, or updating hashed objects
on disk. Once you see the four object types and the directed acyclic
graph (DAG) of commits, the porcelain commands stop feeling arbitrary.
Snapshots, not deltas
Older systems like CVS and Subversion stored differences between
file versions. Git stores snapshots: each commit records
the full tree of file contents at that moment. Unchanged files point to
the same underlying blob object as before — deduplication happens
automatically because identical content hashes to the same ID. Two
commits that both contain README.md with identical bytes
share one blob; only the tree and commit objects differ.
This design trades some disk space for simplicity and speed. Looking up "what did this file look like at commit X?" is a single tree walk, not replaying a chain of patches. Branching is cheap because a new branch is just a new pointer to an existing commit — no file copying required.
The four object types
Everything in .git/objects/ is one of four kinds, prefixed
by type and identified by a SHA-1 (or SHA-256 in newer repos) hash of
its contents. This is the same
content-addressable
idea used in blockchains and IPFS: the name is the hash of
the data.
- Blob — raw file bytes. A blob stores content only;
no filename, no permissions.
git hash-object -w myfile.txtwrites one. - Tree — a directory listing. Each entry has a mode (file type + permissions), a name, and a pointer to a blob or subdirectory tree. The root tree of a commit is the project snapshot.
- Commit — metadata plus a pointer to a root tree, zero or more parent commit hashes, author/committer name and timestamp, and the commit message. Parents form the DAG; the first commit has none, merges have two or more.
- Tag — an annotated pointer to a commit (or other object) with a GPG signature, tagger, and message. Lightweight tags are just refs, not objects.
Run git cat-file -t <hash> to see the type and
git cat-file -p <hash> to pretty-print contents.
Plumbing commands like these are how Git itself is implemented.
The staging index
The index (also called the staging area) is a flat
binary file listing paths, modes, blob hashes, and stat metadata for
the next commit. git add does not copy files into a
magical "staging folder" — it hashes the working-tree content into a
blob (if new) and updates the index entry for that path.
git commit builds a tree from the index, wraps it in a
commit object, and moves the current branch ref forward.
git status compares three trees: HEAD (last commit), the
index, and the working directory. "Changes staged" means index differs
from HEAD; "Changes not staged" means working tree differs from index.
Understanding this three-way split explains why you can stage part of a
file with git add -p and why
git checkout -- file restores from the index, not from
HEAD.
Branches, HEAD, and refs
A branch is a file in .git/refs/heads/
containing a 40-character commit hash. main might point to
a3f2b1c.... Creating feature/login copies
that hash into a new ref — both branches now share history until one
advances. HEAD usually points to a branch ref (a
"symbolic ref"); detached HEAD means HEAD points directly at a commit
hash instead.
Tags live under refs/tags/. Remote-tracking branches like
origin/main sit in refs/remotes/ and update
when you fetch — they record where the remote last was,
not where your local main is. The reflog
(.git/logs/) keeps a local journal of where refs used to
point, which is how git reflog rescues "lost" commits after
a bad rebase.
Merge, rebase, and the commit DAG
History is a directed acyclic graph, not a straight line. A fast-forward merge simply moves the branch pointer forward when your branch is a direct ancestor of the target. A true merge creates a new commit with two parents, combining two diverged lines of work. Git runs a three-way merge using the common ancestor, your changes, and their changes; conflicts appear when the same lines were edited differently.
Rebase replays your commits on top of another base: it creates new commit objects with new hashes (same patches, different parents). The old commits become unreachable until garbage collection. Rebase produces linear history but rewrites public commits — never rebase commits others have already pulled. Merge preserves exact history at the cost of merge commits. Teams pick a policy; the underlying mechanism is always pointer movement and new objects.
Remotes, fetch, and push
A remote is a nickname for another object database —
usually on a server. git fetch downloads missing objects
and updates remote-tracking refs; it does not touch your working tree or
local branches. git pull is fetch plus merge (or rebase, if
configured). git push sends objects the remote lacks and
asks the server to move a branch ref forward — the server rejects the
push if that would discard commits someone else already pushed
(non-fast-forward).
The wire protocol is efficient: after negotiation, only missing objects transfer. Large repos use shallow clones (truncated history) or partial clones (lazy blob fetch) to reduce initial download size. For deployment pipelines, see how CI/CD pipelines pin builds to immutable commit SHAs so "deploy main" always means an exact, reproducible tree.
Packfiles, deltas, and garbage collection
Loose objects one file per hash waste inodes. git gc
packs them into .git/objects/pack/ — single files
containing many objects, often stored as deltas against similar objects
for compression. So Git does use deltas internally; it just
does not expose delta chains as the user-facing model. Unreachable
objects (orphaned by rebase, amended commits, or deleted branches)
are pruned after the reflog retention window expires.
The index and pack machinery mirror ideas from lossless compression and hash-based lookup: content addressing gives O(1) deduplication, while pack deltas shrink storage without changing the logical snapshot model.
Agent-built repos and operational discipline
When autonomous agents commit in parallel — as described in agent-first repository harnesses — Git's immutability and merge semantics become load-bearing. Each agent works on a branch; integration is merge or rebase into a shared mainline. Conflict markers in published HTML are a symptom of unresolved merge state, not a Git bug. Operational rules: never force-push shared branches, always fetch before merge, grep for conflict markers before deploy, and treat commit SHAs as the unit of audit and rollback.
Practical checklist
- Think in objects (blob, tree, commit) and refs (branches, HEAD), not in mystic command names.
- Stage intentionally — the index is the contract for the next commit.
- Prefer merge on shared branches; reserve rebase for local cleanup before opening a PR.
- Fetch often; understand that
origin/mainis notmain. - Pin deployments and CI to full commit hashes, not floating branch names.
- After history rewrite, use
reflogbefore panicking — objects linger until GC. - Run
git fsckif you suspect repository corruption; clones are cheap disaster recovery.
Related on Solana Garden: Merkle trees and content-addressable storage explained, CI/CD pipelines explained, Hash tables explained, Harness engineering for agent-first repos, Explainers hub.