Explainer · 7 June 2026

How Git version control internals work

Most developers use Git daily without knowing what lives under .git/. That is fine until a merge goes sideways, a rebase rewrites history someone else built on, or a CI pipeline deploys the wrong commit. Git is not magic — it is a content-addressable object store plus a handful of movable pointers. Every git add, commit, branch, and push is just creating, linking, or updating hashed objects on disk. Once you see the four object types and the directed acyclic graph (DAG) of commits, the porcelain commands stop feeling arbitrary.

Snapshots, not deltas

Older systems like CVS and Subversion stored differences between file versions. Git stores snapshots: each commit records the full tree of file contents at that moment. Unchanged files point to the same underlying blob object as before — deduplication happens automatically because identical content hashes to the same ID. Two commits that both contain README.md with identical bytes share one blob; only the tree and commit objects differ.

This design trades some disk space for simplicity and speed. Looking up "what did this file look like at commit X?" is a single tree walk, not replaying a chain of patches. Branching is cheap because a new branch is just a new pointer to an existing commit — no file copying required.

The four object types

Everything in .git/objects/ is one of four kinds, prefixed by type and identified by a SHA-1 (or SHA-256 in newer repos) hash of its contents. This is the same content-addressable idea used in blockchains and IPFS: the name is the hash of the data.

Blob — raw file bytes. A blob stores content only; no filename, no permissions. git hash-object -w myfile.txt writes one.
Tree — a directory listing. Each entry has a mode (file type + permissions), a name, and a pointer to a blob or subdirectory tree. The root tree of a commit is the project snapshot.
Commit — metadata plus a pointer to a root tree, zero or more parent commit hashes, author/committer name and timestamp, and the commit message. Parents form the DAG; the first commit has none, merges have two or more.
Tag — an annotated pointer to a commit (or other object) with a GPG signature, tagger, and message. Lightweight tags are just refs, not objects.

Run git cat-file -t <hash> to see the type and git cat-file -p <hash> to pretty-print contents. Plumbing commands like these are how Git itself is implemented.

The staging index

The index (also called the staging area) is a flat binary file listing paths, modes, blob hashes, and stat metadata for the next commit. git add does not copy files into a magical "staging folder" — it hashes the working-tree content into a blob (if new) and updates the index entry for that path. git commit builds a tree from the index, wraps it in a commit object, and moves the current branch ref forward.

git status compares three trees: HEAD (last commit), the index, and the working directory. "Changes staged" means index differs from HEAD; "Changes not staged" means working tree differs from index. Understanding this three-way split explains why you can stage part of a file with git add -p and why git checkout -- file restores from the index, not from HEAD.

Branches, HEAD, and refs

A branch is a file in .git/refs/heads/ containing a 40-character commit hash. main might point to a3f2b1c.... Creating feature/login copies that hash into a new ref — both branches now share history until one advances. HEAD usually points to a branch ref (a "symbolic ref"); detached HEAD means HEAD points directly at a commit hash instead.

Tags live under refs/tags/. Remote-tracking branches like origin/main sit in refs/remotes/ and update when you fetch — they record where the remote last was, not where your local main is. The reflog (.git/logs/) keeps a local journal of where refs used to point, which is how git reflog rescues "lost" commits after a bad rebase.

Merge, rebase, and the commit DAG

History is a directed acyclic graph, not a straight line. A fast-forward merge simply moves the branch pointer forward when your branch is a direct ancestor of the target. A true merge creates a new commit with two parents, combining two diverged lines of work. Git runs a three-way merge using the common ancestor, your changes, and their changes; conflicts appear when the same lines were edited differently.

Rebase replays your commits on top of another base: it creates new commit objects with new hashes (same patches, different parents). The old commits become unreachable until garbage collection. Rebase produces linear history but rewrites public commits — never rebase commits others have already pulled. Merge preserves exact history at the cost of merge commits. Teams pick a policy; the underlying mechanism is always pointer movement and new objects.

Remotes, fetch, and push

A remote is a nickname for another object database — usually on a server. git fetch downloads missing objects and updates remote-tracking refs; it does not touch your working tree or local branches. git pull is fetch plus merge (or rebase, if configured). git push sends objects the remote lacks and asks the server to move a branch ref forward — the server rejects the push if that would discard commits someone else already pushed (non-fast-forward).

The wire protocol is efficient: after negotiation, only missing objects transfer. Large repos use shallow clones (truncated history) or partial clones (lazy blob fetch) to reduce initial download size. For deployment pipelines, see how CI/CD pipelines pin builds to immutable commit SHAs so "deploy main" always means an exact, reproducible tree.

Packfiles, deltas, and garbage collection

Loose objects one file per hash waste inodes. git gc packs them into .git/objects/pack/ — single files containing many objects, often stored as deltas against similar objects for compression. So Git does use deltas internally; it just does not expose delta chains as the user-facing model. Unreachable objects (orphaned by rebase, amended commits, or deleted branches) are pruned after the reflog retention window expires.

The index and pack machinery mirror ideas from lossless compression and hash-based lookup: content addressing gives O(1) deduplication, while pack deltas shrink storage without changing the logical snapshot model.

Agent-built repos and operational discipline

When autonomous agents commit in parallel — as described in agent-first repository harnesses — Git's immutability and merge semantics become load-bearing. Each agent works on a branch; integration is merge or rebase into a shared mainline. Conflict markers in published HTML are a symptom of unresolved merge state, not a Git bug. Operational rules: never force-push shared branches, always fetch before merge, grep for conflict markers before deploy, and treat commit SHAs as the unit of audit and rollback.

Practical checklist

Think in objects (blob, tree, commit) and refs (branches, HEAD), not in mystic command names.
Stage intentionally — the index is the contract for the next commit.
Prefer merge on shared branches; reserve rebase for local cleanup before opening a PR.
Fetch often; understand that origin/main is not main.
Pin deployments and CI to full commit hashes, not floating branch names.
After history rewrite, use reflog before panicking — objects linger until GC.
Run git fsck if you suspect repository corruption; clones are cheap disaster recovery.