But try to understand – Heart 💕 Once upon a time, I believed git was storing diffs somewhere. But
then I learned I was wrong. It’s challenging to wield git’s clunky interface when you have a
broken mental model of its internals. Learning more about what’s
happening inside git transformed me into a more effective git user. In this post, I’ll attempt to explain all the deep details of
We can add files to repos using Git stores blobs (among other things) inside the
But what’s in a blob? And why is this blob stored as
Why did Git mapped our file to a number via a hash function. A hash function maps data to a unique number (mostly)—whenever the data changes, the
hash function’s output changes dramatically. SHA1 is the hash
function git uses by default. And when we Blobs are all about content. The filename “foo” doesn’t matter at
all! We could have named the file “🌈”—git still would have stored it in
the same place. If the file contents are EXACTLY the same, then the hash
will be exactly the same. You already know A commit is a type of object. Git uses the word “object” to mean: a
commit, a folder or directory (tree), a file (blob), or a tag. Git
stores objects in its object database—everything inside the
After our commit, the object database has three objects:
So now we’ve established that one of these three objects is our blob
( The magic command to learn about any object is
This object ( Some of the metadata is obvious, but then there’s a tree. And that
tree points to our other mystery object, So a tree is an object that stores a directory listing of
objects by their SHA1s. And a commit is an object that
points at a tree by recording the tree’s SHA1! Commits point to trees, and trees point to blobs and other trees.
Neat! So if we graphed the state of dependencies in our object database,
we’d get something like this: The commit incorporates our tree, which includes our blob—everything
depends on our blob! So if we change even a single bit inside a single file: git will
notice—everything is entirely traceable from the commit down to the bit
level. We get this for free by hashing objects and including those
hashes in other objects. This is the whole concept of a Merkle Directed Acyclic
Graph (Merkle DAG)! When we type Git doesn’t store diffs anywhere at all! It derives
diffs from what’s stored in the object database. We made a new commit, and now we have three new objects. We added a
new file (blob), which made our directory different (tree), and we
committed it (commit). Our graph now looks like this: You might be surprised by a few things in the graph: So now what happens when we try git diff: Git compares the two commits, finds their trees, sees a new blob in
the second commit, and shows you the diff of No diffs. Just Merkle DAGs. And now you know. Thanks to Joe
Swanson for providing excellent early feedback on this post. And
thanks to Kostah Harlan for
reading an early draft of this post and making it less terrible.
<3 Instead of lifting each sheet from the bottom and pulling it off of
the pad in a vertical direction, lift each sheet from the side. – post-it.com, Tips
for creating Post-it® Super Sticky Note pixel art 💡 tl;dr: Always peel post-its from the side, not
the bottom. This is a public service announcement: you’re peeling your
sticky notes all wrong. But you’re not alone; I only learned how to properly peel a Post-It
at an in-person offsite in 2015. As our team worked, we noticed some of our Post-Its were curlier than
others. And the curlier Post-Its tended to fall off the wall. Our facilitator diagnosed our dispair, “If you peel post-its
from the side, they don’t curl up.” 3M scientists Spencer
Silver and Art Fry gifted the Post-It to humanity. The Post-It’s magic is its microsphere adhesive,
discovered in 1968 by Silver; it affords Post-Its the singular ability
to stick and unstick from paper without damaging it. A Post-It’s easy unstickability is its biggest feature and its
downfall. When you yank a note from the bottom, it curls. That curling is just
enough force to lift the note off the paper, the whiteboard, or whatever
it’s stuck to. So the next time you’re slingin’ stickies, just remember: one
rule you must abide: always peel a Post-It® from the side.
Try to understand
Try try try to
understand
Git’s a magic command.git diff
to my past self.📍 Git add makes blobs
git add
. But behind the
porcelain,
git’s busy compressing and storing this file deep in its bowels. Git
terms the results of this process a “blob.”.git/objects
directory.$ git init
Initialized empty Git repository in /tmp/bar/.git/
$ echo "Hi, I'm blob" > foo
$ git add foo
$ tree .git/objects/
.git/objects/
└── 26
└── 45aab142ef6b135a700d037e75cd9f1f1c94dc
./26/45aab142ef6b135a700d037e75cd9f1f1c94dc
?🗃️ Git stores things by their hash
git add foo
store the contents of
foo
as
2645aab142ef6b135a700d037e75cd9f1f1c94dc
?git add foo
git
applies SHA1 to the contents of
foo
—Hi, I'm blob\n
—and that spits out
2645aab142ef6b135a700d037e75cd9f1f1c94dc
.🌱 Git commit creates commits and trees
git commit
creates a commit, but what
is a commit?.git/objects
directory.$ git commit -m 'Initial Commit'
[main (root-commit) 0644991] Initial Commit
1 file changed, 1 insertion(+)
create mode 100644 foo
$ tree .git/objects/
.git/objects/
├── 06
│ └── 449913ac0e43b73bfbd3141f5643a4db6d47f8
├── 26
│ └── 45aab142ef6b135a700d037e75cd9f1f1c94dc
└── 41
└── 81320a57137264d436b2ef861c31f430256bf4
06449913
, 2645aab1
, and
4181320a
.2645aab1
)—let’s see if we can suss out the others.✨ The magic command
git cat-file -p
. We can use that command to find out more
about our mystery objects:$ git cat-file -p 06449913ac0e43b73bfbd3141f5643a4db6d47f8
tree 4181320a57137264d436b2ef861c31f430256bf4
author Tyler Cipriani <tcipriani@wikimedia.org> 1652310544 -0600
committer Tyler Cipriani <tcipriani@wikimedia.org> 1652310544 -0600
Initial Commit
06449913
) appears to be our commit. A
commit is metadata compressed and stored inside git’s object
database.418132
. Let’s see
what we can learn about our last remaining mystery object using our
magic command:$ git cat-file -p 4181320a57137264d436b2ef861c31f430256bf4
100644 blob 2645aab142ef6b135a700d037e75cd9f1f1c94dc foo
📈 Git’s dependency graph
🍔 So, where’s the diff?
git diff
, git presents us a diff. We know
there are blobs and trees and commits—so where’s the diff!?$ echo "I'm ALSO blob" > baz
$ git add baz
$ git commit -m 'Add baz'
$ tree .git/objects/
.git/objects/
├── 06
│ └── 449913ac0e43b73bfbd3141f5643a4db6d47f8
├── 26
│ └── 45aab142ef6b135a700d037e75cd9f1f1c94dc
├── 41
│ └── 81320a57137264d436b2ef861c31f430256bf4
├── 95
│ └── 42599fac463c434456c0a16b13e346787f25da
├── 9b
│ └── 2716e4540c11e8d590e906dd8fa5a75904810a
└── e6
└── 5a7344c46cebe61d052de6e30d33636e1cd0b4
$ git diff 064499..e65a73
diff --git a/baz b/baz
new file mode 100644
index 0000000..9b2716e
--- /dev/null
+++ b/baz
@@ -0,0 +1 @@
+I'm ALSO blob
/dev/null
and
baz
.
🌠 The More You Know
🧐 Why this works
Posted
Posted