05 - Tyler Cipriani

But try to understand
Try to understand
Try try try to understand
Git’s a magic command.

– Heart 💕

Once upon a time, I believed git was storing diffs somewhere. But then I learned I was wrong.

It’s challenging to wield git’s clunky interface when you have a broken mental model of its internals. Learning more about what’s happening inside git transformed me into a more effective git user.

In this post, I’ll attempt to explain all the deep details of git diff to my past self.

📍 Git add makes blobs

We can add files to repos using git add. But behind the porcelain, git’s busy compressing and storing this file deep in its bowels. Git terms the results of this process a “blob.”

Git stores blobs (among other things) inside the .git/objects directory.

$ git init
Initialized empty Git repository in /tmp/bar/.git/
$ echo "Hi, I'm blob" > foo
$ git add foo
$ tree .git/objects/
.git/objects/
└── 26
  └── 45aab142ef6b135a700d037e75cd9f1f1c94dc

But what’s in a blob? And why is this blob stored as ./26/45aab142ef6b135a700d037e75cd9f1f1c94dc?

🗃️ Git stores things by their hash

Why did git add foo store the contents of foo as 2645aab142ef6b135a700d037e75cd9f1f1c94dc?

Git mapped our file to a number via a hash function.

A hash function maps data to a unique number (mostly)—whenever the data changes, the hash function’s output changes dramatically.

SHA1 is the hash function git uses by default. And when we git add foo git applies SHA1 to the contents of foo—Hi, I'm blob\n—and that spits out 2645aab142ef6b135a700d037e75cd9f1f1c94dc.

Blobs are all about content. The filename “foo” doesn’t matter at all! We could have named the file “🌈”—git still would have stored it in the same place. If the file contents are EXACTLY the same, then the hash will be exactly the same.

🌱 Git commit creates commits and trees

You already know git commit creates a commit, but what is a commit?

A commit is a type of object. Git uses the word “object” to mean: a commit, a folder or directory (tree), a file (blob), or a tag. Git stores objects in its object database—everything inside the .git/objects directory.

$ git commit -m 'Initial Commit'
[main (root-commit) 0644991] Initial Commit
1 file changed, 1 insertion(+)
create mode 100644 foo
$ tree .git/objects/
.git/objects/
├── 06
│   └── 449913ac0e43b73bfbd3141f5643a4db6d47f8
├── 26
│   └── 45aab142ef6b135a700d037e75cd9f1f1c94dc
└── 41
  └── 81320a57137264d436b2ef861c31f430256bf4

After our commit, the object database has three objects: 06449913, 2645aab1, and 4181320a.

So now we’ve established that one of these three objects is our blob (2645aab1)—let’s see if we can suss out the others.

✨ The magic command

The magic command to learn about any object is git cat-file -p. We can use that command to find out more about our mystery objects:

$ git cat-file -p 06449913ac0e43b73bfbd3141f5643a4db6d47f8
tree 4181320a57137264d436b2ef861c31f430256bf4
author Tyler Cipriani <tcipriani@wikimedia.org> 1652310544 -0600
committer Tyler Cipriani <tcipriani@wikimedia.org> 1652310544 -0600

Initial Commit

This object (06449913) appears to be our commit. A commit is metadata compressed and stored inside git’s object database.

Some of the metadata is obvious, but then there’s a tree. And that tree points to our other mystery object, 418132. Let’s see what we can learn about our last remaining mystery object using our magic command:

$ git cat-file -p 4181320a57137264d436b2ef861c31f430256bf4
100644 blob 2645aab142ef6b135a700d037e75cd9f1f1c94dc    foo

So a tree is an object that stores a directory listing of objects by their SHA1s. And a commit is an object that points at a tree by recording the tree’s SHA1!

Commits point to trees, and trees point to blobs and other trees. Neat!

📈 Git’s dependency graph

So if we graphed the state of dependencies in our object database, we’d get something like this:

Simple git repo’s object dependency graph

The commit incorporates our tree, which includes our blob—everything depends on our blob!

So if we change even a single bit inside a single file: git will notice—everything is entirely traceable from the commit down to the bit level. We get this for free by hashing objects and including those hashes in other objects.

This is the whole concept of a Merkle Directed Acyclic Graph (Merkle DAG)!

🍔 So, where’s the diff?

When we type git diff, git presents us a diff. We know there are blobs and trees and commits—so where’s the diff!?

Git doesn’t store diffs anywhere at all! It derives diffs from what’s stored in the object database.

$ echo "I'm ALSO blob" > baz
$ git add baz
$ git commit -m 'Add baz'
$ tree .git/objects/
.git/objects/
├── 06
│   └── 449913ac0e43b73bfbd3141f5643a4db6d47f8
├── 26
│   └── 45aab142ef6b135a700d037e75cd9f1f1c94dc
├── 41
│   └── 81320a57137264d436b2ef861c31f430256bf4
├── 95
│   └── 42599fac463c434456c0a16b13e346787f25da
├── 9b
│   └── 2716e4540c11e8d590e906dd8fa5a75904810a
└── e6
   └── 5a7344c46cebe61d052de6e30d33636e1cd0b4

We made a new commit, and now we have three new objects. We added a new file (blob), which made our directory different (tree), and we committed it (commit).

Our graph now looks like this:

Simple git repo’s updated object dependency graph

You might be surprised by a few things in the graph:

Our new commit stores its parent commit as metadata
Our new tree points to our old blob, and our NEW blob

So now what happens when we try git diff:

$ git diff 064499..e65a73
diff --git a/baz b/baz
new file mode 100644
index 0000000..9b2716e
--- /dev/null
+++ b/baz
@@ -0,0 +1 @@
+I'm ALSO blob

Git compares the two commits, finds their trees, sees a new blob in the second commit, and shows you the diff of /dev/null and baz.

No diffs. Just Merkle DAGs. And now you know.

Thanks to Joe Swanson for providing excellent early feedback on this post. And thanks to Kostah Harlan for reading an early draft of this post and making it less terrible. <3

Instead of lifting each sheet from the bottom and pulling it off of the pad in a vertical direction, lift each sheet from the side.

– post-it.com, Tips for creating Post-it® Super Sticky Note pixel art

💡 tl;dr: Always peel post-its from the side, not the bottom.

🌠 The More You Know

This is a public service announcement: you’re peeling your sticky notes all wrong.

But you’re not alone; I only learned how to properly peel a Post-It at an in-person offsite in 2015.

As our team worked, we noticed some of our Post-Its were curlier than others. And the curlier Post-Its tended to fall off the wall.

Our facilitator diagnosed our dispair, “If you peel post-its from the side, they don’t curl up.”

🧐 Why this works

3M scientists Spencer Silver and Art Fry gifted the Post-It to humanity.

The Post-It’s magic is its microsphere adhesive, discovered in 1968 by Silver; it affords Post-Its the singular ability to stick and unstick from paper without damaging it.

Post-It® at 400× magnification, The microsphere adhesive, top: “beautiful, bright, clear, crystalline spheres — like little glass balls,” – Spencer Silver

A Post-It’s easy unstickability is its biggest feature and its downfall.

When you yank a note from the bottom, it curls. That curling is just enough force to lift the note off the paper, the whiteboard, or whatever it’s stuck to.

So the next time you’re slingin’ stickies, just remember: one rule you must abide: always peel a Post-It® from the side.

←	May 2022					→
S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14 Explaining git diff to myself
15	16	17	18	19	20	21
22	23	24	25	26	27	28 On the Proper use of Post-Its®
29	30	31