Exploting a long-standing git bug for my own amusement.
And I think there is one known race: the index mtime itself is not race-free.
â Linus Torvalds, Re:git bugs, 2008
A well-known race condition skulks through gitâs plumbing.
And I can demo it via a git magic trick đŞ1
$ tree -L 1 -a .
.
âââ file
âââ .git
$ cat file
okbye
$ git status
On branch main
Changes to be committed:
new file: file
$ git ls-files --modified # No output. A clean working directory.
$ git commit --message="The file sez $(cat file)"
$ git log --oneline HEAD -1
600fcac (HEAD -> main) The file sez okbye
$ git status
On branch main
nothing to commit, working tree clean
Nothing up my sleeves:
- The git repo had one staged file,
file
, containingokbye
ânothing else - I committed it
- And the commit message is
The file sez okbye
Now, the big reveal:
$ cat file
okbye
$ git show HEAD:file
hello
$ git status
nothing to commit, working tree clean
Boom. â¨Magicâ¨
Git is clueless that the wrong file is in your work tree.
Even git restore
has no effect:
$ git restore file
$ cat file
okbye
Git maintainers know this sleight-of-hand wellâdubbing it the âRacy Git Problemâ circa 2006.
But it can still generate heisenbugs in unexpected places.
What is the racy git problem?
The two biggest problems in computer science are:
- cache invalidation,
- naming things, and
- off-by-one
Racy git is a cache invalidation problem.
Git speeds operations by stowing two bits of data about each file in your work tree:
- The file size
- The last time the file was modifiedâits
mtime
So, if you tweak a file, without changing the mtime or the sizeâhow would git know?
Before 2006, git was oblivious.
Now, itâs wised upâif the file looks unchanged (via size and mtime),
then git performs another check. If file mtime >= the mtime of the
index file (.git/index
), then git rebuilds the index.
Thus, the core of my lame magic trick:
#!/usr/bin/env bash
echo hello > file
# Stage the file for commit
git update-index --add file
# Send .git/index's mtime INTO THE FUTURE!!!1!
touch --date='1 second' .git/index
# Now modify "file" behind git's back. So. Sneaky.
echo okbye > file
# CAVEAT OF DOOoooM: this has to happen within a single second to work---yeah. git's good. :D
How this happens in the real world
This problem can still happen in the real world.
I stumbled onto this while working with git fat
(a
forerunner of GitHubâs âGit Large File Storageââgit-lfs) in 2017.
At that time, we had a tool that pulled git code onto hundreds of servers. And every so often, a server would fail to fetch large files.
The basic steps were:
- ssh into 100s of servers in parallel
git clone <repo>
git fat pull
git fat pull
rsyncâd large binary files, mostly jars,
into the working directory.
Frustratingly, when this failed, I could jump on the server and rerun
git fat pull
, which worked every time.
This was the racy git problem in disguise.
See, git fat
uses git filters (just
like git-lfs). Filters are scripts that git runs automagically at
checkout or commit time.
But git is smartâit only triggers filters when it has toâwhen your working directory differs from its index.
So, when we ran git clone
within the same second as
git fat pull
, the files on disk were the same as in the
indexâso, git never triggered the filter.
Why did it work when I ran it manually? Because git fat
was smart, tooâit touch
âd files on disk, which invalidated
the index which caused git to run the filter command.
And so the racy git problem persists. It plagues every system that relies on the git filters2 to. this. day.
This will only seem like magic if you can distiguish normal git operations from magic. For nerds only.âŠď¸
This means almost all large file storage mechanisms in git. git annexâs âindirect modeâ eschewed gitâs smudge and clean filters. But I see both âdirectâ and âindirectâ mode are now deprecated. Iâm unclear how this works today
ÂŻ\_(ă)_/ÂŻ
.âŠď¸