To see posts by date, check out the archives

Git Annex Metadata Filtered Views
Tyler Cipriani Posted

Git annex goes mindbogglingly deep.

I use git annex to manage my photos – I have tons of RAW photos in ~/Pictures that are backed-up to a few git annex special remotes.

Whenever I'm done editing a set of raw files I go through this song and dance:

And magically my raw files are available on s3 and my home NAS.

If I jump over to a different machine I can run:

Now that raw file is available on my new machine, I can open it in Darktable, I can do whatever I want to it: it's just a file.

This is a pretty powerful extension of git.

While I was reading the git annex internals page today I stumbled across an even more powerful feature: metadata. You can store and retrieve arbitrary metadata about any git annex file. For instance, if I wanted to store EXIF info for a particular file, I could do:

And I can drop that file and still retrieve the EXIF data

This is pretty neat, but it can also be achieved with git notes so it's nothing too spectacular.

But git annex metadata doesn't quite stop there.

Random background

My picture directory is laid out like this:

I have directories for each year, under those directories I create directories that are prefixed with the ISO 8601 import date for the photo, some memorable project name (like mom-birthday or rmnp-hike). Inside that directory I have 2 directories: raw and edit. Inside each one of those directories, I have photos that are named with ISO 8601 import date, project name, and 5-digit import number and a file extension – raw files go in raw and edited/finished files edit.

I got this system from Riley Brandt (I can't recommend the Open Source Photography Course enough – it's amazing!) and it's served me well. I can find stuff! But git annex really expands the possibilities of this system.

Fictional, real-world, totally real actually happening scenario

I go to Rocky Mountain National Park (RMNP) multiple times per year. I've taken a lot of photos there. If I take a trip there in October I will generally import those photos in October and create 2015/2015-10-05_RMNP-hike/{raw,edit}, and then if I go there again next March I'd create 2016/2016-03-21_RMNP-daytrip-with-blazey/{raw,edit}. So if I want to preview my RMNP edited photos from October I'd go:

But what happens if I want to see all the photos I've ever taken in RMNP? I could probably cook up some script to do this. Off the top of my head I could do something like find . -iname '*rmnp*' -type l, but that would undoubtedly miss some files from a project in RMNP that I didn't name with the string rmnp. Git annex gives me a different option: metadata tags.

Metadata Tags

Git annex supports a special type of short metadata – --tag. With --tag, you can tag individual files in your repo.

The WMF reading team offsite in 2016 was partially in RMNP, but I didn't name any photos RMNP because that wasn't the most memorable bit of information about those photos (reading-team-offsite seemed like a better project name) nor did RMNP represent all the photos from the offsite. I should tag a few of those photos rmnp with git annex:

Also, when my old roommate came to town we went to RMNP, but I tagged those photos cody-family-adventure-time. So let's tag a few of those rmnp, too:

Metadata views

Now the thing that was really surprising to me, you can filter the whole pictures directory based on a particular tag with git annex by using a metadata driven view.

I can even filter this view using other tags with git annex vfilter tag=whatever. And I can continue to edit, refine, and work with the photo files from there.

This feature absolutely blew my mind – I dropped what I was doing to write this – I'm trying to think of a good way to work it into my photo workflow :)

License: Creative Commons Attribution-ShareAlike License