09 - Tyler Cipriani

My significant other and I bought our very first home in July. We just finished moving. I learned a lot during this process. I’m writing it down while it’s still fresh in my mind.

Why not rent?

I’ve always rented and thought I would continue to rent forever. When I was still in high school I read Rich Dad, Poor Dad which repeatedly makes the point that your house is a liability. I have been, and still am, convinced by the arguments in that book.

While I was renting I liked that I wasn’t responsible the furnace or the fence or the stove – those were all someone else’s problem.

Except our fence was falling down; our stove was crappy and there was nothing we could do about it; and our furnace hadn’t even had a filter change in…a while.

We had the freedom to move, but not freedom to create our own space as we wanted. In the end the desire for our own space won out.

Buying a house in 2020

Sometime in 2018 we decided we wanted to buy instead of rent. At the time people were predicting an upcoming recession and I felt like the fed lowering the interest rate could only artificially inflate the economy for so long.

Summer 2020 seemed like a good target. Housing prices were pretty flat in 2019. By summer 2020, I predicted, there would be some slack in the local housing market and low interest rates for mortgages. In some ways, I guess, I was right, but I really could not have predicted much of anything about 2020 (I mean, murder hornets? What even?).

Our process

Back in January, Blazey and I sat down with a big stack of post-it notes and sharpies to try to determine what we wanted in a house. We each wrote down everything we could think of wanting in a house; one item per post-it. Once we wrote down everything, we took a step back and grouped all our post-its into “MUST”, “SHOULD”, and “nice to have”.

We ended up with a list of 25 things with 5 “MUST”s.

Must

Not in flood zone (this came from witnessing the 2013 flood)
Can cook macaroni on day one
3+ Bedroom / 2+ Bath (one office each, one bathroom each)
Private outdoor space
Walkability

The “macaroni” bit was one I came up with. The idea behind day-one macaroni is that, while I’m looking forward to being able to work on our shared space, I shouldn’t need to work on our shared space to move in.

We had increasingly esoteric needs father down the list. Things like, “Pre-World War II era/Craftsman/Bungalow” with “Access to NextLight™ municipal gigabit fiber internet service”.

As unlikely as it seems to find a 1910 craftsman house with municipal gigabit internet, we got it. In the end, we got 20 our of our 25 things. We sacrificed indoor space for location. Given that the pandemic has made location largely irrelevant, I’m hoping that was the right choice 😅

Moving

We closed on the 3rd of July and informed our landlords that August would be our last month – 2 months to move.

During that time we:

Sanded and painted the cabinetry in the kitchen and installed new pulls
Replaced the base of a few cabinets
Painted the dining room and living room
Removed the wallpaper in the library (yeah, there’s a library)
Skim coated a couple of walls that we have every intention of wallpapering Soon™

This left us with about 6 days to move from our ~3,000 square foot rental house to our new 1,300 square foot home. By dint of several truly heroic days of hauling and cleaning we managed to haphazardly move all of our crap from one place to the other and dispose of a good amount of the true crap in the process (How does one get rid of old computers? 🤔 I still have my laptop from college it seems…).

Things I learned during this move:

If you buy a new thing, get rid of the thing you are replacing. This is particularly easy to forget when you’re caught up in an ever expanding hobby. Selling the old thing should be part of your consideration about getting the new shiny thing.
Just because the box is fancy, doesn’t mean you have to keep the box. I had a whole cabinet full of boxes for fancy electronics: cameras, lenses, drones, laptops, networking equipment, weather station. What is wrong with me‽
Deadlines are good. Renting a u-haul is a good artificial deadline you can give yourself. I woke up the morning I rented the u-haul ready to haul things and spent 8 hours doing that. It was the most productive day of the whole move for me.
Just buy moving boxes. We tried to save our amazon boxen, but it wasn’t enough. I ended up buying a ton of boxes at Lowes. I should have bought those from the get go.
Even though the move is short, treat it like a long move. As a friend of Blazey’s said recently: you don’t forget your toothbrush if you’re moving cross-country. We were moving 5 blocks from our old place. The short relocation distance resulted in countless exchanges along the lines of, “Where’s the tape measure?” “Oh, I left it at the old place, I’ll be right back” – it turns out, those add up quickly.
Prioritize pre-move tasks. We had a long list of things we wanted to do before moving and I’m glad we did them before we moved in (we probably never would have painted the kitchen cabinets otherwise), but we grossly underestimated the amount of work needed. The end result was that we wasted a lot of time that we couldn’t make up later.

Final thoughts

Summer 2020 is either a genius time to buy a house or a really fucking stupid time to buy a house.

We took a lot of precautions. Managing the process thoughtfully during a pandemic was a key interview question for all of our buyer’s agents. Still.

The whole process was strange and fraught with problems I’d never thought I’d have. This also describes 2020 for me.

On the plus side, we locked in a very low interest rate – we’re paying 3% on a 30 year fixed mortgage. On the negative side, I have no idea if what I love about this neighborhood will survive the pandemic.

The neighborhood I want to live in is a neighborhood with a diversity of use – there are offices and restaurants and houses and stores and bars all within walking distance. There are parks. There are eyes on the street at all times of day and night. The presence of people makes this neighborhood strong. If main street collapses then demand to live near main street will collapse, and so will the neighborhoods near main street. Quoting Jane Jacobs, “When a city heart stagnates or disintegrates, a city as a social neighborhood of the whole begins to suffer.” I worry about this.

Also I worry about having signed a document agreeing to pay a staggering amount of money with the date “2050” on it 🙈

This is a cautionary tale about keeping git data in sync between two machines with rsync. There aren’t really a lot of pitfalls here, but we stumbled into one of them, and I’ve been meaning to write this up since.

tl;dr: to keep git repos in sync using rsync use the command:

rsync --archive --verbose --delete <dir1> <dir2>

Background

Almost a year ago we upgraded the hardware for our primary git host at work. We run our primary git server on bare metal in one of the Equinix data centers in Virginia and it was starting to show its age. Our git host was coming up on the end of its warranty, but – more importantly – we’d simply outgrown the hardware. We run Gerrit as our code review system and its hunger for heap led to more than one late night caused by java.lang.OutOfMemoryError. After spending more time than I probably should have tuning various GC parameters, I put in a request for new hardware.

The plan for the upgrade was pretty simple: Setup a new machine seeded with all of our git data and run it as a replica of the current machine until the switchover window. Prevent the new machine from writing to Gerrit’s database entirely. When the switchover window rolls around: take both machines offline, one final rsync of data, swap DNS records, allow database writes from the new machine, and bring the new machine online.

We finished up the migration at the end of my day and all seemed to go fine, we sent out the all clear and claimed victory. Over my night the European cohort began to see the first inklings of a problem: there were revisions and Gerrit comments missing on the new server! Patches that had been merged were showing up as unmerged! Day was night! Dogs and Cats were best friends! Chaos reigned.

Data integrity problems are alarming, but they are especially acute when the data that’s integrity is in doubt is the canonical source code to a gigantic open source project backing one of the most important free knowledge projects in existence. No pressure.

NoteDB and things to know

The first thing to know is that code reviews in Gerrit aren’t stored in a real database, but are stored instead in NoteDB – which is just a bunch of namespace conventions on top of git. In fact, as of today, the latest version of Gerrit stores nothing in the database and stores everything in git.

Everything being stored in git has some uhhh…I’ll say “interesting”…. side-effects. For example, users are stored in a git repo called All-Users.git and in our version of that repository there are >22,000 refs pointing to the blob ce7b81997cf51342dedaeccb071ce4ba3ed0cf52. Why tag a blob? What could be in that blob?

$ git show ce7b81997cf51342dedaeccb071ce4ba3ed0cf52
star

That’s right, there are 22,000 refs pointing to a single blob with the contents, star. Each ref is of the format refs/starred-changes/XX/YYYYXX/ZZZZ. This is how Gerrit stores starred changes

I don’t know if that’s normal or sane: there are no rules out here in git-is-your-database-now land.

All of the above background about NoteDB is to say that any knowledge you might have about how reviews might disappear from a database don’t hold in Gerrit. All the lovely persistence guarantees about RDBMS mean fuck all. This is a pop quiz about git knowledge.

How reviews are stored

OK, so Gerrit doesn’t use an RDBMS, so we’ll need to know how reviews are stored in order to understand how they might disappear.

Gerrit stores patchsets for review in refs. Gerrit uses the “changes” ref namespace for all changes. For example, the first revision for the first change for the repo “foo” would be stored in /srv/gerrit/git/foo.git under the ref refs/changes/01/0001/1. The next revision for the first change would be stored on refs/changes/01/0001/2. Any commentary about the first change is also stored in a special ref in the changes namespace in git in refs/changes/01/0001/meta.

How refs are stored

Git refs are stored in the refs directory inside a repository’s git directory. A Gerrit change stored in loose refs on disk might look like:

refs/changes
└── 01
    └── 0001
        ├── 1
        └── meta

Each file there points to a commit (or a tree or a blob, but in practice it’s usually a commit).

Periodically (i.e., whenever git runs a garbage collection cycle) that directory is emptied out and the info is shoved into a packed-refs file.

But what happens when there are both? When there is a refs/heads/foo and a packed-refs that references a refs/heads/foo? When you do git rev-parse which one “wins”? This is a common scenario and happens whenever you update a ref:

$ git init
Initialized empty Git repository in /home/thcipriani/tmp/git-pack/.git/
$ echo "foo" > README
$ git add . && git commit -m 'Initial commit'
[main (root-commit) 8c1ba31] Initial commit
 1 file changed, 1 insertion(+)
  create mode 100644 README
$ git update-ref refs/changes/1 HEAD
$ cat .git/refs/changes/1
  8c1ba312abe6b25948011d05e0ded8bc581b6bb0
$ echo 'bar' > README
$ git commit -a -m 'update'
  [main 93791e4] update
   1 file changed, 1 insertion(+), 1 deletion(-)
$ git gc
   Enumerating objects: 6, done.
   Counting objects: 100% (6/6), done.
   Delta compression using up to 4 threads
   Compressing objects: 100% (2/2), done.
   Writing objects: 100% (6/6), done.
   Total 6 (delta 0), reused 0 (delta 0), pack-reused 0
$ ls -lh .git/refs/changes/
total 0
$ git update-ref refs/changes/1 HEAD
$ cat .git/refs/changes/1
93791e4e3fbf39cd2d90d678eb2530ce03e5eaf4
$ cat .git/packed-refs
# pack-refs with: peeled fully-peeled sorted
8c1ba312abe6b25948011d05e0ded8bc581b6bb0 refs/changes/1
93791e4e3fbf39cd2d90d678eb2530ce03e5eaf4 refs/heads/main

The punchline

OK, so what happened to our changes? Trying to be cautious we used the rsync command:

rsync --archive --verbose <dir1> <dir2>

We purposely omitted --delete because objects in git are deterministic: who cares if they were packed? Why risk deleting things? We knew we didn’t lose any objects in the transfer. The problem was we didn’t lose any of the unpacked refs either. This meant that when we seeded the git directories on the new server a month before the maintenance window, some of these repositories had loose refs that were subsequently packed into packed-refs. Since the newer refs ended up in packed-refs while the older refs were on disk it made the Gerrit interface appear to be in an older state.

The moral of the story here is to never omit --delete from rsync if you’re trying to keep repos in sync.

←	Sep 2020					→
S	M	T	W	T	F	S
		1	2	3	4 Buying a house 🏠 in 2020	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22 Migrating git data with rsync	23	24	25	26
27	28	29	30