09 - Tyler Cipriani

Software staging clusters only grow.

As production accrues more services, staging’s costs ramp up.

And maintaining a single, massive, production-like staging may no longer be the right answer.

But several, small staging clusters—each fit for their purpose—offers a more maintainable, cheaper alternative.

🍝 There is no perfect staging; there are only perfect stagings

Each reason for having a staging cluster requires a different level of “production-likeness” to fit its use.

You could put Apache and PHP on raspberry pi and call it Wikipedia’s “staging.”

And that’d be a fine place to demo a MediaWiki patch. But to be confident deploying that patch into Wikipedia’s production: (to paraphrase “Jaws” 🦈): you’re gonna need a bigger staging .

In the 1970s, the pasta sauce brand Ragu tasked the psychophysicist Howard Moskowitz with finding the perfect pasta sauce.¹

Moskowitz concluded the perfect pasta sauce doesn’t exist—it depends on each individual’s wants and needs. There is no perfect pasta sauce; there are only perfect pasta sauces.

Howard’s work is why you’ll find Ragu Old World Style® next to Ragu Chunky Garden Vegetable next to 10s of other Ragus.

And much like pasta sauce²: there can be no perfect staging; only perfect stagings.

💱 Staging trades cost vs. production-likeness

The requirements for a staging cluster depend on its use.

Demos – Demoing new code requires the same software, maybe a few microservices, and (possibly) a subset of production data.
- 🟢 Low cost
- 🟢 Low production-likeness
Exploratory testing – QA requires the same software, services, and a subset of production data. It’d be nice to run it on the same type of infrastructure, too.
- 🟡 Medium cost
- 🟡 Medium production-likeness
Deployment confidence – 100% confidence requires a parallel universe you destroy whenever a deployment goes wrong.
- 🔴 High cost
- 🔴 High production-likeness

Staging trades costs for nearness to production

Staging is a trade-off: resources (money, people, time) against asymptotically approaching actual production.

The closer you get to production, the higher the costs and complexity.

🔁 Production should be reproducable

Setting up a staging server should be easy. If it is not easy, you already have a problem in your infrastructure, you just don’t know it yet

– Patrick McKenzie 🐉, Staging Servers, Source Control & Deploy Workflows, And Other Stuff Nobody Teaches You

Configuration management should make it easy to rebuild production from scratch. Otherwise, you’ve got a disaster in the offing.

This creates an environment suitable for demos and end-to-end test automation.

But organizations are evolving away from using pre-production staging to build deployment confidence.

They’ve replaced their high-cost staging with a mix of canary deployments and advanced feature flagging.

Small environments with narrow scope—like testing or demoing—seem like a reasonable trade-off of cost vs. benefit.

But using pre-production staging as insurance for your deployments—requiring snapshots of production data and maybe even replayed traffic—seems too. darn. expensive.

📚 Further reading

This is an anecdote related by Malcolm Gladwell in a 2007 Ted Talk, “Choice, Happiness, and Spaghetti Sauce ↩︎
There’s a joke here about “spaghetti code” that I’m too lazy to find.↩︎

github is a perfectly fine hosting site, and it does a number of other things well too, but merges is not one of those things.

– Linus Torvalds

Git possesses parts of a decent software forge.

When git was developed by the Linux kernel community, they already had bug tracking, documentation, and a mailing list, so (unlike fossil) git has none of those things.

Enter GitHub. It uses “issues” for bug tracking and discussion, and its code browser is unrivaled.

But for all of its features, GitHub implements only a subset of git. For instance, GitHub lacks the default merge strategy of git—the fast-forward merge.

And after some pondering I realized there’s a good reason for that: it’s a cop-out.

`git log` can be clean or accurate, not both

I want clean history, but that really means (a) clean and (b) history.

– Linus Torvalds

Git log will always suck for someone.

An eternal war rages between team “git log should be clean” vs. team “git log should have an accurate history.”

📚 Team History
- Method: git merge --no-ff
- 🟢 Pros: A complete history of how everything was developed
- 🔴 Cons: You’ve opened a pandoras box of strange git situations. And your git log looks like this now¹:
```
* (refs/heads/B)
* * Merge 'C' into 'B'
* |\
* | | * (refs/heads/C -- git revert B8)
* | | * Merge 'B' into 'C'
* | |/|
* | |/
* |/|
* * | B8
* | * C3
* |/
* * A (refs/heads/A)
```

Team history uses git merge --no-ff to ensure a merge commit is always created — Team history uses `git merge --no-ff` to ensure a merge commit is always created

✨ Team Clean
- Methods: git merge --ff-only or git rebase && git merge (extreme clean freaks add the --squash option)
- 🟢 Pros: Linear history, git log is easy to read, git revert requires no thought.
- 🔴 Cons: You’re erasing history—you can no longer tell if two commits were written together on a single feature branch.

Team clean uses git rebase && git merge --ff-only to make it appear there was never a feature branch at all — Team clean uses `git rebase` && `git merge --ff-only` to make it appear there was never a feature branch at all

Why is `git merge` bad?

git merge opted out.

If a branch can be fast-forwarded, git merge sticks the commits on the end of the branch and never tells you there was a merge—team clean.

But if a branch has conflicts, you’ll need to fix them and create a merge commit to say what you did—team history.

Sometimes there’s a merge commit; sometimes not: Madness.

What does GitHub do?

When you mash “merge” in GitHub it never executes plain git merge.

And sussing out what git command it will run is kafkaesque. I spent some time mapping all the checkboxes and merge strategies into something you could type into bash.

command	GitHub	Alignment
`git merge`	not implemented	`¯\_(ツ)_/¯`
`git merge --ff-only`	not implemented	✨ Team Clean
`git rebase && git merge --ff-only`	Rebase and Merge	✨ Team Clean
`git merge --no-ff`	Create a merge commit	📚 Team History
`git merge --squash --ff-only BRANCH`	Squash and merge	✨ Team Clean
`git merge --is-ancestor && git merge --no-ff`	Create a merge commit + Require linear history	✨ Team Clean²

There is a distinction between git rebase && git merge --ff-only and git merge --ff-only. Rebasing modifies the commit—you end up with a different SHA1.

By not using the “merge if necessary” strategy of git merge, GitHub forces you to choose a side in the eternal war. And that’s a good thing.

This is a contrived example of a “criss-cross merge”↩︎
In GitLab this is “Merge Commit with Semi-linear history” which seems like a nicer UI vs the buried option to “Require linear history”. This option mitigates some of the pain of an ugly git log.↩︎

←	Sep 2022					→
S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15 Staging is a trap	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30 GitHub's missing merge option