To see posts by date, check out the archives

Of Git Commits, GitHub, and Gerrit
Tyler Cipriani Posted

Impassioned ranting about the format of commit messages1 often feels like cringe-inducing gatekeeping. Many times such rants seem to be written using language both bombastic and bellicose – the message is: fuck off if you don’t agree.

The fact that the authors of many of these rants tend to be respected in the software community is confounding to many. Defense of these rants is commonplace and, likewise, is swift, absolute, and equally, seemingly, meant as a giant middle-finger to the uninitiated.

Explaining new concepts to people unfamiliar with them is a good way of testing your understanding. Explaining new concepts repeatedly, in a seemingly unending cycle, is a good test of your patience. Neither lack of understanding or lack of patience excuse the bad behavior that is all-too-typical of the software hegemony.

This post is meant to explain when and why commit messages matter. Additionally, it explains a few thoughts about the Gerrit code review system.

Pull Request vs Commit

The common wisdom is that commit messages don’t matter on GitHub. When I collaborate on GitHub I commit often and my commit messages frequently contain “.gif”, ¯\_(ツ)_/¯, various curse words, and copious emoji. This is because the unit of change in GitHub is the pull request. My pull requests are thoughtful, and attempt to explain why I developed this particular patch, and try to provide means of testing for this change. When trying to bisect history in a repository developed on GitHub the merges of pull requests are the thing; i.e., Merge pull request #763 is meaningful. Pull request #763 probably has an explanation for what changed (even if your git log doesn’t).

The information contained in a pull request is useful, but is – by design – only accessible online with a web browser through github.com.

My current job uses Gerrit instead of GitHub. Gerrit doesn’t have pull requests, it has patches. The code review interface is arguably not as good as GitHub – I can’t set a unified diff view by default in my preferences, for instance. (edit 2019-03-21. unified diff is available in top-level preferences) Gitiles is no substitute for using GitHub as a repo browser – you can’t link to blocks of code, for instance. Gerrit/Gitiles URLs are hard to remember, unlike GitHub’s. There is little in the way of a “prescribed workflow”. You as a developer must decide how to split patches meaningfully, and how to do that in such a way that adheres with any shared agreements about particular branches (e.g., master must be deployable).

Despite its flaws, I really like Gerrit.

Gerrit’s use of git aligns well with git’s design. The git history that Gerrit produces is beautiful, available offline, distributed, and useable via the git command line interface (more usable than in the browser, unfortunately). This is partially because the unit of change in Gerrit is a commit. As a result, git log --oneline is totally readable and totally useful!

I prefer the repo produced by Gerrit to the repo produced by GitHub for reasons that relate to development, operations, and values.

Development

If I’m considering making a change in a repository, especially when this change is an obvious or simple one, I worry. I worry: why wasn’t this approach chosen in the first place? Is there a bug that is being worked around by using a non-obvious approach? Am I in an area of code that is used in many areas of the code base, or is it used very seldom? Is there test coverage for this function that I can use to ensure I am not creating a regression? A good commit message would answer all of these questions, and maybe more I haven’t thought of yet.

This information in Gerrit lives with the repository, in GitHub the information lives on github.com. It’s very important that the information exist somewhere.

I like the freedom to work on code when I’m on an airplane or somewhere else without WiFi. There may be software that can make this happen for GitHub. In Gerrit the unit-of-change is the Commit, so the commit has most information that I want. With the addition of the review-notes Gerrit plugin, code reviews live in a special git note namespace (/refs/notes/review) and are also available with the repository offline.

Gerrit makes no recommendations whatsoever about how you develop, and nothing in the patch interface aligns with your local view of your changes necessarily. I feel like this is confusing for people new to using Gerrit, but it is also, after initiation to the concept, not a bad way to develop.

Operations

An appreciable portion of my job is greping through git history in the shaky moments following an embarrassing production outage. This exercise has given me a deep appreciation for well formatted git commit messages. I need to know: what changed, who changed it (not just who merged it), why it was changed, and why a particular approach was chosen.

Good commit message information helps me determine what to do with a change: do I need to wake up the person who made this commit, or can I simply revert it? Is this change merely setting an unused variable, or is it a feature flag that will unleash new functionality?

Having all this information at my fingertips, rather than having to dig through the 188(!) different repositories that are composed to create a production deployment of MediaWiki and extensions for all 933 wikis that exist in Wikimedia’s production infrastructure is important – I already have too many browser tabs open without having to dig through GitHub for repository histories.

Values

This section speaks more to my feelings about Gerrit vs GitHub as projects more than Gerrit vs GitHub repositories. In the case of GitHub’s use of pull requests, I feel like the two are inextricably linked – GitHub has opted out of the open source implementation of git using proprietary software to implement this feature and the resultant repository is less usable as an artifact on its own because of this decision.

The essay Free software needs Free tools is probably a better summary of this topic than anything I could write here; however, the short version of this is: without the freedom to run, modify, study, and redistribute the software on which your project depends, your project is at the mercy of corporate caprice.

Corporations are beholden to shareholders, not to users. When a corporation pivots away from the customers that made it a success to serve a different market that it perceives as more valuable that is not an uncommon or remarkable event: that is the design of a corporation.

If you are working on a software project that’s important – if your software provides an essential service, or essential infrastructure that is meant to last many years (even beyond a single human lifetime) – you cannot afford to lose a part of your infrastructure in the event that a (possibly erroneous) business analysis has identified a more efficient profit-center for a business.

There are many counter-arguments to the ones I’ve made above; however, to summarize a few counter-arguments (possibly unfairly) they are:

  1. <Closed Source/Hosted provider>‘s core business is acting in their customers’ best interest, if its goal is to provide value to shareholders then to do so means being a better service for its existing customers. Our interests and the business’s interests are aligned.
  2. <Closed Source/Hosted provider> is based on an open standard, so the information is portable, if they become a bad actor, we can port information to a different solution.

To argument one: there are myriad examples of business decisions made at the expense of customers. This is particularly true once a business entity becomes a monopoly power as so many tech companies are at this moment. I think anyone would concede the example of large cable companies offering poor service to customers despite the fact that customers provide their revenue. Business interests and customer interests, even when they are currently aligned, may not always be.

To argument two: in the instance of GitHub above (and this is applicable to other services as well) they have proprietary features (i.e., “pull requests”) that make them incompatible with portability. In other instances, the compatibility with a portable solution is invalidated by the point of business caprice.

How I commit now

If commit messages are important (as I argue above), then it is a valuable exercise to evaluate the way in which you write your commit messages.

Earlier this year I came across Vicky Lai’s post “Git Commit Practices Your Future Self Will Thank You For” Which (for me anyway) highlighted the use of the git commit.template. Vicky provided an example in the post that I’ve been refining for myself.

I’ve followed git commit best practices234 for years, so initially the template wasn’t proving too useful. I started to think about what was missing from my commit messages, where I could improve formatting, and where I could save myself some time searching.

The first issue I identified is that I can’t remember the names of commit message fields like Signed-off-by or Requested-by. Also, my capitalization and ordering of those fields was all over the place. What are the best-practices for using these fields? I could never remember. I put all these fields in my template. Along with a link to the kernel patch submission guidelines for easy reference.

The next issue I had was that there are myriad schools of thought about commit message bodies. Bullet points vs Problem/Solution vs “answers the following questions”. The basic questions of “What is wrong with the code that this patch fixes?” sometimes hindered my ability to write a commit message that made sense. I wanted options. I wanted examples. I wanted links in case I felt like reading more. I added all this to my template.

Finally, I noticed that vim does some syntax highlighting in the commit screen. Specifically, lines that end with :. So I made sure that the only lines that ended with : were important sections.

I think I have a template I can live with for a while. It’s verbose. Probably too verbose, really. But my mind works in ways I don’t understand sometimes. To keep it on track, I need all the information it craves at my fingertips in a context-dependant way. I think this template is ideal for that.

Git Commit ZOMG!!1!

The commit.template below is in my dotfiles as .git-commit-zomg. I install the template using the command git config --global commit.template ~/.git-commit-zomg.

I named this template .git-commit-zomg because I have mixed feelings about commit messages. I think a lot about commit messages. I think they’re important. I evidently feel that they’re “rant worthy” in some context. I still, however, know people will decide the value of commit messages on their own. You can tell people the value you think commit messages will have, and they’ll maybe ackowledge your concerns are valid. Maybe they’ll even make changes in their process. But no one groks to fullness.

Someday production will be down. Rollback will have failed with an opaque error. Your mind will be screaming too loud for you to think clearly. You’ll frantically grep git log output for the error message – something, anything – and you’ll come face-to-face with a commit (probably authored by you) that reads, simply, ¯\_(ツ)_/¯. To paraphrase Jack Handy: when this moment comes, if you’re drinking milk, I bet it will make milk come out your nose.


# ^^ Subject: summary of your change:
# * 50 Characters is a soft limit (aim for < 80)
# * "If applied, this commit will..."
# * Use the imperative mood; e.g.,
#   (Change/Add/Fix/Remove/Update/Refactor/Document)
# * Do not end the subject with a period (full stop)
# * Optionally, prefix the subject with the relevant component
#   (the general area of code being modified)
#
# Example[0]
#
#     jquery.badge: Add ability to display the number zero
#
# [0]. <https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines#Subject>
#
# Leave this blank line below your subject:

# Body: Additional information about a commit:

# Think about these questions
#
# * Why should this change should be made?
#   What is wrong with the current code?
# * Why should it be changed in this way?
#   Are there other ways?
# * How can a reviewer confirm that your change works as intended?[0]
#
# * An alternative format maybe a problem/solution commit as used in
#   ZeroMQ[1]; e.g.
#
#       * Problem: Windows build script requires edit of VS version
#       * Solution: Use CMD.EXE environment variable to extract
#         DevStudio version number and build using it.
#
# [0]. <https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines#Body>
# [1]. <http://zeromq.org/docs:contributing#toc3>

# ---
#
# Bug number:
#
# Bug: TXXXXXX
#
# ---
#
# Gerrit specific:
#
# Change-Id: IXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# Depends-On: IXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#
# ---
#
# Sign your work:
#
# > The sign-off is a simple line at the end of the explanation for the
# > patch, which certifies that you wrote it or otherwise have the right to
# > pass it on as a open-source patch [0]
#
# Signed-off-by: Example User <user@example.com>
#
# [0]. <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=4e8a2372f9255a1464ef488ed925455f53fbdaa1>
#
# ---
#
# Other Nice Things:
#
# If you worked on a patch with others it's nice to credit them for
# their contributions; however, these tags should not be added without
# your collaborator's permission!

# Acked-by: Example User <user@example.com>
# Cc: Example User <user@example.com>
# Co-Authored-by: Example User <user@example.com>
# Requested-by: Example User <user@example.com>
# Reported-by: Example User <user@example.com>
# Reviewed-by: Example User <user@example.com>
# Suggested-by: Example User <user@example.com>
# Tested-by: Example User <user@example.com>
# Thanks: Example User <user@example.com>
#
# ---
#        _ _                                   _ _
#   __ _(_) |_    ___ ___  _ __ ___  _ __ ___ (_) |_   _______  _ __ ___   __ _
#  / _` | | __|  / __/ _ \| '_ ` _ \| '_ ` _ \| | __| |_  / _ \| '_ ` _ \ / _` |
# | (_| | | |_  | (_| (_) | | | | | | | | | | | | |_   / / (_) | | | | | | (_| |
#  \__, |_|\__|  \___\___/|_| |_| |_|_| |_| |_|_|\__| /___\___/|_| |_| |_|\__, |
#  |___/                                                                  |___/
#
# Save to `~/.git-commit-zomg` Then run:
#
#     git config --global commit.template ~/.git-commit-zomg
#
# The idea for this template came from Vicky Lai[0]
#
# [0]. <https://vickylai.com/verbose/git-commit-practices-your-future-self-will-thank-you-for/>

  1. https://github.com/torvalds/linux/pull/17#issuecomment-5654674

  2. https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html

  3. https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines

  4. https://juffalow.com/other/write-good-git-commit-message

CI Files in Their Own Namespace
Tyler Cipriani Posted

I recently had a random idea to fix a problem that evidently only bothers me. It has become de rigueur to litter every project with a random smattering of top-level dotfiles. I’m thinking of the .travis.yaml-type files that are regarded as a necessary evil to perform some kind of testing or deployment in a particular repo.

I have designed several systems that use this trope. Yes, I feel appropriately ashamed.

My problem is simple: dotfiles clutter my work tree. Given all of the myriad problems with the current state of software “engineering” this particular issue may seem a relatively minor one; however, if this issue is minor, a minor fix may well be all that’s needed.

The idea I had recently is to use a ref namespace in git for these files.

Something like:

$ git checkout --orphan CI
$ git rm -rf .
$ mkdir travis
$ cat > travis/config.yaml
language: ruby
rvm:
 - 2.2
 - jruby

$ git commit -a -m 'Add travis config for testing'
$ git push origin refs/heads/CI:refs/meta/ci

In GitHub, using the example above, refs/meta/ci doesn’t appear in the branch dropdown in the web UI, but the ref is stored on the server and accessible by people fetching the repo.

In CI (or any tooling hoping to use these configuration files) this namespace can be easily accessed via:

$ git fetch origin refs/meta/ci:refs/meta/ci
$ git show refs/meta/ci:travis/config.yaml
language: ruby
rvm:
 - 2.2
 - jruby

A whole refs/meta/ci namespace with folders for each service seems so much nicer to me.

There are a few downsides I can think of to this method. Firstly, it’s git-specific; however, I am sure the concept of a branch is not entirely foreign to most version control systems and it could be supported to some degree. In practical terms, supporting git seems to be good enough for most tools.

Another downside is opacity. That is, CI files are less discoverable using this method; however, if this style of configuration file were to become the de facto standard then much tooling could be written to surface interfaces for these files and their namespaces.

Halloween Nerd Projects 👻
Tyler Cipriani Posted

Last year I got 229 trick-or-treaters at my house.

Candy, yo!
Candy, yo!

The previous year it was 194. The year before that it was 176. The year before that I lived in a duplex that was roughly two blocks from my current house; that year I didn’t think to track how many trick-or-treaters showed up at my door – it was less than 10.

The only logical conclusion we can draw from the above data is that our current house is imbued with dark Halloween voodoo. We’ve made every attempt to decorate accordingly. The first year it was spider webs and a black light. For every subsequent year we’ve tried to out-do our previous selves.

It begins

In order to create an authentic movie-experience at home I own a projector (enjoy with flavacol for extra authenticity). In 2015 I discovered AtmosFearFX which is a company that produces content meant to be projected onto walls or out of windows or onto curtains. For our front-room I project a ghostly apparition onto a sheer curtain so that it roughly looks like it’s walking around inside. This has been a big hit with trick-or-treaters, so much so that I bought a(n extremely crappy) mini projector that I would never be able to use for movies, but – none-the-less – works great for projecting weird zombie hands groping for freedom on the window of our porch.

This was the same year I began incorporating microcontrollers to trigger effects. I built a project with a Particle core attached to a reed switch on my front door. The Particle board sent a web request to my hue light bulb on the porch (scream sound warning):

Halloween 2015: Controlling Hue bulbs with a Particle and Reed Switch

Now

Fast-forward a few years and my setup has become even more elaborate. Sometime around 2016 I began fooling around with pneumatic pop-ups. After a few DuckDuckGos I stumbled on FrightProps and subsequently began spending too much money there.

My big project this year is a pneumatic pop-up that uses an air-compressor. This is controlled by a motion-sensing Raspberry Pi.

The key component is the solenoid:

3-way solenoid with 1/4" ports
3-way solenoid with 1/4" ports

This controls the 30PSI (or so) coming from my air compressor to a pneumatic cylinder. The solenoid is controlled via 12V. This is more than my Raspberry Pi is capable of pushing out over GPIO. Fortunately, I have an Adafruit MotorHAT. The MotorHAT has its own 12V power supply and can be controlled by a Raspberry Pi. I also used a PIR sensor to detect motion and trigger my popup.

Raspberry Pi with MotorHAT
Raspberry Pi with MotorHAT

The pop-up dummy itself is made out of 1/2" PVC tubing, zip ties, a skeleton I bought at Home Depot, and love.

PVC and love
PVC and love

I control the whole contraption with some python code running on the Pi. I’ve wrapped the Adafruit MotorHAT code in (what I think is) a nicer interface for this project:

Which lets me test-drive the setup pretty easily via ipython:

>>> from scary import Pneumatic
>>> motor_output = 4
>>> popup = Pneumatic()
>>> popup.up()
>>> popup.is_up
True
>>> popup.down()
Halloween 2018: Controlling a pneumatic with python
Git Advice
Tyler Cipriani Posted

Git can be confusing. It’s more confusing fumbling with git under pressure. As a release engineer fumbling with git under pressure is a appreciable chunk of my job; as such, I’ve learned a trick or two.

Recently I added some general git advice to our shared wiki page for deployers. I am reproducing this advice here for posterity. The goal of this advice is to ensure that deployers are seeing all the information they need to make smart decisions about the current state of a git repository. There are times when this advice has allowed me to figure out a problem with a git repository simply by cd-ing into its worktree.

  1. Use a git-aware prompt. The git-prompt.sh script that is included in git’s contrib tree is my preferred prompt. There are instructions for use in comments at the beginning of the file. One simple way to use it (on Debian machines anyway) is to add the following to your shell initialization file:

  2. Set status.submoduleSummary. By default submodules have limited visibility in git status which makes it easy to miss a git submodule update step. After adding status.submoduleSummary to your ~/.gitconfig, git will show you a short summary of submodule changes in the output of git status. Set it by executing:

    you@computer:~$ git config --global status.submoduleSummary true
Reverse Polish Notation, Lambdas, and Currying in Python
Tyler Cipriani Posted

I just finished reading The Python Corner’s post “Lambdas and Functions in Python.” The post acts as an introduction to the use of functions as first-class objects in python. The demo code is the implementation of a “Reverse Polish Notation” calculator.

I had never heard of reverse polish notation(RPN) before this post. The short explanation of RPN is available on Wikipedia:

In reverse Polish notation, the operators follow their operands; for instance, to add 3 and 4, one would write 3 4 + rather than 3 + 4.

In that blog post RPN is implemented as a stack of operands and after an operator is pushed onto the stack, the compute() method is called which triggers the evaluation of the lambda specified by the operator. Like this:

The point of the post is to show a python dict using operands as keys with lambdas as values; this demonstrates that lambdas are functions and functions are first-class objects. This allows the compute method of the RPNEngine class to look up a lambda in a dict, and pop() off the stack using the signature function of the inspect module to determine how many arguments are needed for a particular lambda. From there, lambda evaluation is handed off to helper functions named, for instance, compute_operation_with_two_operands and compute_operation_with_one_operand

Currying

One other functional concept that could have helped the example code is that of currying. Currying involves changing a function with multiple arity into a series of evaluations of multiple functions each with an arity of 1.

This is a fancy way to say:

By turning the compute_operation_with_n_operands-type functions into curried functions, the code gets much cleaner. That is, instead of a switch like:

You can implement a curried function using a callable python object and do something like:

func = self.catalog[operation]

while not func.resolved:
    func(self.pop())

This gets rid of the clunky compute_operation_with_n_operands functions. Here is the full code for a solution using currying:

#!/usr/bin/env python3
"""
Engine class for RPN Calculator
"""

import math

from functools import partial
from inspect import signature


class Curry(object):
    """
    Curry a callable

    Given a callable, returns a an object that can be used like a curried
    callable.

    >>> c1 = Curry(lambda x, y: x + y)
    >>> c2 = Curry(lambda x, y: x + y)
    >>> c1(2, 2) == c2(2)(2)
    True

    :func: callable
    """
    def __init__(self, func):
        self.func = func
        self.argc = len(signature(self.func).parameters)
        self.resolved = False
        self.answer = None

    def __call__(self, *args):
        if len(args) == self.argc:
            self.answer = self.func(*args)
            self.resolved = True

        for arg in args:
            self.func = partial(self.func, arg)
            self.argc = len(signature(self.func).parameters)

        return self


class RPNEngine(object):
    """
    Reverse Polish Notation (RPN) Engine

    A RPN calculator
    >>> rpn = RPNEngine()
    >>> rpn.push(2)
    >>> rpn.push(2)
    >>> rpn.compute('+') == 4
    True
    >>> rpn.compute('AC')
    >>> rpn.push(2)
    >>> rpn.compute('^2') == 4
    True
    """
    def __init__(self):
        self.stack = []
        self.functions = self._get_functions()

    def _get_functions(self):
        return {
            '+': Curry(lambda x, y: x + y),
            '-': Curry(lambda x, y: x - y),
            '*': Curry(lambda x, y: x * y),
            '/': Curry(lambda x, y: x / y),
            '^2': Curry(lambda x: x * x),
            "SQRT": Curry(lambda x: math.sqrt(x)),
            "C": Curry(lambda: self.stack.pop()),
            "AC": Curry(lambda: self.stack.clear()),
        }

    def push(self, item):
        self.stack.append(item)

    def pop(self):
        try:
            return self.stack.pop()
        except IndexError:
            pass

    def compute(self, operation):
        func = self.functions.get(operation)

        if not func:
            raise BaseException('%s not a valid function' % operation)

        if len(self.stack) < func.argc:
            raise BaseException(
                '%s requires %d operands, %d given' % (
                    operation,
                    func.argc,
                    len(self.stack)
                )
            )

        if func.argc == 0:
            func()

        while not func.resolved:
            func(self.pop())

        return func.answer

Reading the final code in the Python Corner post made me me really itchy to implement the solution I posted here.