To see posts by date, check out the archives

Reverse Polish Notation, Lambdas, and Currying in Python
Tyler Cipriani Posted

I just finished reading The Python Corner’s post “Lambdas and Functions in Python.” The post acts as an introduction to the use of functions as first-class objects in python. The demo code is the implementation of a “Reverse Polish Notation” calculator.

I had never heard of reverse polish notation(RPN) before this post. The short explanation of RPN is available on Wikipedia:

In reverse Polish notation, the operators follow their operands; for instance, to add 3 and 4, one would write 3 4 + rather than 3 + 4.

In that blog post RPN is implemented as a stack of operands and after an operator is pushed onto the stack, the compute() method is called which triggers the evaluation of the lambda specified by the operator. Like this:

rpn = rpn_engine()
rpn.push(2)
rpn.push(2)
print(rpn.compute('+'))  # Prints 4

The point of the post is to show a python dict using operands as keys with lambdas as values; this demonstrates that lambdas are functions and functions are first-class objects. This allows the compute method of the RPNEngine class to look up a lambda in a dict, and pop() off the stack using the signature function of the inspect module to determine how many arguments are needed for a particular lambda. From there, lambda evaluation is handed off to helper functions named, for instance, compute_operation_with_two_operands and compute_operation_with_one_operand

Currying

One other functional concept that could have helped the example code is that of currying. Currying involves changing a function with multiple arity into a series of evaluations of multiple functions each with an arity of 1.

This is a fancy way to say:

add = lambda x, y: x + y
add_curried = lambda x: lambda y: x + y

assert(add(2, 2) == add_curried(2)(2))

By turning the compute_operation_with_n_operands-type functions into curried functions, the code gets much cleaner. That is, instead of a switch like:

if number_of_operands == 2:
    self.compute_operation_with_two_operands(self.catalog[operation])

if number_of_operands == 1:
    self.compute_operation_with_one_operand(self.catalog[operation])

You can implement a curried function using a callable python object and do something like:

func = self.catalog[operation]

while not func.resolved:
    func(self.pop())

This gets rid of the clunky compute_operation_with_n_operands functions. Here is the full code for a solution using currying:

#!/usr/bin/env python3
"""
Engine class for RPN Calculator
"""

import math

from functools import partial
from inspect import signature


class Curry(object):
    """
    Curry a callable

    Given a callable, returns a an object that can be used like a curried
    callable.

    >>> c1 = Curry(lambda x, y: x + y)
    >>> c2 = Curry(lambda x, y: x + y)
    >>> c1(2, 2) == c2(2)(2)
    True

    :func: callable
    """
    def __init__(self, func):
        self.func = func
        self.argc = len(signature(self.func).parameters)
        self.resolved = False
        self.answer = None

    def __call__(self, *args):
        if len(args) == self.argc:
            self.answer = self.func(*args)
            self.resolved = True

        for arg in args:
            self.func = partial(self.func, arg)
            self.argc = len(signature(self.func).parameters)

        return self


class RPNEngine(object):
    """
    Reverse Polish Notation (RPN) Engine

    A RPN calculator
    >>> rpn = RPNEngine()
    >>> rpn.push(2)
    >>> rpn.push(2)
    >>> rpn.compute('+') == 4
    True
    >>> rpn.compute('AC')
    >>> rpn.push(2)
    >>> rpn.compute('^2') == 4
    True
    """
    def __init__(self):
        self.stack = []
        self.functions = self._get_functions()

    def _get_functions(self):
        return {
            '+': Curry(lambda x, y: x + y),
            '-': Curry(lambda x, y: x - y),
            '*': Curry(lambda x, y: x * y),
            '/': Curry(lambda x, y: x / y),
            '^2': Curry(lambda x: x * x),
            "SQRT": Curry(lambda x: math.sqrt(x)),
            "C": Curry(lambda: self.stack.pop()),
            "AC": Curry(lambda: self.stack.clear()),
        }

    def push(self, item):
        self.stack.append(item)

    def pop(self):
        try:
            return self.stack.pop()
        except IndexError:
            pass

    def compute(self, operation):
        func = self.functions.get(operation)

        if not func:
            raise BaseException('%s not a valid function' % operation)

        if len(self.stack) < func.argc:
            raise BaseException(
                '%s requires %d operands, %d given' % (
                    operation,
                    func.argc,
                    len(self.stack)
                )
            )

        if func.argc == 0:
            func()

        while not func.resolved:
            func(self.pop())

        return func.answer

Reading the final code in the Python Corner post made me me really itchy to implement the solution I posted here.

Bloop! Audible Error Logs
Tyler Cipriani Posted

As a frequent deployer of software, I’m usually watching several different flavors of system monitoring all at the same time. A few logstash dashboards in a few browser windows, a few grafana dashboards, a few different logs being tailed in a few different tmux windows.

If it all goes wrong during a deployment – if the error log suddenly explodes – I want to know. Even if I’m looking at another metric, or at IRC, or at my email – I want to know about log explosion immediately.

So I made a thing that plays a little noise whenever a new line is recieved on stdin. I call it Bloop!

And I’ve been using it to tail logs like:

ssh logsever -- tail -f /var/log/important/error.log | bloop -s

So now my stomach drops when I hear, “bloop…bloop…blobloopboopbloop”.

I’m sure this thing has been build a million times because it’s a braindead-simple idea, but I couldn’t find the right words to search for it so I made new one. Now all I need is an EKG monitor so I can see how much this bloop thing makes my heart rate match the overall error rate.

Using AWK like grep with envvars
Tyler Cipriani Posted

This is a little tip that was impossible for me to DuckDuckGo (that’s right, I use DuckDuckGo like a verb). Like everyone who lives the daily joy (and, frequently, deep frustration) of using the command line to accomplish day-to-day tasks I am a fairly deft manipulator of strings. Often, I turn to AWK to achieve powerful results quickly.

Using AWK like grep

There is usually no reason to use grep with AWK. AWK lets you use /[whatever]/ to search for lines containing [whatever]. You can also use /! [whatever]/ to search for the opposite of [whatever]. After you’ve found the line you want to act upon, you can manipulate it further using the powerful functionality of AWK.

Today, however, I wanted to use AWK to find the rotation of my current display using xrandr. The name of current display device was stored in the environment variable $DISPLAY_DEVICE (and in this instance was LVDS-1). I could not figure out, nor could I find the right words to search for, how to do this.

I tried:

xrandr -q | awk -v device="$DISPLAY_DEVICE" '/device/ {print}'

I tried

xrandr -q | awk '/'"$DISPLAY_DEVICE"'/ {print}'

Finally, I figured it out! This is something I thought I should document for others using the same search terms (and for myself in a month):

The Magic

awk -v device="$DISPLAY_DEVICE" '$0 ~ device {print}'

To get the rotation of a device from xrandr if you know the device name it’s:

xrandr -q | awk -v device='LVDS-1' '$0 ~ device {gsub("([()]| primary)", ""); print $4}'

<3 AWK so darn much.

2017 by the numbers
Tyler Cipriani Posted

These are some meaningless numbers.

I felt pretty powerless against the rising tide of horribleness that seemingly permeated every aspect of 2017 – that is not captured by these numbers.

Ssh Key Fingerprints, Identicons, and ASCII art
Tyler Cipriani Posted

Security at the expense of usability comes at the expense of security.

AviD

Public key authentication is confusing, even for “professionals”. Part of the confusion is that base64-encoded public keys and private keys are just huge globs of meaningless letters and numbers. Even the hashed fingerprints of these keys are just slightly smaller meaningless globs of letters and numbers.

It is a known fact in psychology that people are slow and unreliable at processing or memorizing meaningless strings

Hash Visualization: a New Technique to improve Real-World Security

Ensuring that two keys are the same means comparing key hashes— fingerprints. When using the md5 hash algorithm, comparing a key fingerprint means comparing 16, 16-bit numbers (and for the uninitiated that means blankly staring at 32 meaningless letters and numbers). In practice, that usually means not comparing 32 meaningless letters and numbers except when strictly necessary: Security at the expense of usability comes at the expense of security.

SSH Tricks

I am constantly troubleshooting ssh. I spend a lot of time looking at the authlog and comparing keys.

I’ve learned some fun tricks that I use constantly:

Get fingerprint from public key ssh-keygen(1)
ssh-keygen -l [-E md5] -f [public key]
Generate a public key given a private key ssh-keygen(1)
ssh-keygen -y -f [private key]
Automatically add server key to known_hosts file ssh-keyscan(1):
ssh-keyscan -H [hostname] >> ~/.ssh/known_hosts
List key fingerprints in ssh-agent ssh-agent(1)
ssh-add [-E md5] -l

When I get the message, Permission denied (publickey), I have a protocol.

  1. Find the fingerprint of the key being used by the authenticating host. This will either be in ssh-agent or I may have to use ssh-keygen -l -E md5 -f [publickey] on the authenticating host.
  2. Find the authorized_keys file on the target machine: grep 'AuthorizedKeysFile' /etc/ssh/sshd_config
  3. Compare the fingerprint of the public key from the authenticating host is among the fingerprints listed in the authorized_keys file.

Most ssh problems are caused by (SURPRISE!) the public key of the authenticating host not being present in the AuthorizedKeysFile on the target.

The Worm Bishop

Most of the time when I “compare fingerprints” of keys, I copy, I paste, and finally I use the time-honored global regular expression print command. This is insecure behavior for myriad reasons. The secure way to compare keys is by manually comparing fingerprints, but meaningless string comparison is hard, which makes security hard, and, so, most folks simply aren’t secure. Security at the expense of usability comes at the expense of security.

In the release announcement for OpenSSH 5.1 a different approach to comparing fingerprints was announced:

Visual fingerprinnt [sic] display is controlled by a new ssh_config(5) option “VisualHostKey”. The intent is to render SSH host keys in a visual form that is amenable to easy recall and rejection of changed host keys.

Announce: OpenSSH 5.1 released

The “VisualHostKey” is the source of the randomart ascii that you see if you add the -v flag to the ssh-keygen command from above:

$ ssh-keygen -lv -E md5 -f ~/.ssh/id_rsa.old.pub
2048 MD5:b2:c7:2a:77:84:3a:62:97:56:d0:95:49:63:fd:5d:2b tyler@tylercipriani.com (RSA)
+---[RSA 2048]----+
|       .++       |
|       .+..     .|
|     . .   . . ..|
|    . .     .E.. |
|     ...S     .  |
|      o+.        |
|     +..o        |
|  o B .o.        |
| . + +..         |
+------[MD5]------+

This is something like an identicon for your ssh key:

Identicons’ intended applications are in helping users recognize or distinguish textual information units in context of many.

– Don Park

Although it is important to note that while the intent of an identicon is to distinguish against many, the intent of the VisualHostKey is more ambiguous.

The work to add this randomart was based on the paper Hash Visualization: a New Technique to improve Real-World Security, and the algorithm that generates these bits of art is discussed in detail in the paper The drunken bishop: An analysis of the OpenSSH fingerprint visualization algorithm. Also, interestingly, while the above paper contains a reference to the apocryphal drunken bishop leaving stacks of coins in each square he’s visited, the code comments in OpenSSH refer to a “worm crawling over a discrete plane leaving a trace […] everywhere it goes”.

Regardless of whether the algorithm’s protagonist is a worm or a bishop the story is similar. There is a discrete plane (or an atrium), the protagonist is in the middle (and possibly drunk), and moves around the room leaving a trail (either because they are slimy or because they are…drunk? and dropping coins because they’re drunk? I guess.), the more times they visit a particular part of the plane/atrium, the slimier/more-coin-filled it becomes. The direction of movement is determined by the little-endian processing of each word in the md5 checksum, so each run should create the same unique randomart for the same unique checksum.

I wrote a simple python version that visualizes the algorithm step-by-step which may be a better explainer than any meaningless strings I can group together:

The Drunken Slime Bishop
The Drunken Slime Bishop

ASCII Art is meaningless characters

Confession time: I have never used randomart even when copying and pasting is impossible—I just compare strings. I have VisualHostKey yes in my ~/.ssh/config, but I almost never look at since OpenSSH warns me if a host-key has changed, so mostly it’s just taking up vertical space.

But why?

I think the reason I don’t use VisualHostKey to help distinguish between public-keys is that it failed to meet the regularity property of hash visualization:

Humans are good at identifying geometric objects (such as circles, rectangles, triangles, and lines), and shapes in general. We call images, which contain mostly recognizable shapes, regular images. If an image is not regular, i.e. does not contain identifiable objects or patterns, or is too chaotic (such as white noise), it is difficult for humans to compare or recall it.

Hash Visualization: a New Technique to improve Real-World Security

Randomized ASCII art that is composed of common letters and frequently used symbols is very nearly the same as a string. The constituent parts are the same. This is the first problem: while Unicode contains symbols that are more recognizable, ASCII contains a very limited set of characters. This is a property that makes ASCII as an art-medium so charming, but makes abstract ASCII-art hard to remember as it fails to form geometric patterns that are easily distinguished from noise.

Secondly, ASCII randomart lacks any color, which is a property mentioned for hash visualizations as well as one of the more distinguishing features of identicons:

humans are very good at identifying geometrical shapes, patterns, and colors, and they can compare two images efficiently

Hash Visualization: a New Technique to improve Real-World Security

Can I do better?

No. Probably not if I had to work under the constraint that these hash visualizations need to work everywhere OpenSSH works. OpenSSH is an amazing piece of software and they are solving Hard™ problems while folks like me write blogs about ASCII art.

Donate to them now…I’ll wait.

For the purposes of differentiation, under the constraint that it must work on all terminals, I like the current solution. My liking the solution didn’t, of course, stop me from fiddling with the existing solution.

Add color

The first stab I took at this was to add color. As an initial try I simply used the first 6 hex digits of the md5 checksum, converted those to rgb, and converted those to true-color ansi output:

def to_rgb(color):
    return int(color[:2], 16), int(color[2:4], 16), int(color[4:], 16)


def to_ansi_rgb(color):
    r, g, b = to_rgb(color)
    return '\x1b[38;2;{};{};{}m'.format(r, g, b)

This actually makes a huge difference in my ability to quickly differentiate between on key and another:

b2c72a77843a629756d0954963fd5d2b
+-----------------+
|       .++       |
|       .+..     .|
|     . .   . . ..|
|    . .     .E.. |
|     ...S     .  |
|      o+.        |
|     +..o        |
|  o B .o.        |
| . + +..         |
+-----------------+

vs

161bc25962da8fed6d2f59922fb642aa
+-----------------+
|      o .        |
|     = +         |
|    . = o        |
|       = +       |
|      . S  .     |
|       o..o .    |
|       o. o=     |
|      . ..=..    |
|    E.   o.+.    |
+-----------------+

Becomes purple-ish vs green, which is easy for me, but probably less easy for a sizable portion of the population. Further, while this solution works in terminals that support true-color output, I didn’t even take the time to make it fail gracefully to 8-bit color. Of course, with 8-bit color there are fewer means to differentiate via color. While color is visually distinctive to me, it is likely infeasible to implement, or unhelpful to a large portion of the population. Fair enough.

Different Charsets

Unicode encodes some interesting characters to represent slime nowadays. There is the block element set, the drawing set, and even a micellaneous symbols set. I’m on linux so obviously some of these sets just look like empty boxes to me. But in my slime-bishop repo I did add support for a few symbols.

Oddly, I don’t think the symbols add too much to help differentiation.

b2c72a77843a629756d0954963fd5d2b
+-----------------+
|       ░▓▓       |
|       ░▓░░     ░|
|     ░ ░   ░ ░ ░░|
|    ░ ░     ░E░░ |
|     ░░░S     ░  |
|      ▒▓░        |
|     ▓░░▒        |
|  ▒ ▆ ░▒░        |
| ░ ▓ ▓░░         |
+-----------------+

I think the problem of memory persists (sensible-chuckle.gif). We’re good at remembering regular pictures, but do abstract pictures count as regular?

Final thoughts

Visual hash algorithms are a novel concept, and may become more useful over time. At the very least the ability to quickly reject that two keys match at a glance is a step in the right direction. The drunken bishop is certainly a fun algorithm to try to improve (while paying absolutely no attention to reasonable constrains like “terminal background color” or “POSIX compatibility”). The drunken slimy worm bishop, with their coins and/or their slime trails is certainly useful for differentiation (just like identicons), but it is not useful for identification. I hope that the distinction between differentiation and identification is clear for users of OpenSSH, but I’m not entirely sure that it is.

Improving the usability of security tooling is as important as it’s ever been; however, the corollary of, “security at the expense of usability comes at the expense of security.” is: usability that creates a false sense of security is at the expense of security. And we must remain mindful that we are not providing usability that is meaningless or (worse) usability that widens the attack surface without providing any real security benefit. Part of that comes from a shared understanding of the available features in the tools we already collectively use. In this way we can build on the work of one another, each bringing a unique perspective, and ultimately, maybe (just maybe) create tools that are usable and secure.