Maybe Not: notes
Thoughts on the shape and optionality of data.
After a recent rewatch of the talk Maybe Not by Rich Hickey I decided to share some of its messages.
I will lift some “quotes” from the talk and add my take. I also briefly discuss how a case I encountered on the workfloor matches what Hickey presents.
The main point of the talk is that the shape of data and the provision of data are different concepts. “A name is a string” is a seperate idea from “A name is required”.
“If type systems make you jam those two things together, they’re wrong”, reffering to types systems with Maybe types (sometimes called Either types), hence the title of the talk.
He also phrases this sentiment in a way that can be more easily used as general wisdom: “Like all design things, this is just what was wrong: two things were combined that shouldn’t have been combined."
Despite its title, the talk isn’t about the Maybe type being bad, although some good-natured shots are fired. Hickey uses a particular way in which Maybe is used to illustrate a point. Rather, the main topic of the talk is the friction caused by combining shape and provision and possible solutions.
We end with two more minor topics that Hickey mentions: ‘openness’ of specification and that types give useful information but not all information you want.
An important piece of context is that Hickey talks about software of a particular nature:
“Part of the idea of [my solution] is that you can make systems that you can change - that you can enhance - over time. That is the game. Saying today that you can do X or Y, that is not enough. Every program changes. Every program grows. You need the ability to talk about type-like things in ways that are compatible with program evolution."
In my experience this is very applicable to web development. I worked as a server developer for a start-up, a scale-up and an established coorporation (alas, all different companies - it would have been a nice story arc). In all three environments new and not entirely compatible requirements were commonplace.
I can imagine the degree of evolution is noticably less for OS related or embedded software.
Let’s say we’re building a web server that deals with users and shipping goods to them. We’re going to look at two use-cases out of many more. Here’s what they need to know about a user:
- requires name and email; address can be provided but is not required.
- requires only name and adress
Typically (combining shape and optionality), there are two choices. We can either make one specification that accomodates all use-cases, or a specification for each use-case.
One spec to rule them all:
or one for each use-case:
In the first case, we have code reuse and can see that both
ship want to talk about a
User. However, there is friction.
ship requires an address, but for
create_user address is optional. To accomodate
User description is no longer a nice description of the data that
In the second case both
ship have fitting descriptions of the data they need. However, we have some duplication (users have names) which in larger descriptions becomes error prone. Also, the communication that both functionalities share a common interest,
Users, is much weaker.
“Not being able to reuse, it’s a recipe for error. If you have to define car and I have to define car, well, maybe you’ll call it ‘make’ and ‘model’ and I call it ‘brand’ and ‘model’. An now we’ve got not connection where we absolutely should have had a connection."
The problem is that the shape of data is general, but which parts of the data are required changes per use-case. “Optionality is context-dependant." In both examples shape and optionality are defined together, causing friction.
If we could somehow generally describe a
User and per case put a ‘requirement mask’ over it, it could look something like this:
Now it is clear that both functionalities talk about the same type of data,
Users, while also specifying exactly what they need.
The duplication is also gone. Sure, name is repeated a couple of times, but we can no longer make mismatching mistakes. Let’s say we want to change
full_name. All instances of
name have to be replaced with
full_name at the same time, or we would get an error.
To actually implement this seperation of shape and provision you might have to get a little creative, depending on the language you’re using.
Here’s an example that’s not directly related to programming.
I once worked for a big player in the medical domain. Besides engineering, I was part of a team that specialized in modeling medical data using the FHIR industry standard. The second major purpose of this team was to represent our company in the ongoing development of FHIR, a collaboration of many organizations from all over the world.
FHIR offers an extensive collection of ‘resources’, the building blocks to model with. MedicationRequest is such a resource. Among other things, what is specified are a resource’s fields, including their types and cardinalities.
One of MedicationRequest’s fields is
doNotPerform of type boolean. Its cardinality is
0..1, zero or one, meaning optional. Any cardinality like like
0..* specifies optionality. Shape and optionality combined.
Different organizations contribute to refining these resources by sharing and discussing use-cases for them. Fields, types and cardinalities are adjusted to meet the needs.
Medicine is practised differently in different parts of the world. The only way to have a resource accomodate all (or at least many) use-cases is to weaken restrictions, for examply by making things optional.
After a while, for certain use-cases resources were found to not be specific enough. For the particular case of requesting medication in the United States the spec for the MedicationRequest resource had to be more constrained to be of proper use.
To overcome this friction FHIR introduced the notion of a profile:
a set of constraints on a resource. US Core is an example of a profile for MedicationRequest.
Resources are similar to the
User: name, email, address (optional) of before. The specification, including both shape and optionality was made to accomodate so many different use-cases that for particular use-cases the spec is no longer specific enough.
A profile is similar to the ‘requirement mask’ of before. Making shape and optionality more orthogonal decreased friction.
Hickey mentions that a specification should only dictate what should be there, not what should not be there:
“I have an expectation of data I want to see. You could give me more, but I won’t care. Maybe my job is to pass things along to somebody else who might need it. I think that’s an important part of making flexible systems."
“These are minimal requirements. I am not going to help you write closed, brittle, breaking systems. This is about minimal requirements, minimal provision. This is not a boundary around things."
This aligns well with what is part of the Unix philosophy, as Doug McIlroy wrote in 1978: “Expect the output of every program to become the input to another, as yet unknown, program."
Being open might be more strenuous than you think if you work with classes. Inheritance has its limits.
Strictly using classes, I can never just pass some information to a third-party library to enrich it for me. I have to first mold my data into a class that the library understands, because for sure it won’t be able to handle my proprietary classes. And afterwards I have to extract data from whatever class the library returned and mold it back into my proprietary classes.
If you work with open maps, you can just throw a map into the library, which will add some data to it and pass it right back to you seamlessly. Indeed, at the cost of having to check if everything is still OK at runtime.
To stay in the spirit of passing output to the next program, let’s look at another example of friction from combining shape and optionality. Hickey mentions this is also common in an information pipeline, a sequence of processes that build towards a final result.
Let’s say the final result consists of an
b and a
result: a, b, c
We start with nothing and three steps each add one element.
c can be obtained independently so the order of the steps doesn’t really matter. What does each step require and what does each step provide?
If we do one spec for all, we got something like this:
Now the spec for
result has become pretty useless. Nothing is guaranteed to be there, but at the end we want everything to be there.
We can also do one spec per step:
Although originally the steps were independent of eachother, because of what each step is specified to require and provide are the steps now definitely stuck in this particular order. Incidental complexity, yikes!
Types are not enough
Hickey mentions something briefly, which I’m taking as an opporunity to promote testing. He says that although types provide information, they don’t always provide all the information you want. You need to do more.
As a specific example uses the function
reverse. Its signature may indicate that
reverse takes a list of As and returns a list of As. Alright, but what does
reverse actually do? What are its semantics? The types alone aren’t enough to convey this.
Sometimes you can model semantics with types to help your compiler check for errors. For example, instead of using a primitive number type for temperature measurements, introduce seperate Celcius and Fahrenheit types to prevent laboratories from blowing up.
However, in the case of functions expressing semantics as types isn’t so easy.
A simple way to succinctly express a function’s semantics is with testing. Tests are documentation! Seeing
reverse([1, 2, 3]) == [3, 2, 1] makes things immediately obvious.