Ideas are funny things. It can take hours or days or months of noodling on a concept before you’re even able to start putting your thoughts into a shape that others will understand. And by then, you’ve explored the contours of the problem space enough that the end result of your noodling doesn’t seem interesting anymore: it seems obvious.
It’s a lot easier to feel like you’re doing real work when you’re writing code or collecting data or fixing firewall rules. These are tasks with a visible, concrete output that correlates roughly with the amount of time you put into them. When you finish writing a script, you can hold it up to your colleagues and they can see how much effort you put into it.
But as you get into more senior-type engineering roles, your most valuable contributions start to take the form not of concrete labor, but of conceptual labor. You’re able to draw on a rich mental library of abstractions, synthesizing and analyzing concepts in a way that only someone with your experience can do.
One of the central challenges in growing into a senior engineer consists in recognizing and acknowledging the value of the conceptual labor. We have to learn to identify and discard discouraging self-talk, like:
Time spent thinking is time spent not doing. When you’re a senior engineer, thinking is often the most valuable kind of doing you can do.
It’s not worth telling anyone about this connection I thought of. As a senior engineer, you will make conceptual connections that no one else will make. If they seem worthwhile to you, then why shouldn’t it be worthwhile to communicate them to others?
My coworkers are all grinding through difficult work and I’m just over here ruminating. Working with concepts is difficult work, and it’s work whose outcome can be immensely beneficial to your team and your org, but only if you follow through with it. Don’t talk yourself down from it.
Perhaps for the first time in your career, one of the skills you have to develop is faith: faith in the value of your conceptual labor. When an idea looks important to you, have faith that your experience is leading you down a fruitful path. Find that idea’s shape, see how it fits into your cosmos, and put that new knowledge – knowledge that you created by the sweat of your own brow – out into the world.
Imagine you’re an extremely bad doctor. Actually, chances are you don’t even have to imagine. Most people are extremely bad doctors.
But imagine you’re a bad doctor with a breathtakingly thorough knowledge of the human body. You can recite chapter and verse of your anatomy and physiology textbooks, and you’re always up to date on the most important research going on in your field. So what makes you a bad doctor? Well, you never order tests for your patients.
What good does your virtually limitless store of medical knowledge do you? None at all. Without data from real tests, you’ll almost never pick the right interventions for your patients. Every choice you make will be a guess.
There’s another way to be an extremely bad doctor, though. Imagine you don’t really know anything about how the human body works. But you do have access to lots of fancy testing equipment. When a patient comes in complaining of abdominal pain and nausea, you order as many tests as you can think of, hoping that one of them will tell you what’s up.
This rarely works. Most tests just give you a bunch of numbers. Some of those numbers may be outside of normal ranges, but without a coherent understanding of how people’s bodies behave, you have no way to put those numbers into context with each other. They’re just data – not information.
In medicine, data is useless without theory, and theory is useless without data. Why would we expect things to be any different in software?
Observability as signal and theory
The word “observability” gets thrown around a lot, especially in DevOps and SRE circles. Everybody wants to build observable systems, then make their systems more observable, and then get some observability into their observability so they can observe while they observe.
But when we look for concrete things we can do to increase observability, it almost always comes down to adding data. More metrics, more logs, more spans, more alerts. Always more. This makes us like the doctor with all the tests in the world but no bigger picture to fit their tests results into.
Observability is not just data. Observability comprises two interrelated and necessary properties: signal and theory. The relationship between these two properties is as follows:
Signal emerges from data when we interpret it within our theory about the system’s behavior.
Theory reacts to signal, changing and adapting as we use it to process new information.
In other words, you can’t have observability without both a rich vein of data and a theory within which that data can be refined into signal. Not enough data and your theory can’t do its job; not enough theory and your data is meaningless. Theory is the alchemy that turns data into knowledge.
What does this mean concretely?
It’s all well and good to have a definition of observability that looks nice on a cocktail napkin. But what can we do with it? How does this help us be better at our job?
The main takeaway from the understanding that observability consists of a relationship between data and theory, rather than simply a surfeit of the former, is this: a system’s observability may be constrained by deficiencies in either the data stream or our theory. This insight allows us to make better decisions when promoting observability.
Making better graph dashboards
However many graphs it contains, a metric dashboard only contributes to observability if its reader can interpret the curves they’re seeing within a theory of the system under study. We can facilitate this through many interventions, a few of which are to:
Add a note panel to the top of every dashboard which give an overview of how that dashboard’s graphs are expected to relate to one another.
Add links to dashboards for upstream and downstream services, so that data on the dashboard can be interpreted in a meaningful context.
When building a dashboard, start with a set of questions you want to answer about a system’s behavior, and then choose where and how to add instrumentation; not the other way around.
Making better alerts
Alerts are another form of data that we tend to care about. And like all data, they can only be transmogrified into signal by being interpreted within a theory. To guide this transmogrification, we can:
Present alerts along with links to corresponding runbooks or graph dashboards.
Document a set of alerts that, according to our theory, provides sufficient coverage of the health of the system.
Delete any alerts whose relevance to our theory can’t be explained succinctly.
Engaging in more effective incident response
When there’s an urgent issue with a system, an intuitive understanding of the system’s behavior is indispensable to the problem solving process. That means we depend on the system’s observability. The incident response team’s common ground is their theory of the system’s behavior – in order to make troubleshooting observations meaningful, that theory needs to be kept up to date with the data.
To maintain common ground over the course of incident response, we can:
Engage in a regular, structured sync conversation about the meaning of new data and the next steps.
Seek out data only when you can explicitly state how the data will relate to our theory (e.g. “I’m going to compare these new log entries with the contents of such-and-such database table because I think the latest deploy might have caused an inconsistency”).
Maintain an up-to-date, explicit record of the current state of problem solving, and treat it as the ultimate source of truth.
Data is just data until theory makes it signal.
The next time you need to build an observable system, or make a system more observable, take the time to consider not just what data the system produces, but how to surface a coherent theory of the system’s workings. Remember that observability is about delivering meaning, not just data.