The value of post-mortems is apparent: failures present opportunities to learn about unexpected behaviors of the system, and learning lets us make improvements to the system’s reliability.
The value of post-mortem documents is much less apparent.
Many R&D orgs will insist that the final draft of a post-mortem document have a particular structure. Common components of this structure include:
- Start and end time of customer impact
- Time to detection
- Time to diagnosis
- Time to recovery
- A list of action items, each with a link to a ticket
- Mitigation action items broken down into categories (e.g. “Prevention,” “Detection,” “Impact”)
- Specific sections (e.g. “Timeline,” “What went wrong?,” “What can we do better?”)
- Signoffs/approvals
None of these structural requirements facilitate learning. The benefits of post-mortem analysis come not from the document, but rather from the sense-making processes of which the document is an artifact. In order to understand a given failure, we invent hypotheses and test them against our mental model and the observable facts. In order to choose effective strategies for improvement, we converse and debate. And in order to make any of this matter, we establish accountability for seeing those strategies through.
These social processes are the source of the value of post-mortem analysis. The document is just a souvenir.
But what if you want to do meta-analysis? What if you want to analyze trends in incident characteristics over time, or categorize incidents according to some scheme? Don’t you need structure then?
I suppose you do. But good luck getting any useful information. No matter how much structure you insist on, the data set will be hopelessly noisy. Just try to state a hypothesis that can realistically be tested by a meta-analysis of a single organization’s incident data. I don’t think you can.
But what if structure helps me learn?
If structuring the post-mortem process helps you learn, then by all means! categorize! prompt! But recognize structure as a tool rather than an end in itself. Your learning process may benefit from one kind of structure, while somebody else’s may benefit from a different kind of structure, or less or more structure altogether. But the structure of the learning is what matters; not the structure of the document.
Organizational legibility
A requirement for post-mortem documents to have a specific, consistent structure doesn’t help us learn or improve. So why do we do it?
If you ask me, it’s all about the bureaucratic drive for legibility. Centralized power craves legibility.
Idiosyncratic processes like sense-making and learning are illegible to a command-and-control power structure. They come in diverse and intricate forms, instead of the standardized, codified forms that bureaucracy can parse. In service of legibility, a company’s power structure will insist that the post-mortem process culminate in spreadsheet-ready data items like “customer impact duration,” “time to recovery,” “severity level,” and the like. Centralized power demands these simplifications even if they inhibit learning and improvement. To the bureaucracy, legibility itself is the goal.
As an employee, you probably can’t totally disregard these bureaucratic impositions. But, to the extent you can disregard them, you should. Focus on what really matters: learning and improvement.