Managing entropy in complex systems: the Maxwell’s Demon approach


Take a nontrivial software system and put it on the internet. Problems will emerge. Some problems will be serious; others less so. We won’t notice most of them.

A software system in production is a bucket filled with fluid. Each particle of the fluid is a discrete problem. The problems bounce around and collide with each other and do all kinds of stochastic stuff from moment to moment.

At the very bottom of the bucket are problems so minute that they can hardly be called problems at all. They have low energy. They don’t interact much with each other or with anything else.

Higher up, you find higher-and-higher-energy particles. Problems that cause small hiccups, or sporadic bouts of sluggishness.

Somewhere near the top, there’s a threshold. When a problem gets enough energy to cross this threshold, we passively notice it. Maybe it causes an outage, or maybe it just causes a false positive alert. Maybe a support ticket gets filed. Maybe it’s just a weird spike in a graph. However we perceive it, we’re forced to take it seriously.

Once problems get enough energy, we can’t help but notice them. But before that, they already exist.

What happens before a particle jumps this energy threshold?

Perhaps the problem is entirely novel – no part of it existed before now. A code deploy with a totally self-contained bug. A DOS attack. If it’s something like that: oh well.

But more often, a problem we just perceived has been acted upon by a more gradual process. Problems bounce around in the bucket, and occasionally they bounce into each other and you get a problem with higher energy than before. Or circumstances shift, and a problem that was once no big deal becomes a big deal. Over time, particles that started in the middle – or even at the bottom – can work their way up to the passive perception line.

If problems usually hang out below the perception threshold for a while before they cross it, then we can take advantage of that in two ways. One way is to lower the threshold for passive perception. Raise the sensitivity of our monitors without sacrificing specificity. This is hard, but worthwhile.

The other way to take advantage of the fluid-like behavior of problems is to spend energy finding and fixing problems before they boil. I call this the Maxwell’s demon approach. You go looking for trouble. You poke around in dashboards and traces and logs, find things that look weird, turn them around in your hands until you understand them, and ultimately fix them. Maybe you have a ticket backlog of possible problems you’ve found, and it’s somebody’s job to burn down that backlog. Ideally it’s the job of a team using a shared-context system like differential diagnosis.

If you make it somebody’s job to be Maxwell’s demon, you can find and fix all sorts of problems before they become bigger problems. If you don’t make it someone’s job, then no problem will get taken seriously until it’s an outage.

8 thoughts on “Managing entropy in complex systems: the Maxwell’s Demon approach

  1. Dean Kekoolani

    Dan,

    Thanks so much. Everything in your blog is amazing. I am using IMPULSE as a framework to dampen panic and anxiety; it’s helping me aim my talent (which is considerable) long enough to complete.

    I like your simple approach to keeping the train on its tracks.

    Thank you thank you thank you.

    Dean

  2. Pingback: Big Problems and Small Problems under load – Dan Slimmon

  3. Pingback: Descriptive engineering: not just for post-mortems – Dan Slimmon

  4. Pingback: Descriptive engineering: not just for post-mortems - My Blog

  5. Pingback: Upstream-local queueing – Dan Slimmon

  6. Pingback: Why I only page on downtime. ONLY. – Dan Slimmon

  7. Pingback: Huh! as a signal – Dan Slimmon

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s