The paradox of the bloated backlog

I want to point out a paradox you may not have noticed.

A team of software engineers or SREs invariably has more good ideas than time. We know this very well. Pick any system we own, and we’ll come up with a list of 10 things we could do to make it better. Off the top of our head.

On the other hand, when our team is confronted with the opportunity to purge some old features or enhancements out of the backlog, there’s resistance. We say, “we might get around to this some day,” or “this still needs to get done.”

These two beliefs, taken together, reveal a deep lack of team self-confidence.

If our team always has more good ideas than time, then we’re never going to implement all the good ideas in our backlog. If we add more people, we’ll just get more good ideas, and the backlog will just get more bloated.

Why are we reluctant to ruthlessly remove old tickets, then? We know that we’re constantly generating good ideas. In fact, the ideas we’re generating today are necessarily (on average) better than most of the ideas in the backlog:

  1. We have more information about the system now then we used to, so our new ideas are more aligned with real-world facts, and
  2. We have more experience as engineers now, so we have developed better intuition about what kind of interventions will create the most value.

Seen in this light, a hesitance to let go of old ideas is revealed as a symptom of a deep pathology. It means we don’t believe in our own creativity and agency. If we did, we would have easy answers to the questions we always ask when we consider closing old tickets:

  • What if we forget about this idea? That’s okay. We’ll have plenty of other, better, more relevant ideas. We never stop generating them.
  • What if this problem gets worse over time? If the risk is enough that we should prioritize this ticket over our other work, then let’s do that now. Otherwise, we can cross that bridge if we ever get to it.
  • Will the reporter of this ticket even let us close it? Nobody owns our backlog but us. When we decide to close a ticket, all we owe the reporter is an honest explanation of why.

Leave behind the paradox of the bloated backlog and start believing in your team’s own agency and creativity. Hell, maybe even cap your backlog. A team with faith in its competence is a team unleashed.

Latency- and Throughput-Optimized Clusters Under Load

It’s good to have accurate and thorough metrics for our systems. But that’s only where observability starts. In order to get value out of metrics, we have to focus on the right ones: the ones that tell us about outcomes that matter to our users.

In The Latency/Throughput Tradeoff, we talked about why a given cluster can’t be optimized for low latency and high throughput at the same time. In conclusion, we decided that separate clusters should be provisioned for latency-centric use cases and throughput-centric use cases. And since these different clusters are optimized to provide different outcomes, we’ll need to interpret their metrics in accordingly different ways.

Let’s consider an imaginary graph dashboard for each type of cluster. We’ll walk through the relationships between metrics in each cluster, both under normal conditions and under excessive load. And then we’ll wrap up with some ideas about evaluating the capacity of each cluster over the longer term.

Metrics for comparison

In order to contrast our two types of clusters, we’ll need some common metrics. I like to employ the USE metrics at the top of my graph dashboards (more on that in a future post). So let’s use those:

  • Utilization: The total CPU usage for all hosts in the cluster.
  • Saturation: The number of queued requests.
  • Error rate: The rate at which lines are being added to hosts’ error logs across the cluster.

In addition to these three metrics, we want to see a fourth metric. For the latency-optimized cluster, we want to see latency. And for the throughput-optimized cluster, we want to see throughput.

One of the best things about splitting out latency-optimized and throughput-optimized clusters is that the relationships between metrics become much clearer. It’s hard to tell when something’s wrong if you’re not sure what kind of work your cluster is supposed to be doing at any given moment. But separation of concerns allows us to develop intuition about our systems’ behavior and stop groping around in the dark.

Latency-optimized cluster

Let’s look at the relationships between these four important metrics in a latency-optimized cluster under normal, healthy conditions:

2019-03-11 blog charts latency-oriented healthyUtilization will vary over time depending on how many requests are in flight. Saturation, however, should stay at zero. Any time a request is queued, we take a latency hit.\

Error rate would ideally be zero, but come on. We should expect it to be correlated to utilization, following the mental model that all requests carry the same probability of throwing an error.

Latency, then – the metric this cluster is optimized for – should be constant. It should not be affected by utilization. If latency is correlated to utilization, then that’s a bug. There will always be some wiggle in the long tail (the high percentiles), but for a large enough workload, the median and the 10th percentile should pretty much stay put.

Now let’s see what happens when the cluster is overloaded:

2019-03-12 blog charts latency-oriented loaded

We start to see plateaus in utilization, or at least wider, flatter peaks. This means that the system is low on “slack”: the idle resources that it should always have available to serve new requests. When all resources are busy, saturation rises above zero.

Error rate may still be correlated with utilization, or it may start to do wacky things. That depends on the specifics of our application. In a latency-optimized cluster, saturation is always pathological, so it shouldn’t be surprising to see error rates climb or spike when saturation rises above zero.

Finally, we start to see consistent upward trends in latency. The higher percentiles are affected first and most dramatically. Then, as saturation rises even higher, we can see the median rise too.

Throughput-optimized cluster

The behavior of our throughput-optimized cluster, on the other hand, is pretty different. When it’s healthy, it looks like this:

2019-03-22 blog charts throughput-oriented healthy.png

Utilization – which, remember, we’re measuring via CPU usage – is no longer a fluffy cloud. Instead, it’s a series of trapezoids. When no job is in progress, utilization is at zero. When a job starts, utilization climbs up to 100% (or as close to 100% as reality allows it to get), and then stays there until the job is almost done. Eventually, the number of active tasks drops below the number of available processors, and utilization trails off back to zero.

Saturation (the number of queued tasks) follows more of a sawtooth pattern. A client puts a ton of jobs into the system, bringing utilization up to its plateau, and then as we grind through jobs, we see saturation slowly decline. When saturation reaches zero, utilization starts to drop.

Unlike that of a latency-optimized cluster, the error rate in a throughput-optimized cluster shouldn’t be sensitive to saturation. Nonzero saturation is an expected and desired condition here. Error rate is, however, expected to be follow utilization. If it bumps around at all, it should plateau, not sawtooth.

And finally, in the spot where the other cluster’s dashboard had a latency graph, we now have throughput. This we should measure in requests per second per job queued! What our customers really care about is not how many requests per second our cluster is processing, but how many of their requests per second. We’ll see why this matters so much in a bit, when we talk about this cluster’s behavior under excessive load.

Throughput should be tightly correlated to utilization and nothing else. If it also seems to exhibit a negative correlation to saturation, then that’s worth looking into: it could mean that queue management is inappropriately coupled to job processing.

Now what if our throughput-optimized cluster starts to get too loaded? What does that look like?

2019-03-23 blog charts throughput-oriented loaded

Utilization by itself doesn’t tell us anything about the cluster’s health. Qualitatively, it looks just like before: big wide trapezoids. Sure, there a bit wider than they were before, but we probably won’t notice that – especially since it’s only on average that they’re wider.

Saturation is where we really start to see the cracks. Instead of the sawtooth pattern we had before, we start to get more sawteeth-upon-sawteeth. These are jobs starting up on top of jobs that are already running. A few of these are to be expected even under healthy circumstances, but if they become more frequent, it’s an indication that there may be too much work for the cluster to handle without violating throughput SLOs.

Error rate may not budge much in the face of higher load. After all, this cluster is supposed to hold on to big queues of requests. As far as it’s concerned, nothing is wrong with that.

And that’s why we need to measure throughput in requests per second per job queued. If all we looked at was requests per second, everything would look hunky dory. But when there are two jobs queued and they’re constrained to use the same pool of processors, somebody’s throughput is going to suffer. On this throughput graph, we can see that happen right before our eyes.

Longer-term metrics

Already, we’ve seen how decoupling these two clusters gives us a much clearer mental model of their expected behavior. As we get used to the relationships between the metrics we’ve surfaced, we’ll start to build intuition. That’s huge.

But we don’t have to stop there! Having a theory (which is what we built above) also allows us to reason about long-term changes in the observable properties of these clusters as their workloads shift.

Take the throughput-optimized cluster, for instance. As our customers place more and more demand on it, we expect to see more sawteeth-upon sawtooth and longer intervals between periods of zero saturation. This latter observation is key, since wait times grow asymptotically as utilization approaches 100%. So, if we’re going to want to do evaluate the capacity needs of our throughput-optimized cluster, we should start producing these metrics on day one and put them on a dashboard:

  • Number of jobs started when another job was already running, aggregated by day. This is the “sawteeth-upon-sawteeth” number.
  • Proportion of time spent with zero saturation, aggregated by day.
  • Median (or 90th percentile) time-to-first-execution for jobs, aggregated by day. By this I mean: for each customer job, how much time passed between the first request being enqueued and the first request starting to be processed. This, especially compared with the previous metric, will show us how much our system is allowing jobs to interfere with one another.

An analogous thought process will yield useful capacity evaluation metrics for the latency-optimized cluster.

TL;DR

UnsteadyBlueAardvark-size_restricted

Separating latency- and throughput-optimized workloads doesn’t just make it easier to optimize each. It carries the added benefit of making it easier to develop a theory about our system’s behavior. And when you have a theory that’s consistent with the signal and a signal that’s interpretable within your theory, you have observability.

The Latency/Throughput Tradeoff: Why Fast Services Are Slow And Vice Versa

Special thanks to the graceful and cunning Ben Ng for consulting on this post.

I’m finally getting around to reading that DevOps* book everybody’s been raving about, Site Reliability Engineering: How Google Runs Production Systems. My verdict so far: it’s pretty good.

Here’s one of the first passages to jump out to me, from Chapter 3: Embracing Risk:

The low-latency user wants Bigtable’s request queues to be (almost always) empty so that the system can process each outstanding request immediately upon arrival. (Indeed, inefficient queuing is often a cause of high tail latency.) The user concerned with offline analysis is more interested in system throughput, so that user wants request queues to never be empty. To optimize for throughput, the Bigtable system should never need to idle while waiting for its next request.

This is a profound and general insight. When I read this passage, my last decade of abject suffering suddenly came into focus for me.

When I say “abject suffering,” I’m of course talking about ElasticSearch administration. When a storage system like ElasticSearch has to serve both high-latency and high-throughput workloads, it is guaranteed to get ugly. This fact is super important, which is why I’m devoting this blog post to exploring the relationships among latency, throughput, and capacity from a queueing perspective. I hope I can make these relationships stick in your mind like they’ve stuck in mine.

* Go ahead. Tell me DevOps and SRE aren’t the same thing. I dare you.

The tradeoff between throughput and latency

Consider a service that responds to requests. As an example, let’s say it’s a service that takes as input a picture of a dog and returns a picture of that dog wearing a silly hat.

2019-02-25 dog wearing hat
Artist’s rendering

Like almost any service (exception: Tourbillon), our service can only handle a certan number of requests per second [to put hats on dogs (RPSTPHOD)]. We’ll call this number its capacity. If we have 200 processes devoted to dog-hatting, and dogs take on average 400 milliseconds to haberdash, then the theoretical capacity of the system is

(200) / (0.4s) = 500 hats per second

Now let’s consider the two types of users that depend on our service:

  • On-the-spot dog hatters. At any given time, these users have a single dog picture that requires a hat as soon as possible. Perhaps they’re using our service to support a website that generates a single dog-hat picture per page load, and they want their page to load quickly. These users are interested primarily in how quickly they can get a hat on a dog. In a word: latency.
  • Bulk dog-hatters. These users tend to have massive data sets that they want processed as quickly as possible. The most obvious example would be a law enforcement agency wanting to compare their large database of pet photos to surveillance footage of a particular dog robbing a bank while wearing a hat. Bulk dog-hatters care not about the latency of any individual dog-hatting, but about the throughput they can achieve. In other words: how close they can get to our service’s theoretical capacity of 500 hats per second.

But here’s the problem: no single cluster of dog-hatting servers can be optimal for both types of users. And the better we make the service for one kind of user, the worse we make it for the other.

The needs of on-the-spot users

In order to minimize latency for our on-the-spot users (without dropping any of their requests), we need to make sure that there’s always a processor idle when their request comes in. If we fail to make sure of this, then new requests will have to be queued while we wait for a spot to open up, thus inflating latency. The system needs some “slack.”

2019-02-25 blog charts low-latency

Since we need slack, we don’t ever want throughput to approach capacity. The closer we get to our system’s capacity, the more drastically latencies will balloon, like I talked about in this post.

The needs of bulk users

Our bulk dog-hatters, on the other hand, don’t care so much about request latency. Some of their individual requests might take seconds, or minutes, or even hours to complete. What they care about is how quickly our service can process their entire data set. In other words, they care about getting throughput as close as possible to capacity.

This means that, whenever a job is running, bulk dog-hatters want there to be (virtually) zero slack. Every processor should be active at all times. Consequently, our queue sizes will explode as soon as the job starts, and our queues will stay occupied until the job is almost done.

2019-02-25 blog charts high-throughput

In this case, we want our queues to be full whenever there’s a bulk job running. Anything else would give sub-optimal throughput.

Splitting the cluster up

The needs of on-the-spot and bulk users are incompatible. One group needs minimal latency, while the other group needs maximal throughput.

If both of these groups are using the same cluster, we’re going to have serious problems. On-the-spot users’ latencies will vary widely depending on whether there’s currently a bulk job in progress, and bulk users’ job times will vary depending on the number of on-the-spot users currently using the system. No matter how much we scale or tweak tuning parameters, neither group will get what they need. And what’s worse, we’ll be stuck in a perpetual tug-of-war between the priorities of these two groups.

So let’s split our cluster in two: a “low latency” cluster and a “high throughput” cluster. And let’s let our users pick the right one for their use case. This way, we’ll have much clearer expectations about the performance and scaling characteristics of our service, and we’ll avoid the frustrating priority tug-of-war that characterized our mixed-use cluster.

The split doesn’t have to be complete. Instead of having two wholly separate clusters, we could have some kind of load balancer that reserves a certain portion of our fleet for low-latency traffic and slots bulk jobs onto dedicated segments of the cluster. The details of every solution will vary. What matters is that on-the-spot and bulk dog-hatters aren’t drawing on the same pool of resources.

Once we do split up our cluster, then, what should we expect the performance characteristics of the new clusters to be? What will their graph dashboards look like when they’re healthy, or near capacity, or over capacity? In an upcoming post, I’ll use some more queueing reasoning to answer these questions. So get hype for that!

[UPDATE: It’s here!]

Let’s Stop Pretending Estimates Are Exact

Some days I feel like estimates and plans are worthless. Other days I find myself idly speculating that if only we were better at planning and estimating, everything would be beautiful.

In an engineering organization, we want realistic plans. Realistic plans and roadmaps will hypothetically help us utilize our capacity effectively and avoid overcommitting. And our plans are usually built on estimates; that’s why we believe that “correct” estimates are more likely to generate a realistic plan. Therefore, we believe that correct estimates are useful for planning.

On the other hand, if you’ve ever worked with engineers, you also believe that estimates are never correct. Something always gets in the way, or we pad the estimate out to avoid being held responsible for missing a deadline, or (occasionally) the task ends up being much simpler than we expected. The time we estimate is never the time it actually takes to do the job.C7ofJ_scaled

These two statements seem to be in contradiction, but they are not. There is a key to holding both these beliefs in your head at once. And I believe that if an entire engineering org were to grasp this key together, it would unlock enormous potential.

The key is to treat uncertainty as a first-class citizen.

Why This Is Hard

Developing organizational intuition in about estimates is really hard. It is in fact so hard that I’m not aware of any organization that has done it. I see two big reasons for this:

  1. We don’t have good awareness of the amount of uncertainty in our estimates.
  2. We pay lip service to the idea that estimates aren’t perfect, but we plan as if they are.

1. Uncalibrated estimates

When we give estimates, those estimates aren’t calibrated to a particular uncertainty level. Everybody has different levels of confidence in their estimates, and those levels are neither discussed nor adjusted. As a result, engineers and project managers find it impossible to turn estimates into meaningful plans.

2. Planning with incorrect use of uncertainty

You’ve surely noticed uncertainty being ignored. It usually goes something like this:

PM: How soon do you think you can have the power converters re-triangulated?
ENG: I’m not sure. It could take a day, or it could take five.
PM: I understand. What do you think is most likely? Can we say three days?
ENG: Sure, three or four.
PM: Okay. I’ll put it down for four days, just to be safe.

Both sides acknowledge that there is a great deal of uncertainty in this estimate. But a number needs to get written down, so they write down 4. In this step, information about uncertainty is lost. So it shouldn’t be a surprise later when re-triangulating the power converters takes 7 days and the whole plan gets thrown off.

Example

Suppose 5 teams are working independently on different parts of a project. Each team has said they’re 80% confident that their part will be done by the end of the year. The project manager will want to know how likely is it that all five parts will be done by the end of the year?

When we answer questions like this, we don’t usually think in terms of probability. And even if we do think in terms of probability, it’s easy to get it wrong. If you ask non-mathy people this question (and most project managers I’ve worked with are not mathy), their intuition will be: 80%.

DreidelsBut that intuition is extremely misleading. What we’re dealing with is a set of 5 independent events, each with 80% probability. Imagine rolling five 5-sided dice. Or, since 5-sided dice are not really a thing, imagine spinning five 5-sided dreidels (which are also not really a thing, but are at least geometrically possible). You spin your five dreidels and if any of them comes up ש‎, the project fails to meet its end-of-year deadline. The probability of success, then, is:

(80%)⁵ = 80% ⋅ 80% ⋅ 80% ⋅ 80% ⋅ 80% = 32.8%

Awareness of the probabilistic nature of estimates makes it obvious that a bunch of high-confidence estimates do not add up to a high-confidence estimate.

To make matters even worse, most organizations don’t even talk about how confident they are. In real life, we wouldn’t even know the uncertainties associated with our teams’ estimates (this is problem 1: uncalibrated estimates). Nobody would even be talking about probabilities. They’d be saying “All five teams said they’re pretty sure they can be done by the end of the year, so we can be pretty sure we’ll hit an end-of-year target for the project.” But they’d be dead wrong.

Why This Is Worth Fixing

I believe that poor handling of uncertainty is a major antagonist not just of technical collaboration, but of social cohesion at large. When uncertainty is not understood:

  • Project managers feel let down. Day in and day out they ask for reasonable estimates, and they’re left explaining to management why the estimates were all incorrect and the project is going to be late.
  • Engineers feel held to unreasonable expectations. They have to commit to a single number, and when that number is too low they have to work extra hard to hit it anyway. It feels like being punished for something that’s no one’s fault.
  • Managers feel responsible for failures. Managers are supposed to help their teams collaborate and plan, but planning without respect for uncertainty leads to failure every time. And what’s worse, if you don’t understand the role of uncertainty, you can’t learn from the failure.
  • Leadership makes bad investments. Plans may look certain when they’re not at all. Return on an investment often degrades quickly with delays, making what looked like a good investment on paper turn out to be a dud.

Fostering an intuition for uncertainty in an organization could help fix all these problems. I think it would be crazy not to at least try.

How To Fix It

Okay, here’s the part where I admit that I don’t quite know what to do about this problem. But I’ve got a couple ideas.

1. Uncalibrated estimates

To solve the problem of uncalibrated estimates (estimates whose uncertainty is wrong or never acknowledged), what’s needed is practice. Teams need to make 80%-confidence estimates for everything – big and small – and record those estimates somewhere. Then when the work is actually done, they can review their estimates and compare to the real-world completion time. If they didn’t get about 80% of their estimates correct, they can learn to adjust their confidence.

Estimating with a certain level of confidence is a skill that can be learned through practice. But practice, as always, takes a while and it needs to be consistent. I’m not sure how one would get an organization to put in the work, but I do believe it would pay off if they did.

2. Planning with incorrect use of uncertainty

Managers and project managers should get some training in how to manipulate estimates with uncertainty values attached.

Things get tricky when tasks need to be done one after the other instead of in parallel. But I like to think we’re not adding complexity to the process of planning; rather we’re just revealing the ways in which the current process is broken. And recognizing the limits of our knowledge is a great way to keep our batch sizes small and our planning horizons close.

I think we need to get used to calibrating estimates before we can see our way clear to address the planning challenges.

Can It Be Fixed?

It strikes me as super hard to fix the problems I’ve outlined here in a company that already has its old habits. Perhaps a company needs to be built from the ground up with a culture of uncertainty awareness. But I hope not.

If there is a way to make uncertainty a first-class citizen, I truly believe that engineering teams will benefit hugely from it.

Falsifiability: why you rule things out, not in

This June, I had the honor of speaking at O’Reilly Velocity 2016 in Santa Clara. My topic was Troubleshooting Without Losing Common Ground, which I’ve written about and written about before that too.

I was pretty happy with my talk, especially the Star Trek: The Next Generation vignette in the middle. It was a lot of ideas to pack into a single talk, but I think a lot of people got the point. However, I did give a really unsatisfactory answer (30m46s) to the first question I received. The question was:

In the differential diagnosis steps, you listed performing tests to falsify assumptions. Are you borrowing that from medicine? In tech are we only trying to falsify assumptions, or are we sometimes trying to validate them?

I didn’t have a real answer at the time, so I spouted some bullshit and moved on. But it’s a good question, and I’ve thought more about it, and I’ve come up with two (related) answers: a common-sense answer and a pretentious philosophical answer.

The Common Sense Answer

My favorite thing about differential diagnosis is that it keeps the problem-solving effort moving. There’s always something to do. If you’re out of hypotheses, you come up with new ones. If you finish a test, you update the symptoms list. It may not always be easy to make progress, but you always have a direction to go, and everybody stays on the same page.

But when you seek to confirm your hypotheses, rather than to falsify others, it’s easy to fall victim to tunnel vision. That’s when you fixate on a single idea about what could be wrong with the system. That single idea is all you can see, as if you’re looking at it through a tunnel whose walls block everything else from view.

Tunnel vision takes that benefit of differential diagnosis – the constant presence of a path forward – and negates it. You keep running tests to try to confirm your hypothesis, but you may never prove it. You may just keep getting tests results that are consistent with what you believe, but that are also consistent with an infinite number of hypotheses you haven’t thought of.

A focus on falsification instead of verification can be seen as a guard against tunnel vision. You can’t get stuck on a single hypothesis if you’re constrained to falsify other ones. The more alternate hypotheses you manage to falsify, the more confident you get that you should be treating for the hypotheses that might still be right.

Now, of course, there are times when it’s possible to verify your hunch. If you have a highly specific test for a problem, then by all means try it. But in general it’s helpful to focus on knocking down hypotheses rather than propping them up.

The Pretentious Philosophical Answer

I just finished Karl Popper’s ridiculously influential book The Logic of Scientific Discovery. If you can stomach a dense philosophical tract, I would highly recommend it.

karl-popper-4.jpg
Karl “Choke Right On It, Logical Positivism” Popper

Published in 1959 – but based on Popper’s earlier book Logik der Forschung from 1934 – The Logic Of Scientific Discovery makes a then-controversial [now widely accepted (but not universally accepted, because philosophers make cats look like sheep, herdability-wise)] claim. I’ll paraphrase the claim like so:

Science does not produce knowledge by generalizing from individual experiences to theories. Rather, science is founded on the establishment of theories that prohibit classes of events, such that the reproducible occurrence of such events may falsify the theory.

Popper was primarily arguing against a school of thought called logical positivism, whose subscribers assert that a statement is meaningful if and only if it is empirically testable. But what matters to our understanding of differential diagnosis isn’t so much Popper’s absolutely brutal takedown of logical positivism (and damn is it brutal), as it is his arguments in favor of falsifiability as the central criterion of science.

I find one particular argument enlightening on the topic of falsification in differential diagnosis. It hinges on the concept of self-contradictory statements.

There’s an important logical precept named – a little hyperbolically – the Principle of Explosion. It asserts that any statement that contradicts itself (for example, “my eyes are brown and my eyes are not brown”) implies all possible statements. In other words: if you assume that a statement and its negation are both true, then you can deduce any other statement you like. Here’s how:

  1. Assume that the following two statements are true:
    1. “All cats are assholes”
    2. “There exists at least one cat that is not an asshole”
  2. Therefore the statement “Either all cats are assholes, or 9/11 was an inside job” (we’ll call this Statement A) is true, since the part about the asshole cats is true.
  3. However, if the statement “there exists at least one cat that is not an asshole” is true too (which we’ve assumed it is) and 9/11 were not an inside job, then Statement A would be false, since neither of its two parts would be true.
  4. So the only way left for Statement A to be true is for “9/11 was an inside job” to be a true statement. Therefore, 9/11 was an inside job.
  5. Wake up, sheeple.

plhpico

The Principle of Explosion is the crux of one of Popper’s most convincing arguments against the Principle of Induction as the basis for scientific knowledge.

It was assumed by many philosophers of science before Popper that science relied on some undefined Principle of Induction which allowed one to generalize from a finite list of experiences to a general rule about the universe. For example, the Principle of Induction would allow one to deduce from enough statements like “I dropped a ball and it fell” and “My friend dropped a wrench and it fell” to “When things are dropped, they fall.” But Popper argued against the existence of the Principle of Induction. In particular, he pointed out that:

If there were some way to prove a general rule by demonstrating the truth of a finite number of examples of its consequences, then we would be able to deduce anything from such a set of true statements.

Right? By the Principle of Explosion, a self-contradictory statement implies the truth of all statements. If we accepted the Principle of Induction, then the same evidence that proves “When things are dropped, they fall” would also prove “All cats are assholes and there exists at least one cat that is not an asshole,” which would prove every statement we can imagine.

So what does this have to do with falsification in differential diagnosis? Well, imagine you’ve come up with these hypotheses to explain some API slowness you’re troubleshooting:

Hypothesis Alpha: contention on the table cache is too high, so extra latency is introduced for each new table opened

Hypothesis Bravo: we’re hitting our IOPS limit on the EBS volume attached to the database server

There are many test results that would be compatible with Hypothesis Alpha. But unless you craft your tests very carefully, those same results will also be compatible with Hypothesis Bravo. Without a highly specific test for table cache contention, you can’t prove Hypothesis Alpha through a series of observations that agree with it.

What you can do, however, is try to quickly falsify Hypothesis Bravo by checking some graphs against some AWS configuration data. And if you do that, then Hypothesis Alpha is the your best remaining guess. Now you can start treating for table cache contention on the one hand, and attempting the more time-consuming process (especially if it’s correct!) of falsifying Hypothesis Alpha.

Isn’t this kind of abstract?

Haha OMG yes. It’s the most abstract. But that doesn’t mean it’s not a useful idea.

If it’s your job to troubleshoot problems, you know that tunnel vision is very real. If you focus on generating alternate hypotheses and falsifying them, you can resist tunnel vision’s allure.

A Moral Thought Experiment That Breaks My Brain

Note: This blog post is not about computers or math or DevOps. I like those things, and I write about them usually. But not today.

Sometimes I read something and I’m like “that can’t be right,” but then I think about it for a while and I can’t figure out why it’s not right. This happens to me especially often with arguments about moral intuition.

We humans make moral judgements and decisions all day, every day, without even thinking about it. It’s a central part of what makes us us.

Try this: pick a moral belief that you hold very firmly. For example, I went with, “It’s wrong to treat people with less respect because of their sexual orientation.” Now, try to imagine no longer believing that thing. Imagine that everything about your mind is the same, except you no longer hold to that one belief or its consequences. It’s hard, isn’t it? It really feels like you wouldn’t be the same person.

We know that our moral beliefs change over the course of our lives, and so that feeling of our selves being tightly coupled to those beliefs must be an illusion. But still, it’s very disturbing when a well constructed thought experiment forces us to reevaluate our basic moral intuitions.

What follows is a thought experiment that has such a brain-breaking effect on me. We can’t all be moral geniuses who cut right through the Trolley Problem like this kid:

Making People Happy, or Making Happy People?

On this week’s episode of Sam Harris’s Waking Up podcast, the Scottish moral philosopher and effective altruism wunderkind Will MacAskill gave a very brief but very brain-breaking argument. It’s had me scratching my chin intensely for a couple days now.

It seems intuitive to me that we, as humans, have no moral obligation to make sure that more humans come into existence in the future. After all, it’s not like you owe anything to hypothetical people who will never come into existence. There seems to me to be no way that such a moral obligation could arise. I think a lot of people would agree with this intuition.

Will MacAskill’s argument (and I’m not sure it’s his originally, but I heard it from him) goes like this. Imagine what we’ll call World A. World A has some number of people in it, living their lives. And let’s also imagine World B, which has all the same people as World A, plus another person named Harry. Everybody in World B is exactly as well-off as their counterparts in World A, and Harry’s well-being is at a 6 out of 10. He’s moderately well-off.

blog_world_a_b
Photo credit: Brett Swanson

According to my intuition that there is no moral reason to prefer World B, in which Harry exists, to World A, in which Harry never came into being. If we say the total moral value of World A is a and the moral value of World B is b, I believe that a = b.

Alright, MacAskill says, now let’s introduce World C. This world is identical to World A, except it includes a person named Harry whose well-being is an 8 out of 10. He’s very happy almost all the time!

blog_world_c.png

Now, by the same logic I used before, letting World C’s total moral value equal c, I have to say that a = c. This world with a Harry at well-being level 8 is not preferable to a world in which Harry never existed.

This puts me in a bit of a pickle. Because, by straight-up math, we know that if a = b and a = c then b = c. In other words, there’s no moral reason to prefer World C to World B. But come on! World C is exactly like World B except that Harry is better off. Obviously it’s an objectively preferable world.

I feel reductio ad absurdum‘d and it makes me very uncomfortable. Does this mean I have an obligation to have kids if I think I can give them happy lives? I don’t believe that, but I’m not sure what to believe now.

3 Things That Make Encryption Easier

Almost everyone (especially in ops) knows they should be better about encrypting secret data. And yet most organizations have at least a few passwords and secret keys checked into Git somewhere.

The ideal solution would be for everyone at your company to use PGP all the time, but that is a huge pain. Encryption tools are annoying to use, and a significant time investment is required to learn to use them correctly. And if security is hard, people will always find a way to avoid it.

In the last few months, I’ve adopted 3 new technologies that make secure storage and exchange of secret information at least bearable.

1: Blackbox

StackExchange’s blackbox tool makes it easy to store encrypted data in a Git repository. First you need to import into your personal keyring all the PGP keys you want to grant access to. Then you initialize the blackbox directory structure:


dan@george:/tmp/secrets$ blackbox_initialize
Enable blackbox for this git repo? (yes/no) yes
VCS_TYPE: git
NEXT STEP: You need to manually check these in:
git commit -m'INITIALIZE BLACKBOX' keyrings /private/tmp/secrets/.gitignore
dan@george:/tmp/secrets$ git commit -m'INITIALIZE BLACKBOX' keyrings /private/tmp/secrets/.gitignore
[master 695d29a] INITIALIZE BLACKBOX
2 files changed, 3 insertions(+)
create mode 100644 .gitignore
create mode 100644 keyrings/live/blackbox-files.txt

view raw

gistfile1.txt

hosted with ❤ by GitHub

Once you’ve initialized blackbox, you can start adding administrators, which are keys that will be granted access to the secret data in the repository:


dan@george:/tmp/secrets$ blackbox_addadmin dan@danslimmon.com
gpg: /private/tmp/secrets/keyrings/live/trustdb.gpg: trustdb created
gpg: key A9FD8CCF: public key "Dan Slimmon " imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
NEXT STEP: You need to manually check these in:
git commit -m'NEW ADMIN: dan@danslimmon.com' keyrings/live/pubring.gpg keyrings/live/trustdb.gpg keyrings/live/blackbox-admins.txt
dan@george:/tmp/secrets$ git commit -m'NEW ADMIN: dan@danslimmon.com' keyrings/live/pubring.gpg keyrings/live/trustdb.gpg keyrings/live/blackbox-admins.txt
[master (root-commit) 94108f6] NEW ADMIN: dan@danslimmon.com
3 files changed, 2 insertions(+)
create mode 100644 keyrings/live/blackbox-admins.txt
create mode 100644 keyrings/live/pubring.gpg
create mode 100644 keyrings/live/trustdb.gpg

view raw

gistfile1.txt

hosted with ❤ by GitHub

Now you can start adding secrets securely:


dan@george:/tmp/secrets$ echo "nuclear launch code: SashaMalia44" > launchcode.txt
dan@george:/tmp/secrets$ blackbox_register_new_file launchcode.txt
========== PLAINFILE launchcode.txt
========== ENCRYPTED launchcode.txt.gpg
========== Importing keychain: START
gpg: Total number processed: 1
gpg: unchanged: 1
========== Importing keychain: DONE
========== Encrypting: launchcode.txt
========== Encrypting: DONE
========== Adding file to list.
========== CREATED: launchcode.txt.gpg
========== UPDATING REPO:
NOTE: "already tracked!" messages are safe to ignore.
[master 2a5eb65] registered in blackbox: launchcode.txt
2 files changed, 1 insertion(+)
create mode 100644 launchcode.txt.gpg
========== UPDATING VCS: DONE
Local repo updated. Please push when ready.
git push

view raw

gistfile1.txt

hosted with ❤ by GitHub

I really like how this tool gives my team a distributed, version-controlled repository of secret information. We can even give other teams access to the repository without worrying about exposing secrets!

My team uses this tool for shared passwords and SSL private keys, and it works great. Check it out.

2: Salt

At my company, we use Salt for config management. Like most config management systems, Salt lets you decouple the values in a config file from the file itself. You make a template of the config file that will appear on the node, and you put the values in a pillar (equivalent to a Chef databag, or a Puppet… whatever it’s called in Puppet).

So instead of storing a config file like this:


[myapp]
username = app_user
password = hunter2

view raw

blah.ini

hosted with ❤ by GitHub

You store a template like this:


[myapp]
username = {{ pillar['myapp']['username'] }}
password = {{ pillar['myapp']['password'] }}

view raw

blah.ini

hosted with ❤ by GitHub

and a pillar (which is just a YAML file) like this:


myapp:
username: app_user
password: hunter2

view raw

pillar.yaml

hosted with ❤ by GitHub

Now suppose you don’t want to commit that super-secure password directly to your Salt repository. Instead, you can create a PGP keypair, and give the private key to your Salt server. Then you can encrypt the password with that key. Your pillar will now look like this:


#!yaml|gpg
myapp:
username: app_user
password: |
—–BEGIN PGP MESSAGE—–
—–END PGP MESSAGE—–

view raw

pillar.yaml

hosted with ❤ by GitHub

When processing your template on the target node, Salt will seamlessly decrypt the password for you.

I love that I can give non-admins access to our Salt repo, and let them submit pull requests, without worrying about leaking passwords. To learn more about this Salt functionality, you can read the documentation for salt.renderers.gpg.

3: SecretShare

Salt’s GPG renderer and blackbox are great ways to store shared secret data, but what about transmitting secrets to particular people? In most organizations, when passwords and such need to be transmitted from employee to employee, insecure methods are used. Email, chat, and Google docs are very common media for transmitting secrets. They’re all saved indefinitely, meaning that an attacker who gains access to your account can gain access to all the secret info you’ve ever sent or received.

To make transmitting secrets as easy and secure as possible, my teammate Alex created secretshare. It lets you transmit arbitrary secret data to others in your organization, and it has immense advantages over other systems:

  • Secrets are never transmitted or stored in the clear, so a snooper can’t even read them if they manage to compromise the Amazon S3 bucket in which they’re stored.
  • Secrets are deleted from S3 after 24-48 hours, so a snooper can’t go back through the recipient’s or sender’s communication history later and retrieve them.
  • Secrets are encrypted with a one-time-use key, so a snooper can’t use the key from one secret to steal another.
  • Users don’t need Amazon AWS credentials, so a snooper can’t steal those credentials from a user.

Right now, secretshare only exists as a command-line utility, but we’re very close to having a web UI as well, which will make it even easier for non-technical people to use.

 

Security’s worst enemy is bad UX. It’s critical to make the most secure path also the easiest path. That’s what these three solutions aim to do, and they’ve made me feel much more comfortable with the security of secret data at my company. I hope they can do the same for you.

Start a Paper Club!

A few months ago, a work friend and I were commiserating about how we never make time to read any research. There’s all this fascinating, challenging stuff being written, we agreed, and we’re missing it all.

When a few more coworkers chimed in, saying they’d also like to push themselves to read more academic literature, we realized we were onto something. The next day, the Exosite Paper Club was born.

It’s been really fun to organize and participate in Paper Club these last few months. I’ve learned a lot, not just about the fields on which our reading material focused, but also about my coworkers and my company. Now I want to share some my excitement about this project and encourage you to start a Paper Club at your own company.

What Paper Club is and does

Paper Club is a group that meets every other week over lunch. We all prepare by reading a particular academic paper, and we discuss it in a free-form group session. Sometimes we teach each other about concepts from the paper, sometimes we brainstorm ways to apply ideas to our work at Exosite, and sometimes we just chat.

The paper for each session is selected by a participant from the previous session, and can be about any topic from project management to statistics to psychology to medicine. Readers of any skill level in the paper’s subject matter are warmly welcomed at meetings, but you can always skip a session if the topic doesn’t interest you. All we ask is that, when the material is hard for you, you push yourself and try to grasp it anyway.

We want a wide variety of participants, from DevOps to UI to sales & marketing, but we also want to read papers that are detailed enough to be challenging. To these ends, we have three guidelines for submitting a paper:

  1. Papers should be accessible. Try to pick papers at a technical level such that at least some of your peers will be able to understand most of the content. You want the session to be a discussion, not a lecture.
  2. Papers should be challenging. While accessibility is important, you don’t want your peers to have too easy a time. The best conversations happen when people are forced to push themselves a bit. It’s okay to suggest papers that are only accessible to those with a particular academic background, as long as you think the paper will create a good discussion with some subset of your coworkers.
  3. Papers should be deep. The best discussions tend to come from papers that dive pretty deep into a topic. Reviews and textbook chapters can be interesting, but we tend to prefer papers that go into detail on a specific topic.

What we’ve read so far

We’ve been doing Paper Club at Exosite since mid-November, and we’ve discussed 8 papers to date. I thought we’d be reading mostly computer science papers, but I couldn’t be happier with the variety we’ve gotten!

Here are a few of the papers that produced the most interesting discussions:

  • Confidence in Judgment: Persistence of the Illusion of Validity. This classic behavioral study from 1978 takes the well-established observation that people (even especially experts) are usually more confident in their judgments than they should be. The authors build a simple but powerful model for the mental processes responsible for this overconfidence, and present some suggestions for systematically curtailing it.
  • On Bullshit. If you’re like most engineers, you’re utterly allergic to bullshit. But have you ever thought about what makes bullshit bullshit? How it’s different from outright lying, and how it’s different from normal speech? This famous philosophical essay tries to answer these questions, and it provides some valuable insights for someone trying to excise bullshit from their life.
  • Why Johnny Can’t Encrypt. This 1999 analysis of PGP 5.0 usability raises some points that are crucial for anyone trying to design intuitive user experiences. The paper led us into a super productive critique of the metaphorical structure used in our own company’s documentation.

How it’s going

I have been really happy with the intellectual diversity of our Paper Club participants. Even when the paper under discussion is an especially wonky one, we get project managers, devs from all different software teams, salespeople, marketers, managers, and occasionally even company executives. Seeing all these people engage in thoughtful dialogue on such a wide variety of topics is inspiring. It takes me out of my DevOps bubble and reminds me that I work with some very interesting and smart folks.

I’d like to see how far we can stretch ourselves. Selfishly, I’d like us to read a math or linguistics paper some time, and help each other through it. I think that would be very rewarding.

Overall, I’m very glad we have a Paper Club here. I’d recommend it to anyone who likes people and/or learning. If you start one at your company, let me know how it goes!

The 10 Best Books I Read In 2015

I read 56 books in 2015, which is more than I’d read in the previous 5 years combined. Turns out books are pretty cool. Who knew?

Here are the 10 books I liked the most this year, in no particular order.

1. Metaphors We Live By (1980)

Goodreads___Metaphors_We_Live_By_by_George_Lakoff_—_Reviews__Discussion__Bookclubs__Lists.pngI got real into linguistics this year, and this book offers an interesting perspective on semantics. We usually think of metaphor as a poetic device. But this book argues that the whole human conceptual system is based on metaphor! According to George Lakoff, every concept we understand (short of concepts corresponding to our direct experience) is understood by analogy with a more concrete concept.

There are some very intriguing ideas in here. I didn’t necessarily buy (or even understand) them all, but I’m really glad I read this book.

2. The One World Schoolhouse: Education Reimagined (2012)
one_world_schoolhouse_-_Google_Search.png

This thought-provoking book on education was written by the Khan Academy guy. He presents a lot of research pointing toward the hypothesis that self-paced “mastery learning” is much more broadly effective than the contemporary American model of arbitrarily delineated, one-speed-fits all classes.

Discussing this book with my friends (who tend to be smart academic underachievers), really brought home the point that our education system underserves anyone who understands a concept more slowly, or more quickly, than the rest of the class.

3. The Orphan Master’s Son (2012)

Goodreads___The_Orphan_Master_s_Son_by_Adam_Johnson_—_Reviews__Discussion__Bookclubs__ListsI’m endlessly fascinated by North Korea, and this novel stoked my fascination. That alone would probably have been enough, but it’s also super well written. I found it beautiful and sad and gripping the whole way through.

 

 

 

4. The Immortal Life of Henrietta Lacks (2010)

Goodreads___The_Immortal_Life_of_Henrietta_Lacks_by_Rebecca_Skloot_—_Reviews__Discussion__Bookclubs__ListsThe author of this book tracked down the family of the long-deceased woman whose incredibly robust tumor cells became the most widely studied strain of human tissue in the world. Her cells have been used by scientists to make countless discoveries in genetics and immunology.

Through interviews with Henrietta Lacks’ descendants, all of whom still live in abject poverty, Rebecca Skloot raises important and nuanced questions about the interplay between science and culture and race. It’d be hard to read this book and still think of science as “pure” or “objective.” Scientists aspire to objectivity, but they’re just as boxed in by their cultural preconceptions as anyone else.

5. Red Rising (2014)

Goodreads___Red_Rising__Red_Rising_Trilogy___1__by_Pierce_Brown_—_Reviews__Discussion__Bookclubs__ListsThis is a super fun young-adult novel about badass vicious teens trying to kill each other. The premise is pretty similar to that of The Hunger Games, but I found the character development and storytelling way better. And the second book in the trilogy, Golden Son, is awesome too! The third one is coming out very soon, and I can’t wait.

Don’t expect any deep truths or transcendent prose. This book is just really fun to read.

6. The Road (2006)

Goodreads___The_Road_by_Cormac_McCarthy_—_Reviews__Discussion__Bookclubs__Lists.pngHey, speaking of books that are really fun to read, this book is not one. It’s a painfully stark novel about a father and son trying to survive in post-apocalyptic North America. I definitely wouldn’t call it a “feel-good” book.

But despite the author’s unremittingly bleak vision, the relationship between the man and the boy (those are the only names given for the characters) is very touching. A friend of mine claims to use this book as a parenting handbook of sorts. I don’t know if I’d go that far, but I do see what he means.

7. History in Three Keys: The Boxers as Event, Experience, and Myth (1997)

Goodreads___History_in_Three_Keys__The_Boxers_as_Event__Experience__and_Myth_by_Paul_A__Cohen_—_Reviews__Discussion__Bookclubs__Lists.pngMy favorite non-fiction book of the year. On one level, this is a history book about the Boxer Rebellion: a grimly compelling episode in its own right. But, moreover, this book is about history itself. Paul A. Cohen describes the Boxer Rebellion through 3 different lenses – experience, history, and myth – each of which represents part of the way we engage with the past.

I came away from this book with a newfound appreciation for the role of the historian in creating history, and a newfound skepticism about the idea that history is composed of objective facts and dates.

[I did a lightning talk (transcript in the first slide’s speaker notes) relating what I learned from this book to post-mortem analysis in DevOps.]

8. The Martian Chronicles (1950)

Goodreads___The_Martian_Chronicles_by_Ray_Bradbury_—_Reviews__Discussion__Bookclubs__ListsI tried to read The Martian Chronicles in high school, and I was like “Pfft! This isn’t science fiction. The science doesn’t make any sense.” I’ve read a lot more sci-fi now, so I decided to give this classic another shot.

What I’ve learned since high school is that the sci-fi I like most isn’t about cool technology or mind-bending thought experiments (although those can add some nice seasoning to a story). It’s about humanity: what continues to define us as human, even when the things we think of basic to our humanity – Earth, language, gender, our bodies – are stripped away?

Bradbury knew exactly what he was on about. After reading these stories with the human question in mind, I finally understand why he’s been so influential in science fiction.

9. The New Jim Crow: Mass Incarceration in the Age of Colorblindness (2009)

Goodreads___The_New_Jim_Crow__Mass_Incarceration_in_the_Age_of_Colorblindness_by_Michelle_Alexander_—_Reviews__Discussion__Bookclubs__ListsI already had strong objections to the War on Drugs on account of the senseless imprisonment of people who’ve done nothing harmful. But this book shows how drug arrest quotas, mandatory minimum sentences, and felon profiling work together as an insidious system to maintain white supremacy in America.

I’d recommend this book to any American. We all need to understand that racial oppression didn’t go away on July 2, 1964; it just adopted more clever camouflage.

10. Anathem (2008)

anathem_cover_-_Google_Search.pngThis is now my favorite Neal Stephenson book. It has his trademark mathyness and detail to a truly engaging story in a richly-imagined universe.

Like most Neal Stephenson books, it’s not for everyone. A lot of time is spent on epistemological ruminations and physics. I love that shit, so I loved this book all the way from page 1 to page 937.

 

Welp, those are some books I loved this year.

But I want to see what you’re reading, so I can find more books to love. Friend-poke me on GreatBooks!

Minding Your Pees And Queues

A couple weeks ago, my wife & I went to Basilica Block Party, a local music festival. It was a good time, and OH MANS you have to see Fitz & The Tantrums live. Their sax player is a hero unit.

Anyhoo, we walked over to the porta-potties between sets. The lines were about 8–10 people long. And my spouse suggested an intriguing strategy for minimizing her wait time. I will call this strategy The Strategy:

The Strategy: All else being equal, get in the line with the most men.

The reasoning behind The Strategy is obvious: women take longer to pee than men, so if 2 queues are the same length, then the faster-moving queue should be the one with fewer women. It’s intuitive, but due to my current obsession with queueing theory, I became intensely interested in the strategy’s implications. In particular, I started to wonder things like:

  • How much can you expect to shave off your wait time by following The Strategy?
  • How does the effectiveness of The Strategy vary with its popularity? Little’s Law tells us that the overall average wait time won’t be affected, but is The Strategy still effective if 10% of the crowd is using it? 25%? 90%?

And then I thought to myself “I could answer these questions through the magic of computation!”

qsim

Lately I’ve been seeing queueing systems everywhere I go, so I figured it’d be worthwhile to write a generic queueing system simulator to satisfy my curiosity. That’s what I did. It’s called qsim.

To quote the README,

A queueing system in qsim processes arbitrary jobs and is composed of 5 pieces:

  • The arrival process controls how often jobs enter the system.
  • The arrival behavior defines what happens when a new job arrives. When the arrival process generates a new job, the arrival behavior either sends it straight to a processor or appends it to a queue.
  • Queues are simply holding pens for jobs. A system may have many queues associated with different processors.
  • A queueing discipline defines the relationship between queues and processors. It’s responsible for choosing the next job to process and assigning that job to a processor.
  • Processors are the entities that remove jobs from the system. A processor may take differing amounts of time to process different jobs. Once a job has been processed, it leaves the queueing system.

qsim provides a framework for implementing these building blocks and putting them together, and it also provides hooks that can be used to gather data about a simulation as it runs. I’m really looking forward to using qsim to gain insight into all sorts of different systems.

For now: porta-potties.

The porta-potty simulation

You’ll recall The Strategy:

The Strategy: All else being equal, get in the line with the most men.

To determine the effectiveness of The Strategy, I implemented PortaPottySystem using qsim. Here are some of the assumptions I made:

  • People arrive very frequently, but if all the queues are too long (8 people) they leave.
  • There are 15 porta-potties, each with its own queue. Once a person enters a queue, they stay in that queue until the corresponding porta-potty is vacant.
  • Shockingly, I couldn’t find any reliable data on the empirical distribution of pee times by sex, so I chose a normal distribution with a mean of 40 seconds for men and 60 seconds for women.
  • Most people just pick a random queue to join (as long as it’s no longer than the shortest queue), but some people use The Strategy of getting into the queue with the highest man:woman ratio (again, as long as it’s no longer than the shortest queue).
  • Nobody’s going number 2 because that’s gross.
  • Everyone is either a man or a woman. I know all about gender being a spectrum, and if you want to submit a pull request that smashes the gender binary, please do.

The first question I wanted to answer was: how does using The Strategy affect your wait time?

To answer this question, I ran a simulation where the probability of a given person deciding to use The Strategy is 1%. The other 99% of people simply join one of the shortest queues without regard to its man:woman ratio. I ran 20 simulations, each for the equivalent of 2 weeks, and came up with these wait time distributions:

waittimes

More of a box-plot person? I hear ya:

boxplot

The Strategy is definitely not a huge win here. On average, your wait time will be reduced by about 10–15 seconds (4–6%) if you use The Strategy. Still, it’s not nothing, right?

Now how does the benefit of using The Strategy vary with its popularity? This is actually really interesting. I never would have guessed it. But the data shows that you should always use The Strategy, even if everybody else is using it too.

Here I’ve charted average wait times against the proportion of people using the strategy. Colors are more prominent where the data set in question is large (and therefore heavily influences the overall average):

eff_v_pop

You’ll notice that the overall average (dark green) does not vary with Strategy popularity. This is good news, because otherwise we’d be violating Little’s Law, which would probably just mean our simulation was broken.

The interesting thing here is that the benefit of using The Strategy decreases pretty much linearly as its popularity increases, but at the same time there accrues a disadvantage to not using it. If everybody else in the system is using The Strategy, and you come along and decide not to, you can still expect to wait 10 seconds longer than everybody else. Therefore using The Strategy is unequivocally better than not using The Strategy.

Unless of course you don’t really care about those 10 seconds, in which case you should do whatever you want.