Metrics that Matter

Fewer is surprisingly better

Metrics are critical and are how we assess many aspects of our business, applications, and infrastructure. When implemented properly, they allow us to measure and report on key indicators, and we can then use that information for making decisions, or to adjust direction for improved performance, stability and overall quality.

You might think that collecting and analyzing metrics is the hard part of work; yet oddly, in many respects, the most challenging aspect of working with metrics is identifying which ones you should collect in the first place. Engineering resources are constrained regardless of your organization and you want them to focus only on those things that deliver value to the organization. You want to spend your teams valuable time collecting metrics that actually matter and avoid "vanity metrics," by which I mean those that don't accurately represent your environment, or provide extraneous information that might seem useful but does not actually improve outcomes.

At first, this is surprisingly hard to do - especially when a team or organization are collecting metrics for the first time. A first instinct is to capture everything and sort it out on a dashboard. There are multiple challenges with this approach. First, you will be tempted to show as many of the captured metrics on the dashboard as possible (after all, you went through the effort of capturing them) - leading to a cluttered, confusing mess. Worse, it leads to a noise problem. Metrics are supposed to represent key characteristics of your systems and not every possible thing. If every possible metric is important, and believe me on this, eventually nothing will be. Too much noise (too many metrics) will drown out the things that are really important. This is the same problem that occurs in classic engineering scenarios: an alarm or alert goes off so often that it is eventually automatically dismissed as not being meaningful. This is not idle speculation, this really happens. Second, there is the problem (even if you magically avoid the first) that regardless of what is actually getting displayed on the dashboard, you are still collecting everything. In the real world, the consequences of this iinclude: increased network traffic, stolen CPU cycles from business processes, and an increased memory footprint. Nothing is ever free and it is always a trade-off with something else.

Vanity Metrics - what are they and why they should be avoided

Vanity metrics are things that usually that tell us that we’re succeeding, even if they probably mean nothing. As an example from the marketing world, think of measuring and reporting on the total number of downloads from your site; it might look impressive, but it rarely tells you what your adoption actually is. Especially when you consider that merely downloading something doesn't indicate that it has ever been used more than once. Yet, this is the kind of metric that you always see touted on mobile applications - even GitHub does this with the projects they have available. In no way is this an indication of how successful an application is nor how widely it has actually been adopted. But breaking free of these sorts of worthless metrics is hard because it is breaking a psychological reward, not just adopting some new stats. If you actually want to drive your organization towards success, you need to be able to take an honest look at what is going on.

While it is psychologically rewarding to choose metrics that make you look good and please your stakeholders, they tend to provide a false sense of security that all is good in the world (or at least your piece of it). Even worse, these sorts of vanity metrics can lead you into focusing on the wrong things. Just as premature optimization is a bad idea when developing a software solution, making decisions about where you go and what you will work on next based on only "happy" information can result in a lot of work that provides little to no value.

Metrics should be used to drive action and should highlight aspects of your system that are critical to its health and continued performance. It is vital that you identify the right metrics to track.

What makes a Metric matter?

Before I get too deep into what things indicate a "good" metric, I want to share a quick analogy that I often think of when speaking to teams or organizations about metrics. Coming from an engineering background, I am often tasked with helping in goal setting - both for myself, my teams, and sometimes even for the organization as a whole. One technique that I have found particularly useful when doing this is to use S.M.A.R.T Goals. The whole purpose of this approach is to ensure that the goals you have set are achievable and meaningful. If you are not familiar with S.M.A.R.T goals, I definitely recommend that you spend some time learning about them.

Just as S.M.A.R.T goals help clarify and focus what we should be doing at a goal level, we should be able to identify and adopt a simple framework for doing the same thing with metrics. Metrics that matter will, of course, vary widely from effort to effort and even to some extent from industry to industry. But, in my experience, good metrics all seem to share some common characteristics - the Four A's.

ACCOUNTABLE

Before you measure anything, know who the owner is. If the ownership of a metric is ambiguous or shared between multiple teams, you may encounter problems tying the success or failure of the metric to a particular effort or team; even worse, the success of one team may be eclipsed by the sub-optimal performance of another. Establish a clear owner for each metric. Without a clear owner, who will address any issues raised?.

ACTIONABLE

Any metrics which you track must be actionable and should be tied to specific performance or health targets. If there isn't an action you can take in response to a given metric, the value of that particular metric becomes questionable and you should consider carefully if you want to proceed with it.

ACCESSIBLE

You can have the greatest minds in your organization curate a report with every possible permutation of raw and derived metrics, but if the info isn't in an easily digestible format, it's not going to be addressed by those in the trenches. Metrics and the reports containing them should be easy to understand for anyone who might read them. A corollary to this is that any captured metrics should be displayed for those individuals or teams that are responsible for acting on them.

AUDITABLE

Finally, ensure that you can validate the accuracy of metrics. Using metrics as a foundation from which to drive progress is only relevant if the improvement can be measured, and if the particular measurements being instrumented are accurate. Making organizational decisions on bad or inaccurate data is often worse than making one on no data at all.

When implemented correctly, and with the right collection of metrics, effective monitoring can ensure that your systems are running efficiently and effectively. Additionally, if things take a turn for the worse, developers and production support personnel should be able to use the data available from your monitoring system to quickly isolate the cause and formulate a way to resolve it. Effective monitoring enables teams and engineers to identify potential challenges before they become issues.

Considerations for effective metrics

While these aren't hard and fast rules, I've always found myself recommending the following whenever helping teams identify and select metrics that are useful.

Choose Rates rather than Totals

Having come from an engineering background, I can tell you that rates of change are far more important than an absolute total value. For example, in a nuclear power plant, I, of course, care about the current power level. However, I am much more interested in how fast it is changing, either up or down. If the power meter is changing quickly...you can better believe that I will be paying attention to that. In short, avoid the temptation to monitor total counts or run total metrics. Total counts are a big part of the vanity metric problem. What you want to monitor instead is a specific rate over time. Calculating the number of transactions being performed over a distinct period will be significantly more useful to developers and stakeholders alike. You can even measure the rate of change over time to highlight whether your system is improving or deteriorating. Rates also give you a much better idea of the adoption of a system, and by watching it, your teams can anticipate where and when bottlenecks may occur in the future.

Don't be afraid

Make your metrics visible - post them on a page and let anyone in your organization see them. The right collection of metrics should and will highlight both successes and failures. Ensure your metrics are displayed prominently to the owners of the metrics and other key stakeholders within the organization. Emphasize that a downturn in the metrics represents an opportunity for improvement, whereas good metrics should spur on the process of continuous improvement forward. After all, the purpose of metrics is to help drive change for the good of the organization.

Start small

It is much better to start with a small number of metrics and then add to them over time rather than to start with a large collection and trying to winnow it down. Capturing and displaying metrics on dashboards to teams is probably a new experience for many developers, we don't want to inundate them out of the gate. Let teams get used to see a few metrics, and let them add more where and when they see value. Avoid the noise problem.

Captured Metrics will change - more than you would expect

A general rule of thumb that I have found to be almost universally true, is that metrics that matter will change over time. This happens because the needs of stakeholders and teams change as they and your applications mature. It also happens because you will, inevitably, choose metrics that seem like they would be valuable; but, turn out to not be. This happens to EVERY team and don't be discouraged if (when) it happens to yours. Simply use this information to change your approach in the future.

Additional Resources

There are countless articles from both vendors and thought leaders on what are some key metrics that you might want to start with. I recommend looking at many of them; but, I caution you to always consider their suggestions in the context of the four A's above and your business needs. A little common sense and business knowledge can go a long way in selecting your own metrics. Having said that, here is a list of some resources to get you started.

. . .

Metrics aren't magic; but it takes a little forethought on choosing the right things to measure and monitor.

Back to Monitoring →