(Note: this is a somewhat incomplete thought.)
There's been a lot of talk of late about security metrics, but I'm increasingly inclined to think that we're shooting at the wrong elusive target. Why do we keep chasing after measuring relatively immeasurable things? Instead, I think we should be starting with the things that we can measure. After all, security is a feeling, not a tangible outcome, right?
Instead of measuring something so squishy, let's instead look at the operational metrics that we can absolutely measure. For example:
- Uptime
- Availability
- Performance (e.g., TPS, MIPS)
- Time-to-Fix
- MTBF (for a broad definition of "failure")
- SNR (representative here - i.e., how much "background noise" do we get from scans vs. detected legit attacks)
- Visibility (into code, into environment, etc.)
These strike me as useful metrics to track, at least operationally, with an ability to then roll them up into tactical, and even strategic, reports. Thinking about this all in terms of survivability, then, we want to be able to answer these questions:
- Have operations been negatively impacted?
- Were we able to continue operations despite degraded conditions?
- What measurable impact occurred during the impact period?
- How quickly can we resolve issues once detected?
These, I think, are very useful metrics to monitor. One could rightly argue that they're primarily IT operations metrics, but they go directly toward key infosec objectives, too. In terms of survivability, they help us gain a better picture about resiliency, such as benchmarking how recoverability, as well as to a degree defensibility.
Defensibility, of course, is where we start potentially getting back into squishiness. We have a similar problem with performing a FAIR risk analysis, too, when we look at the "Vulnerability" factor, since there's no simple, reliable, consistent way to measure it (i.e., this is one of the more subjective values in the overall scheme of things).
Putting this thought into a properly framed risk management context, based around survivability as the main driver, I think that metrics developed along these lines are more useful today, while also being reasonably accurate and precise. It's time to put aside fuzzy "security" metrics in favor of something that tells the business just how reliable its systems and applications are.
Interesting thoughts. For me it relates to the formal logic space and "Safety Properties" and "Liveness Properties" of computer programs. When I read up on those things a few years back many puzzle pieces fell into place in my head. Just a taste ...
A safety property specifies that “nothing bad happens during execution.†Informally, it states that once a bad action has taken place (thereby excluding the execution from the property) there is no extension of that execution that can remedy the situation.
For example, access-control policies are safety properties since once the restricted resource has been accessed the policy is broken. There is no way to “un-access†the resource and ï¬x the situation afterward.
A liveness property, in contrast to a safety property, is a property in which nothing exceptionally bad can happen in any ï¬nite amount of time. Any ï¬nite sequence of actions can always be extended so that it lies within the property.
Availability is a liveness property. If the program has acquired a resource, we can always extend its execution so that it releases the resource in the next step.
Check for instance out "More Enforceable Security Policies" http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3048
It has most good references back to the original work too.
Agree. However I wouldn't mix metrics (measurement vs. target value) with risk assessment (loss predictions). I know you get that, curious what inspired this post?