My friend Wade recently posted his thoughts on how to go about building a security team. For the most part, I found his comments to be spot-on, with one major, glaring exception. At the end of his post, he starts talking about getting into planning and measurements once you have your team in place, overlooking one major area: risk management.
Now, in his defense, Wade's objective was sharing his thoughts on how to build a good security team, starting with a good security manager who actually understands things. You cannot simply put a well-connected talking head in place and expect them to be successful managing security without the necessary technical know-how to grok what is going on. That being said, choosing who to hire and when to hire them, as well as making decisions about what technologies to leverage within your security team, must be based on sound risk management principles.
When I'm talking about "risk management" here, I'm really talking from a high level, and I'm including risk assessment and measurement as part of the equation. Plain and simple, if you're charged with building a security team and managing security objectives, one of your top challenges will be prioritization of work and resources. With security, it's very easy to let oneself slip into semi-anarchic ways where you are quickly overwhelmed that all that needs to be done. In order to keep the tigers at bay you need to make use of sound decision-making practices that prioritize your workload on a few criteria.
The criteria that I've found to be most useful are:
* Estimated Risk - the overall risk of each "todo" item, including the potential impact of a failure on the enterprise
* Ease of remediation - this is the "low-hanging fruit" principle - high risk items that are easily remediated should top most lists
* Cost of remediation - one factor that is particularly vexing for smaller companies is that the easiest fix for the highest risk may cost hundreds of thousands of dollars
Perhaps the best way to grok how these criteria work together is to put them through a couple scenarios.
Scenario #1: Log ManagementOne of the biggest challenges to organizations is effectively and efficiently monitoring environments. The most common approach is to send system, application, and monitoring system logs to a central host and then use some sort of tool to make sense of use of that information. Logging and monitoring is a huge component under the PCI DSS. But what does it really mean?
From a risk management perspective, the ability to monitor your environment is very important. If you don't know what's going on, then how can you head off problems and respond to incidents? If your organization does not have any logging and monitoring in place, then you're effectively flying blind. Flying blind is a high risk to an organization that has any sort of public exposure and that handles any sensitive data. Thus, from a risk management perspective, the need for a log management solution (or a SIEM or other similar kind of tool) rates as a high risk item.
Now to the ease of remediation. From a purely technical perspective, it is relatively straightforward to get key systems and applications logging to a central host. Thanks to free tools like syslog-ng and Snare Agent for Windows you can securely export logs back to your main log host for analysis and reporting.
Even better than that, there are a plethora of solid commercial log management tools that can aid you with log management, monitoring, and reporting (e.g. Log Logic, ArcSight, netForensics, Intellitactics). On top of the commercial solutions, there are also several open-source tools that could be brought to bear on the task of log management (e.g. logcheck, splunk). However, when you get down to it, there are significant limits to what can be accomplished using free tools, meaning that in the long-run you will need a commercial solution.
And thus we have an interesting challenge. While we won't call the ease of remediation "simple," it's really not all that difficult given the right tools. But those tools are not cheap, and likely will require some professional services and training beyond the base software cost in order to get installed and operating properly. Add to this the need for servers with lots of disk space, backups, archiving, etc. (since PCI requires you maintain at least a year worth of logs), and the cost starts piling up. In fact, it's not inconceivable that a small organization handling a couple million credit card transactions a year may be looking at a couple hundred thousand dollars in cost using commercial solutions.
If you're not looking at commercial solutions, then you need to look at personnel costs. In most major cities in the US, that means you're going to need a highly skilled technical security resource who's going to run you close to a 6-figure salary, not to mention the cost of benefits, training, and beyond.
Thus, while the risk is high, and the ease of remediation is relatively low, the cost of remediation is fairly high. The question then becomes "can we afford to implement such a system? can we afford not to?" In the current economic climate, this are not, by any means, easy questions to answer.
Scenario #2: Operational Security
This scenario is more generic. In talking about "operational security" we're really talking about all the little security bits that need to be done as part of normal operations. These little bits include things like hardening servers, securing databases, generating the logs that would be used in Scenario #1, and so on. These responsibilities would also include the formalization of key processes, such as around change management and patch management.
From a risk perspective, these changes will vary in importance from low to high risk. For an externally exposed Windows IIS server, patch management and hardening are extremely important, whereas for a base Gentoo install with Apache that is only internally available and not handling sensitive data, hardening may be less of a concern. Similarly, people might bristle at the suggestion, but patch management is often far more important for Windows workstations and servers than for other types of systems.
While many of these activities may rate fairly low on the risk rating scale, their ease of remediation is also often very low. Configuring an SSH daemon to only accept SSHv2 connections is literally a single configuration change that would take at most a couple minutes. In other cases it may be more difficult to implement changes - such as formalizing change management processes - but overall they are still not high-burden changes.
Cost is also one of those areas where the actual cost of implementing a change could be very, very low. In the SSH daemon configuration example, we're talking about a cost equivalent to about 2 minutes of your administrator's time. In other cases, the cost may be more in terms of added overhead responsibilities for a group of system administrator. In the end, though, you likely would consider most of these responsibilities to be built into the positions of your support staff, meaning the cost is negligible.
So, in looking at it, the decision is easy, right? You implement the changes because they're cheap and easy, even if they're not addressing high-risk items. This example might suggest, then, that the risk rating isn't all that important overall, but you'd be wrong. While in this scenario it makes sense to just implement the cheap and easy changes as part of routine duties, failing to recognize the risk management value would be to miss the big picture. Simply put, why do we care about using security best practices as part of standard operational procedures? The answer is that, while it may look like a lot of low-risk items being addressed, the bigger picture is that of secure systems management, where we most likely rate at least a medium risk, if not a high risk. Taken individually, all of the little changes seem low risk, but when taken collectively these risks add up to something much more significant.
The point of these scenarios is to highlight the need for incorporating a risk management philosophy into all decision-making processes. Even when the topic does not seem to be about risk management - such as talking about how to build-out a security team - it is imperative that risk still be part of the conversation. Especially in these times, budgets are tight and resources are constrained. You may want to hire 10 people to tackle all your security needs, but the reality is that you're likely only going to get half that many people allocated. In that case, you will need to prioritize your work and check that against the skills you already have in-house when deciding what kind of resources to bring onboard. The same is true for tools. Sure, implementing a commercial log management solution would make life a lot easier, but are there better things you can be doing with that $50-250k? Using a risk-based approach will help you make better decisions.