Organizations trying to cope with securing their expanding attack surfaces eventually find themselves at a crossroads: they need to move beyond finding risks to effectively mitigating risk.
Making that transition starts with a shift from using “risks found” as the KPI to “risks remediated” as the true measure of success. That change shifts security team incentives and drives them to focus on risk remediation. For that to work at scale, organizations must get away from a “firefighting” mode when it comes to risk reduction – meaning they must stop chasing the latest critical issue – and become more proactive.
Here are seven steps you can take to transition your existing vulnerability and risk management processes and workflows from firefighting to proactively managing risk reduction at scale.
Step #1: Collect – Create one backlog to rule them all
Taking a findings-based approach with your security testing tools means that your typical remediation process begins with logging into the dashboard of each tool. This, of course, entails learning the different functionality of each tool and understanding each tool’s findings language.
To transition to a fixing-based approach, start by creating a single backlog. The first step is collecting all the findings from all the testing tools into one centralized location, whether that’s a spreadsheet, database, or some other system.
Step #2: Consolidate – Normalize, deduplicate, and enrich with context
Now that you have a single backlog, take it a step forward and normalize all findings so that they use a uniform terminology, enabling you to then execute your remediation processes uniformly. After all, if you want to measure outcomes, you need to execute the same process across all findings. This normalized backlog of findings is now a pillar for all subsequent activities.
You will see that your now normalized list has duplicate findings. Remove the redundant ones to cut down the length of your backlog.
The normalized list also enables you to identify different findings that affect the same resource. At this point you should enrich findings with ownership context which you will need down the road. For instance, collecting metadata from a configuration management database (CMDB) to later analyze who owns a vulnerable machine.
Step #3: Choose – Decide what, who, how, where to perform remediation actions
With all findings normalized, you can now choose how to remediate through a multi-dimensional prioritization method, which includes:
a. WHAT: Choose whether you want to prioritize a finding according to external context (e.g., a known exploit in the wild) or according to internal context (e.g., which domain – cloud, code, etc. – it is in).
b. WHO: Choose whom to send the remediation item. To identify the right team, analyze the resources metadata that you collected in step #2.
c. HOW: Prioritize the outcomes, not the problems, by aggregating around remediation actions. This means that if you have the same solution for different resources or for different problems, you generate just a single remediation item.
d. WHERE: Choose where and under which project to open the ticket for the remediation team (e.g., in Jira, ServiceNow or any other ticketing system the fixer uses).
Step #4: Route – Get the remediation item into the hands of the remediation team
Now that you know who will do the fix and the list of remediation actions to send them, you can start routing them.
At this stage, you’ll realize that you’re able to remediate in parallel, and not in a sequential manner as is typically done today.
As a simple example scenario, imagine that you have two remediation teams, Engineering and DevOps, and that you have 150 critical findings. Next, let’s assume that the first 100 findings are all to be fixed by Engineering and the remaining 50 by DevOps. Working off the list in a sequential manner would mean that the Engineering team was overloaded with fixes, while the DevOps team was underutilized. However, once you work the list based on remediation actions, where you know the team that will do the remediation work, you can remediate parts of the backlog in parallel.
Step #5: Receive – Be solution-oriented, not security-dependent
This is the step that allows you to truly scale: automate backlog management by creating programmatic workflows. The key to this is in synchronizing with other organizational processes, and making security data available to the remediation teams when they need it, not when a finding is discovered.
How? As a start, there should be a workflow between the remediation items and the ticketing system each different remediation team uses. That way when an issue is found, the ticket is automatically opened and directed to the right team, as defined in Step #3. You can even take it a step further and create a unified template for each ticketing system.
Your automated workflow should be bi-directional so that when a ticket is closed in a ticketing system, you can verify that with the results of the next testing scan. If you find any discrepancy, highlight it by reopening the ticket in the remediation team’s workflow tool with the relevant details.
Step #6: Remediate – Where the hard work gets done
This is the actual fix, mitigation or risk acceptance that is done to remediate the security issue. This is a critical part of the remediation process, but as a security team, it is out of your direct control.
Step #7: Report – Measure actual performance, efficiency and risk reduction
Having an automated routing process that sends the remediation actions to the right remediation team allows you to see the entire backlog at once and its status, not just whether it was remediated. That enables you to track and measure your risk reduction process.
With that data, you can measure performance and you can also compare remediation performance between different teams or groups across the organization. For instance, you could analyze and compare your different applications in terms of critical findings, total findings, and how teams handle their tickets.
You can also now provide stakeholders with reports on the organization’s remediation program that enable everyone to understand that program’s cadence and performance, and statistics, such as the ratio of new to resolved findings, average time for remediation, and the overall backlog status.
This kind of tracking lets you identify any issues in the remediation process itself and provides the security team with data they can use to collaborate more closely with the respective remediation teams to enhance their processes and address any areas that require improvement.
It’s precisely this approach of moving from output to outcome that should lead the way in removing security from being a bottleneck in the remediation process and enabling the process to scale.