HelpnetSecurity

The endpoint recovery gap many teams discover during an incident


In this interview with Help Net Security, IGEL CTO Matthias Haas explains why backups alone do not equal recovery. He makes the case that endpoint recovery is often overlooked, leaving organizations exposed when thousands of devices go down at once.

Haas walks through what a well-planned recovery looks like, where the bottlenecks appear, and why restoring trusted user access matters more than counting blocked threats. He also shares how security leaders can convince a CFO to fund recovery capability before an incident proves it was worth the spend.

What is the most expensive recovery assumption you have watched an enterprise make and regret?

The most expensive assumption is that backups alone equal recovery.

Backups are essential. They help restore data, applications, and systems. Without them, recovery may not even start. But they do not automatically restore user access, clinical workflows, branch operations, call centers, or the endpoint environments people depend on to work.

Both questions matter: “Can we get our data and applications back?” and “Can users regain trusted access to them quickly and safely?”

Many organizations discover during an incident that they planned for infrastructure recovery, but not endpoint recovery. That is where the real cost appears: reimaging devices, replacing hardware, validating trust, coordinating users, and deciding who comes back online first.

The better recovery architecture accounts for both sides: restore the systems, and restore trusted access to those systems.

You argue recovery architecture is one of the most underused levers a CISO has. Most security budgets still pour into prevention and detection. What did you see that convinced you the leverage sits on the recovery side instead?

Prevention and detection are essential, but they do not determine how long the business stays down after disruption hits.

Recovery is where cybersecurity becomes operational. It is where endpoint state, identity, policy, applications, and business continuity meet. If every endpoint must be rebuilt before people can work, the organization has designed recovery around friction.

The leverage sits in restoring trusted access before the full rebuild is complete. That changes the conversation from time to detect or time to remediate to time to safely resume critical work.

By reducing endpoint risk through an immutable operating system and trusted endpoint foundation, you can give recovery architecture a stronger starting point.

Walk me through what a well-architected endpoint recovery looks like when ten thousand laptops are bricked at once. Where does the bottleneck usually show up, and is it technical or human?

The first mistake is treating it as 10,000 repair tickets.

At that scale, recovery has to treat the endpoint fleet as a managed recovery surface. A well-architected model gives users a trusted alternative path into critical SaaS, VDI, DaaS, browser-based, or cloud desktop resources without depending on the failed local environment.

This model works when organizations can boot devices into a trusted alternate operating system through planned recovery options such as dual boot or USB boot, preserving access while the primary environment is investigated and remediated.

The bottleneck is usually both technical and human. Technical, because many organizations have no trusted recovery path staged in advance. Human, because service desk, security, infrastructure, legal, communications, and business teams all enter triage at once.

At scale, recovery fails when it depends on heroics. The goal is to replace heroics with architecture.

If you could replace one number on the CISO dashboard, which security metric is overrated and which recovery metric deserves a seat next to it?

I would not remove threat-blocking metrics, but I would stop treating “number of blocked threats” as a proxy for resilience.

It is useful, but incomplete. It tells you controls are active. It does not tell you whether the business can keep operating when controls fail, systems are disrupted, or endpoints can no longer be trusted.

The recovery metric I would add is time to trusted user access.

Not just time to restore a server. Not just time to close an incident. How long does it take for defined user groups to return to a trusted workspace with the right applications and controls?

That metric answers the business continuity question executives care about: when can critical work safely resume?

Downtime cost is easy to wave at and hard to pin down. How do you get a CFO to fund recovery capability before the incident that proves it was worth it?

Stop selling fear and start modeling dependency.

A CFO needs to understand which workflows depend on endpoint access, how many users are affected, what the per-hour interruption cost looks like, which teams must return first, and how long manual recovery would realistically take.

That turns recovery from an insurance conversation into an operating resilience conversation.

The business case is not, “An incident might happen.” It is, “these workflows cannot tolerate endpoint unavailability, and today recovery depends on manual work at the worst possible moment.”

Recovery capability should be framed in that context: reducing time to trusted access, reducing dependency on replacement hardware, preserving forensic options, and keeping critical users productive while remediation continues. That is not fear-based spending. It is measurable resilience.

Demo: Prophet Agentic AI SOC Platform transforms alert triage and investigation



Source link