Seven ways to be sure you can restore from backup


A disaster recovery plan is only effective if you can restore data. But despite the growing risks – especially from ransomware – not all organisations are sure they actually can recover from their backups.

When it comes to backup and recovery, regular and rigorous testing should be a core part of any plan. But there are other steps IT managers can take, such as auditing backup processes, following the 3-2-1 backup rule, and checking the integrity of backup files.

Backup testing needs to go hand in hand with a thorough understanding of which systems and data are the most critical, and how systems depend on each other in the production environment.

Here, we summarise some of the key questions IT leaders and business continuity teams should be asking. 

What are the key elements of reliable restores from backup?

Organisations need to know their backups work – that they can they recover data and restore systems with the minimum of disruption and without loss or corruption of data.

This breaks down into several, interlocking elements. Each organisation’s backup and recovery plan will set out the recovery time objective (RTO) – meaning, how quickly data must be recovered; and the recovery point objective (RPO) – meaning, how far back you are willing to go to find the last good copy of data.

These parameters set out what a successful recovery looks like for the business. With ransomware, though, there is another key metric. And that’s whether the business can achieve a clean restore of data.

There’s no point recovering systems after a cyber attack if doing so re-infects systems with ransomware code. And it could be that the RPO for, say, a power outage, is different for the RPO for ransomware. It all comes down to the business’s tolerance of risk.

Reliable restoration also depends on the integrity of recovered data. Do recovered files work as they should, or did some data fail to restore or become corrupted?

Firms also need to consider the order in which they restore data. Some systems are mission critical, or need to be restored first due to dependencies from other applications. A recovery test should check whether systems come back online in the right order.

This, in turn, depends on access to media. Cloud backups require bandwidth while local copies need backup systems to be up and running. Off-site backup media must be retrieved and brought on site, or uploaded to a standby system or the cloud.

Firms should also check that failover or standby systems come online as planned. This includes cloud capacity and disaster recovery facilities, if these need to be invoked. Last, can the organisation access the supporting services it needs to recover data?

These include power and cooling, communications and key staff. Simply checking that the backup software worked as intended is not enough. 

How do you audit backup processes?

A backup audit – or a backup and recovery audit – is a formalised process to check backup and recovery works as it should.

Backup audits should include checks on where data is held and which applications it supports, what data protection exists, and the location of backup targets. This includes data held in the cloud, and backed up to the cloud.

Then, the audit will look at data recovery, including compliance to RPO and RTO targets, and examine the organisation’s backup and recovery policy and procedures. This includes technical criteria as well as who will manage the recovery process.

The result will be a report, with recommendations for action. 

What is the 3-2-1 backup rule?

The 3-2-1 backup rule is a long-standing method to ensure data is adequately protected. The rule states that organisations should keep three copies of their data, on at least two types of media or storage systems. One copy of the data should be off site.

Keeping to the 3-2-1 rule is much easier now the industry offers a plethora of cloud-based backup services. But there is still a case for physical, off-site backup in many industries, not least as a safeguard against ransomware.

And, all parts of the 3-2-1 rule need to be checked for effective restoration and data integrity. 

How do you test the integrity of a backup?

Backup is useless if it fails to restore correctly. This might sound self-evident, but testing the integrity of backups is an essential part of any backup and recovery or business continuity plan.

Files can be corrupted or infected, physical media such as tape can degrade over time, become inaccessible or even be destroyed during a disaster. Cloud services might become unavailable or degraded, and affect the ability to recover time-sensitive data in the right order.

Backup software uses tools that include checksum validation and hashing to check logical recovery. Vendors have also introduced machine learning and AI to check for unusual patterns in data – sometimes called entropy – to spot ransomware and other forms of corruption.

The only surefire way to test a backup’s integrity is to try to restore from it. This does raise practical issues, especially when restoring data to mission-critical production systems which are in constant use. IT teams might need to test recovery one system at a time, or to virtual machines.

Some vendors have developed alternatives. Commvault, for example, has a “cleanroom” recovery product that will allow customers to restore data to a virtual replica of their environment in the cloud.

But it remains critical to test recovery to physical hardware too, especially for older systems that cannot easily be replicated on cloud technologies. 

Why is it important to test backup and restore procedures?

Testing procedures is as important as testing technology, but it’s easy to overlook.

Much of backup testing is rightly focused on technical aspects, such as whether backup software runs as intended and that backup files can be retrieved and restored.

But often when restoration recovery fails it is for non-technical reasons. In a conventional disaster recovery situation and a ransomware attack, staff are under pressure, lines of communication are disrupted and it is hard to maintain command and control.

Backup and recovery procedures should set out what needs to happen, when, and who is responsible for it.

A clear plan and solid procedures will help enormously when the worst does happen. But that means procedures need to be tested in as realistic a way as possible.

That way, any weaknesses can be identified and addressed before procedures need to be used in anger. Can backups be found and recovery systems activated? Do systems restore in the right order? Does the disaster recovery environment – physical or cloud – work as intended? And does everyone know their role?

Disaster recovery is one of those cases where it really is about tools, processes and people. All elements need to be stress tested. 

What are the objectives of backup testing?

The main objective of backup testing is to ensure files can be restored from backup copies to production systems.

Testing needs to ensure that production systems function as they should after recovery. If a business plans to failover to a standby setup in its own datacentre, with a disaster recovery vendor or in the cloud, it needs to check that failover works. And, crucially, that they can recover to the production system when the time comes.

However, there is often rather more to backup testing than the purely technical question of whether the backup works. As we’ve seen, firms need to test their overall procedures to ensure plans are carried out in the correct sequence, that communications and command and control work as intended, and that everyone knows their roles.

Drill deeper, however, and comprehensive backup testing can reveal a lot more about an organisation’s readiness and resilience.

Are, for example, RPOs and RTOs being met? And if they are, are they appropriate for the organisation? Businesses change, and an RTO that was acceptable say, five years ago, might not be now.

Firms also need to take into account any regulatory requirements around business continuity and downtime. 

How often should you test restores from backups?

The simple answer is, “as often as you can”. Large-scale backup and recovery testing is disruptive and potentially expensive, and might only be carried out annually. Other tests can be more frequent. These could include spot checks on critical applications, or to build in testing as part of application updates, for example.

It might well be that some systems are tested daily, but this will depend on the system’s criticality, the importance of its data and, of course, the business’s view of risk.



Source link