How to Create a Disaster Recovery Plan
Losing sensitive data is any organization’s worst nightmare. However, the quickest way to recover after any type of disaster – from cyberattack to hurricane to building fire – is to have a business continuity plan (BCP) in place outlining the company’s approach to crises.
And in our opinion, the most important aspect of the BCP is the IT disaster recovery plan. In this article we’ll highlight the key features of the IT disaster recovery plan and how to go about creating one.
What is an IT disaster recovery plan?
The IT disaster recovery plan (IT DRP) is a documented process used to outline the recovery path of IT infrastructure in case of disaster.
An IT disaster recovery plan is the lynchpin of organizational business continuity strategy that maintains (at least) a minimum level of service while restoring the usual operations. When businesses fail to put an IT DRP in place, they risk losing reputation, funding, and customers if/when disaster strikes.
Why is IT disaster recovery so important?
The overarching benefit of an IT DRP is that it dictates detailed, accurate, simple, and up-to-date information about your organization’s IT operations. This document should have a coherent format and be easily consumable for employees, allowing them to be ready to take actionable steps when necessary
A solid IT disaster recovery plan can help your organization:
- Minimize interruption to normal operations and establish alternative operations to utilize if necessary
- Limit the extent of a cyberattack or natural disaster
- Train staff in emergency procedures
- Cut costs regarding relief efforts
These types of preventative measures will reduce the risk of a man-made disaster and will hopefully improve your customer service by reducing the risk of downtime and securing customer care (and retention!) after a disaster.
How to create a disaster recovery plan
Building an IT DRP requires research, a strong understanding of your organizational risks, and coordination with stakeholders. The plan should be tested and continuously updated by relevant team members.
Consider the following suggestions when creating and testing your disaster recovery plan:
- Include the processes for contacting support and escalating issues. This information will help to avoid prolonged downtime. Evaluate the business impact of application failures.
- Choose a cross-region recovery architecture for mission-critical applications.
- Identify one specific owner of the disaster recovery plan, including automation and testing.
- Document the process, particularly any manual steps. Automate the process as much as possible.
- Establish a backup strategy for all reference and transactional data, and test backup restoration regularly.
- Train operations staff to execute the plan.
- Perform regular disaster simulations to validate and improve the plan.
5 steps to create a successful IT disaster recovery plan
1. Identify critical operations
Begin by identifying the business operations that are critical to the functioning of your organization. Begin by outlining the following:
- Comprehensive list of services and products you provide
- Any known vulnerabilities that could impact your organization
- The extent to which you must operate from your company’s headquarters
Determine what data is crucial to keep your business operational in any situation. Consider including the following in your data back up plan:
- Business-critical data and assets
- Alternative meeting channels
- Crisis and post-disaster communications
- Proactive security measures
Next, determine the priority level of services and products with the following classifications:
- Absolutely mission-critical: The major revenue generators requiring the least amount of downtime possible, measured in minutes or hours.
- Semi-important services and products: Minor revenue generators with larger acceptable downtimes.
- Low-tier services and products: Little to no revenue-generating impact. These might have a downtime of several hours to days with little or no impact on the mission-critical services and products.
Each tier should have its own SLA (Service-level agreement) detailing potential downtime losses and explaining how the risks will affect business operations and growth. Emphasis should be placed on two key elements:
- Recovery Time Objective (RTO): The maximum acceptable time that your services and products can be offline.
- Recovery Point Objective (RPO): The maximum targeted period in which data might be lost from an IT service due to a major incident.
This effort may require various meetings with leaders and executives who can help identify what risks would impede operations in their department. To ensure accountability, we recommend establishing someone on your team to be responsible for the planning process, which includes defining the essential elements of your business, the sensitive data assets, and a financial plan to maintain disaster recovery.
2. Evaluate disaster scenarios
A one-size-fits-all IT DRP doesn’t necessarily work for all scenarios. That’s why it’s critical to evaluate a variety of scenarios, review how they impact your business and how to react to each, and formulate several DR plans.
Here are a few examples of the types of disaster scenarios your IT DRP could/should cover:
- Natural disasters (fire, hurricane, Datacenter destruction, cyberattack)
- DDoS attack
- Hardware or software failures
We recommend working closely with department leaders to identify possible scenarios and formulate procedures for each. As a result, you’ll gain a big picture overview of your recovery objectives, timelines, and processes.
3. Create a communications plan
In the case of disaster, it’s critical to keep staff, suppliers, business partners, stakeholders, and customers informed of your responses and actions via a thoughtful and efficient communications plan.
As a first step, we recommend defining clearly articulated communications roles. If you’re a small team, you’ll likely appoint just one person (often the business owner, though it’s wise to also identify a back up) to be in charge of all disaster/recovery communications. Within a larger organization, there may be a larger comms team assembled with a variety of disaster-related roles.
When developing a communications plan, consider using a few possible disaster examples, such as:
- Example #1: Building fire. In the case of fire/fire damage, you’ve assigned the maintenance supervisor with the responsibility of notifying the CEO. CEO then triggers a cascade of communications to be disseminated to staff.
- Example #2: Natural disaster (e.g. hurricane). In this case, daily operations will likely need to be moved to another location. Assign a POC to ensure customers are communicated with and know how to get in touch regarding questions/concerns.
- Example #3: Data breach. Your communication plan should include both the required regulatory communications (example of GDPR) and appropriate PR communications to assure stakeholders and customers of the actions you are taking to protect them.
Finally, create a task list using a who/what/when format along with the audiences that should be contacted. Messaging about the situation should be honest and clear, outline consequences, and highlight the action steps you’re taking in response. Ensure your plan can be implemented without delay by preparing templates for press releases, website notifications, emails, and social media.
4. Develop a data backup and recovery plan
For an IT DRP, these three elements should be addressed:
- Emergency response procedures: Outline of the appropriate emergency responses to a fire, natural disaster, or any other activities in order to protect lives and limit damages.
- Backup operations procedures: Steps to ensure that essential data processing operational tasks can be conducted after the disruption.
- Recovery actions procedures: Steps to facilitate the rapid restoration of a data processing system following a disaster.
Following the identification of a disaster recovery incident, having a documented set of procedures will help carry out the disaster recovery strategy. The DRP should be in accordance with the already established RTO and RPO standards. Both automated and manual processes should be neatly documented for maximum efficiency of the DRP.
It’s critical that at the end of the disaster recovery procedure, all recovered data should be in operational state.
The extent of the appropriate IT DRP for your enterprise will depend on your BIA (Business Impact Analysis). It might be one of the following:
- Pilot light: A small implementation in another region that can be easily spun up to take full production traffic.
- Cold site: Backup region has sufficient resources when one region goes offline. All backups from the primary region are replicated to the cold site.
- Warm site: Backup region has infrastructure setup and configured in offline state. Frequent replications going back and forth in a larger implementation.
- Hot site: Both regions may serve the same amount of traffic, but each region has sufficient resources so if one region goes offline, the other region can take all the traffic seamlessly.
As organizations review the options for a given application as well as the impact of cost and budget, it is common for the IT DRP to be updated and changed over time.
5. Plan, test, repeat
Once you’ve developed an IT DRP, we highly recommend testing it out. Simulate a breach or a natural disaster to ensure your plan has no gaps and your team is ready to address any issue. The secret to a strong IT DRP lies in regular reviews and updates, especially when hiring new people, connecting with new suppliers, or expanding to new locations. Ensure that essential data and contact details are always up-to-date.