The Basic Principles of Disaster Recovery Planning
Data disaster comes in many forms, from hurricanes to ransomware attacks, and security is always a major issue. But no matter the source of the problem, recovery will be faster, easier, cheaper and less traumatic with a solid disaster recovery plan in place.
So will your existing plan be effective in the event of disaster? If it is poorly planned, is not updated to cover ongoing changes, or is not regularly and rigorously tested, the answer is no.
These basic steps will get you to yes.
Use it or lose it.
The best plan ever devised is useless if it sits gathering dust on a shelf. Once you have a good plan, revisit it on a regular schedule and keep improving it so it is ready to implement when the time comes.
Calculate how much downtime you can handle.
Effective planning begins with this baseline metric. To determine how long you can afford to be down, you must identify how much it costs for key elements to be idle over a period of time. This can range from seconds to days, depending on the size and type of the business. That is why disaster recovery is different for every business.
Establish a recovery point objective (RPO) and a recovery time objective (RTO) for each application. Divide your applications into tiers, with Tier 1 being those that should be restored as rapidly as possible. Tier 2 apps are important but not as urgent, and can be put on hold for some hours. Tier 3 apps can wait for a few days if necessary. Some apps will become more or less important, thus changing tiers.
While systems are down, the cost meter keeps running, and these must all be part of your calculations:
- Ordinary business costs – From salaries to physical plant and utilities to subscriptions and dues to travel expenses, money keeps going out at the same speed, even if the income machine is frozen.
- Loss of data – Downtime corrupts and destroys data, which causes widespread and often unanticipated damage to present and future operations. Worse yet, your disaster can harm your customers’ data.
- Reputation suffers – If operations stay down for long, public perception can be at risk. If downtime persists, you will start to lose customers.
- Recovery has its price – Getting back to normal can be expensive, but a good recovery plan keeps costs down.
Inventory your systems and applications.
Include every hardware and software item, because any or all of them could be involved in a disaster. Put the inventory into a regularly updated spreadsheet or table and make it widely available at all times.
Along with your inventory records, all vendor-supplied product data (specifications, manuals, contracts, etc.), should be kept on file, and employees should become familiar with these documents. Maintain relationships with vendors so that if and when you need their help, they will know you and your system and you will know them.
Also include articles, commentaries and reviews regarding your hardware and software products.
Review the document every quarter for completeness and accuracy.
Understand your data.
Be aware of that data comes in many categories. Factors include:
- The importance of the data to business operations and service to your customers
- The ease or difficulty of replacing specific data
- The sensitivity and confidentiality of the data
During the planning process, take care to ensure the safest and most careful handling of sensitive data.
Assign everyone a role in disaster recovery.
This conveys the importance and nature of the process while promoting fast and thorough recovery. Decide who does what, who steps in if backup is needed, and who takes up the slack if key contributors are absent.
Devise a process for deciding if a disaster is taking place and make specific individuals responsible for the decision. Distribute responsibilities as if you were organizing a new company, with appropriate hierarchies and accountabilities.
Include key vendors in the plan and keep them in the loop. Set a procedure for notifying all concerned parties as soon as a disaster strikes.
Schedule recovery drills. Even a minor file recovery test can reveal weaknesses to remediate before a disaster occurs.
Make communication a priority at every level.
Every employee needs to be informed as soon as a disaster is detected, and a formal communications system – in both electronic and paper form – should be at the ready. Keep vendors and customers informed as well. With internal email and telephone services at risk, alternate communications should be employed, and everyone needs to know when and how to switch to safe platforms.
As soon as a disaster occurs, issue a corporate statement on company web sites and social media. Most of this statement can be prepared in advance, ready for specifics to be filled in. Be truthful about the problem and how it will be solved. Resist the impulse to cover up or minimize the event, and do not promise a better outcome than you are likely to deliver. Release updates of your progress, and if events undermine an announced recovery response, get the news out ahead of the grapevine.
Check your SLAs.
Make sure that service level agreements with vendors include disaster contingencies. If the companies are in any way involved in a disaster, they should be contractually obligated to begin working on problems within a specified time. In some cases, it is possible to reach a binding agreement regarding how long it would take to resolve problems.
Plan your escape.
When disaster strikes, you may need to move operations to a safe site, with the right kind of work space in addition to the necessary systems, equipment and communications. People must be able to get to the place, walk in, sit down and get to work quickly. Everyone should know what to do and how to do it. Training and drills reduce surprise and confusion.
In the event of an area-wide catastrophe such as an extreme weather event, people should be able to telework productively enough to discharge their most important duties.
Test thoroughly and often.
When disaster strikes, the best recovery plan can be bedeviled by small failures (an absent employee, an Internet glitch, an unfamiliar new app). Quarterly testing is your best safeguard. Never forget that your testing processes must support your RTO and RPO goals.
To keep their emergency response performance sharp, everyone who interacts with the system should be tested from time to time.
Remember that failing a test is not bad, since it prompts improvement and reduces risk. However, failure to test is.
Learn more about disaster planning.
Please follow our company page on LinkedIn to get the latest information and news on Data Protection and Disaster Recovery.