The recovery of data and critical applications—from granular recovery of an email through full site restoration—is still a risky proposition for most organizations. That’s because all too often data recovery or failover and failback don’t perform as expected—especially during an outage when it is needed most. There are a number of notable reasons for this:
- Recovery processes typically involve several manual steps that are very time consuming and prone to human error.
- Data center configurations “drift” over time, both in the physical and virtual layers
- Recovery often simply fails to execute properly, forcing IT staff into a “fire-fighting,” troubleshooting mode during an outage—a worst case scenario.
- Previous recovery testing is too limited to identify true failover requirements.
The costs of recovery from failures can be staggering in terms of business service downtime, in lost revenues and damaged reputations. Research from DCIG reported that businesses lose an average of about $5,000 per minute in an outage. When it comes to service recovery, speed matters.
For DR preparedness to significantly improve, companies should consider these 5 dimensions of disaster recovery:
Steps to Take to Avoid Costly Outages
There are 5 critical dimensions to the disaster recovery (DR) problem that must be considered and addressed.
Dimension 1: Build a DR plan that accounts for everything you need to recover
There is a wide continuum of interrelated items that IT professionals must recover highlighted in the diagram below. Companies need to protect their data, files, folders, emails, etc. They also need to recover applications, business services or even their entire site in the event of an outage or disaster.After all, what good are your applications, if you don’t have your data? What good is your data if you don’t have access to your applications or business services? The most accurate way to build your DR plan is through testing. Conducting a failover test identifies quickly an application’s dependencies on data, network, and other applications and results in a useable, proven DR plan.
Dimension 2: Defining recovery time objectives (RTO)
RTOs are all about defining how quickly you need to recover. This describes the amount of downtime that is tolerable in the event of an outage or disaster. RTOs are not “one-objective-fits-all” and therefore must be defined granularly based upon items such as business criticality. Even though every department in an organization will say their application is critical, most organizations can compile of list of applications and rank their importance, usually by attaching the revenue contribution of that application. For example, for an online business, the ecommerce application is critical and would most likely have the lowest RTO requirement. Because of the nature of ecommerce, the annual revenue of the company divided by 24 hours a day, 365 days a year indicates the potential revenue loss attached to an outage of this application. That resulting number of revenue loss will help define the RTO for that online application as the lowest possible.
Dimension 3: Defining Recovery Point objectives (RPO)
RPOs describe the amount of data you’re willing to risk in an outage or disaster. For example, with a tape backup taken once per day, the RPO is up to 24 hours. Like RTOs, RPOs must be defined granularly. Usually, the same critical applications identified as needing the lowest RTO are the same critical applications that have the lowest RPO. The ecommerce application can afford to lose the least amount of data possible, indicating that its backups to capture data must be as frequent as possible.
Dimension 4: Testing to validate that you actually will recover within RTO/RPO
In a recent study published by www.drbenchmark.org, 50% of the respondents to their cool on-line survey test their DR plans only once or twice a year. Believe it or not, 13% never test their DR plans. When companies do test their DR plans, 70% do not pass their own tests. And, very few who do test, have the ability to validate they can recover within SLAs (RTO and RPOs). How comfortable would your CEO feel if you couldn’t validate at any time that you can actually recover within RTOs & RPOs in the event of an outage or disaster? It is also important to note that the documentation associated with these tests is often what auditors require to prove regulatory compliance.
Dimension 5: How much you’re willing to spend on DR
The range in expense for disaster recovery solutions varies tremendously depending on the solution deployed. But most business cases tally the potential revenue loss resulting from an outage of their critical applications o justify the DR budget. Current industry average for DR solutions is 2 to 8% of an IT budget. In addition to calculating the business costs of RTO and RPO, the cost of equipment and required services are all items that must be factored.
About the author:
Steve Kahan serves as chief marketing officer for PHD Virtual, a pioneer in virtual backup and innovator of disaster recovery assurance solutions. He has more than 25 years of experience building high energy, high commitment organizations that produce breakthrough revenue growth. Prior to PHD Virtual, Kahan was Senior Vice President of Global Marketing for Quest Software.
PHD Virtual provides a robust, yet affordable solution called Recovery Management Suite that provides data, application and site-wide recovery and enables IT professionals to verify recoverability to granular RTO & RPOs. Recovery Management Suite automates labor-intensive manual recovery processes and constantly verifies that recovery occurs within acceptable SLAs.