Replication Strategies for Disaster Recovery: Storage-centric or Database-integrated?

Bookmark and Share

Setting up a replication configuration is a fairly standard way to enable disaster recovery (DR) for business-critical databases. In such a configuration, changes from a production or primary system are propagated to a standby or secondary system. One of the important technology decisions that organizations make upfront is the choice of the replication architecture.

Two popular ways of implementing this architecture are:

  • Using technology provided by the underlying storage array (referred to as storage mirroring).
  • Using technology integrated with the database.

The latter technologies are often referred to as host mirroring because they are done using server or host processes. For the purpose of this article, we will look into the pros and cons of using storage mirroring versus database-integrated replication technologies for DR.

The main advantage of using a storage mirroring technology is that it supports all application data resident in the corresponding storage systems. If a large company has its business-critical data scattered over several heterogeneous databases from various vendors, as well as in file systems, they can employ a common replication solution at the storage level. Any changes for any application data will be mirrored to the secondary storage array using a single technology.

While this may give the apparent sense of simplicity, the devil is in the details. Let's outline some of the technical issues.

Data Corruption

A significant drawback of a storage mirroring technology is that it may not be able to protect from data corruptions. If they strike, data corruptions are difficult to debug and the impact is often disastrous. If the underlying replication technology is storage mirroring, there is a strong likelihood that the corrupted bits from the primary storage system will be propagated to the secondary storage system rendering the DR system useless. In contrast, since database-integrated replication technologies understand the database block structure, they can perform various physical and logical consistency checks of the block to ensure that the secondary database is isolated effectively from various production-side corruptions.

Bandwidth Consumption

Another consideration is network bandwidth usage. Storage mirroring technologies propagate all changes generated at the production system in a write-ordered manner, to ensure the secondary database comes up in a consistent manner at failover time. In an OLTP environment, a transaction can cause changes in the redo logs, archived logs, data files, control file, specialized on-disk backup logs, etc. Therefore, the additional bandwidth consumption for storage mirroring technologies could be enormous. In contrast, database-integrated technologies do not send every production-side change to the target system. They send the minimal set of changes necessary to keep the secondary database synchronized with the production and reconstitute the data at the secondary system using those changes, which minimizes network bandwidth use and keeps the network utilization costs low.

Recovery Time Objective

Another factor to consider is how the choice of the technology impacts the recovery time objective (RTO). A failover operation at the storage level requires splitting the mirror, mounting the database on the secondary storage volumes, starting up the database, performing crash recovery, and then opening up the database for read/write access. Additionally, because the secondary data is not validated real-time, the database start operation may fail because of corrupted or inconsistent data. In contrast, database-integrated technologies offer intelligent middleware interfaces such that both the database-level and application-level failover can be completed automatically and within seconds. Overall, this approach offers organizations in essence a combined high availability (HA) and a DR solution.

This particular factor leads to the next point. Depending upon the choice of the database-integrated replication mechanism, the secondary database may be available for real-time application access while changes are being applied to it from the production database. This delivers three significant benefits:

  • Increased Return on Investment-the DR system employed is being used productively to support active-active deployment, or specialized uses such as real-time reporting, Quality Assurance, development, and backups;
  • Increased Performance-end users can improve production system performance by offloading reporting and/or backup tasks to the standby system resulting in increased CPU horsepower available for production applications, leading to improvement of application throughput; and,
  • Less Risk-the production-ready status of the secondary database can be continuously validated since the secondary database can be open at all times (e.g., for real-time reports, queries) without compromising data protection. This is not possible with storage mirroring.

Business Implications

A final consideration for organizations considering storage mirroring versus host mirroring architectures to their DR systems revolves around cost. Storage-mirroring technology is often cost-prohibitive because it can require an identical storage subsystem at the secondary site or, in some cases, because it may itself be an extra-cost option. Deploying an identical subsystem also may not be possible because of technology evolution or for reasons related to mergers and acquisitions.

On the other hand, database-integrated replication technologies are storage subsystem agnostic so organizations may choose a more cost-effective storage array for the DR system. Database-integrated technology also is more likely to be available at no extra cost depending on the specific database edition. The license terms of the standby database-whether the latter is used in a storage-mirroring configuration or a database-integrated replication configuration-will vary depending upon the database vendor.

In summary, while replication is a common way to ensure geographic disaster recovery, there are various options available for the replication architecture. Organizations should make the final choice by considering the various technical and business issues, and weighing the relative importance of these issues with respect to their short-term and long-term requirements.