You would be hard-pressed not to hear an IT department talk about virtualization. When virtualization was first born, IT departments went gangbusters using this revolutionary change to get better performance out of their servers. In all the excitement of implementation, something not so very small was overlooked — backup and recovery. The lack of proper planning forced jobs and recovery to fail, and backup admins started feeling backed into a corner. They felt stuck with this new great virtualization technology that failed to perform proper backups.
Thankfully, times have changed. IT departments, now very aware of these issues, have gotten savvy at avoiding the potential pains of virtualization infrastructure. But a new challenge has emerged. They know what to avoid, but how to avoid it comes with a smorgasbord of choices. Many vendors claim to help, but fall short in hitting the mark. The following is an outline of the architectural choices and some pros and cons of each one, so you can make better virtualization backup decisions.
Standard Backup Applications to Deduplicated Disk
This solution is architected to use existing legacy backup applications, put a disk in as the target and replicate the backups to a remote site or export them to tape for external storage. This solution is the most-crude and brute-force answer, but it is the most convenient with minimal change required for IT administrators and particularly backup administrators.
The immediate impact is improved speed and reliability due to replacing tape with disk. It’s important to avoid using just regular disk, as it would only be half of the solution because the retention times are insufficient without some form of deduplication. Timely replication to a second site cannot be achieved without using dedupulication as well. This change means only the incremental block changes of the data are sent over the wire, after the first backup and synchronization is done. If replication is not possible or not desired, exporting copies to tape and sending them to a remote storage site is the other option.
The appeal aspect of this solution is the ease of deployment. Without much change to the current backup application, a deduplicated disk can be put in place of tape or old disk targets quickly. It can improve backup windows and reduce application performance hits in a virtualized environment. Unfortunately, it does not solve them completely, leaving more to be desired. Proponents of these solutions often undersize to try to keep the costs down, so buyer beware. Bottom line here is that these solutions are on par with the costs of other virtualization backup solutions, but with less functionality.
Snapshots of Data on Primary Disk
Instead of moving copies off the primary disk in the first pass, this solution uses the array functionality to take a snapshot of the data. Snapshots are quite powerful, have very low impact on production servers at the time of backup and allow for quick restores and easy management. How long the snapshots can be retained is something that should be of concern here, as it really can impact how useful this solution can be in your environment. Array performance for production applications is also impacted, as using copy-on-write snapshots are inefficient and only allow for short retention times.
Snapshots should not be considered a total replacement of traditional backup, but instead an enhancement. Once a primary disk snapshot is taken, the data should be copied off to another independent disk array. Without this additional copy, snapshot backup solutions do not meet necessary requirements to protect data. Admins may be forced to run two backup schemes to adequately protect the data (if the solution does not already have this functionality built in). This is a vital factor in evaluating this type of solution, so don’t overlook it.
The largest benefit of snapshot-based backup schemes is the low impact and ease of use with the correct array and backup application. Implementation here can impact the costs of various solutions, making it competitive in some instances, and not competitive in others. If designed efficiently, it should be competitive, but offer more functionality and less impact on production applications in a virtualized environment.
LAN-Free VMware Integration Backups
Agentless backups is an inaccurate and misleading, though often-used, term when referring to LAN-free backups solutions. These solutions integrate with VMware (or other hypervisors) to mount and backup snapshot copies of VMs directly, avoiding the hypervisor server that runs the production VMs from having to process the backup. Application integration at the VM-level, and the ability to make any secondary copy either through replication or exportation, should be a critical consideration here. The application backup may be corrupted and unusable if there is no backup agent installed correctly on the VM with the application. This is required because an application needs to be quiesced and its cache flushed to get a confirmed backup. Therefore, agentless backup is not the correct name for this type of backup. Technically, critical examination is called for when looking at any virtualization backup solution that tries to claim complete agentless backup, including for major applications.
Unfortunately, it is difficult to use this solution alone, as most environments have some physical servers that require backup. It is an adequate solution for an enterprise that is 100-percent virtualized, but those are rare. If the backup application can combine this type of functionality with one of the other solutions outlined here, there is a place for this type of backup in the total solution.
Block-Based Backup to External Disk
This is the most exciting solution to me — it’s as close to complete as there is. The backup environment is created by using a block-based backup solution that establishes a baseline copy of the data and subsequent block updates. No duplicate blocks of the data ever leave the clients. This can be in the form of source-side deduplication schemes or array-based functionality. The array solutions should not be confused with simple array snapshots. An array-based functionality example would be NetApp’s SnapVault functionality. Other vendors have their own versions of this solution.
The source-side deduplication solutions can usually span heterogeneous array types and may be cheaper than array-based versions, but do not offer the same backup or restore times. The benefits of this type of solution are obvious: faster backups, faster restores, longer retention times, easier administration and a better total cost of ownership.
What’s the best backup for you? The answer is likely a combination of these solutions to ideally fit your specific environment. Data centers are all built differently, so start with a thorough evaluation to assess your needs. Realistically, there are only a handful backup applications that can span heterogeneous primary storage environments and still offer the functionality of both LAN-free VM backups and array block-based backups. Start by looking for solutions that offer unique restore options such as instant virtualization of the servers and volumes for significant recovery-point and recovery-time advantages. These technologies extend beyond the backup window and address the issues outlined here. While the array of choices can seem daunting, one thing is certain — in 15 years of experience with data management, the available options and complexity of issues has never been more interesting and abundant than today.
About the author:
Jean-Paul Bergeaux is chief technology officer at SwishData Corporation.