Database - Abstracted

Oct 15, 2011

By Kevin Kline

I started a column series a couple of months ago about emerging but significantly disruptive technologies, with a post entitled, "2012 Might Really Be the End of the World as We Know It." I called out four disruptive technologies that will significantly change, if not outright overturn, the day-to-day work of database professionals. Those technologies are virtualization, cloud computing, solid state drives (SSD), and advanced multi-core CPUs.

To summarize my last two columns, SSDs enable databases to overcome the IO performance bottleneck we've faced, while new multi-core CPUs make precise and highly tuned SQL code less important. In other words, they enable IT departments to throw more hardware at a problem rather than turning to us, the highly trained database experts. Job security? Not so much.

I'm going to continue an analysis of these disruptive technologies in inverse order. Today, let's discuss virtualization and databases.

We've actually had virtualized databases and operating systems for eons. Both IBM and VAX systems had long histories of virtualized memory and operating systems. Much later, Intel-based systems running Windows gained virtualization capabilities. But, for many folks new to IT, especially those who entered the career after the turn of the last century, virtualization seemed like an entirely new concept.

We've also had a limited form of virtualized databases since the early days of relational databases, called multi-instancing. Under multi-instancing, you have multiple running database servers (such as Oracle or SQL Server) running on one host server. While the OS is not entirely redundant, the databases are completely separate copies and can process their own separate workloads.

Adoption of Virtualization

Today, hypervisor-based virtualization takes the multi-instancing concept all the way down to the OS level. When virtualizing using a popular hypervisor such as VMware vSphere or Microsoft Hyper-V, a host system makes entirely separate copies of the OS and database (or other applications). Each can have its own separate hardware resources and configuration settings, such as one running under a Unicode collation, another under binary, and a third under the standard Latin alphabet collation.

Virtualization was slow on the adoption curve just a few years ago. Even where it was adopted, databases were the last application to be considered for virtualization. But now I see a very large number of my customers adopting virtualization for database applications, probably about 40 percent of my clients. Recent releases of the various hypervisor products have low overhead, and, in all but the most extreme cases, the benefits and cost savings of virtualization compensate for the small CPU overhead penalty.

Disruption Caused by Virtualization

So if virtualization is such a good thing, why is it disruptive to a DBA? There are some immediate reasons, as well as some that are a few months down the road in the future.

New Planning Processes Needed

The first disruption to our day-to-day work is the necessity for virtualization to have a more rigorous planning process. Heck, some of my customers don't even have a planning process for server deployment. Their process is simply, "Got a new application? Get a new server!"

One of the key benefits of virtualization is that it provides cost savings through maximizing your hardware usage. But, that means you have to know a lot about both the servers' potential and the demands of the applications running on those servers. You'll need both a good inventory of servers and their hardware, and a good monitoring system to tell how close to the edge the applications are pushing the servers. When new applications come along, you'll need a decently accurate estimate of the server's workload so you can assign it to a hypervisor with adequate capacity. Monitoring after deployment is equally important for making sure your estimates were good.

Troubleshooting Gets More Complex

Each added layer of abstraction means more things to decipher when performing troubleshooting. When your monitoring system emails you a CPU alarm (you do have an alarm system in place, right?), is it the hypervisor, the guest virtual machine (VM), or the database that's consuming the CPU? When an IO alarm goes off, do the VMs have fixed, pass-through disk drives or is that virtual and dynamic too? Did you or your VM administrator over-allocate VMs onto the hypervisor, perhaps 10 VMs onto a system with only eight CPUs on the bare metal? What happens when all VMs start making CPU requests? All of these scenarios, and many more, are possible with VM-based SQL Server, making your life more complex.

Ad Hoc Resource Allocation

Something that CIOs love about virtualization, but of which I'm not a fan, is ad hoc resource allocation. CIOs love the idea that if an application running on a VM needs more CPU, the VM administrator can simply assign more CPU to the VM. If the VM needs more IO, that can be added too. Same thing for memory. Why don't I like this? In a word - sloppiness.

One fact I always teach in my performance tuning classes is that bad SQL code can always overmatch your hardware, no matter how much hardware you have. Sure, you can always throw more hardware resources at a problem. But you'll always exhaust those resources if the underlying application code is sloppy. And believe me, the world is full of bad SQL code. The last thing I need, as a DBA, is one more reason for developers to get lazy or sloppy.

New Methods of Backup and Recovery

I'm constantly and unhappily surprised by the number of DBAs who don't know how to put a decent SQL Server backup and recovery system in place. Don't get me wrong - most DBAs put good backup systems in place, but, only a minority of them currently institutionalize recovery testing as well. In my opinion, this is essential.

But that's a bit of an aside because virtualization introduces new opportunities and methods of doing SQL Server backups, as well as new means for high availability (HA). VMware's vMotion, for example, enables you to seamlessly roll a VM on one physical host to another physical host. That means that DBAs can now do OS and bare metal patching and upgrades with virtually no downtime. Great news for DBAs with tough SLAs to support! You can also easily duplicate VMs. That means you can rebuild an entire physical environment within a set of VMs or, by extension, one set of production VMs into another set of disposable VMs, say, for QA or development.

In Summary

All of this means that virtualization will require DBAs and other database professionals to get up to speed on virtualization technology features, at a minimum, or perhaps actually learn how to use and administer virtual machines. Don't we all have ample time for that?!?