I get the chance to talk to a lot of DBAs in my line of work, and that sometimes gets me thinking about the things that DBAs spend time on and what they feel are the most important aspects of their job. Too often, DBAs conflate frequency with importance. In other words, just because you do something a lot does not make it the most important thing that you do!
Too little emphasis overall is placed on the integrity and recoverability of the data—and too much is placed on performance. Yes, performance is probably the most visible aspect of database systems, at least from the perspective of the end user. But the underlying assumption of the end user is always that they want to access accurate and, usually, up-to-date data. But what good does it do to quickly access the wrong data? Anybody can provide rapid access to the wrong data!
The real trick is to provide rapid access to the correct data. By this I mean you better get the integrity of the data correct before you even start to worry about performance. Anybody can give a wrong answer quickly, but most of us would rather wait a little bit if it means getting the right answer, wouldn’t we? Database design, proper data types, referential integrity and other constraints, and so on need more emphasis in DBA training and in actual database implementations.
Taking this a step further, how sure are you (you, meaning the DBA) that every database under your care is recoverable to a useful point in time should an error or failure occur? Is there a database backup job running in a timely manner for each and every database structure so that recoverability can be achieved within the agreed service level? What’s that? You don’t have service levels for time to recovery for each database, table space, and/or table? You should. These are commonly known as RTOs—or Recovery Time Objectives—and they are every bit as important as your performance-based service-level agreements (SLAs).
RTOs need to be negotiated with the business users and should be expressed in terms of the time to recovery should an error or problem occur. And no, the proper answer is not that every table should be recovered immediately. To ensure that a reasoned approach to RTOs is established be sure to include a cost metric to be incurred by the business unit to achieve the RTO. Lower cost equals longer time to recover; higher cost equals shorter time to recover.
Finally, when was the last time you tested the recoverability of your database(s) using the backups you’ve made? Or, do you just assume that they are all there and working as planned and will be available as soon as you need to recover? Failing to conduct a periodic, planned test of all your backup and recovery plans and implementation is a surefire way to lose data when you need to recover during a hectic timeframe (after all, aren’t all recovery situations hectic?).
So, DBAs, take a moment away from focusing solely on performance and spend some time on the integrity and recoverability of your databases. You’ll be glad you did.