Bad Database Standards Can Cause Performance Problems

Just about every company with a DBMS has that binder full of corporate and/or IT standards. You know what I’m talking about  that one over there in the corner with the cobwebs on it  the one that you only use when you need an excuse to avoid work. Okay, well, maybe it’s not quite that bad. Your standards documents could be on the company intranet or some other online mechanism (but chances are there will be virtual cobwebs on your online standards manuals, too).

How do standards become standards? Well, it is like some wise sage said several years ago: “The nice thing about standards is that there are so many to choose from!”

Basically, what happens is that some well-meaning authority comes up with a “rule” or “guideline” that makes sense at some point – and then decides to enshrine it for eternity in the standards manual. Now don’t get me wrong, company standards can be a very good thing. It is the eternity part that I take exception with. Standards need to be a living, breathing “thing” that changes with the times.

Standards can be worthwhile as a measuring stick to work from, hopefully ensuring that reliable and efficient databases and applications are built in a standard manner. But a rule that made sense 5, 10, or even more years ago probably is no longer reasonable. Every standard at your site should be reviewed at least annually to determine whether it is still reasonable to enforce.

An example of a bad standard would be to place an artificial limit on indexing. There should be no arbitrary limit on the number of indexes that you can create for any database table. Indexes are undoubtedly one of the most important factors in creating efficient queries. Relational optimizers rely on indexes to build fast access paths to data. Without indexes data must be scanned – and that can be a long, inefficient means by which to retrieve your data. When a rule such as this exists, it usually is stated in the standards manual using verbage something like this:

  1. “Each table can have at most five indexes created for it”  or 
  2. “Do not create more than three indexes for any single table in the database.”

These are bad standards. If you already have three indexes, or five indexes, or even 32 indexes, and another index will improve performance why would you arbitrarily want to avoid creating that index?

Anyway, a good indexing standard, if you choose to have one, should read something like this: “Create indexes as necessary to support your database queries. Limitations on creating new indexes should only be entertained when they begin significantly to impede the efficiency of data modification.” Now that is a good standard!

But most standards do not read that way because they are not easy to impose without arbitrary numbers and restrictions embedded within them.

Let’s quickly take a look at another bad database standard. One of the simplest mistakes made by many SQL programmers is including too many columns in the SELECT-list of their SQL statements. The only columns that should be included in the SELECT-list are those that are absolutely needed to meet the business requirements of the query.

Sometimes this notion of SELECT-ing only what you need is simplified as “avoid SELECT *”; that is actually not a bad standard (excepting, of course, quick and dirty ad hoc queries). Using SELECT * in your application programs is not a good idea because a column can be added to the table causing your program to stop working.

But using “avoid SELECT *” as a standard does not go far enough because it does not capture the true requirement, which is this: reference only exactly what is needed for the business requirement and nothing more. With this in mind, a bad standard may read something like this: Every column referenced in a WHERE clause of your SQL statement(s) should also be included in the SELECT-list of that statement. Now consider the following statement:

WHERE EMPNO = '700′;

At first glance you might ask “What is so wrong with that statement?” Well, there is no reason for EMPNO to be in the SELECT-list. We know its value will always be 700 because of the WHERE clause.

But isn’t that a small issue? Maybe. What if this statement runs hundreds, or even thousands of times a day? Every column that the DBMS must pick up and return to the application requires additional resources  a small amount of additional resources, to be sure, but additional resources none-the-less. If we remove the column from the SELECT-list we remove the requirement to use those resources. Now multiply that by the thousands of times the statement runs and we’re saving something!

The bottom line is that a standard forcing the column into the SELECT-list has no viable reason to exist — and at worst, it can cause a performance problem.

So review your standards and remove the ones that don’t make any sense. And plan on doing it every year around this time  sort of like an "end of spring" cleaning for your standards. You will be glad you did.