Next-Gen Data Management: Why Is Tuning Hard?

There are many reasons we all get tuning wrong at times. Here is a list of some I can think of (but is by all means not exhaustive):

  • We tune based on object definitions and how they relate to each other, not based on actual workload or how those objects are accessed.
  • Once we tune, we set and forget—assuming all is good until someone complains again.
  • We tune a query without regard for broader impact (such as hinting an index or creating an index)—things that can have negative impacts down the road.
  • We tend to look at queries and indexes when the issue may be with the data model, optimizer, or statistics.
  • We tune in development or test where datasets and workload don’t represent production.
  • We lack the skills to read a plan correctly to know what to tune.
  • “Good” can be ambiguous—compared to what? Do we have a baseline? What is acceptable?

Tuning Headwinds

There are a lot of folks out there who make a living (many of them quite a good one) doing database performance tuning. Why? Tuning requires a high degree of knowledge and performance skill, is time-consuming, and means knowing the right diagnostics to collect when performance hits occur. Because of these and many other reasons, database performance experts thrive—and let’s not forget about job security. No one that I have ever talked to has the privilege of working in a static environment. Things that are constantly changing include the following:

  • New application releases or updates
  • Database upgrades or patches
  • OS upgrades or patches
  • Migrating/motion to new physical nodes (if virtualized) with different workload, available resources, and resource consumption patterns
  • Seasonal patterns occur (end of month, end of quarter, holidays, etc.)
  • Data grows, statistics change (hopefully)
  • User needs change (i.e., a transactional system is now needed for my real-time reports)
  • Business needs change (i.e., BI and reporting)

Another headwind to tuning is this question: Why bother? As mentioned, as soon as we tune, things start changing immediately with the potential to invalidate any good work we’ve done. Also, the availability of resources is much improved. Hardware costs have come down significantly per unit while scalability has gone way up. In addition, the time to provision in a virtualized environment can now be measured in minutes or hours instead of weeks or months. So, regarding tuning, some take the stance of “Let’s just feed the beast.” And yes, that is a strategy—one that can mask a lot of inefficiencies in SQLs, indexes, and data models. I would like to go on record that I feel this is not a good strategy. Especially if any workload is running in or moving to the cloud where we trade green dollars for more resources.

Introducing Table Tuning

I’d like to propose a new way to look at optimizing. I’m going to call it “table tuning.” Here’s the reason: You don’t index a query, you index a table. However, you index a table so that queries can access data more efficiently. When we do index the table, there are two significant considerations beyond the query improvement that we should be looking at:

  • ?What is the cost of maintaining the index? Meaning with data churn (inserts, updates, and deletes), there will be a cost (overhead) for maintaining the index as data changes in the table.
  • ?What other queries might be impacted? This could be either negatively or positively. An index could help more than one query to access required data more efficiently.

The benefit to this approach to tuning is that we can look for any inefficient data access against a table regardless of the reason (of which there can be many), meaning that we should look for low ROI queries—queries that are asking the database engine to do a lot of work when it really doesn’t care about most of the data read. This will ensure that we find impactful, inefficient workloads. Some causes include:

  • Missing indexes
  • Data type conversions (either implicit or explicit)
  • Functions in the WHERE clause
  • Spools
  • Lookups
  • Stale statistics

Inefficiencies Can Be Caused by Many Things

In summary, tuning is difficult because inefficiencies can be caused by many things. At times, we need to get into the detail of what needs to be tuned by parsing the plan. It is practically impossible to look at everything—you need to have some system or method that highlights the most impactful, inefficient workload; aggregate that workload at the table level; and make better decisions with that higher viewpoint. And, because things are always changing, we need to continually watch for new things popping up. Easy, right?