Through the many changes in IT over the years, one constant has always been a concern for performance. With database systems there is especially true. Even with the many advances in relational database technology, SQL performance still remains a key concern for IT professionals and management. Writing SQL for performance is one of the single biggest opportunities for professionals to contribute efficient, effective, cost saving deliverables to projects. Writing SQL for performance can also avoid having to respond to an urgent problem with performance in a production environment. To a considerable extent, a person can choose whether they are running because it is lunch or whether they are running because they are lunch, by following a few simple techniques for writing SQL for performance.
The first, and the most important, technique for writing SQL for performance is to know the indexes. An index is like a bridge that allows the SQL query to jump directly to the data rows of interest in a table, or to jump from the specific intermediate result data rows into the data rows of interest for the next table to be joined in the SQL query. Without using indexes, the query will wade through all of the data rows, scanning the entire table. Depending on the joins, it could cause a table to be scanned hundreds or even thousands of times. When joining multiple tables, the performance implications of indexing are magnified many times. Before ever writing a single line of SQL, a writer must know exactly what indexes support which tables and how to use those indexes effectively.
A SQL writer should always have a clear picture, or list, of the indexes for the tables that will be used in the SQL. There are many GUI tools to look at index information. A simple SQL query can also be used to report on the indexes directly from the database manager catalog. This might even be better, as it allows a person to build a custom report for just those tables of interest. Here are a couple of examples of queries that can be customized to report on the indexes for a list of tables that are going to be used in a SQL query to be written:
For a DB2 zOS database try the follow query:
--THIS QUERY WILL REPORT THE COLUMNS IN EACH INDEX FOR A LIST OF TABLES
SELECT substr(I.TBNAME,1,18) as TBNAME,
substr(K.IXNAME,1,18) as IXNAME,
substr(K.COLNAME,1,18) as COLNAME,K.COLSEQ,
I.UNIQUERULE, INTEGER(I.FIRSTKEYCARDF) AS FIRSTKEYCARD
FROM SYSIBM.SYSKEYS K,
WHERE I.TBCREATOR = '<schema>' <--SCHEMA <<
AND I.TBNAME IN ('<table1>','<table2>') <--TABLES <<
AND I.CREATOR = K.IXCREATOR
AND I.NAME = K.IXNAME
ORDER BY I.TBNAME,I.TBCREATOR,K.IXNAME,K.COLSEQ
For a DB2 database on Linux, UNIX, Windows try the following query:
select substr(a.tabname,1,18) as table,
substr(a.indname,1,18) as indname,
substr(b.colname,1,18) as Col,b.colseq,
dec(a.firstkeycard,9,0) as fstkeycard,
dec(a.first2keycard,9,0) as fst2keycard
from syscat.indexes a,
where a.tabschema = ‘<schema>' <--SCHEMA <<
and a.tabname in (‘<table1>',‘<table2>‘) <--TABLES <<
and a.indschema = b.indschema
and a.indname = b.indname
and t.tabschema = a.tabschema
and t.tabname = a.tabname
order by table,b.indname,b.colseq
Now that the indexes are known, and the index column order is known, the second technique is to write SQL to make the most of these critical structures. To make the most of an index, the predicates need to use any preceding columns in that index. For example, consider the following indexes:
TBNAME IXNAME COLNAME
TABA IX_TABA REGION
TABA IX_TABA STORE_NUM
TABA IX2_TABA PURCHASE_TS
A predicate like the following is not using all preceding index columns:
WHERE TABA.STORE_NUM = 4789
The above predicate will likely do an index scan. This may not be that costly, unless the predicate was instead a join between tables. Then the index scan could occur many times, even thousands of times. If the process that drives this SQL can make the value available to be used in an additional predicate for the preceding index column, then the database manager will perform a direct index access with improved performance and less I/O. For example, if the process was able to add an additional predicate for a region search on 3:
WHERE TABA.REGION = 3
AND TABA.STORE_NUM = 4789
If the values for the preceding columns are absolutely not available, then there needs to be a serious review into the possibility of adding a new index.
In writing SQL predicates to make the most out of the indexes, it is very important to make certain that the predicates are written in a manner that does not cause the database manager to avoid using the indexes. A scalar function, or an expression on an indexable column, is a point for special consideration. Take the following SQL predicate:
WHERE TABA.PURCHASE_TS + 1 Second Between CURRENT TIMESTAMP - 13 months And CURRENT TIMESTAMP
Adding 1 second to the column turns it into an expression and causes the database manager to avoid using the index. The above filtering predicate will cause the database manager to perform a full table scan, taking every single row and performing a calculation on the columns data and then comparing that result against the range. If we think algebraically, we can treat both sides of the equation the same, subtracting 1 second from both sides, and move that 1 second to the other side of the equation. Resulting in:
WHERE TABA.PURCHASE_TS Between CURRENT TIMESTAMP - 13 months - 1 Second And CURRENT TIMESTAMP - 1 Second
The result is a filtering predicate that will calculate the range one time and compares that value in an index search.
A common challenge for SQL writers sometimes occurs when joining tables between different schemas. These schemas could have been designed for applications that were written at different times, or one or both could have been purchased from different vendors. The challenge happens when a column is defined with a different data type in each schema. Consider what happens when REGION is defined as INTEGER in the table for one schema and STORE_ZONE is defined as CHAR(10) for another table in a different schema. A SQL statement might have been written as:
WHERE OTAB.SALES_PLAN_DESC = ‘JANUARY PROMOTION'
AND CHAR(TABA.REGION) = OTAB.STORE_ZONE
AND CHAR(TABA.STORE_NUM) = OTAB.STORE
The columns have to be converted with a scalar function to make them the same data type to be able to perform the compare. However, choosing on which table's columns to perform the conversion can have dramatic results on performance. In the preceding example, the filtering predicate is on the SALES_PLAN_DESC column of the OTAB table. For those resulting rows, the database manager will join to TABA based on the two join predicates. In that example, the two columns for each row in TABA will have to be converted before they can be compared to the two values from an intermediate result set of rows from OTAB that qualified on its filtering predicate. The database manager will not be able to use the index on TABA, because of the CHAR scalar functions being applied to those columns. A significant performance improvement can be realized by switching the columns that are being converted by a scalar function. In that situation, the database manager will start retrieving rows from table OTAB with the filtering predicate. Since these rows are already retrieved, a scalar function can be applied to columns from those rows without causing index avoidance. The SQL statement would be written with:
WHERE OTAB.SALES_PLAN_DESC = ‘JANUARY PROMOTION'
AND TABA.REGION = INT(OTAB.STORE_ZONE)
AND TABA.STORE_NUM = INT(OTAB.STORE)
Using that query rewrite, the database manager will use an index to retrieve rows from OTAB, which qualify for the filtering predicate on OTAB.SALES_PLAN_DESC = ‘JANUARY PROMOTION'. For those rows that have already been returned from OTAB, the database manager will convert the two columns to integer. Now, the database manager will be able to use the index on TABA to retrieve rows that satisfy the two join predicates. The performance savings can be significant.
In some situations, the SQL writer may find that an index is not available to support the objects of the query requirements. A discussion with the database administrator is good next step to review and evaluate a number of aspects and to determine if an existing index could be modified, or a new index created, to support the query. In some rare situations, a change may be required to the underling table to support a new index to support the query. For example, consider a table with a name column which must store the data in mixed case for display purposes. If a business requirement develops to be able to write a query to search against this column, it is typical to want to fold the data to upper case for comparison.
WHERE UPPER(RETAIL_STORE_NAME) = ‘MCBETH'
However, using that UPPER scalar function would prevent using and index on RETAIL_STORE_NAME, if the index existed or was created. The alternative is to add another column to the table, with the data in that column folded to upper case, and index that column. Then the query can be written to search against this new, indexed column:
WHERE RETAIL_STORE_NAME_UC = ‘MCBETH'
Always know the indexes that are available for the tables that the query is going to be written to run against. Knowing the available indexes may change the approach on how a SQL query is written to navigate the tables. In more rare situations, it may lead to a new index being defined to support the navigation. The key is to know the indexes, like the express routes on a road map. The performance implications are typically in orders of magnitude. The more tables being joined, or the more data that exists in the tables, the larger the performance implications. Following these simple tips can often save 70% to 80% on query run time.