Code for Performance from the Start

DBAs spend a great deal of time monitoring and managing the performance of database systems and applications. But it doesn’t have to be such a large percentage of their time. Yes, changing data patterns, requirements, and time will always conspire together to create performance problems. But more can be done to avoid the up-front poor performance of most database applications.

Application developers need more guidance and education on how to build performance into their applications by design. The earlier you tackle the performance issue, the less of an overarching problem it will be, and you can’t get much earlier than during development.

The goal should be to enable developers to write code that does not require remedial after-the-fact modifications by performance analysts. The first step to becoming a successful database programmer who codes with performance in mind is to develop a relational mindset. But what does this mean? Well, first we need to understand what a relational database system is and how that differs from other types of data storage and management.

Every operation performed on a relational database operates on a table (or set of tables) and results in another table. This is known as relational closure. All SQL data manipulation operations—that is, Select, Insert, Update, and Delete statements—are performed at a set level. One retrieval statement can return multiple rows; one modification statement can modify multiple rows. Application developers accustomed to processing data a record-at-a-time will make very poor SQL database programmers without some training in the set-at-a-time nature of accessing data in a relational database.

If you have programmed using flat files before you must unlearn the “flat file” mentality. Forget sequentially accessing data record by record. Access what you need using the features of SQL. Master file processing is not appropriate for optimal SQL applications. With master file processing two or more files are read with one driving reads to the other. The SQL approach simply joins the tables together in a single SQL statement.

Instead of explicitly programming how to get the data, the relational approach uses a built-in optimizer that determines the best method for accessing and modifying data based on your SQL statements and information about your system and data. The same SQL statement can be optimized by the DBMS to do the same work in many different ways. This is a key benefit of using SQL instead of writing host language code.

An important guideline for coding relationally is to let the DBMS do as much work as possible by coding as many of your requirements into the SQL as possible. The more work that can be done in the DBMS, the more efficient your program will tend to be. And when things change, the relational optimizer can change access paths without the programmer needing to modify their code.

Another aspect of coding relationally is to understand cursors. Because a SQL Select statement can return multiple rows, a cursor is used to enable application programs to access individual rows. The Select statement can be assigned to a cursor, which is opened by the program, and then rows are fetched from the cursor one by one. When you open a cursor in your program to process a SQL statement it is not the same as opening a file. Opening a cursor can cause a lot of activity to occur (e.g. sorting) whereas opening a file is a pretty benign operation.

There are, of course, numerous other things that are required to code your applications for performance. Knowledge of indexes and data clustering is extremely important. Indexes can significantly speed up access to data based on the Where clauses in your SQL. Instead of scanning the entire table for the data you need, an index can be used to read only the required rows. But not every column or group of columns can be indexed, so programmers will need to work with DBAs to ensure that the proper indexes are created for the business requirements of their application.

Other considerations include coding for concurrency, minimizing passes through the data, testing multiple variations of SQL, using functions properly, and more. DBAs should work with their developers to help them understand these things, because it will minimize the needed DBA effort when the application becomes operational.

The bottom line is that programmers need more education on how to use SQL to achieve upfront performance in their applications. The more you know, the better you will code … and the fewer built-in performance problems there will be. And that is a goal worth striving for…