The Mysteriously Hidden Order of the Database

Mar 15, 2009

By Todd Schraml

In formulating the tenets of relational theory, issues anent to order were explicitly addressed. These relational theory tenets included defining a relational database so that it need have no concern with the order of columns in a row, or with the order of rows in a table. And yet, such a stance seems counter-intuitive since the database brings structure and organization to content. Chaos is the primordial soup from which all things originated. Thus it seems only reasonable that a relational database, being the best and brightest of its kind, should abhor such chaos and bring ever more order instead, right? This expectation implies that relational theory should remove all chaos and force order into everything. Yet the relational theory precept held that the order of rows and columns is not important. The simple reasoning for this relational non-order position is that mathematical sets do not impose order. The order of attributes in and of themselves is not logically relevant. Since this aspect of mathematics provides the basis of relational theory, abandoning the non-ordering position means abandoning mathematics. And in abandoning mathematics, chaos would emerge. Or at least the support of a coherent and consistent suite of rules would be lost. So a little bit of chaos helps tame a greater chaos.

However, the majority of developers are aware that in building tables, while using most of the Database Management System [DBMS] products today, it often proves true that order is important. Some prominent examples include the following: placing variable-length columns at the end of a row; placing columns contributing to a composite key or index together and ordered by most used, or cardinality; placing primary or clustering columns first in a row; perhaps even making sure that columns get used in a specific order when acquired within a query retrieving data. These mindfully ordered things play a role in optimizing query response performance. Such DBMS-required quirks create increased value for those folks whose skill sets include the arcane and mysterious knowledge of how to invoke the "right" kinds of order at the "right" time and disperse glorious speed from the DBMS optimizer. DBMS vendors often go to great lengths describing these rough edges and skirting of relational theory as advantageous features that clearly demonstrate the innovations of their product over others.

ike the biblical poor, performance issues remain with us always. The problem is not exactly that performance issues exist, nor is it that the processes optimizing a query are sped up or slowed down based on a given order. Rather, the problem with order inside most DBMSs involves the level of transparency the DBMS maintains, or actually fails to maintain, for these performance issues. The physical designer certainly has every right to control and move things around in order to optimize data usage response. But an end-user, someone out there writing a query, should not have to worry about a thing. The user should write a query that runs in an effective manner chosen by the optimizer within the DBMS. The submitted query, or any variation of a query intended to return an identical result, should be optimized and consistently return data at the same "speed." Users, regardless of description as a novice query writer, or an experienced SQL-minded application developer, should not need to concern themselves with the ordering of query text geared solely to obtain favorable DBMS performance. In fact, the ideal DBMS product of the future might very well be a product in which all optimization and ordering for performance is handled within the optimizer itself, dynamically rearranging physical storage based on the trends derived from executed queries.

Newsletters

The Mysteriously Hidden Order of the Database

White Papers

Sponsors