Referencing Dimensions in Data Models

Bookmark and Share

Many businesses seem to believe that dimension tables and reference tables are indistinguishable objects. Apparently, the only difference worthy of note seems to be altering the object’s name from “something reference,” or “something xref,” to “something dimension.”  As these organizations build multidimensional data marts, they often place a view on top of their reference table and feel good about how quickly and efficiently they complete their data mart. Repeating this swap-in approach throughout the development process, results in data marts with data models which no longer resembleva star schema, or even a snowflaked star schema. Instead, these data models have one or more fact tables buried within a forest of normalized reference tables, interconnecting in all sorts of fashions, resulting in a star-spaghetti schema.

Are such muddled approaches doomed to failure? Not necessarily, as code can always be written to maneuver around the worst possible data representations. Any data model can be reduced to three or four very high-level abstractions; and given an implementation having infinite storage and cycles, this minimalistic model can be made to work. The secret is in having infinite complexity buried within the code that is accessing the data. However, being able to “make it work” is not a certification of a good implementation. Facts wedged within 3NF reference web-work can function. Shortcomings arise in problems with scaling, or problems with adhering to the assumptions that reporting and analysis tools have embedded within them.  Many OLAP tools assume that the data they access is a valid star schema that is following expected multidimensional ideas, not more-normal-than-not shortcuts. These disparities between thought and expression can easily lead to performance issues showing up down the road. A small change oftencan result in much larger than expected areas needing to be reworked.  Another issue is that users can be confused by having too many tables with too many relationships to consider in querying. The simplicity of a star schema is the power of the star schema.

Just the term “dimension” should help invoke for the database designer ideas that go well beyond an unassuming and solitary reference table. The intention of a dimension is to cover a business area, I.e.,everything of importance about a customer, a product, an employee, and so forth. A dimension is also expected to contain one or more hierarchies of these business area descriptors whenever such interrelationships are found.  If a data mart is required to retain changes in data content, isolated references impressed into service as a dimension, also may result in problems. Having a Type 2 Slowly Changing Dimension means capturing changes. When you have a simple reference table serving as the dimension, the only change that may be captured is an altered description or a new value added. When one has a rich set of data covering a business area as a proper dimension should, changes between the interrelationships of the data should count as a change that is captured, such as an already existing department being reassigned to a different-than-yesterday’s but already existing division value. In this fashion, the quick-and-easy implementation of views on top of reference tables has deprived the resulting structure of proper change data capture.

Dimensions are rich structures. And the function that dimensions serve is beyond a simple code-and-description reference table. To equate a reference table with a dimension table is often shirking the responsibility of designing a star schema. If a data modeler believes that multidimensional design is as simple as such a this-is-that approach, be very afraid, be very afraid. Your organization may be heading down a path where, for unknown reasons, things that were considered done alwaysseem to need tweaking and changing. The cost of a quick initial implementation often results in the ongoing gift of unexpected rework.