Don't Equate Dimensions with Look-Up Tables

Bookmark and Share

Naïve approaches to business intelligence will occasionally trap designers as they juggle operational data stores and data warehouses.  The trap results from an honest endeavor to simplify designs and increase consistency throughout the solution.  Under the umbrella of consistency a designer may plan for a reference table used for operational look-ups to perform a second service as a star schema dimensional table.  Some or all reference tables then are declared by fiat to also be dimensions.  While on a superficial level there are similarities between dimension tables and more normalized look up or reference tables, fundamentally these two concepts are separate things. 

Dimensions will often contain several levels of a hierarchy.  Even if these dimensions don't contain an explicit hierarchy, a dimension may easily incorporate into a single structure a group of various look-ups sharing a commonality from a single subject area.  Reference tables, on the other hand, should be focused on only a single level of such hierarchies or groupings.  In defining primary keys, dimensions should lean toward implementing surrogate keys, while reference tables functions best using natural keys.

The deference towards surrogate keys for dimensions is a consequence of the next distinction between look-ups and dimensions.  Dimensions are maintained via one of several strategies generically referred to as Slowly Changing Dimensions.  These strategies are named Type 1, Type 2, and Type 3.  These strategies support managing the aspects of dimensional contained value changes which are specifically meant to be visible for reporting and analysis purposes.  Type 1 allows for all fact data to have access to only current dimensionally referenced values.  Type 2 provides for facts to always have the values that were current at the time data was originally processed.  And Type 3 allows for both current as well as some previous important value from a set point-in-time to be visible to a fact row.  Theses Slowly Changing Dimension strategies are established column-by-column within a dimension, so one column my use a Type 1 strategy while the next column uses a Type 2 approach.  Reference tables have other auditing needs; more normalized approaches are captured using approaches such as start and stop dates, shadow tables containing historical values sets, or perhaps using an individualized audit table.

As Ralph Waldo Emerson wrote, "A foolish consistency is the hobgoblin of little minds."  And trying to force consistency between a reference table and a dimension does fall into this foolish category.  Even under the best case scenario, hoping that the reference entity and the dimension could sync up and serve double duty as both a look-up on the operational side and a descriptive group on the dimensional side, it's best to avoid the idea.  Don't give in to the temptation of having one single structure instantiated within the database.  The reason to avoid this over-consistency is that building databases in this fashion hamstrings the future.  Time goes by, perspectives can change, needs for using the structures can change.  Should a need arise for a relatively minor change on the business intelligence side, for example a change to decisions on which columns are handled using Type 1 versus Type 2 Slowly-Changing-Dimension strategies, this minimized approach could have an adverse impact on the operational side.  Therefore, every change to such double-function tables has to go through two layers of vetting before any change can be made.  And some changes may accidentally require a much larger effort as more code in more places must be altered; or in the worst case, a decision to physically split the table into two is made after much more code has been placed based on the assumption that only one table exists.