Newsletters




Yet Another Type 3 Slowly Changing Dimension Variation


As the number of types of slowly changing dimensions (SCDs) increased, things have ended with Types 0 through 7, making essentially eight of them. But I do not believe full consensus exists among current practitioners on what actually differentiates each of these eight SCD types. Some confusion may result from the fact that when the first three SCD types were defined, each could be equated to a result for a dimensional attribute. Type 1 had facts associated to dimension values as they are currently, or always current, Type 2 had facts associated with dimension values as were current when the facts were processed, Type 3 had facts associated with both current values and values current when processed.

The original concept was for each attribute or column to be addressed by a single SCD Type. Once things became confused on the implementation side, some assumed an SCD type applied to the entire dimensional table implementation. Type 6 and Type 7 became two different ways to implement a Type 3 result across all columns of a dimension. In the Kimball Group’s version, Type 6 has the dimension duplicating every column on a dimension, one holding an original value and the other the current value. Type 7 has a second key on the fact, one pointing to the original, the second pointing to a current. But again, these are both implementation options for arriving at a state of a fact that is able to be associated with the original dimension value and the current dimension value concurrently. Therefore, Type 6 and Type 7 are implementation options under a general Type 3 result approach.

Another Type 3 SCD Variation

There is another implementation option, and as such it should likely be considered yet another SCD Type, but, not believing in SCD-type proliferation, I suggest only that this is yet another variation of a SCD Type 3 result approach. The implementation for this new approach is quite simple by using an SCD Type 2 implementation approach. The dimension table has new rows created as anything changes, with start and stop dates. The facts always and only point to the dimension row that is active at the point in time of the fact, with one slight addition. On the dimension table, an additional column exists that is a self-join foreign key to the same dimension pointing to the “current” version of the natural key in the dimension. Using this foreign key, the dimension can be joined to itself where that “second” copy can expose “current” versus “original” dimension values. Views can be established allowing users to join into an SCD Type 1 dimension, an SCD Type 2 dimension, or even an SCD Type 6 dimension. A recursive join into a dimension should not be something that becomes a drag on DBMS optimizer performance, as indexing and clustering should overcome any issue, if one might arise. As a new SCD Type 2 row is inserted, the previously active row has its stop date populated and all previous versions have their new foreign key updated to point to the newly inserted row.

Greater Flexibility

Applying this Type 3-result-variant approach ubiquitously allows for great flexibility for users, as they can have a choice of current or original values across all dimensional data items. Or, things could be biased for one group versus another. Group 1 could have access to the current/SCD Type 1 versions of dimensions; Group 2 could have access to historical/SCD Type 2 versions; and the savvy data scientists in Group 3 could have access to both sets of columns. Practitioners should find this approach very useful in establishing a star schema environment that can service a wide variety of users effectively. The SCD Type 3 variant can become a standard that enhances a solution’s success.


Sponsors