The Consequences of Dimension Degeneration When Building a Star Schema

The grain of a fact table is derived by the dimensions with which the fact is associated. For example, should a fact have associations with a Day dimension, a Location dimension, a Customer dimension, and a Product dimension, then the usual assumption would be for the fact to be described as being at a “by Day,” “by Location,” “by Customer,” “by Product” metrics level. Evidence of this specific level of granularity for the fact table is seen by the primary key of the fact being the composite of the Day dimension key, Location dimension key, Customer dimension key, and Product dimension key. However, this granularity and these relationships are easily disrupted.

In establishing a star schema, the fact table is comprised of metrics, i.e., the number values to be analyzed. The expectation is that the only attributes existing on the fact, other than the foreign keys into the associated dimensions, are the numeric values to be summed, averaged, or otherwise manipulated. As long as such a numbers-only state is true, then fact tables fall into the above mentioned condition of having a granularity defined by the composite primary key containing the individual foreign keys into the dimension tables. However, one tactic that is often employed in designing star schemas is to take what might otherwise be considered a “trivial” dimension and collapse those values right onto the fact table. This action is referred to as “degenerating” the dimension onto the fact. The dimension that now does not exist is referred to as a degenerate dimension.

Once any dimension has been degenerated, it is now harder to identify the primary key of the fact table. For instance, if the above example had a degenerated “Order” dimension that consisted of Order Number and Order Taker ID as columns added onto the fact table, it is possible that more than one fact row might be required with the same values for the Day dimension, the Location dimension, the Customer dimension, and the Product dimension. The primary key is the composite of all the dimension keys. However, with one or more of the dimensions degenerated, it is not likely that a surrogate key was created that represents the items within the degenerated dimensions. Therefore, no clean composite of the dimension keys works as a primary key for the fact. Possibly, one might be able to try using a combination of all dimension keys plus all degenerated attributes, but that could be overkill if more than one column is coming from each degenerated dimension. It is not unusual for designers to ignore the degeneration that was done and then be confused when granularity issues arise. A simplified approach to having a primary key for the fact table is generating a sequential surrogate key for such use. The dimensional key values now are downgraded to become simply foreign keys, and they do not serve as part of the primary key at all. In this way, the fact gets treated similarly to a dimension, in that it simply exists by fiat; the process creates a row and there it is.

No one is suggesting that degeneration be avoided on a star schema design. But it is recommended that designers be conscious of degeneration when they are employing it as part of a design. Initial designs should show those logical dimensions even if they will not be physically generated. This makes the overall process more clear to everyone. Degeneration of a dimension is a valid design option when building a star schema arrangement.