Dimensional Degeneration Do’s and Don’ts

When one hears the term “degenerate dimension,” thoughts of teenagers, leather jackets, motorcycles, and petty crime may come to mind. After all, the word “degenerate” is associated with something that is “below normal” or “corrupt.” Science tries to rehabilitate degenerate’s meaning by using it for a manner of simplification. A degenerate dimension is an easy, and, at the same time, subtle, aspect of multidimensional data modeling. Effectively, a degenerate dimension is a dimension that has not been implemented as a standalone table structure. Instead, this dimension has been implemented by merging its columns onto the fact table with which it is associated. In this way, the dimension is constituted conceptually as a subset of the fact columns.

So, the simplification is in making a single table out of what might otherwise be two. Due to its nature, one never needs to think of a degenerate dimension as something that might be a conformed dimension, because there is no table to reuse. If a degenerated dimension’s information applies to multiple fact tables, chances are, one needs to duplicate the dimension’s columns on each of those fact tables.

Degenerating a dimension can be a very natural result if one has a dimension with only a natural key as its total content. Ignore the idea of a possible generated surrogate key for a moment. If the dimension were to be linked into the fact table, its natural key would be placed onto the fact as the foreign key into the dimension. Even if one created a dimension table, it would not be a table that would ever need to be accessed: All its content is already on the fact table. Another circumstance wherein degenerating a dimension should be considered arises when a dimension is expected to have the same, or nearly the same, number of rows as the fact table. In a classic star schema, facts are expected to be “narrow and deep” (few columns, many rows), while dimensions are expected to be “wide and shallow” (many columns, few rows).

Dimensions act as a variation of a mega-index into the fact table. If fact and dimension are virtually one-for-one row-wise, any performance benefit of a dimension as a “mega-index” is lost.

Therefore, placing this dimension’s content onto the fact is a practical move. The example of this kind of degeneration that would be seen most often is something like degenerating an Order Dimension onto a Shipment Fact.

There should be no need for dimensional degeneration outside of the “naturally degenerated” or “one-for-one rows” reasons. In fact, if one is degenerating without one of these reasons behind their actions, an argument could be made that one is simply denormalizing and not degenerating at all. When folks start denormalizing things, often their denormalizations are simply a habit and nothing more. Excuses of “denormalizing for performance” are offered, but nothing has yet been implemented to experience any performance issue at all. Often, those choosing these actions are just denormalizing things because that is how they were initially shown and have always done it. If one has a performance issue, denormalizing is only one of many options available that can address such issues. But alternative options will never be considered if one is always assuming there are performance issues before an issue is actually seen and assuming that denormalizing will fix this mythical issue. If you only have a hammer, you tend to see every problem as a nail.

Degenerate dimensions are a valid tool within the multidimensional toolbox. When opportunities are seen in which a dimension might be naturally degenerated, or when a dimension has one-for-one rows with its associated fact, then do take advantage of this degeneration practice. But don’t confuse degeneration with potentially excessive denormalization.