Junk dimensions are often misunderstood and avoided. And they should not be. Junk dimensions offer a strategy to remain true to dimensional intentions and to better focus one’s design and sometimes provide new insights into your data. The junk dimension is a collection of data items that may not relate to each other at all, although all relate to the fact at hand.
In building a dimensional design, novice designers often undertake a lazy approach. When they run across an odd or very standalone textual data element, their first thought may be, “That’s OK, I will just degenerate that one and add it onto the fact.” Even when such a result may happen, that is not the proper way to start off one’s thinking. All textual elements should be considered as parts of their own dimension.
An odd, solitary data item may initially be considered a dimension of one data element. If one has several of these items, one ends up with several single, or few attribute, dimensions. And that multiplicity of dimensions is just as an initial draft design should be. The problem with the lazy approach is that it often ends up with a designer degenerating many standalone items into the fact and not giving a thought to having done so.
As the initial design evolves, the junk dimension can arise to group these standalone items together and optimize one’s design as opposed to sloppily adding random things onto one’s fact. While degenerating dimensions onto a fact is a valid tactic, it is a limited rather than broad-sweeping approach. Items that would be good options to degenerate would be items that have as many values as there are rows in the fact table. For example, it might be best to degenerate an order number onto an order fact. This instance of degenerating prevents the need to join together two equally large tables. Or, at the end of the initial design, if one has only a single “odd duck” attribute and that attribute is unlikely to be useful to any other fact, degenerating might be your best option. But blindly degenerating multiple text attributes onto a fact as soon as one sees them is simply a corrupt approach.
As one deals with a group of lonely, small dimensions, a pattern may arise that causes the designer to gather these items into more than one “junk” dimension. It could be a minor relationship or category underlining a subgroup, rates of change activity, or almost anything. Often, a junk dimension arises from an assembly of various flags or indicators. If that is the case for a junk dimension, then one has a choice in how to go about building the actual dimension content. One may write a process that simply generates a table composed of Cartesian products across these flags and indicators. Execute it once, you have your data, and the process never needs to be run again.
However, there will be circumstances where that kind of a process will generate an excessively large number of rows, and maybe only a very few of those rows will be used by the related fact content.
Then the alternative approach would be to simply generate rows for the combination of junk dimension values encountered within the data. Each processing cycle would look for new combinations to add.
The bottom line is that every multidimensional designer should be frugal with degenerating attributes on the fact. Degeneration should be restricted to items that are just one-for-one row wise with the fact, or for a truly solitary data item that likely will never be used by any other fact. Designs having many attributes of a text nature that are living on the fact structure often are the result of suboptimal designing efforts. Those creating dimensional structures should evaluate their efforts with greater discretion.