Database Elaborations: When Is a Dimension a Fact?

In dimensional modeling, business events are typically designated as facts while descriptive information elements are dimensions. However, events (or information about them) occasionally serve as dimensions as well as facts.  Metaphorically, this event information is somewhat similar to a gerund. Sometimes a verb is a verb, sometimes a verb can be used as a noun—as in the distinction between “gathering wood” versus being at “the gathering.”

A good data architect must watch their p’s and q’s and be certain when it is appropriate for a fact to also serve as a dimension—or when the dual function is not appropriate. Data modelers too often normalize dimensional components. Frequently, two distinct facts are associated into parent-child relationships, such as Order and Order Line—which is a normal design thing, not a dimensional design thing. It is only to be expected that subtler issues can prove even more confusing.

Sometimes, a business event may have both numeric elements and non-numeric elements; for example, Order will have a numeric Order-Total-Purchase-Amount, a non-numeric such as Order-Status-Code, and an Order-Number. The numbers to be summed are obviously grist for a fact table. The non-numeric items are supposed to be items for one or more dimensions.

However, going through a two-step conversion process may suit everyone better. Conceptually, having an Order-Total-Purchase-Amount, an Order-Status-Code, and an Order-Number could be viewed as having both an Order Fact and an Order Dimension. The Order Fact will contain the Order-Total-Purchase-Amount, and the Order Dimension can have the Order-Number and the Order-Status-Code. Having these two separate objects at the logical level keeps the ideas distinct and manageable; this is the first step.

Step two involves making a big decision. Does the Order Dimension physically stand alone as an isolated table, or is there a good reason to merge it into the Order Fact as a degenerated dimension? The reason for this merging might be that any such Order Dimension is going to have the same number of rows as the Order Fact; therefore, having two large tables to join for virtually every query could result in negative performance.

However, on the other side of the coin, a good reason to keep things separated might be justified by many updates of the Order-Status-Code. If the status changes often, it implies the need to update one’s fact often, which is never a great idea. If one wishes to degenerate the Order Dimension and avoid updating the Order Fact, then one needs to think through the ETL processing necessary to maintain the table. A bookkeeping approach can be used to maintain the Order Fact. The bookkeeping approach would require that each time an order’s status value is changed, then TWO rows are inserted into the Order Fact. One row is an exact negation of the first row; so, if the original row had a $100 Order-Total-Purchase-Amount value, the negation would be -$100.

In this way these two rows, when added, would sum to $0. The second row added would have the new Order-Status-Code value and all the normal current values. If nothing had changed, the new Order-Total-Purchase-Amount value would be $100. When all three rows are added together, the total amount result is $100, the desired value.

The bookkeeping approach is effective in identifying changes over time; the downside is that it has more rows, and queries against the fact must always be summations. But on the positive side, as mentioned above, the fact is only ever inserted into, no updates are processed. The decision on whether to degenerate the Order Dimension is not a stylistic choice but an architectural decision that has great impact on the flow of data into the Order Fact. The data architect should always be involved in such decisions, carefully deliberating the options.