Future Facts and Type 2 Dimensions

One creates the potential for some interesting anomalies when building a star schema wherein the fact table contains future-dated metrics and any of the dimensions are Type 2.  A Type 2 dimension tracks changes to the data items contained within it. Effectively, each dimension contains a surrogate key, a natural key with a start and stop date, and additional descriptor columns. If any of the descriptor column values change, the existing dimension row has the stop date populated while a new row is inserted with the same natural key, new start date, and new descriptor values. 

What becomes a potential issue is that, if a fact is for a future date, the Type 2 dimensional assignment is with the row that is today’s current version of the dimension value instead of what will be the current version of the dimension value when that future date is reached. One could say, it is an estimated dimension assignment. Obviously, older fact rows, if processed today, can be associated correctly with the values that were active during the point-in-time the metric is from. As well, should the fact table be one that is truncated and re-loaded every cycle, then things will always self-adjust. But if the fact table is handled properly, having only inserts applied, then future-dated facts will initially be associated with dimension row versions that may or may not still be active when the fact row reaches currency.

It is possible that having a fact row associated with one point-in-time and that same fact row having a dimension row version that is from a prior point-in-time (the point-in-time the fact was processed) may not be considered a issue for business and such anomalies may be ignored. However, if there is a desire for this to be cleaned up, then there is more work to be done within the ETL logic. If metric data may change over time and should future-dated metrics exist, it is probably best to assume things could change, then a proper insert-only fact should have ETL processing that addresses changes. Most likely, fact rows are inserted using some variation on an accounting-like approach wherein after the initial version of a metric is inserted, any future insertions are delta values for that metric. This means that user queries against these fact tables would always be summations that aggregate the original and all change values to arrive at the correct current value for any metric.

If every future-dated fact is changed when that future is reached, then the existing accounting processes would assign the new values to the correct current version of the dimension and all is well. However, if that is not true, and future-dated facts are not guaranteed to be changed, then in order to ensure the metric and dimension are for the same point-in-time the ETL logic must be expanded. The ETL changes needed are fairly subtle. 

Initially, the assumption would be that existing change logic is focused on changes to the metrics to invoke new adjustment fact row inserts. For this circumstance, that logic would need to expand to encompass metric and/or Type 2 dimension changes as well. And for this specific circumstance, those Type 2 dimension changes could be represented by previous and current surrogate key assignment comparisons.

This expanded comparison logic would ensure that as the associated dimension natural key goes through updates, the metrics shuffle along with them to the new estimated dimension row assignment.  As the actual date of these future facts arrive, they will then be guaranteed to be associated with the correct point-in-time dimension values. Obviously, it is much more desired that these date association anomalies be considered of no importance to business so that one need not apply such changes, but sometimes business wants what business wants.