Customizing the Fact Table

Bookmark and Share

Occasionally the database under design becomes more than a one-off project. For example, the data mart or data warehouse might act as a profit center for the organization, and as a solution sold to many customers. The possibility exists to capture a basic truth for a given industry in a simple universal design exactly suiting everyone. But unless the situation really qualifies as a valid "one-size-fits-all" circumstance, one needs a level of customization within the architecture. Given these stipulations, the ultimate success or failure of the product rests on exactly how and how well such customizing works. Multidimensional designs consist of two basic table types - dimension tables and fact tables. Dimension tables contain the descriptors used to group, sort, or select data. And fact tables consist of measures, those numeric values added, averaged and otherwise utilized for analysis. While adding uniqueness to a design by incorporating new dimension tables progresses in a fairly straightforward fashion, adding in uniqueness on the fact-side of the design often proves a bit more awkward.

Applying a classic typing or polymorphic angle, to this posited group of standard fact elements and group of custom fact elements, leads to a slightly out-of-the-usual-box outlook. The basic fact elements comprise the supertype of the object under scrutiny, while each version or set of customizations provides a separate sub-type. Jumping from the conceptual (and normal, or at least object-oriented) perspective into the multidimensional standpoint takes a bit of deliberation. Building a secondary fact table offers one option. This second fact table may end up being used in the same query as the initial fact table, with the two fact tables then joined on shared values via the dimension tables. Multiple fact table queries such as this often end up performance-challenged. Some design approaches may overcome such performance shortcomings. Rather than consider the unique data items as a separate fact table, consider looking at them as if they present a unique extension to the fact table. Under an extension perspective, one needs to have a shared primary key. This means that the original fact table needs to have a generated column used as a primary or candidate key, rather than relying on a combination of the dimensional key values to act as a composite key. Then one can use this single key as a primary key on the extended fact table containing the unique data elements (including foreign keys into unique dimension tables), which allows the basic fact table and the extended fact table to join based on this generated single-column key. Additionally, if partitioning or clustering occurs in the basic fact table, then the extended fact table should end up partitioned or clustered in exactly the same way. A direct fact-to-fact single-column join performs much better than linking two potentially large fact tables through shared dimension values. A completely discretionary option includes deciding whether to go an extra step from this point and create a database-supported view that pre-joins the two fact tables, which then allows them to appear to users as a single relation.

Alternately, leveraging metadata can handle the divisions between a basic fact table and a customer-specific set of extensions to the same fact table. An active metadata repository fully supports completely unique customer-level tables containing both the standard and the custom data elements. This repository generates and maintains the table structure descriptions. The processes creating the data definition language [DDL] would grab the standard bits and then grab the custom bits to write out a single CREATE statement containing both groupings. These examples explain what kinds of techniques can provide the means for maintaining a standard suitable for all, while embracing customized extensions user-by-user.