The Case of the Missing Fact

In far too many organizations, processes become automatic. Requests are made, and requests are fulfilled. People wish to please; they like to show how fast they can respond or how agile they are.

The same is true when addressing database design. This delivery-focused desire comes from a good place. However, regardless of how well-intended, blindly fulfilling requests is dangerous and harmful to one’s overall results. Optimization of processes, especially processes involving database design, should never remove the step where questions are asked and rationale is provided.

If a request is made to create a dimension, for example, questions should be raised about where and how the proposed dimension is to be used. If no answer exists for these basic questions, then why should someone be creating a dimension table in the first place? Where is the fact table that needs this dimension to explain it? I would suggest that the idea of a dimension without at least one related fact is not a dimension at all. Instead, one has a simple table standing alone, unattended and purposeless. One does not clutter up a database with purposeless data structures.

Ideally, the issue of the lone dimension table as described above is a problem of communication. The questioner has asked the wrong person, or perhaps the questioned individual is, for one reason or another, unable to articulate it properly. Surely a reason exists for the request, there is substance behind the dimension request, and we are not in an anti-reason organization more akin to Alice’s Wonderland, believing six impossible things before breakfast.

In many organizations, there would be a steering or governance function driving workloads; dimension requests wouldn’t necessarily come from engineers. A planned bus matrix would define the scope of everything to be built, and dimensions may be created for associations into facts still in the planning stage.

Therefore, a dimension may be built prior to any fact that will need to use the dimension. But even so, that circumstance should be known and able to be spoken.

Alternately, perhaps an engineer is wishing to expand some level of functionality, and the requested dimension is actually an outrigger that is planned to extend data availability across multiple existing dimensions. Instead of “hanging off” a fact, this requested structure might join into multiple dimensions for use.

Again, this is something that should be able to be discussed, perhaps even debated. If one has a policy against outriggers, maybe other approaches could be considered. But discussions such as these must start somewhere, and that somewhere usually begins with, “Why do we want this structure?” From such conversations, knowledge grows. The request may be for an inappropriate solution that needs further evolution.

If someone is responsible for a database’s design, they must question the purpose of every structure within that database. Not every structure will fit nicely—that is pretty much a given circumstance. While building up a database, all kinds of circumstances arise, and suboptimal solutions often make their way into existence whether you wish them to or not. However, a responsible designer should, of necessity, know why every data structure exists and have plans on how to correct structures with known issues in the future as opportunities arise. Dimensions do not pop up and serve a need in complete isolation.

Dimensions serve to describe facts by filtering and grouping. When the data structures within a database are known, and necessary improvements are also identified, then one has a database that is cared for, designed, and managed. Otherwise, one has a hodgepodge of data structures that apparently have no one responsible for them. Objects within these hodgepodge databases are data orphans left to fend for themselves against the unkind winds of change.