In Designing Databases, Normal Expected Use Plays a Part

Choices are pervasive when designing a database. The data modeler must progress through a series of issues: What ideas are important? Which objects stand out?  Which concepts can take a back seat? Adding to all those decisions comes understanding the target structures one is shooting at. A normalized design may lean one way, while a dimensional design may lean another.

As an example, let’s say one has several critical parts, for example engine parts, body parts, and interior parts. In a normalized design, it is entirely possible that the data modeler may choose to have a Part Type structure with a row for “Engine,” and a row for “Body,” and so forth. There may be a Part structure having a foreign key into the Part Type structure. The combination of two tables and their interconnection is simple and straightforward. Over time, if new kinds of parts arise, then a new row is simply inserted into the Part Type table, say “Exhaust,” and life continues without a need for any database structural changes.

 Translating the above normalized design into a multidimensional star schema could involve creating a Part Fact table summarizing counts of parts, and the Part Type may become a Part Type Dimension. The Part Fact would have a Part Count column. What kind of Part is counted would be known by the associated Part Type Name value in the dimension. A design such as this could be a valid and useful approach.

However, other arrangements are possible. In choosing between various options, the concept of “normal expected use” should carry some weight. In the above design, having a Part Type Dimension and using the values within that dimension to provide full meaning for the Part Count has as consequence that the natural result of queries against the structure will provide for each kind of part to have its associated count on a separate row. What if the user community always will want to see the counts horizontally across a single row? Certainly, manipulations can be done to force the multi-row result into a single row, but it would be manipulations that need to be done over again for every single query that the users may ever wish to execute.

Under such circumstances, another option for consideration would be to explicitly define within the Part Fact a column for each kind of part, an engine part count, a body part count, an interior part count, and so forth. With this second approach the queries would naturally have the kind of parts and counts across a single row. The multiple rows returned would be partitioned based on any other dimensions within the design.

Some might lean towards the original approach simply because of expected future change. The idea that new kinds of parts may be added in the future means that a new column would then need to be added to the fact table in the latter approach. Therefore, the perspective appears to be that database changes are an event to be avoided.

While there is no denying that with the latter approach there would be a database change involved whenever the kinds of parts change, the idea that database changes should be viewed with terror or fear is extreme. The ALTER TABLE ADD COLUMN command can be written and executed swiftly; but our procedures, priorities, and preferences do tend to add some time too. Nevertheless, our goals ought to be that building and changing star schema data marts should be a task to be accomplished quickly. 

Changing tables shouldn’t be as hard as we sometimes make it. The former approach is more generic, but demands more post processing, while the latter approach is more direct for what users ultimately want, but requires more table preparation in the design. Absolutely, each approach is valid. The question the designer must answer is fitness for use under a given set of circumstances at hand.