Physical Data Models Are Not Necessarily One-For-One With Logical

Bookmark and Share

It seems as if it should be simple. Within a normalized database model, a one-to-many relationship is represented by establishing a foreign key on the entity at the many-side of the binary relationship. Inside the ensuing physical tables, the foreign key column(s) equi-join to the primary key column(s) on the one-side to provide the usual and expected query results combining both tables. These one-to-many relationships are a bread-and-butter pattern as we build our database models. If the relationship is not one-to-many, it may be many-to-many, as in, say, customers to accounts at a bank. A customer may have many accounts; and a given account may sometimes be for more than one customer. The usual and expected pattern for a many-to-many relationship is an associative entity, a third object sitting between the two. This associative table resolves the many-to-many relationships with rows for each valid primary key combination by pairing the primary keys from the objects on each side of the relationship. By injecting this third table in the middle, the initial many-to-many relationship becomes two one-to-many relationships.

Under usual circumstances, the one-to-many or many-to-many relationship, alone, drives the pattern used within the database model. Certainly, the logical database model should represent the proper business semantics of the situation. But on the physical side, there may exist extenuating circumstances that would cause a data modeler to consider including an associative table construct for a one-to-many relationship. Reasons for this deviation from the usual could be that the relationship is sourced from a different system from the rest; and sometimes, even when sharing a source, there may exist timing issues wherein the relationship may not be defined until after the other objects are well-established. Another situation that could drive implementation of an associative table is if the relationship changes frequently, either due to data quality issues or simple expected business processing. A separate associative table allows for a smaller table structure going through these updates, with possibly its own unique tracking of historical changes. Lastly, if one has a circumstance where it is possible, or even likely, that in the future the one-to-many relationship could become a many-to-many relationship, then utilizing an associative table might be considered forward-thinking.

When assessing design choices, the data modeler or architect must reflect on the bigger picture, as well as color in all the details necessary to arrive at an appropriate design for implementation. Among the differences between these two implementation choices, “foreign key” and “association,” is enforcement of the relationship rules. When a foreign key is used, the structure itself enforces that the relationship must be one-to-many as only one value can exist within the foreign key’s columns. The associative entity allows as many relationships as one wants; any enforcement of a rule, such as a maximum limit, or a limit at specific points in time may only be enforced programmatically—regardless of whether it is application, ETL, database triggers, stored procedures, or other choices.

There are data modeling approaches that as a standard practice decompose object relationships into their own structures, i.e., associative tables. Most of these approaches fall under the umbrella of Anchor Modeling. Effectively, this treats every single relationship as if it were a many-to-many relationship. The most prevalent Anchor Modeling variant here in the U.S. is Data Vault. Within Data Vault, these associative tables are called Links. Remember, while decomposing relationships as standalone association tables is usual and expected, within Anchor Modeling, taking a one-to-many relationship and treating it as a many-to-many is a physical trick within a normalization practice, not a logical one. The logical database model should still reflect the actual one-to-many relationship. One might even go so far as to document the design choice within the notes inside one’s data modeling tool.