There Ain’t No Such Thing as a Partial Foreign Key

A foreign key is a relationship between two entities. Generally speaking, inter-entity relationships are one-to-many, e.g., an ORDER has many ORDER LINEs, and so forth. Even a case that may start out as a many-to-many relationship, like a PERSON has many ADDRESSes but an ADDRESS may have many PERSONs, becomes two one-to-many relationships once an associative entity is added into the design. A PERSON has many PERSON ADDRESSes and an ADDRESS has many PERSON ADDRESSes.

Across physical tables, shared columns are the evidence of a foreign key relationship. The columns comprising the primary key of the table on the one-side of the relationship are repeated on the table on the many-side of the relationship. These common columns would be used in a join between these tables within a query. From our above example, an ORDER LINE table would likely have an Order Number within it to join back into the ORDER table. The beauty of the concept is the basic simplicity.

In the data warehousing world, even the simple may get trampled on and mangled. One of the great differences between data warehousing and operational solutions is that data warehouses gather history. Gathering history very often means that each object within the database design may have columns like a “Start Date” and an “End Date” designating the date range for when a particular set of values was active. And because of this temporality, the “Start Date” column becomes part of the table’s primary key, since rows are no longer unique inside the table without this column. And it is within this Start Date inclusion where data warehouse implementations create confusion.

If the primary key includes a Start Date, the allied foreign keys must also include that same Start Date value. Very often someone, somewhere, convinces implementers that following such rules is too much work, or not really necessary. The result is that these object-specific Start Date columns do not follow along as part of the foreign keys across the other tables.

Data modeling tools want to enforce the basic rules; how a foreign key works is one of those rules. Therefore, to support this misrepresentation of foreign keys the data modeler may have to exclude the Start Date from the primary key, misleading users as to how a unique row is identified. Or sometimes a modeling tool may allow a way to hide columns, so the model masks away these “offending columns.” The only thing enhanced by the exclusion of these Start Date columns is that the development team saves a few lines of code and a little bit of effort.

Now, of course, it is the users who will have to do more work to understand the actual data versus the data model, because now the data model does not describe the data, just some of it. Users will now need to be trained about an additional set of rules to apply when querying data. Often these rules would be something along the line of, each table must have a separate test done to it regardingwhich date within its Start Date / End Date range that one is trying to pull in with everything else. And those users must also make sure that they double check for tables not having those Start Date / End Date columns, or some other variation that may have been done, here and there.

Obviously, if one is using techniques outside of normalization, such as anchor modeli