Referential Integrity Rules Must Stand Alone

Bookmark and Share

Referential integrity helps manage data by enforcing validation between related entities.   This enforcement follows logical semantics behind the database design -- i.e., an employee can only work for an already defined department; a prescription can only be written by a health care practitioner with the proper authority.  A Foreign Key on an Employee table rejects data when any attempt is made to insert or update a row with a department value that does not already exist as a department identifier within a Department table.

Apprentice database designers can become overly obsessive when defining referential integrity rules.  A novice designer may agree to abstract reference tables into a single universal code table containing codes for all purposes, yet also feel compelled to explicitly define foreign keys for all of those codes used across all the other tables. If a Foreign Key is defined, it requires a definition that provides for a full evaluation of allowed values.  Superficially, a Foreign Key referencing an abstract code table, only will test that a specific code is one of all the generically defined codes.  When an abstracted "all codes" table is in place, mistaken values can creep in for any specific use, as a value might be for a code of the wrong type. For example, an ethnicity code would be allowed to contain a gender code value without a problem, which is not actually what is intended.  In this circumstance, the application must shoulder the testing burden for value validation.  And if the application must do the test, the referential integrity rule is invalid and likely should not be defined inside the database.  If the database rule cannot stand entirely on its own, it is not an appropriate rule to enforce.  Certainly, one option to overcome the situation might be creating two columns for each code, one indicating the code type and the second containing the code value.  The Foreign Key could be defined against both columns instead of just the one.  However, using one column to determine the meaning of another column generally signals a weak design, which truly is the failing of a generic code table.

Ultimately, a Foreign Key is a restriction on allowable data values; and in practice, Foreign Key constraints are like check constraints.  A defined check constraint can state that a column must contain only a "Y" or an "N" value.  Or a table could be defined with separate rows for Y and N, and a Foreign Key testing that relationship requires the same adherence to a Y or N value set.  On a grander scale, a reference table containing the whole numbers 0 through infinity can be used in place of defining a column as an integer data type. 

Data types and check constraints function by restricting values for a column, just like referential integrity, as in Foreign Keys, operates similarly.  Some level of design confusion does exist because database management systems have not truly embraced a complete support for domains.  If the DBMS had more mature domain functionality, we would not have such compelling need for so many reference tables in normalized database designs.  Robust domains would enable dynamic user-defined data types to have display options and functions operating as seamlessly as those surrounding a Date data type.  As an industry, we have not truly defined when such reference tables should or should not be necessary.  Look up tables, and Foreign Keys into them, are a legitimate way to have the DBMS support validation of content; but there is a fuzzy line between legitimate object-to-object referential integrity and "simple" domain value support.  This fuzzy issue, between referential integrity and domain behavior, would benefit by further advancement from us to provide clear designs and application-friendly databases.