The Importance of Referential Constraints for Data Integrity

Referential integrity (RI)is a method for ensuring the “correctness” of data within a DBMS. People tend to oversimplify RI, stating that it is merely the identification of relationships between relational tables. It is actually much more than this. RI embodies the integrity and usability of a relationship by establishing rules that govern that relationship.

By defining referential constraints, you can set up the database to control the semantic accuracy of the data it contains. To define a referential constraint, you must create a primary key in the parent table and a foreign key in the dependent table. You must also define what actions are allowed when data is added or modified. This is accomplished with a set of rules applied to each relationship. The combination of the primary and foreign key columns and the rules that dictate the data that can be housed in those key columns can be defined to ensure correct and useful relational databases.

Referential Constraint Rules

Three types of rules can be attached to each referential constraint: an INSERT rule, an UPDATE rule, and a DELETE rule.

The INSERT rule indicates what will happen if you attempt to insert a value into a foreign key column without a corresponding primary key value in the parent table. In general, it is never permissible to insert a row into a dependent table with a foreign key value that does not correspond to a primary key value, unless the foreign key is null.

The UPDATE rule controls data modification such that a foreign key value cannot be updated to a value that does not correspond to a primary key value in the parent table. There are, however, two ways to view the update rule: from the perspective of the foreign key and from that of the primary key. From the perspective of the foreign key, once you have assigned a foreign key to a row, either at insertion or afterward, you must decide whether that value can be changed. This is determined by looking at the business definition of the relationship and the tables it connects. If you permit a foreign key value to be updated, the new value must either be equal to a primary key value currently in the parent table or be null. From the perspective of the primary key, it is good database design to forbid modifying primary keys. If a primary key value is updated, three options exist for handling foreign key values: (1) Restricted UPDATE. The modification of the primary key column(s) is not allowed if foreign key values exist; (2) Neutralizing UPDATE. All foreign key values equal to the primary key value(s) being modified are set to null. (3) Cascading UPDATE. All foreign key columns with a value equal to the primary key value(s) being modified are modified as well.

Finally, RI DELETE rules define what happens when an attempt is made to delete a row from the parent table. Similar to the primary key perspective of the update rule, three options exist when deleting a row from a parent table: Restricted DELETE, Neutralizing DELETE, and Cascading DELETE.

Referential Integrity Guidance

Although it is true that a database setup without RI will likely perform better and be easier to administer, that does not mean you should avoid RI! If you do not define the referential constraints into the database DDL, then application developers will have to code similar logic into all of their application programs. And, in most cases, the application logic will be less efficient than the built-in DBMS logic. Furthermore, simply failing to code the referential constraints in the DDL does not mean that the relationships cease to exist. It just means that the database system will not enforce them, so your data will wind up with integrity problems.

In terms of administering backup and recovery jobs, the same advice applies. If you do not code the referential constraints, then your DBMS will permit you to do improper things such as backing up related tables on different schedules. That means data integrity issues can arise if you have to recover using the backups without applying log records.

So, it is a good physical database design practice to implement referential integrity using database constraints. Ignoring the relationships does not eliminate the business requirement.

Proper database RI ensures data integrity for both planned application programs and for queries in ad hoc SQL statements and BI tools.