Don’t Do the Too-Many-Foreign-Keys Thing!

Occasionally one sees an entity-relationship diagram (ERD) having an unusual characteristic. The unusual element is an entity (or more than one) that seems to have a relationship with every other entity in the entire diagram. These focal point entities turn the ERD into a hornet’s nest of lines crossing everywhere. Superficially, these ERDs look fine, the excessive relationships appear to be something reasonable. But looking more closely, it seems somehow a bit weird. There are generally two reasons causing all these relationship lines to appear. The first reason is the fault of our DBMS tools. Most DBMS’ have spotty support for the concept of domains.

At their heart, domains are solution-based data types, having not only specific value sets, but also potentially specific transformations. The best example of good support for a domain is to consider the date data type. Inside most DBMS’ one can display a date or even date parts in many differing formats, driven by a great variety of internal functions. Applying similar flexibility to other concepts, one should be able to define a “department” domain, with limited valid values, the ability to display an ID versus a text department name, or a budget code, or even the department manager’s name. Databases support date logic, but not-so-much the more arbitrary needs. Some might say that dates were different because they don’t ever change. Well, dates have changed at points in the past. If we had had computers, the adjustments that transitioned us from the Julian Calendar into the Gregorian calendar might never have been allowed.

The design entities driving the many thousands of relationships might be “User” or “Version” or “Company.” At some level these references, which are the heart of the web-like array of relationships, really are not entities, but are domains. Considering the reference as an entity allows for limiting values to the valid value list and allows for the varying expressions of values (ID, name, budget code…) to be implemented as columns that may be queried. This domain-as-entity approach is often the easiest choice to acquiring a reasonable approximation of true domain support. However, the cost is paid in many places. Starting with a muddied and possibly unreadable ERD. When domains are modeled out into the ERD as an entity, one is going beyond modeling the business and instead beginning to model the implementation, becoming more of a meta-description and coloring outside of the expected solution box.

Is the proposed implementation going to rely on error codes returned by the DBMS to recognize an invalid value being used? If not, then maybe the domain references are better off removed from the ERD. Reference entities may be gathered off to a side. Treating these as domains may help clean things up.

One will not need all the FKs spider webbing the data model. The reference table may still exist, but instead of being used for all the FK references, a function or other processing could be defined that verifies a value’s existence. Retaining an infinitude of Foreign Keys may cause future performance bottlenecks as transactional volume will get a performance hit by maintaining all those foreign keys and the indexes behind them. Each insert involves potentially many more steps. Each delete could become much worse. And keep in mind, any DBMS will have a physical limit on the number of foreign key references defined on a table; if a design requires all entities to have at least one relationship with a single reference table, then as a design grows, some day, a limit will be reached. What happens then? Why set a trap to fail at some point in the future? When you arrive at a design wherein an entity needs to have a relationship to almost every other entity, stop and think about what is happening, and review your reasoning before proceeding.