The Simple and Ubiquitous Reference Table

Bookmark and Share

It is rare to find an implemented database devoid of reference tables.  Reference tables provide valid values for drop-down lists and in a slightly obtuse way, also allow for expression of a domain or user-defined data type within the database design.  Nominally a reference list is but a list of values to be referenced.  In practice, the implemented structure of a reference table is driven by small nuances regarding how each list is used. The signifiers may consist of a code or number that creates a short-hand way of expressing a state or value.  Yet in addition to such simple codes, a "medium length" name column or even a column to contain a lengthy text description should exist in support of these codes.  The names may be used in drop-down lists so that people need not memorize many code values, and the descriptions can assist help-functions by providing users with more understanding of a value's meaning or intended use.  For operational applications it may be worthwhile to have an attribute expressing the order in which items should be displayed, unless alphabetical display is a preference.  If codes are maintained via online screens, an indicator may be helpful in flagging values that are part of the system and not to be removed or deactivated via those online screens. 

In general, the short designation (or code-value) serves well as the primary key of the reference table.  However, when time is an important element - such as when code "ABC" was valid a week ago, is not valid today, but "ABC" will be valid again tomorrow - then primary key options other than "ABC" must exist.  Either the code plus the start date serves as the primary key, or a generated surrogate key is used.  The rules for generating a new row key value would incorporate looking up the offered code value and the valid start and stop dates.  Beyond the start and stop dates, it may be a helpful short-hand for application logic to define an active flag that could be used as part of the reference table structure and highlighting those items available "right now."

Just as with virtually every other table structure, reference tables should have columns designating who added those rows and when they were added, as well as columns designating by whom and on what date a row was last updated.  To track changes over time, a shadow table logging such changes is never a bad idea.  Most DBMSs allow for triggers to help maintain such shadow tables so that applications remain unaware of the log table's existence.

Some databases are established with a catch-all decode table, a structure meant to hold all reference tables.  Every valid value list is collapsed into a single master-list of all values for all codes used throughout the application universe-of-discourse.  Generally, an additional meta-code will need to designate into which list each row participates.  As a clever option, the list of these codes designating the value-groupings may also be one of the lists stored within the master reference table.  While such structures demonstrate that the designer understands abstraction (or, at least how to generalize and abstract things such as valid value lists), it negates any possible usage of DBMS-supported referential integrity.  Therefore, enforcement and validation of adherence to these valid values is left entirely in the hands of application code.  While application code can be written to validate such things, the same validation can be implemented more organically by simply having the individual reference tables and associated referential integrity constraints defined inside the database design.  The cost for this more verbose approach comes in accepting the "clutter" of many small reference tables.