Effective Database Design Avoids the Semantically Ambiguous Use of NULL-Surrogates

Oct 9, 2013

By Todd Schraml

When databases are designed, they serve as semantic expressions of the organizations that employ them. Those expressions describe the data items of importance and articulate the business rules as the organization views these rules. Ideally, each business also has standards which help frame the databases which are designed and implemented. Some organizations have a rule that says something like, “NULLs should not be allowed in a properly designed table.” However, things become strange when, rather than designing structures to avoid the need for a NULL, NULLs are avoided simply by using a specific value that serves as a surrogate for NULL. As an example, a date column may be populated with “1900-01-01”. Anytime anyone sees “1900-01-01” those individuals need to know that “1900-01-01” means “NULL.” The strangeness that is ignored by everyone in such an implementation is that, semantically, having a “value” that means “I-have-no-value” is still NOT having a value. What was nothing still is nothing.

The Different Between NULLs and NULL-Surrogates

The only alteration between allowing NULLs and using NULL-surrogates is in the operations coded/performed, and the implementation foreknowledge that must exist within the user’s command. A user must explicitly know to recognize these values and treat them appropriately instead of dealing with more generic “IS NULL” operators. Or, alternately, you could say that by using a meaning-augmented NULL-surrogate a user need not understand “IS NULL”-related functions. It is much like the difference between saying, “I have nothing” versus saying, “I do not have anything.” Maybe one could shade perspectives, consider one approach as “positive” and the other as “negative.” More preferable might be a description of “direct-NULL” when using a NULL and an “indirect-NULL” when using a “NULL value” surrogate. Ultimately, whether a direct-NULL or an indirect-NULL approach is used, the idea is the same. There is no additional semantic meaning to be found due to the approach used within the database.

How to Avoid the Semantically Wishy-Washy Use of NULL-Surrogates

How does one avoid the semantically wishy-washy use of NULL-surrogates; and instead of avoiding NULLs, actually design structures wherein NULLs are not necessary? It makes sense to consider attributes that might otherwise be optional as belonging to specific extensions of the base object, or possibly as sub-types. These extended or sub-type objects would need to be semantically defined, so that should a row exist, the values to those objects would be mandatory. Optionality is effectively upgraded to apply at the entity/table level; an instance exists within the table or not, and every column has a meaningful value. Instead of the column “Telephone Number” being NULL, or having a “999.999.9999” value, a “Customer Telephone Number” table exists and if one did not have the Customer’s Telephone Number, then there is no entry made. Or perhaps for another circumstance, separate tables exist for “Actual Meter Reading” versus “Estimated Meter Reading” or “Default Meter Reading.” Approaching data in this fashion actually does expose a much more detailed perspective. And while this deeper-meaning tactic might be considered overly tedious, it is correct and accurate. Precision is good, but there may be times where such precision is unnecessary or simply too costly.

Designing Data Structures to Meet the Existing Needs and Circumstances

Sadly, each of these approaches does require user knowledge and understanding. The user must know the exact NULL-surrogate used to extract meaning appropriate to what the user intends to gain from analyzing the data. Or, the user needs to understand NULL-related functions and how to use them to navigate to a proper answer. Or, the users need to understand outer join operations or that various tables and columns representing subtle variations in meaning must be coalesced properly to obtain a fuzzier desired result. Even more sad are the debates that may erupt as one implementation perspective tries to claim dominance over others in a database design war. The primary lesson here is that knowledge is important. Whatever approach is used, it should be chosen because the method fits the existing needs and circumstances, and not that it is “the only way.”