Inside any database management system (DBMS), one can designate a specific data item as “null.” The null represents the “existence” of a non-value, the nonexistence of a value, or…nothing.
This sounds a bit like an oxymoron, a nonvalue value, but there it is. Each DBMS has its own implementation of null support, so what it does to be able to share with you that “there is no value” can differ. For example, rather than a value, there may be a group of bit flags associated with an individual data element, with one of those bits being an “I am null” flag. And because every DBMS has its own way of doing this, it is best not to think that by using a null, one is greatly saving on space usage. Space may be saved, or maybe not so much.
But the null does not explain why an attribute has no value, and that is part of the confusion. In general, it is said that if an attribute is null, it could mean that the value for some reason does not apply in this circumstance. Or, it might be that our data collection efforts have been unsuccessful.
There may be a value out there that should be inside that database column, but we currently do not know what that value is. In dealing with any logical operation testing a possibly null attribute, our logic fails. Is this attribute equal to, greater than, or less than any specific value?
When a data item is null, the answer is not true or false. The answer is, “I do not know.” Normal truth tables evaluate combinations of true and false, so we do not even have a guide to answer the three-value logic needed to evaluate true/false/unknown.
Some DBMSs further splash in this illogical puddle by saying that variable-length character string columns that “just happen” to be of length zero are not at all the same thing as a null. Apparently, those DBMSs believe there is yet another, special, kind of nothing. Ultimately, when nullable columns are used as filters within a query, data may return in an unexpected way, such as, all null rows are either thrown out or included, depending on the DBMS.
One can design database schemas to minimize or even eliminate nulls by splitting off the possibly null columns into distinct tables. Functionally, these tables would be similar to a supertype/subtype arrangement. Rows would only exist in these “subtype” tables when data exists for a column or column grouping. In this fashion, the database is not storing a null. However, users may still create queries from these tables using outer joins wherein the results may contain nulls.
There are many null-related operators and functions that a DBMS or data platform may have. Proficient data users will learn to employ the ones that they need for their data usage.
In the early days of data warehousing, many data modelers obsessed over null prevention. The usual strategy was to define null surrogate values to use instead of the null. A name might contain “Unknown,” a code might contain “?,” a start date might be “1900/01/01,” a stop date “9999/12/31,” and so on. In this fashion all inconstant null logic is completely circumvented.
There are still some in the database industry who are philosophically opposed to the use of nulls. The reasons for being against them are indeed obvious—inconsistent query logic, unknown meanings about why something is null.
But nulls are here, and I doubt they will be deprecated anytime soon, or ever. So, when nulls are used, be cautious, try to avoid them when possible, and make sure data users know how to handle them properly for their data needs.