Missing Information and the Relational Null

Mar 12, 2015

By Craig S. Mullins

One of the trickiest aspects of relational database management can be dealing with missing information. The standard method of representing missing information is to set the “value” to null.

A null represents missing or unknown information at the column level. The column must be defined to be able to accept a null; a column defined as “not null” cannot be set to null. If a column “value” can be null, it can mean one of two things: the attribute is not applicable for certain occurrences of the entity, or the attribute applies to all entity occurrences, but the information may not always be known.

At a high level, it is important to understand that a null is not the same as 0 or blank. Null means no entry has been made for the column and it implies that the value is either unknown or not applicable. A 0 or a blank are actual values stored for the column.

When a DBMS supports nulls you can distinguish between a deliberate entry of 0 (for numerical columns) or a blank (for character columns) and an unknown or inapplicable entry (NULL for any data type). Null indicates that the user did not explicitly make an entry or has explicitly chosen <NULL> for the column in that specific row. For example, a null in the Price column of the ITEM table does not mean that the item is being given away for free; instead it means that the price is not known or has not yet been set. Whereas a Price set to 0 would seem to indicate that the item is available free of charge.

Nulls sometimes are inappropriately referred to as “null values.” Using the term value to describe a null is inaccurate because a null implies the lack of a value. Therefore, it is a good practice to simply use the term null or nulls (without appending the term “value” or “values” to it).

Because a null is not a value it is not greater than, less than, or equal to any other value. And one null does not equal another null. It is invalid to test if a column is = NULL, < NULL, <= NULL, > NULL, or >= NULL. These are all meaningless because null is the absence of a value. So what if you wish to find rows where a particular column is null?

To test for the existence of nulls, use the special predicate IS NULL in the WHERE clause of the SELECT statement. You cannot simply state WHERE column = NULL. You must state WHERE column IS NULL, for example:

SELECT *

FROM ITEM

WHERE PRICE IS NULL;

This will find all rows in the ITEM table where PRICE is null. Similarly, you can use NULL to set a column to null in an UPDATE statement like:

UPDATE DEPT

SET MGRNO = NULL

WHERE MGRNO = '000010';

But what if you want to avoid returning any nulls in your SQL statements? This is possible using the Coalesce function. The Coalesce function returns the value of the first non-null expression. Here is an example:

SELECT *

FROM EMP

WHERE COALESCE(HIREDATE,DATE(’1959-12-31’)) < ’1960-01-01’;

Here we assume that the HireDate column is nullable. This query then, selects all employee rows for which HireDate is either unknown or earlier than Jan 1, 1960.

To this point the discussion focused on numbers and characters, but there are other data types. Date and time data types, as used in the previous example, benefit greatly from the concept of null. This is so because the RDBMS will check to ensure that any Date, Time or Timestamp value is actually a valid date/time value. As such, without nulls, missing information must be stored as a valid date/time.

So, if a date/time column can be unknown, it should be defined to be nullable because null is the only viable option for the recording of missing dates, times, and timestamps.

Summary

Nulls are one of the most misunderstood features of SQL database systems. Although nulls can be confusing, you cannot bury your head in the sand and ignore them. Get to know how nulls work and how to access them.

Newsletters

Missing Information and the Relational Null

White Papers

Sponsors