More Database Date and Data Type Functionality Please

Sep 11, 2013

By Todd Schraml

Database management systems support numerous unique date and time functions. And while the date-related functions are many, they do not go far enough. One date-driven circumstance often encountered has to do with objects having a type of date range that needs to be associated with it. The date range may describe the active timeframe for sales of a product, or possibly use date ranges to track the timeframes of validity for combinations of row values. While there are some exceptions, this date range need generally ends up implemented via two distinct date columns—one signaling the “start” and the other designating the “end.”

When one of these dates’ values is unknown, or not yet defined, then the column is allowed either to be NULL or, more often, a specific “nodate-

value” date-value is employed, like “1/1/1900” or “12/31/9999.” By having separate start and stop columns, queries become complex. Instead of being direct we must ask, “Which occurrence has a start date equal-to-or-less than that of the timeframe we are concerned with and has a stop date that is greater than that same timeframe, or possibly a NULL stop date?” And in truth all we really wish to ask is, “Which occurrence was active at the specific point-in-time the order was placed?” Or, “What is the effective value as of now?”

Even more painful implementations use only a start date, and the end dates must be calculated using recursive joins and the start date of the next “highest” row. These kinds of questions could be simplified were a date-based “range” data type to exist. What should exist is a “range” or perhaps an “interval” complex data type that consists of a start and a stop date, yet could still be considered a single attribute and could be “matched” directly against an external date to respond positively/negatively saying this “interval” was/wasn’t active at that point-in-time.

Maybe “data type” is an imprecise term, maybe “domain” would be more appropriate, but the terms both have spilled across each other. How we describe things drives how we think about things. Focusing on the implementation, in this case two different properties (start and stop), obfuscates the ideas that are at the heart of the matter. And in this specific area, thinking about a particular start date and a distinct end date has kept us from addressing our questions more simply.

In addition to simplifying queries, range-data-types could ease maintenance tasks as well. Today, with separated start and stop dates, application logic must be written that attempts to prevent the population of start and stop dates with inappropriately overlapping values across some critical attribute.

For example, if employee ID “123”is covered by insurance plan “A” from January 1 through March 30, something needs to prevent a row from being created having employee ID “123” covered by insurance plan “B” from March 25 through December 31. If the range-data-type were engaged properly, it could be established in such a way as to prevent any of these undesired overlaps.

But the most important aspect regarding this functionality is that it would be managed within the DBMS by the definition of the range-data-type

column instance, and not by external application code that must be hand-written case-by-case. Maybe, should the creative juices of DBMS builders’ flow, such things as numeric-range-datatypes could be created in addition to a date-range data-type. A numeric version might assist in postal/step-type function support. An easy example for its use would be in helping support the maintenance of a pricing table that has quantity-based discounts.

(And, of course, there would be a need for integer versions versus floating point or decimal versions.) Who knows where things could end up?