Support Your Local Reference Data

Reference data is everywhere. References serve as the mortar holding together the walls of enterprise. Codes, indicators, types, and flags squirm all over each application in every organization. Solutions depend on known values existing at known points in the lifecycle of many business objects.

Reporting can only be interpreted via the control breaks and groupings of well-understood values appearing in expected columns. In fact, businesses live and die based on knowing the clear distinctions between “in process,” “shipped,” “received,” “billed,” and “paid,” or a similar progression of status.

If something that should be identified as a “closet” gets mislabeled as a “warehouse,” then bad decisions might be made. Terms such as “active,” “inactive,” and “canceled” may seem mundane and inconsequential, and when folks hear the term,“valid values,” their eyes glaze and expectations of interest diminish. But exciting or not, reference values and an understanding of them are important to every organization.

The meaning behind values may be scattered all over the place. The listing of valid values and their intended business meaning may only exist as a mention inside a text definition in some data dictionary or an appendix of an application specification.

Some of these lists may have become important enough to be defined as unique rows inside a database table defining the answer domain for the spotlighted code or type. Or, perhaps a generic valid values table is defined. The downside of such a generic structure is that first, a new code now must exist that allows users of the generic table to identify the specific list within the many contained by the table in which they are interested. Second, the list within a conglomerated table can no longer serve as the source for a DBMS-defined referential integrity rule.

When they have a defined place to exist, these codes and meanings need to be kept up-to-date. Some lists may simply be defined once with no intention for them ever to be altered. These petrified values likely were populated via a hand-written insert script, and nothing may ever have been established to allow for changes. Some organizations may have developed patchwork applications based on tools such as MS Access or the like. Other organizations may have completely avoided worrying about a user interface and instead simply have a means of loading spreadsheets or comma-delimited files from a shared location. Larger organizations may have an ERP tool in place that manages reference data as a matter of course. Still other organizations may even have acquired a specific tool for reference data management. In addition to having a means of maintaining these value lists, any organization may not have clear identification of the individuals responsible for maintaining these lists. Business may want IT to handle it, while IT may want business to manage it.

As an organization matures in its data governance function, responsibilities, approaches, and even tools used will likely change. Regardless of the means employed, reference data needs to be supported across every organization. If business shirks responsibility for reference data, IT needs to step up to the plate and manage this area for the good of the business. A time will come when responsibilities will change. Certainly, the effort should be to maintain this data at a minimal cost; but at the same time, the chosen method needs to be stable and sustainable.

Not all costs are explicit, and not all costs hit the balance sheet today. Expending resources that provide for maintaining reference data, in a fashion that may seem costlier than an organization may want to spend today, can very well save money and prevent potential future errors and problems that may result from not thinking ahead.