Normalization in Data Modeling Does Not Always Mean Separation

When normalizing data structures, attributes congregate around the business keys that identify the grain at which those attributes derive their values. Attributes directly related to a person, deriving their value from that individual, will appear on an Employee entity, or a Customer entity, and so forth. However, in the process of normalizing, the data modeler must identify the objects that are meaningful and useful to the organization.

The goal of data modeling is not to define an objective reality. If an objective reality were the goal, generic industry data models would be a ubiquitous solution every organization would leverage. Every business in the same line would have identical data models that would function perfectly. Instead, it is the sometimes-messy subjective view of the business that is the reality that matters for establishing a data model. Linguists understand a single word may have many meanings; and when using words to communicate, what a word means to one person may or may not align with what that same word means to someone else.

A data model represents a collective agreement within the business over what objects are important and how those objects interrelate within the context of the business—what they mean. As well, every organization has unique “quirks” that help make them who they are, establishing an organizational personality. Important ideas in focus along with the way the organization views connections between those objects are the “special sauce” driving the business. Generic industry data models do have a place, but they serve as a kick-start to the modeling process, not the destination.

Consider an address; organizations may break address components apart in varying fashions based on how they use them. A likely breakdown could be Line 1, Line 2, Community Name, Region Name, and Postal Code. In this fashion, easy queries can be executed to group addresses by zip code, or state, or city. Perhaps the Line 1 could be parsed apart into a house number, directional, street name, street type, but is that breaking apart helpful and necessary? Or is one setting up a coding monster to cause future pain? Similarly, where should these address elements go? Objectively, addresses do stand on their own. One could model an Address entity and every other entity needing an address can have a foreign key into that abstracted Address object. But is such a concept useful for a given organization? Do objects having addresses (customers, vendors, etc.) have more than one single address used by the organization? Does the business need to run extra processing to clean up addresses? Is the organization into setting up a geographical information system? Maybe all the organization needs is the address components listed once as attributes on each of these address-needy objects. A simple implementation, for a simple and straight-forward need. At times, splitting out an address as a separate object only creates more coding and logic and maintenance, but not so much value.

If going for a simplified approach, keep in mind the subtleties between an attribute and a domain. On an abstracted Address entity, Community Name and Address Line 1 are fine attribute names. But if one has, for example, a Customer entity and a Store entity each needing a set of address attributes, the attribute names should be more qualified—Customer Address Community Name, Store Address Line 1, and so on. Once simplified in this entity-by-entity way, Community Name is now a domain of which Customer Address Community Name and Store Address Community Name are domain uses. And each of these attributes will have slightly different definitions including details about each one’s specific use. Simple can be very useful for an organization. Knowing when to simplify, and when to abstract is the skill and the art for which the data modeler is responsible.