The Art of Logical Data Models

Occasionally one may hear that a data model is “over-normalized,” but just what does that mean? Normalization is intended to analyze the functional dependencies across a set of data. The goal is to understand which data elements relate to what other data elements. The context of a normalization exercise is the semantically constructed reality within a chosen organization. There are available industry-based data models, because across an industry there certainly is a degree of semantic alignment, but ultimately each organization has its own unique ways of speaking and thinking. That organizational uniqueness is why industry-based data models only serve as a jumpstart and they are not implemented as-is within any specific business. In normalizing the data, efforts are made to describe data structures that reflect how the business thinks about the business. Over-normalized, when referring to a data model, implies that in some way, shape, or form a data model has moved beyond the semantic framing of the business.

One place where over-normalization may be spotted is in excessive data decomposition of a given area. This excessiveness may be natural in an object-role modeling approach, or even some other variant anchor modeling strategy, but for normalization it is often out of place. For example, if one is simply wanting to store an address line for use in shipping, then breaking down an address into explicit house number, street name, street type, pre-street directional, post-street directional, and any other additional elements is an excessive approach. The goal is alignment with business, and while it may seem quite logical to break things down to their most atomic point, few businesses think in that fashion. However, such a detailed address breakdown may be useful in building a focused solution to parse and reformat addresses correctly, but that situation does not arise very often.

Another place to spot over-normalization is if one encounters unwarranted data abstractions. Abstraction is a very powerful data modeling tool. Generically, abstraction is when one perceives an object as it relates to other similar objects. The defined relationship is an “is-a” relationship, in the sense that a house is a building, a skyscraper is a building, and a parking ramp is a building. As these objects are defined within the data model, relationships from objects outside the group may connect with subtle distinctions at varying levels, i.e., some things relate directly to a house, others relate directly to a building and therefore cover all kinds of buildings. However, within the logical data model the goal is to express the business rules and logic. Therefore, abstractions do not have to be avoided, but they should be used surgically. When an area of the organization is fluid, constantly changing or adding ideas, then that may be an area where abstraction can be useful. If an area is static, unchanging and known, then there is more reason to be explicit and concrete versus being abstract. Sometimes a data model may suffer from unconscious abstraction. The biases of a data modeler may have resulted in only defining a high-level object, like a Party or a Person, when the business needs an explicit Employee and a Customer object.

If you have concerns that a data model may be over-normalized, then look for excessive decompositions, unwarranted abstractions, or even unconscious abstractions. The artistry in logical data models is to understand the business deeply enough to know when and where to apply abstraction, keeping objects within the used language and meanings of the organization. Over-normalization can result in a loss of semantic details and integrity. Data modeling is a semantic process and as such, it will always remain more art than science. Consider your logical data model as if it were a poem expressing the thoughts of the business.