Let’s Not Get Physical

The value of normalization is in understanding the data well enough to create the normalized design. Pulling out the business rules, business terms, and relationships from the mass of jumbled together raw content is critical. Those moments of uncovering a relational design are when a data architect “learns” about the data in focus. Keep in mind that normalization is the analysis of functional dependencies betwixt and between the elements within the universe of discourse. Since knowledge is power, the understanding that normalization brings is always worthwhile. Having that knowledge retained in an entity-relationship diagram, or within a data modeling tool, is of value to any organization.

Regardless of how one ultimately stores the data for use, an organization is in the dark about many aspects of their data unless a normalization exercise has been completed. Those data-based functional dependencies will drive the rolling out of change impacts across one’s database—even if enforcement is moved from the DBMS into the data pipelines preparing the data.  Similarly, there is value in understanding how business end users relate to their data, by which descriptive items group together and which levels of granularity make sense for potential facts in a dimensional design. The value is there even if physical structures do not follow it. Normalized and multidimensional perspectives enhance one’s understanding of the data, as well as how that data interacts with itself and with users.

Far too many folks just do not understand relational theory. Many seem to believe that relational designs must only ever be physically instantiated in relational databases that are implemented internally as tables of rows and columns. The value of understanding the data at that level of detail, regardless of how it may be physically implemented, is lost to many. But that narrow-minded perspective is flawed. No matter how data is stored, the functional dependencies throughout that data will drive how change to one data item impacts all the other data items. Physically persisting one’s data in a columnar or document store, hierarchical or relational, JSON or comma-delimited, does not alter those inter-dependencies and relationships. In fact, understanding those functional dependencies supports the establishment of subtly more powerful designs for data structures in any implementation.

Relational theory is based on the idea that users of the data can interact with that data as if it were tables of rows and columns. It was always assumed that technology would advance and that the best and fastest ways of doing things would morph and change. How that data is physically stored is truly a moot point. If we had insanely wide flat files but presented the data to the user as logical and normalized tables, that’s fine and that’s still relational. The tables of rows and columns is the user experience, not necessarily the physical implementation. Every Non-SQL vendor should realize that once they establish an SQL-ish interface, they now have defined their tool as relational—whether they intended to or not. And normalization is largely a business function identifying the detailed level understanding of the organization’s data.

The business rules that result from performing the normalization exercise establish the requirements that need to be satisfied by solutions, whether they are either built or purchased. When an organization creates and maintains a normalized design for the data within the important areas of their business, they reduce work on all future systems. Those normalized designs will only change when there are new innovations to the business. Otherwise, those normalized designs are reusable across initiatives. Organizations always seem to need to start from scratch on each project because previous initiatives were short-circuited and swept away rather than completed. Truly normalized designs are more powerful because they are independent of implementation.