Blurred Lines and Sloppy Data Models

Many neophyte data modelers have trouble distinguishing between logical and physical data models.  These novices likely cannot explain why each model exists, or the differences expressed between them. Sadly, such confusion also exists in the realm of the experienced data modeler.  Not to say an experienced data modeler can’t express the difference between a logical and a physical data model; but across a group of experienced data modelers one would not get a consistent answer.  Data modeling tools only add to the confusion, as limited programming skills applied to tool development either have provided skewed, or occasionally no transformations, between what is logical and what is physical. 

Initially, the logical model was intended to be the item to which the users were exposed.  Queries were to be executed against those logical constructs.  Entity Relationship Diagrams arose to help express that logical perspective, as well as to expose interconnections trailed across the structures from applying the normalization rules that were intended as part and parcel of the relational approach.  The physical model had no diagram because it was physical; the data definition language itself comprised that physical layer.  Then data modeling tools arose to help create database designs, which started presenting a “physical model” which was accepted as given.  But these physical diagrams really were not very physical, as storage groups, partitions, indexes, and many other things were not visually incorporated. Often there were places inside the screens and pop-ups within the modeling tools for storing values related to these physicalities, but it was all just hidden parameters.  Diagram-wise there was a logical model and a second logical model that was called physical.

As the modeling tools evolved, some tools started allowing variations between logical and physical diagrams.  Two logical entities could become one physical table, or vice versa.  Attributes and entities could be flagged as “logical only,” and tables and columns could be flagged as “physical only.”  And super types and sub-types from the logical layer could be converted into physical objects with some flexibility.  Again data modelers bought into a newer fiction that the logical model expressed the business rules while the physical was what was established “for performance.” 

If such an idea were entirely true, maybe things could have settled into that newer explanation.  But unfortunately, the tools were not always maintaining the proper variations between the layers.  When building a logical model today, far too often the designer will start adding in attributes under the guise of “denormalizing for performance.”  Further, those denormalizations likely may be of primary key values for other entities.  The expressions of business rules becomes murky and associative derivatives or worse are expressed.  But then again, the actual relationship line in the ERD may or may not be defined.  And even when it is defined, thanks to the flexibility of the tools, there are times where the relationship is defined as a zero-to-many, when it actually should be a one-to-many.  The designer may simply make the attribute mandatory while not properly noting the relationship itself.  The end result is a sloppy logical data model that does not really express the business rules and logical functional dependencies across the data in scope.  Ultimately, one could suggest that instead of a logical and physical model, one has two variations of a physical data model.

It is not unusual for designers to follow some of these confusing practices, or claim they have distinct logical and physical data models while really having two logical or two physical models.  As a profession, data modelers and data architects should try to reach consensus, and bring tools into alignment with that consensus.  What should the roles of the logical and physical data models be?  We have not yet made ourselves clear.