How Many Data Models Does It Take?

Bookmark and Share

Back in the 1970s, the ANSI SPARC three-tiered model arose, foreshadowing a smooth intertwining of data and architectural design.  The three tiers concept isolated the physical storage needs of data structures independent of business' perception of these structures.  The three levels were comprised of schemas labeled external, conceptual, and internal, with each level describing the data in focus from varying perspectives.

The external "model" represented the perspective of an external user of the data.  Each variety of user would have their own concerns regarding the data and its content, resulting in a multitude of external schemas within each organization.  The conceptual level applied the semantic "magic" to integrate all external schemas into a single data model for everyone.  Finally, the internal schema incorporated the conceptual details in an actual implementation, presumably within some database management system (DBMS). 

Like the external schemas, many instances could exist that allowed for physical implementation of varying pieces at separate times and possibly utilizing separate technologies.  Because this model was well ahead of its time, it was many years before any DBMS could support the requirements to explicitly maintain such functional separation. 

The three-tier approach was incubated at a time when people believed that a single enterprise database serving all of an organization's needs was the best solution.  Entity-Relationship Diagrams (ERD) became a standard, at least for the conceptual model. 

A decade later, the arrival of the Zachman Framework helped articulate an enterprise approach to data intending to account for more perspectives than those initially identified in the ANSI SPARC method.  Planners, owners, designers, builders, and sub-contractors all had views with varying requirements and priorities driving their focus and direction.  Ultimately, a limit of three schemas or perspectives may not sufficiently cover all the bases in a data-focused ballgame.  Despite this fact, educators continued to promote the three-tiered approach, with the external schema renamed "conceptual," the original conceptual renamed "logical," and the internal one dubbed "physical."

Confusion increased as some proponents claimed the ERDs should be used for the new conceptual model, while also agreeing such models need not be attributed, and many-to-many relationships could remain as-is without being configured into associative entities.  Others argued for a conceptual model to remain completely non-formalized, presuming that any notation could be used as long as business users related well to it.  Everyone agreed that the physical schema had such unique elements only the definitional code (DDL) was truly descriptive of it; regardless of such agreement, tools started generating "physical ERDs." 

Although the original intention of a separate physical layer was to allow for differences in how database models and engines work, developers pushed the flexibility door so far open that any changes could be made at a physical level as long as someone justified it "for performance."  The real relationship between logical and physical models degenerated into the idea of sameness, even if it required a very convoluted perspective.  As analytics and business intelligence evolved, multidimensional approaches arose.  Star schemas by their nature might have been pigeon-holed as only a physical implementation, until tools added further ambiguity by supporting the creation of logical multidimensional ERDs.

What data models should be created by using which specific techniques?  Should some hard-and-fast ties of models transitioning between conceptual, logical, and physical provide for sure identification of a valid inter-schema model transformation?  How are valid transformations between these models controlled?  What level of precision or integration should apply to each model?  How many models are enough?  For any organization the choices are many; it is only the decisions that are hard.  While ideas and approaches have been presented, our industry has yet to agree on a gold standard.