The Importance of Canonical Data Models

An enterprise conceptual data model is often seen as a high mountain to be climbed, a journey that will last a lifetime. People have visions of 10 feet or more of wall in the corporate offices wallpapered with an entity relationship diagram [ERD] that has zillions of teeny, tiny boxes and more relationship lines than the combined lines of queuing patrons in all Disney Resorts, when full. In this context, an enterprise conceptual data model is a daunting task not to be taken lightly. But in today’s world, that enterprise conceptual data model can simply be a list of subject areas.

Great care should be taken to ensure that the subject area list is comprised of the right subject areas and uses the right business terms that have value and impact within the given organization. The subject area list should be limited, 20, 30, 40 areas are likely acceptable, but 100, 200, 300 are probably excessive. Presentations of these subject area lists can be enhanced by providing an interesting graphic of ones choosing, with components bearing labels of subject areas from the list. This graphic could be Greek columns holding up a roof, a Ferris wheel, a map of a fictional territory, a variation of a pie chart. The critical elements are expressed as being critical, and the supporting elements are shown in a more supporting role.

Underneath the subject area list that comprises the enterprise conceptual data model comes the heart of the matter. This next layer should be comprised of a set of canonical data models. These canonical data models are expressions of the primary and vital business rules for the corporation. Each organization may determine how many canonical data models should be created.

Every canonical data model should cover a functional area or need of the organization, and also include documenting the subject areas involved in an individual canonical data model. The canonical data model is most often a normalized logical ERD, but not necessarily fully attributed.

Primary Keys, when listed, should be natural keys. Non-key attributes included should only be those items that are critical to the organization. Individual entities may be duplicated across more than one canonical data model, when appropriate.

Relationships defined should be those that are mission critical as well. The reason for the things within these canonical data models needing to be of vital importance concerns how these data models will be used. Once defined, these data models serve as a standard by which to evaluate the solutions implemented, whether they be created in-house or purchased off-the-shelf.

The evaluation of solutions performed against the canonical data model, whether those solutions are being designed or considered for purchase, is a qualitative one. There is no expectation that the implemented data structures be one-for-one representations of the items within the canonical data models. Instead, the intent is to ensure the varying data models properly support the critical business rules of the organization. For example, if there is a rule defined in the canonical data model stating a many-to-many relationship between a part and a product, but the solution being evaluated has that same relationship defined as one-to-many, then that solution relationship is an inconsistency. Such inconsistencies can be serious problems down the road. Understanding these inconsistencies before purchasing off-the-shelf solutions can be of great benefit to any organization.

Alternatively, if a solution is being built internally, an existing canonical data model can be a valuable jump-start to the process of building a physical database design covering one or more areas from canonical models. Most importantly, these canonical data models can be established one-by-one over time. There is no requirement to address the set of canonical data models in a big bang fashion. Attack the important priorities first, and work through the rest of the organizational elements as is practical.