Only the Strong Survive

In building a logical data model, some entities are considered strong, other entities are considered weak. Strong entities are the most foundational elements within a nascent Entity-Relationship Diagram (ERD) and comprise the list of objects that would likely come to mind first. Strong entities are independent in that they exist all on their own and can be created without having to meet any pre-conditions. Being independent, these entities often have obvious single-attribute primary keys such as Customer Number, Employee Id, Product SKU, or License Number. Weak entities are dependent on external things. A classic example is an Order Line. An Order Line is dependent on the Order existing first. This is often represented in a logical data model by the primary key of the Order Line being a composite key; the Order Number from the Order entity, and a simple sequential number within that Order Number, as in lines 1, 2, 3…. This pattern is often repeated with weak entities—part of the key is the primary key of the related strong entity, and the second piece is either a sequential number, a date, or a datetime. The composite key explicitly points to the related strong object that the weak entity depends upon, so that anyone viewing the ERD can see these obvious relationships.

An associative entity or association arises for many-to-many relationships. Associations are special kinds of weak entities. They are special in that, as opposed to a dependency on a single entity, they represent complex interrelationships between two, or more, entities. Similarly, the natural key of an associative entity is a composite of the primary key of the other entities that are members of the association. Under circumstances where simply using the strong entity's primary keys are not enough—meaning there are times where more than one row or instance will exist—then the primary key must be expanded. Like the simple weak entity case, a sequence number, or datetime may be fine. Other times when there are multiple associations between two objects, each pairing might be said to account for a different and specific role, then a role object may be added into the data model and into the association to add clarity in choosing a single instance for usage.

With the rise of object-oriented approaches, it has become quite fashionable to have every entity within an application use a single-column generated-surrogate key as its primary key. Superficially, this makes every entity a strong entity, and can be created independently. The weakness of dependencies still exists—nothing has really changed the semantics of the business itself. BUT those dependencies are no longer explicit within the data model. (Of course, it is possible that some organization has a logical model as originally described, and then a physical data model that has all the generated surrogate keys, but such a division is not easily accomplished via data modeling tools and therefore no one usually goes to such lengths.) Ultimately, these business keys become obscured. Foreign keys may exist. Maybe they are defined as mandatory. One can assume the existence of foreign keys represents a dependency, but largely even these are defined as optional. The application may always populate values so they are virtually mandatory, but you would have to search the application logic to arrive at such a determination.

Natural keys can have a downside too.  Many early data warehouses had normalized structures with start dates added onto each primary key. Trickling down composite PKs onto weak entities physically means that even small changes generate large changes across many impacted tables and rows. One needs to choose one’s poison, do you lose ERD semantics or complicate the build logic? Data architects must consider where they find balance.