Database Elaborations

Database Elaborations explores the human side of building and managing databases. The discussion revolves around data modeling as a semantic process, and how normalization only functions within a business rule context. Communication is hard; the meaning of everything is more fluid than we might wish to admit. Large gaps between logical versus physical or theory versus implementation are always worth reflection.

Language is Always Hard

Words such as "taxonomy" and "ontology" are often thrown around by data architects as if these terms were interchangeable. Generically, the ideas of taxonomy and ontology are similar, but not synonymous.

Posted January 08, 2026

Differences Between Operational and Analytical Data Structures

It is an unfortunate result of being human that we are biased by our first impressions. For example, we are often blinded when we encounter a subject area's data for the first time. The initial data structures we analyze and learn to understand set our internal paradigms. Once we understand that initial structure, we tend to believe that the way that structure is configured is the best, or the only way, that a specific data set of that kind of data should be presented.

Posted December 11, 2025

Remember What the Dormouse Said

Fourth and fifth normal forms send us down a pretty, yellow brick road leaving behind functional dependencies to gather up join dependencies and their special child, multi-valued dependencies. Explanations for these extra dependencies can be conflicting, which makes them perfectly at home in Wonderland with Alice. We'll try and explain them a little here, but be prepared, because others will claim this is wrong.

Posted November 13, 2025

Intuitively Obvious to the Most Casual Observer

When we draw lines within an entity-relationship diagram (ERD) that represent a "relationship," the diagram only shows the exterior presentation. A relates to B in a one-to-many fashion. But what is the actual nature of that A-to-B relationship? Is the relationship one of composition where A is the whole and each B is a part of A? Can any B exist without an A?

Posted October 09, 2025

The Mind of a Data Modeler

Years ago, before the rise of the gig economy, people developed their careers over time, often more slowly than desired. Folks who became DBAs, Data Modelers, or Data Architects had a rule-following gene they were either born with or grew. And that aspect of their mentality was greatly leveraged into how procedures and change progressed in their areas of influence. Such perspectives created a serious consistency, and constancy of purpose that fit well into the demands of the job. The rules were guardrails on how tasks within the framework were done.

Posted September 11, 2025

Hauling Dimensional Junk

Junk dimensions are often misunderstood and avoided. And they should not be. Junk dimensions offer a strategy to remain true to dimensional intentions and to better focus one's design and sometimes provide new insights into your data. The junk dimension is a collection of data items that may not relate to each other at all, although all relate to the fact at hand.

Posted August 14, 2025

The Meaning of Done

In the analytics world, when building data marts or other data areas meant to support end users, the goal is to present the "needed data" allowing users to perform their queries and analyses. This means that if an "XYZ Value" is required, then the "XYZ Value" is presented.

Posted July 10, 2025

What to Do When You Have Nothing

Inside any database management system (DBMS), one can designate a specific data item as "null." The null represents the "existence" of a non-value, the nonexistence of a value, or…nothing. This sounds a bit like an oxymoron, a nonvalue value, but there it is. Each DBMS has its own implementation of null support, so what it does to be able to share with you that "there is no value" can differ. For example, rather than a value, there may be a group of bit flags associated with an individual data element, with one of those bits being an "I am null" flag. And because every DBMS has its own way of doing this, it is best not to think that by using a null, one is greatly saving on space usage. Space may be saved, or maybe not so much.

Posted June 12, 2025

The Supernatural Many-to-Many Relationship

In the realm of data modeling, many-to-many relationships are often considered an "odd duck." Unlike one-to-one or one-to-many relationships, which can be directly implemented in physical database schemas, many-to-many relationships must remain in the abstract, the conceptual, the supernatural, and never the physical. This insubstantial nature makes them troublesome for greener data modelers.

Posted May 08, 2025

Putting Fun in Functional Dependency

Everyone knows and loves the first three normal forms. We go through the process of normalization to remove redundancies in our data structures. But the redundancies we remove have nothing to do with trying to save space. Instead, the desire is to prevent maintenance anomalies. The normalization process involves stepping through the evaluation of normal forms that decompose a data structure into multiple structures based on the needs of these normal forms.

Posted April 10, 2025

Sticks, Stones, Broken Bones, and Harmless Names

Even when we try to discuss a nameless thing, we apply a pseudo-name to allow us to articulate our thoughts. The "pink place," "the big, big road," "the evil-looking tree" …. To discuss a thing, it must have an identity. A name supplies an identity for a thing. Once named, we may now speak of it. The bedrock of any good set of standards starts with names. Regardless of the role we may play in managing technological solutions, everything needs a name because everything will be discussed.

Posted March 13, 2025

The Many Faces of Denormalization

Normalization clusters data items together based on functional dependencies within the data items. This normalized arrangement expresses the semantics of the business items being presented. Denormalization means that, for some reason, grouping the data strictly by functional dependencies has been ignored. Why might one cast aside reasonable designs for something less rational? There are many reasons for denormalizing.

Posted February 13, 2025

The Beast Known as a Surrogate Key

A surrogate is, "One appointed to act in place of another." A surrogate key is appointed to act in place of a natural key. Why? Well, one could argue that surrogate keys existed since the beginning of the computer. As the computer uses internal addresses for finding everything, and in the early days that usage had to be explicitly managed by those programming. It wasn't until relational theory came along that the concept of using data values as identifiers arose and were functionally implemented.

Posted January 09, 2025

Really, Views Don’t “Exist”

The beauty and joy of a relational database is the concept of relational closure—everything is a table. Beyond the eponymous table, query results are also "tables." Any query serves as a table to be queried by any other query, which is why queries may be nested almost infinitely within queries.

Posted December 12, 2024

Avoiding Dangerous Data

Far too often information engineers and others take a dismissive stance about managing data structures. This indifferent attitude is largely because whatever tool the engineers use provides fast performance in resolving queries. They believe that quick performance means all is well, so nothing else matters, right?

Posted November 14, 2024

A Bridge Too Far

In multidimensional data modeling practice, there is the concept of a bridge dimension. A bridge dimension, being the dimensionalized equivalent of a normalized data design's associative entity, allows for the resolution of a many-to-many relationship. By its very nature, the bridge is saying, "Yes, let's normalize our dimensional designs." The bridge dimension exists to resolve a many-to-many relationship between two other dimensions.

Posted October 10, 2024

The Lines Between

The lines between entities within an Entity-Relationship Diagram [ERD] represent an interdependency between the involved entities. In a normalized designed, this interdependency is both semantic and functional, as in, "a HOUSE has one-or-more DOORs," "an ORDER has one-or-more ORDER LINEs," or "a STUDENT enrolls in zero-to-many CLASSes." These object pairings would have a line drawn between them in an ERD representation. There also would be markings from whatever notation one is employing, designating the "one" side of the relationship and the "many" side of the relationship.

Posted September 12, 2024

Fuzzy Wuzzy Was a Bear, Or Was He/She a Diagram?

Everyone likes a diagram. Diagrams often help make an issue clear—except when they don't. There are plenty of diagrams that really do not convey information to the viewer, and that is a sad state for any diagram to find itself in. Maybe, just maybe, not everyone has the proper temperament to create a useful diagram? Before composing a diagram, the diagram-builder needs to be clear about what viewers of the created diagram should gain from viewing it.

Posted August 08, 2024

Remake/Remodel

Eventually the day comes when an existing data mart or warehouse needs to be re-examined. Maybe new workloads are desired, new business models arise, a new platform is considered, or new supporting tools in the environment are wanted. Especially when a new platform is part of the equation, many in charge think first of "lift and shift" as the approach. Taking as much existing data and processes as possible and keeping them virtually unchanged as things are moved onto new foundations is almost always the cost-saving approach—for the immediate short term.

Posted July 11, 2024

Needed? Unnecessary? Yes!

Data modeling has always had a push-me/pull-you relationship in the IT world. The idea of complete logical and physical data models, insightful definitions, coherent data flow diagrams, and other related informational caches is always desirable. However, far too often, the expectation is that these activities take virtually no time to create. When the expenditure of resources goes beyond a certain level, the activities are jettisoned or otherwise given the short shrift. Actually, getting code in place is viewed as the only evidence of true progress.

Posted June 13, 2024

I am, Therefore I am

A first and important step in being successful at almost any job or task is understanding oneself. This is true for engineers, modelers, or even architects. Do you know how to perform your tasks? Are you faster or slower than others doing the same tasks? Do you have more or less errors in your output than others? It seems that a rising number of people are so uncomfortable with competing that they may even avoid asking these kinds of questions of themselves; but this is not about competition, it is about understanding oneself, it is about being responsible, it is about holding oneself accountable.

Posted May 09, 2024

Relational vs. Relational

There is an old quote attributed to various origins that says, "When I hear the word ‘culture,' I reach for my gun." The saying expresses frustration at a term, in this case "culture" being used, abused, and politicized beyond all forbearance. Even when it was first popularized, it was a line meant to draw laughter from the audience of a play, rather than suggesting anyone carry an actual firearm. And no one is suggesting anyone do so now. "Relational" is a term that, over the last few decades, has undergone its own level of use, abuse, and politicization.

Posted April 11, 2024

Learning Is Forever

As disheartening as it may sound to new graduates, even when you leave school never to return, you still must continue to learn. There are always new tasks to comprehend, and new tools to drive. And when one serves as a data modeler, one is always going to be exposed to new data. Fortunately, if one has a good foundation, acquainting with new data can be easily managed. Every new conceptual or logical data model is an opportunity to develop a deeper understanding about another subject area within an organization.

Posted March 14, 2024

The Case of the Missing Fact

In far too many organizations, processes become automatic. Requests are made, and requests are fulfilled. People wish to please; they like to show how fast they can respond or how agile they are. The same is true when addressing database design. This delivery-focused desire comes from a good place. However, regardless of how well-intended, blindly fulfilling requests is dangerous and harmful to one's overall results. Optimization of processes, especially processes involving database design, should never remove the step where questions are asked and rationale is provided.

Posted February 08, 2024

Don’t Let Your Tools Lull You to Sleep!

When I was in high school, a little German lady would show up as a substitute teacher for almost any subject. Apparently, she had a very broad skill set. As I mentioned this was high school, eventually we students would become disruptive, unruly, loud—generally annoying—and our substitute teacher would give us "the speech," spending what I recall as far too much time explaining to us that "knowledge is power" and how we would be better served by changing our attitudes. At the time I did not believe her exhortations had any impact on 15-or-so year old me, but over the years it turns out she did.

Posted January 11, 2024

Dimensional Degeneration Do’s and Don’ts

When one hears the term "degenerate dimension," thoughts of teenagers, leather jackets, motorcycles, and petty crime may come to mind. After all, the word "degenerate" is associated with something that is "below normal" or "corrupt." Science tries to rehabilitate degenerate's meaning by using it for a manner of simplification.

Posted December 14, 2023

The Rationality of Surrogate Keys

The original intention of surrogate keys across multidimensional database designs was to help optimize joins by keeping all keys, used for joining between facts and dimensions, both numeric and single-columned. Often such surrogates were generated as simple sequential numbers, 1, 2, 3…. Early versions of many relational database products had sub-optimal performance when joining data together via character strings—especially large quantities of data.

Posted November 09, 2023

Rise Above Your Fear of Normalization

Many people seem to become filled with anxiety over the word "normalization." Mentioning the word causes folks to slowly back away toward the exits. Why? What might have caused this data modeling phobia? Do people have images flashing through their minds of data modelers running around wearing a hockey mask and carrying a chainsaw screaming, "Give me your primary keys"? I hope not.

Posted October 12, 2023

If You Have the Opportunity, Please Be Clear

Data models allow for the expression of a great deal of clarity and precision—when the data modeler chooses to allow for it. Many designers seem to work in "sloppy" or "imprecise" mode. Entities are defined containing many nulls allowed attributes. Certainly, if in the existing situation the source data is so dirty that every defined attribute "should" apply, but randomly things are not passed on, then yes, the data model is accurate. However, if the condition is such that the object has many very similar sub-objects, and various combinations of attributes must be populated based on which sub-type is being instantiated, then the data model is not expressing those rules very well.

Posted September 14, 2023

‘You’ve Got All the Weapons You Need. Now Fight!’

When Apple first released the iPhone, the company did something that was a bit wild, a bit innovative, especially for a new piece of technology. The iPhone did not come with a 500-page instruction manual explaining how everything worked. There was very little documentation for users. Apple worked hard designing the interface to allow users to apply their own intuitions and navigate around. While some grumbled, it was successful enough, and people adapted.

Posted August 10, 2023

The Art of Logical Data Models

Occasionally one may hear that a data model is "over-normalized," but just what does that mean? Normalization is intended to analyze the functional dependencies across a set of data. The goal is to understand which data elements relate to what other data elements. The context of a normalization exercise is the semantically constructed reality within a chosen organization.

Posted July 13, 2023

Programming and Attitudes

In dealing with databases, there are times when one must do some level of programming. With the rise of the variety of ETL, data pipeline, and data platform tools, programming elements have become mushy, confused, and scattered. This confusion has arisen because of what happens, and where that "what" happens has devolved into everything, everywhere, all at once.

Posted June 08, 2023

Grok the Data

In 1961, a clever science fiction author named Robert Heinlein coined a new term in one of his novels. The term was "grok." Grok is a wonderful word that means "to empathize or communicate sympathetically; also, to experience enjoyment," and is a term similar to the later sixties phrase "dig it." Grok meanings also include "to drink," "to love," and "to be one with." Grok can apply when one experiences those moments of insight as a new realization coalesces in one's mind, or as that joyful experience continues to provide a thrill. Perhaps grok can be a mental state one achieves as one attains the peacefulness advertised in another 1960s term—"be here now." Grokking can be vital in working through the creation of a new logical data model. If one does not understand the data, then one cannot model the data. Alternately, if one groks the data, the data modeling efforts will flow freely.

Posted May 11, 2023

Documentation Sucks, but You Can’t Google Everything

Sooner or later, in business, even in IT, almost everyone must write something more than simple emails. Procedures, standards, business cases, proposals, plans: All kinds of things may need to be authored by everyone and contain descriptive narrative about the subject at hand. Much documentation can be found on the internet, but your organization's internal rules, goals, and desires cannot be found via a Google search. Even database designers and data architects must contribute to corporate internal literature.

Posted April 13, 2023

Confused About Database Dates? Many Are

Across database structures, dates are ubiquitous: sales date; order date; shipment date; receipt date; created date; last updated date. You might think that with such an abundant presence, dates would be well understood. Sadly, dates hide in the shadows—and many are confused over how dates work. People run a query, get a result, and seem to expect that what one sees, is what one gets. If they see a two-digit year, they believe that is exactly how the data is stored.

Posted March 09, 2023

Avoid the “Can’t-Get-There-From-Here” Syndrome

Designing a database is not a magic trick—the design process does not involve trap doors, mirrors, or unseen wires holding things up. Sometimes database designers and data architects fail to think ahead as their designs are composed. Rather than checking on each item, the designer believes that whoever provided them the requirement "knew what they were doing."

Posted February 09, 2023

What Traits Make a Good Data Architect?

A good data architect is a gift to their organization because they achieve results that advance an organization's data maturity. What kinds of characteristics lead to a data architect being considered good to great? One of the first traits needed is attention-to-detail. Many tasks that fall under the data architecture umbrella may have very repetitive and tedious aspects.

Posted January 12, 2023

To Be of Value, Data Requires Meaning, Obviously

Often data is categorized into very high-level groupings of structured or unstructured. Generally, structured data is considered data that conforms to an easily identifiable pattern and as part of this conforming, that data may be easily loaded into a relational database table "as is." Examples of this might be fixed-format files, or comma-separated files having an agreed upon pattern to each record within it. Unstructured data supposedly cannot be loaded "as is" into a relational table. Unstructured data is, by name, lacking an identifiable structure to make sense of the data, right? Not exactly.

Posted December 08, 2022

What Becomes a Logical Data Model Most?

In the simplest terms, a logical data model is a visual representation of the business rules and requirements covering the universe-of-discourse for a given solution or enterprise, along with some textual support. Keeping this in mind, the logical data model is a metaphor describing the piece of the organization under analysis.

Posted November 10, 2022

Databases, Continuous Deployment, Integration, and Cheesiness

In a Continuous Deployment/Continuous Integration (CD/CI) world, most commentary dances around references to database changes as being a bottleneck, hard, awkward, or even painful. The reasoning supporting this perspective of suffering seems to arise from a desire for all changes to be fairly isolated matters. After all, a new function within an API, driving a new behavior on a screen, can just slip in, start executing, and life continues.

Posted October 06, 2022

Only the Strong Survive

In building a logical data model, some entities are considered strong, other entities are considered weak. Strong entities are the most foundational elements within a nascent Entity-Relationship Diagram (ERD) and comprise the list of objects that would likely come to mind first. Strong entities are independent in that they exist all on their own and can be created without having to meet any pre-conditions.

Posted September 08, 2022

Entities I Have Known and Loved

In science fiction movies, the word "entity" pops up when discussing an as-yet-unseen but at least suspected alien presence, presumed evil (or sometimes evil presence, presumed alien). This entity is not named until half the cast has vanished, then we might hear, "It's Cthulhu's daughter and she's angry!" But not all entities are evil or angry.

Posted August 11, 2022

Modeling Data is Important

Data modeling has always been a task that seems positioned in the middle of a white-water rapids with a paddle but no canoe. On one side of the data modeling rapids are the raging agilists who are demanding working software and decrying virtually all documentation. To this agilists' group, data modeling is often seen as too simple to matter. But at the same time, their implementations will miss standardization in naming or data model patterns. And results may be so far off course that major rework is unavoidable. Sadly, far too many agile practices have been set up to place things under the technical debt umbrella, when in reality those practices never allow the re-factoring closet door to be opened. Poor data models are "overcome" by creating ever more complex logic around the data in order to get to a more proper result, as developers learn what really needs to be accomplished along the way, maybe. The results may work but can be a nightmare to maintain.

Posted July 07, 2022

Modeling Data Is a Many-Splendored Thing

Data modeling is the process of defining datapoints and structures at a detailed or abstract level to communicate information about the data shape, content, and relationships to target audiences. Data models can be focused on a very specific universe of discourse or an entire enterprise's informational concerns. The final product for a data modeling exercise varies from a list of critical subject areas, an entity-relationship diagram (ERD) with or without details about attributes, or even a data definition language (DDL) script containing all the SQL commands to build a set of physical structures within some chosen database management system (DBMS).

Posted June 02, 2022

Let’s Not Get Physical

The value of normalization is in understanding the data well enough to create the normalized design. Pulling out the business rules, business terms, and relationships from the mass of jumbled together raw content is critical. The business rules that result from performing the normalization exercise establish the requirements that need to be satisfied by solutions, whether they are either built or purchased. When an organization creates and maintains a normalized design for the data within the important areas of their business, they reduce work on all future systems.

Posted May 04, 2022

Arriving at Data Model Quality

Data architects live in a world caged by bars of process, standards, and documented procedures—things many would consider a high ceremony lifestyle. As an industry, information technology has been migrating more and more into agile frameworks for some time now. High ceremony is often seen as an earmark of "waterfall" approaches, which constitute the evil empire that agile frameworks are fighting to replace. The result of this opposition is that formal data architecture groups often do not fold easily into agile approaches.

Posted April 07, 2022

Unwinding Relational Relationships

Often one reads a book or hears a presenter making a pun about relational theory being called "relational" because of entities being "related." Such references are nothing but misplaced puns. Relational theory derives the relational in the name from the idea that a "relation" is a mathematical term synonymous for a "set" and each entity represents a set of some sort. However, relationships between entities are still a very important concept albeit not an eponymous one.

Posted March 11, 2022

It May Amuse, But Sometimes Views Confuse

Folks relate to physical tables; even the most non-relational-minded person can picture a fixed structure file and equate that to a table and its columns. The spreadsheet image is ubiquitous. DBMS-defined views are logically similar to tablesand in usage are certainly interchangeable with tables.

Posted February 08, 2022

Being Agile While Data Comes In and Goes Out

Dealing with data warehouses, data marts, and even data lakes, can be awkward in an agile environment. While adding a single metric onto a dashboard can be very natural, no one builds a dimension table a few columns at a time. This awkwardness has caused many variants in how an agile methodology might be applied to one's analytics databases.

Posted January 03, 2022

True Surrogate Keys Silently Live in the Shadows

Using surrogate keys within a database is often considered a technique to improve performance. The assumption is that using anything other than a numeric data value to join tables provides "bad" performance. Therefore, whatever the natural key may be—one column, multiple columns, alphanumeric, etc.—the surrogate key can be a 100% numeric single value, standing in for that natural key value set. Some DBMSs have key generators that are numeric, others may be more wide-ranging in values. Some organizations may choose to use surrogate keys generated from hashed natural key values. Will surrogate keys improve everyone's query performance? As with the stock market, specific circumstances differ everywhere, so the individual results may vary.

Posted December 08, 2021

Pages
1	2	3	4	5

Newsletters

White Papers

Sponsors