Newsletters




Database Elaborations

Database Elaborations explores the human side of building and managing databases. The discussion revolves around data modeling as a semantic process, and how normalization only functions within a business rule context. Communication is hard; the meaning of everything is more fluid than we might wish to admit. Large gaps between logical versus physical or theory versus implementation are always worth reflection.



There is an old quote attributed to various origins that says, "When I hear the word ‘culture,' I reach for my gun." The saying expresses frustration at a term, in this case "culture" being used, abused, and politicized beyond all forbearance. Even when it was first popularized, it was a line meant to draw laughter from the audience of a play, rather than suggesting anyone carry an actual firearm. And no one is suggesting anyone do so now. "Relational" is a term that, over the last few decades, has undergone its own level of use, abuse, and politicization.

Posted April 11, 2024

As disheartening as it may sound to new graduates, even when you leave school never to return, you still must continue to learn. There are always new tasks to comprehend, and new tools to drive. And when one serves as a data modeler, one is always going to be exposed to new data. Fortunately, if one has a good foundation, acquainting with new data can be easily managed. Every new conceptual or logical data model is an opportunity to develop a deeper understanding about another subject area within an organization.

Posted March 14, 2024

In far too many organizations, processes become automatic. Requests are made, and requests are fulfilled. People wish to please; they like to show how fast they can respond or how agile they are. The same is true when addressing database design. This delivery-focused desire comes from a good place. However, regardless of how well-intended, blindly fulfilling requests is dangerous and harmful to one's overall results. Optimization of processes, especially processes involving database design, should never remove the step where questions are asked and rationale is provided.

Posted February 08, 2024

When I was in high school, a little German lady would show up as a substitute teacher for almost any subject. Apparently, she had a very broad skill set. As I mentioned this was high school, eventually we students would become disruptive, unruly, loud—generally annoying—and our substitute teacher would give us "the speech," spending what I recall as far too much time explaining to us that "knowledge is power" and how we would be better served by changing our attitudes. At the time I did not believe her exhortations had any impact on 15-or-so year old me, but over the years it turns out she did.

Posted January 11, 2024

When one hears the term "degenerate dimension," thoughts of teenagers, leather jackets, motorcycles, and petty crime may come to mind. After all, the word "degenerate" is associated with something that is "below normal" or "corrupt." Science tries to rehabilitate degenerate's meaning by using it for a manner of simplification.

Posted December 14, 2023

The original intention of surrogate keys across multidimensional database designs was to help optimize joins by keeping all keys, used for joining between facts and dimensions, both numeric and single-columned. Often such surrogates were generated as simple sequential numbers, 1, 2, 3…. Early versions of many relational database products had sub-optimal performance when joining data together via character strings—especially large quantities of data.

Posted November 09, 2023

Many people seem to become filled with anxiety over the word "normalization." Mentioning the word causes folks to slowly back away toward the exits. Why? What might have caused this data modeling phobia? Do people have images flashing through their minds of data modelers running around wearing a hockey mask and carrying a chainsaw screaming, "Give me your primary keys"? I hope not.

Posted October 12, 2023

Data models allow for the expression of a great deal of clarity and precision—when the data modeler chooses to allow for it. Many designers seem to work in "sloppy" or "imprecise" mode. Entities are defined containing many nulls allowed attributes. Certainly, if in the existing situation the source data is so dirty that every defined attribute "should" apply, but randomly things are not passed on, then yes, the data model is accurate. However, if the condition is such that the object has many very similar sub-objects, and various combinations of attributes must be populated based on which sub-type is being instantiated, then the data model is not expressing those rules very well.

Posted September 14, 2023

When Apple first released the iPhone, the company did something that was a bit wild, a bit innovative, especially for a new piece of technology. The iPhone did not come with a 500-page instruction manual explaining how everything worked. There was very little documentation for users. Apple worked hard designing the interface to allow users to apply their own intuitions and navigate around. While some grumbled, it was successful enough, and people adapted.

Posted August 10, 2023

Occasionally one may hear that a data model is "over-normalized," but just what does that mean? Normalization is intended to analyze the functional dependencies across a set of data. The goal is to understand which data elements relate to what other data elements. The context of a normalization exercise is the semantically constructed reality within a chosen organization.

Posted July 13, 2023

In dealing with databases, there are times when one must do some level of programming. With the rise of the variety of ETL, data pipeline, and data platform tools, programming elements have become mushy, confused, and scattered. This confusion has arisen because of what happens, and where that "what" happens has devolved into everything, everywhere, all at once.

Posted June 08, 2023

In 1961, a clever science fiction author named Robert Heinlein coined a new term in one of his novels. The term was "grok." Grok is a wonderful word that means "to empathize or communicate sympathetically; also, to experience enjoyment," and is a term similar to the later sixties phrase "dig it." Grok meanings also include "to drink," "to love," and "to be one with." Grok can apply when one experiences those moments of insight as a new realization coalesces in one's mind, or as that joyful experience continues to provide a thrill. Perhaps grok can be a mental state one achieves as one attains the peacefulness advertised in another 1960s term—"be here now." Grokking can be vital in working through the creation of a new logical data model. If one does not understand the data, then one cannot model the data. Alternately, if one groks the data, the data modeling efforts will flow freely.

Posted May 11, 2023

Sooner or later, in business, even in IT, almost everyone must write something more than simple emails. Procedures, standards, business cases, proposals, plans: All kinds of things may need to be authored by everyone and contain descriptive narrative about the subject at hand. Much documentation can be found on the internet, but your organization's internal rules, goals, and desires cannot be found via a Google search. Even database designers and data architects must contribute to corporate internal literature.

Posted April 13, 2023

Across database structures, dates are ubiquitous: sales date; order date; shipment date; receipt date; created date; last updated date. You might think that with such an abundant presence, dates would be well understood. Sadly, dates hide in the shadows—and many are confused over how dates work. People run a query, get a result, and seem to expect that what one sees, is what one gets. If they see a two-digit year, they believe that is exactly how the data is stored.

Posted March 09, 2023

Designing a database is not a magic trick—the design process does not involve trap doors, mirrors, or unseen wires holding things up. Sometimes database designers and data architects fail to think ahead as their designs are composed. Rather than checking on each item, the designer believes that whoever provided them the requirement "knew what they were doing."

Posted February 09, 2023

A good data architect is a gift to their organization because they achieve results that advance an organization's data maturity. What kinds of characteristics lead to a data architect being considered good to great? One of the first traits needed is attention-to-detail. Many tasks that fall under the data architecture umbrella may have very repetitive and tedious aspects.

Posted January 12, 2023

Often data is categorized into very high-level groupings of structured or unstructured. Generally, structured data is considered data that conforms to an easily identifiable pattern and as part of this conforming, that data may be easily loaded into a relational database table "as is." Examples of this might be fixed-format files, or comma-separated files having an agreed upon pattern to each record within it. Unstructured data supposedly cannot be loaded "as is" into a relational table. Unstructured data is, by name, lacking an identifiable structure to make sense of the data, right? Not exactly.

Posted December 08, 2022

In the simplest terms, a logical data model is a visual representation of the business rules and requirements covering the universe-of-discourse for a given solution or enterprise, along with some textual support. Keeping this in mind, the logical data model is a metaphor describing the piece of the organization under analysis.

Posted November 10, 2022

In a Continuous Deployment/Continuous Integration (CD/CI) world, most commentary dances around refer­ences to database changes as being a bottleneck, hard, awkward, or even painful. The reasoning supporting this perspective of suffering seems to arise from a desire for all changes to be fairly isolated matters. After all, a new function within an API, driving a new behavior on a screen, can just slip in, start executing, and life continues.

Posted October 06, 2022

In building a logical data model, some entities are considered strong, other entities are considered weak. Strong entities are the most foundational elements within a nascent Entity-Relationship Diagram (ERD) and comprise the list of objects that would likely come to mind first. Strong entities are independent in that they exist all on their own and can be created without having to meet any pre-conditions.

Posted September 08, 2022

In science fiction movies, the word "entity" pops up when discussing an as-yet-unseen but at least suspected alien presence, presumed evil (or sometimes evil presence, presumed alien). This entity is not named until half the cast has vanished, then we might hear, "It's Cthulhu's daughter and she's angry!" But not all entities are evil or angry.

Posted August 11, 2022

Data modeling has always been a task that seems positioned in the middle of a white-water rapids with a paddle but no canoe. On one side of the data modeling rapids are the raging agilists who are demanding working software and decrying virtually all documentation. To this agilists' group, data modeling is often seen as too simple to matter. But at the same time, their implementations will miss standardization in naming or data model patterns. And results may be so far off course that major rework is unavoidable. Sadly, far too many agile practices have been set up to place things under the technical debt umbrella, when in reality those practices never allow the re-factoring closet door to be opened. Poor data models are "overcome" by creating ever more complex logic around the data in order to get to a more proper result, as developers learn what really needs to be accomplished along the way, maybe. The results may work but can be a nightmare to maintain.

Posted July 07, 2022

Data modeling is the process of defining datapoints and struc­tures at a detailed or abstract level to communicate information about the data shape, content, and relationships to target audiences. Data models can be focused on a very specific universe of discourse or an entire enterprise's informational concerns. The final product for a data modeling exercise varies from a list of critical subject areas, an entity-relationship diagram (ERD) with or without details about attributes, or even a data definition language (DDL) script contain­ing all the SQL commands to build a set of physical structures within some chosen database management system (DBMS).

Posted June 02, 2022

The value of normalization is in understanding the data well enough to create the normalized design. Pulling out the business rules, business terms, and relationships from the mass of jumbled together raw content is critical. The business rules that result from performing the normalization exercise establish the requirements that need to be satisfied by solutions, whether they are either built or purchased. When an organization creates and maintains a normalized design for the data within the important areas of their business, they reduce work on all future systems.

Posted May 04, 2022

Data architects live in a world caged by bars of process, standards, and documented procedures—things many would consider a high ceremony lifestyle. As an industry, information technology has been migrating more and more into agile frameworks for some time now. High ceremony is often seen as an earmark of "waterfall" approaches, which constitute the evil empire that agile frameworks are fighting to replace. The result of this opposition is that formal data architecture groups often do not fold easily into agile approaches.

Posted April 07, 2022

Often one reads a book or hears a presenter making a pun about relational theory being called "relational" because of entities being "related."  Such references are nothing but misplaced puns. Relational theory derives the relational in the name from the idea that a "relation" is a mathematical term synonymous for a "set" and each entity represents a set of some sort. However, relationships between entities are still a very important concept albeit not an eponymous one.

Posted March 11, 2022

Folks relate to physical tables; even the most non-relational-minded person can picture a fixed structure file and equate that to a table and its columns. The spreadsheet image is ubiquitous. DBMS-defined views are logically similar to tablesand in usage are certainly interchangeable with tables.

Posted February 08, 2022

Dealing with data warehouses, data marts, and even data lakes, can be awkward in an agile environment. While adding a single metric onto a dashboard can be very natural, no one builds a dimension table a few columns at a time. This awkwardness has caused many variants in how an agile methodology might be applied to one's analytics databases.

Posted January 03, 2022

Using surrogate keys within a database is often considered a technique to improve performance. The assumption is that using anything other than a numeric data value to join tables provides "bad" performance. Therefore, whatever the nat­ural key may be—one column, multiple columns, alphanumeric, etc.—the surrogate key can be a 100% numeric single value, standing in for that natural key value set. Some DBMSs have key generators that are numeric, others may be more wide-ranging in values. Some organizations may choose to use surrogate keys generated from hashed natural key values. Will surrogate keys improve everyone's query performance? As with the stock market, specific circumstances differ everywhere, so the individual results may vary.

Posted December 08, 2021

When normalizing data structures, attributes congregate around the business keys that identify the grain at which those attributes derive their values. Attributes directly related to a person, deriving their value from that individual, will appear on an Employee entity, or a Customer entity, and so forth. However, in the process of normalizing, the data modeler must identify the objects that are meaningful and useful to the organization

Posted November 01, 2021

Marvel should have an evil villain named "Null." Nulls have always been trouble in the relational world. Certainly, nulls are used all over the place by virtually everyone. Still, that does not mean that nulls are harmless.

Posted October 05, 2021

It is entertaining to be in a room filled with people all claiming that they love data. What becomes more entertaining is discovering how each one views data uniquely and enjoys doing different things with or to that data. The data community has become one that is far more diverse than many realize.

Posted September 16, 2021

The meaning and interrelationships of and between data are important. If you are the designer of a database and being lobbied to allow the creation of a table without a primary key, make sure you understand how the table is to be used, and that peo­ple will not be writing queries against the structure that will poten­tially be multiplying data unexpectedly

Posted August 02, 2021

Within IT, testing has matured as an industry. Many tools exist, and many IT shops have testing groups. But, often those testing groups are unable to assist on data-related projects. The heart of the problem is that the focus of the testing practice has been perverted. The testing industry is concerned primarily with validating code, specifically the kinds of code that interact with people. With data-related projects, the need is to test the accuracy of the data at rest within the structures.

Posted July 15, 2021

Inside a relational database management system, the princi­pal persisted data structure is considered a logical relation. Operations performed against that data within the RDBMS result in a logical relation too. In other words, everything is a table. To step away a little from the term "table," let's use the word "grid."

Posted June 10, 2021

Anything worth doing, is worth doing again and again. Right? When building out and standardizing new subject areas or new sources for one's data warehouse, hub, or other analytics area, a task often initially overlooked is the logic bringing in the data. Obviously, everybody knows that the new data must be processed. What many ignore is the idea that establishing the process to bring the data in often must be done twice, or more.

Posted April 29, 2021

All data architects should con­sider themselves change agents for the organiza­tions in which they work. But at the same time, business also wants to keep much the same. Such discoveries can be confusing when business is say­ing that change is desired, but their actions seem focused on preventing change. It can suggest the Albert Ein­stein quote (or the Baba Ram Dass quote, based on which version of history one sub­scribes to) that "We cannot solve our problems with the same thinking we used when we created them."

Posted April 06, 2021

Dashboard users doing unsophisticated, largely repetitive, operational reports and analyses ask their IT support personnel to provide data in a simple way; they demand fast performance from their queries; and they demand new functionality be provided quickly. While often not stated explicitly, the "simple" data presentation implies several characteristics.

Posted March 01, 2021

The data lakehouse is a merging of the organization's data lake and data warehouse into one platform, eliminating data redundancy and loss of time moving data around from one place to another. Is this newest data savior really better than all the data saviors of the past?

Posted February 10, 2021

An architecture derives its strength from a level of consistency in how things are implemented. However, that is not to say that a mindless devotion to absolute consistency is a good thing. Times will arise when exceptions to almost any rule are necessary. The skill, the art, the balance in applying decisions that result in a good data architecture across an organization are based on a prudent use of when to conform and when an exception is needed. If there are too many exceptions, it can rightfully be declared by observers that there are no rules and that chaos reigns.

Posted January 07, 2021

A recent study indicated that IT professionals were three times more likely to disagree with their leadership than professionals in other indus­tries. Identifying a three-times-more-likely differ­ence of opinion seems a significant variance. I don't know what detailed insights the study arrived at to account for this level of disagreeable­ness. Maybe the study considered IT folks as being more educated, more logical, and therefore more difficult?

Posted December 10, 2020

Data modeling has an intimate relationship with abbreviations. Since the creation of the very first data model, there were circumstances where fully worded names for tables or columns were simply too long to implement within one tool or another. Occasionally one runs into an individual who cannot conceive that anyone on the planet would abbreviate something in a different fashion than they do; but more often data modelers tend to enjoy consistency, and when possible, employ rules to support consistent outcomes. Abbreviations are no exception.

Posted November 04, 2020

Plenty of analytics environments have landing areas, plenty have staging areas, and some have both. So, are landing and staging just synonyms for the same thing? A survey of usage would show there is much overlap in implementations, and even some confusion.

Posted October 08, 2020

Why set a trap to fail at some point in the future? When you arrive at a design wherein an entity needs to have a relationship to almost every other entity, stop and think about what is happening, and review your reasoning before proceeding.

Posted September 09, 2020

IT management can often succumb to repeated patterns of behavior. One of those repeated patterns is accusing developers of being "perfectionists." The developer or developers in question will be told that they should not strive for perfection, because perfection is not needed for the circumstance, and, more importantly, attempts at perfection take too long. Is the developer a closet perfectionist? Maybe, maybe not.

Posted August 11, 2020

An enterprise conceptual data model is often seen as a high mountain to be climbed, a journey that will last a lifetime. People have visions of 10 feet or more of wall in the corporate offices wallpapered with an entity relationship diagram [ERD] that has zillions of teeny, tiny boxes and more relationship lines than the combined lines of queuing patrons in all Disney Resorts, when full. In this context, an enterprise conceptual data model is a daunting task not to be taken lightly. But in today's world, that enterprise conceptual data model can simply be a list of subject areas.

Posted July 01, 2020

As the number of types of slowly changing dimensions (SCDs) increased, things have ended with Types 0 through 7, making essentially eight of them. But it is unclear whether there full consensus exists among current practitioners on what actually differentiates each of these eight SCD types. Some confusion may result from the fact that when the first three SCD types were defined, each could be equated to a result for a dimensional attribute. Type 1 had facts associated to dimension values as they are currently, or always current, Type 2 had facts associated with dimension values as were current when the facts were processed, Type 3 had facts associated with both current values and values current when processed.

Posted June 10, 2020

One creates the potential for some interesting anomalies when building a star schema wherein the fact table contains future-dated metrics and any of the dimensions are Type 2.  A Type 2 dimension tracks changes to the data items contained within it. Effectively, each dimension contains a surrogate key, a natural key with a start and stop date, and additional descriptor columns. If any of the descriptor column values change, the existing dimension row has the stop date populated while a new row is inserted with the same natural key, new start date, and new descriptor values. 

Posted May 13, 2020

As we have moved forward with APIs and microservices, every organization has even more data stores to manage and more sources of data to consider. Sorting through data structures for operational solutions can become mind-numbing due to the variety, or even frustrating due to a lack of detail from many vendors. Source systems are no longer the monoliths they once were.

Posted April 08, 2020

From CEO to presidents, VPs, directors, to any number of mid-to-low level managers, the concept of hierarchies is pervasive in organizations. But, if one is dealing with a relational DBMS, a hierarchy remains an awkward concept.

Posted March 05, 2020

Pages
1
2
3
4

Sponsors