Understanding the Basics of Database Hygiene

Many people claim to understand the basics of good data and database hygiene. Often, these same people claim it is all very simple and very obvious. However, when peering into existing code and databases, it doesn’t seem that good practices are as obvious as people say. “A2” as the name of a column may sound ridiculous, but it has happened. In addition, many names for columns occur in databases across this world that are just as useful as the name “A2”—“Type,” “Flag,” “Good,” “Name,” “Code,” “Value,” and even “ABC,” for example.

These naming abominations have been created by developers and others who simply did not worry about establishing names that were more clear, understandable, or self-explanatory. These names were “good enough” often because the individuals who created them were the same ones who used them every day, and they understood exactly what they meant.

Column names aren’t the only area in which healthy practices suffer. If columns are added to a table based on coding need, regardless of whether the added columns are germane to the initial intent of the table, these developers or others believe they are simply being efficient. The results are “normal” enough. SQL joins just slow things down. Adding in a bunch of items that are not truly dependent on the key is moot; doing it otherwise is simply more work for no good reason. Besides, for the table’s key, we are simply generating a sequential number; the key doesn’t really mean anything other than that the developer decided to create another row. Everyone knows that “denormalizing for performance” is an acceptable practice. The only thing impacting the world will be the interface displayed. As long as the code works and provides a result that cannot be proven incorrect, that is all that matters, right? The problem with the above mentality is that those thoughts express myopia and hubris.

Those good-enough opinions are not wellfounded. Databases and code are no longer isolated to a lone developer. The individual who initially wrote something may be long gone, leaving others to open up what’s been left behind and enhance, change, or fix the issues. Even if the originator is still in-house, that person may no longer be the appropriate individual to drop everything and work on an issue. Thinking that data structures or process blocks are any developer’s playground, allowing each to do with as he or she chooses, is no longer an appropriate perspective. Everything today is shared, open, exposed, and reused, and data created in one place is used elsewhere, even sometimes, everywhere.

It is in this context of a shared data resource, where cryptic names and sloppy modeling techniques cause problems. Such methods mean that
it takes more time for others to learn, decode, deconstruct, and set it right in their minds. When dealing with bad names, bad designs, and sloppy practices, the organization spends far greater resources to relearn, re-explore, and redo the shortsighted techniques of the past for accomplishing even small fixes. In defense of these sloppy tactics, it has been offered that time-crunches and managerial whip-cracking often force these shortcuts to speed up delivery. Sadly, instead, it is the whip-cracking and time-crunching that exposes the developers’ lack of faith and disbelief that doing things properly matters. Those who believe that not naming a column well and clearly, or not seeing to it that data structures are based on proper functional dependencies, are not saying they do not have time to do that work, they are saying they really do not understand how to do that work. Because when one does truly understand, there is no corner to cut.