Beware the Frankenmart!

Novice developers are often confused about database design. And when they mature into experienced personnel, they unknowingly pass on bad habits to the next generation of developers. The problem at hand is a very human one. When putting together data marts, particularly data marts intended to be dimensional, some developers tend to think the way a typical end user might think.  End users are very creative. If end users have a need, a function, a task at hand that the given application does not directly support, they will find a way to instantiate that missing functionality. It may require very unexpected uses of the existing solution, but the need will win out somehow. Developers may or may not understand the tenets and approaches that comprise a “dimensional data mart”; and often those data marts are implemented inside relational DBMSs. Therefore, the developer often feels that anything the DBMS will do is “fair game” for their data mart.

A relational DBMS is a very generic tool, whereas a multidimensional data mart is a specific kind of approach. Therefore, including “anything” that is relational actually spans a great many features that are not multidimensional in nature. The result of such a composite approach likely results in a data mart that has dimensional nomenclature applied to it, but that in practice is really an amalgamation of third normal form, denormalized, and dimensional structures. Most optimistically, one could think of these data marts as a pastiche, a bricolage, or to take a little poetic license, a data-based portmanteau morph, containing the best of all possible elements needed to deliver a solution. In practice, results are often less lofty. The data mart may be a complete pile of junk, or a Frankenstein’s monster comprised of pieces from here and there that function, sort of.  These Frankenmarts exist in many places. The limitations of these beasts often are associated with two main concepts. One, writing SQL to access data correctly means having a detailed understanding of all the pieces. Consequently, there may be a lengthy learning curve for a new querier to become functional. And next, because of the numerous differing approaches contained within the single solution, scaling may not be straightforward. The simplest of changes to the solution’s requirements might force massive, or even complete, refactoring that could prove quite costly.

Unlike our end user, data mart builders must understand what they are working to accomplish. The DBMS is not going to magically guide them to a solution. The builder is responsible for knowing how dimensional techniques work, why they work, and what options may exist within the dimensional framework. The DBMS has a lot of functions for serving all sorts of purposes. Architecture means in part that there is a place for the elements used, and the elements used are all in their proper place. The person serving as the data architect must be the enforcer of the rules, or else there will be none. A Frankenmart has pieces put together, but likely not in their proper place, and that is the ultimate weakness. Simple is almost always best. If the target is a multidimensional data mart, then work toward exactly that. Exceptions may arise, but those exceptions should be very rare and should be very well-documented. Therefore, the best results are most often simple star schemas, with no snowflaking and no strange normalized components dancing around facts. There should be no expectation of direct fact table-to-fact table joins. And ideally, even though they are “legal moves,” there should be an absolute minimum of bridge and outrigger tables. Enforcement is completely in the hands of the data architect in charge, and these individuals need to have the fortitude and belief to imprint their will on the solution’s database design.