Newsletters




Differences Between Operational and Analytical Data Structures


It is an unfortunate result of being human that we are biased by our first impressions. For example, we are often blinded when we encounter a subject area’s data for the first time. The initial data structures we analyze and learn to understand set our internal paradigms. Once we understand that initial structure, we tend to believe that the way that structure is configured is the best, or the only way, that a specific data set of that kind of data should be presented. Our expectations have been set by our understanding. When data modelers transition from crafting operational solutions into designing analytical solutions, the biases and preconceived notions the data modelers bring with them can be a problem.

Operational solutions and analytical solutions have differing goals, priorities, and data patterns to be used. What is done with data structures to run the applications that execute our business processes is quite distinct from what is done to perform analysis. The design differences are best highlighted when one deals with an operational solution idea called “custom attributes” or “additional characteristics” or some similar name. The operational goal is to provide flexibility for the solution administration. If a new characteristic is needed in the associated area, there is ultimately a table having the associated object primary key, a characteristic name or identifier, and lastly the characteristic value. New attribute needs are handled by simply allowing a new row to be added into this structure. The new row will have a new characteristic name existing in a name column, and a value existing in a value column with its associated object primary key. An infinite number of new attributes could be added for that same associated object primary key. This design is open, generic, and abstract, which provides the desired flexibility. And this can be managed dynamically through a user facing a set of screens.

As mentioned previously, analytical solutions have different needs. For analytics one needs to be simple, explicit, and set up for repeatable answers. Having to bring special knowledge to the analysis just to find the correct data is working against the intended purposes. If the operational approach of adding an infinite number of characteristics vertically, as rows within a table were used, this would complicate the coding needed to extract the data. The user would need to know that the characteristic is a name and value pair within a row rather than an actual and distinct column. A user in a rush to get an answer might easily just join into the structure and not notice that they have now multiplied their numeric values by the number of characteristic rows being retrieved. Ouch. Ideally, the correct analytic design should prevent this type of misunderstanding. These attributes would be flipped into a horizontal arrangement within a dimension-like table. In the rotated fashion, the characteristic names become the headers/names for the columns. Queriers would then simply select the attributes of interest by name, and by being within a single row any joins will not multiply one’s results. Yes, this means that as new items are needed the data structure must be expanded to include new columns. The operational flexibility to add new characteristics on the fly is not there. The task of adding a column is one that a modern data platform can implement quickly and easily. And the task of adding a column on occasion is more beneficial than setting up a “flexible” approach that leads to easy user errors. Stability is important. As is leading users down a path of never getting a wrong result. As a designer, make sure structures are fit for the kind of environment they will be inhabiting as part of crafting a data structure’s fitness for use.


Sponsors