Jack Be Nimble, Jack Be Quick: Providing What Analytics Users Need

Dashboard users doing unsophisticated, largely repetitive, operational reports and analyses ask their IT support personnel to provide data in a simple way; they demand fast performance from their queries; and they demand new functionality be provided quickly. While often not stated explicitly, the “simple” data presentation implies several characteristics.

Simple presentation requires that data be grouped in meaningful assemblies, like a subject area. As well, if a subject area naturally contains items from multiple sources, then these items should be integrated together in a coherent fashion, with data quality checks and standardizations also applied. The simple presentation should allow users to view data as it is today, or request and obtain the way the data looked at a previous point-in-time. And simple also implies that when they run the same query, for the same point-in-time that they get the same answers, or at least the ability to break down the answer to get what they received previously, plus knowledge of the changes applied (and why).

Developers and data scientists can deal with complexity and inconsistencies in their data sources. It is understood that beyond changing content, developers may have to handle record structures that change shape over time. Data scientists largely understand that they will need to cope with inconsistent codes from differing sources. The average dashboard user cannot handle very much complexity nor should they need to do so.

It was to resolve the kinds of issues mentioned above that the concepts and architectures of data warehousing and data marts came into being. If simple integrated data is not provided, either in a data warehouse or multidimensional data mart, it may not be the end of the world. But any shortcomings in the implementation of the physical data structures must then be able to be hidden from those analytics users by the reporting and analysis tools used.

These analytic tools will still need to present data in a simple fashion to users, and then turn-around and provide rock star query performance. The work-a-day reports and basic analyses still must be supported. And even with these essential reports, requests cannot be log-jammed in a queue for IT to work through in creating custom reports one by one. Self-service is necessary. Users need to be enabled to fish the data for themselves.

Being physical about much of what has been done and creating physical data warehouses and physical data marts was the simplest approach. That was why they were created. Someday more complex configurations will work where data remains in one single structure, and that data’s “appearance” across more normalized, data-vaulted, or dimensional structures is nothing but an illusion instantiated via the latest and greatest data virtualization tool.

And that will be fine. The success or failure of those tools will be the same as it is today—does a user still get what they want and need in a simple perspective of the available data, and get good query performance? Most tools today do not yet overcome all complexity and performance issues. Maybe they will in the future. Or perhaps it is likely that, like the proverbial poor, data warehouses and data marts will always be with us.

Keep implementations as simple as possible. The bottom line is that there is a need for simplified, unified, subject-oriented, integrated, time-variant, and non-volatile data sources, for all kinds of reasons. Everyone cannot fish convoluted source data structures for themselves; if tried it would simply be too costly for most organizations to attempt. People would spend too much time and energy, and the lack of results would generate enormous lost opportunity costs. There is still a need for the data warehouse.