Newsletters




Database Elaborations: The Data Virtualization Blues


Operational systems that support reporting generally become bogged down by having too many views that are overly complex. No one really understands why all the extraordinarily large number of joins and other elements are within any view, and everyone is afraid to remove any single piece. More and more reports are built on top of these views. 

Ultimately, a circumstance exists wherein no one is comfortable with changing a single line of any view, for fear of initiating a cascade of failure across existing reports. In this environment change is avoided, and the cost to re-work everything is considered extremely excessive.

Differing kinds of data structures support differing kinds of workloads; front office business functions generally perform better on top of normalized structures; light analytics and reporting are easier to accomplish when using multidimensional structures. These differing approaches exist because they each serve a purpose and fulfill a need. If the data within an organization is fairly small, performance delays due to using an ineffective kind of structure are negligible. For example, if there is not a lot of data one can use normalized structures for reporting and analytics without suffering from queries that may run for hours. However, one would be forcing the reporting folks to understand the possible complexities within a normalized operational design, which may be its own motivation to create simpler multidimensional structures. There is work involved in creating physical data structures; but in such environments the performance of such moves and additional storage should not be overly burdensome to the organization.

Data virtualization allows for data structures to appear differently than they physically exist, and to join data structures across platforms. Virtualization should be a tool in one’s arsenal. Sometimes one does have a data store that is too big to move multiple times. And for that circumstance, data virtualization can prove to be a valuable technique to allow the establishment of a coherent data warehousing and analytics environment.

Data virtualization enables the ability to have one or more data stores that break the bank processing-wise, because they can physically exist once but logically exist in multiple transformed structures. Occasionally, IT managers get the idea that data virtualization is a more generic answer, presuming that if it works for the big data, it can work for all data. If data virtualization works for all data, then instead of writing code and moving things, ETL and databases don’t have to be built, the IT world can be largely virtual. Savings are seen all over the place in the virtual solution.

At a primal level, data virtualization is a set of views across one’s operational system. There are a few more bells and whistles than a simple single DBMS view, but logically the two concepts are very much the same. In a virtualize-only world, all business rules and transformations are buried within the data virtualization views.

Despite their best intentions, few organizations establish the level of data governance and data administration necessary to have each of those virtualization views documented, as to their purpose and why each quirk is added, for anyone to be successful in later recognizing when things may be removed due to changes elsewhere.  Eventually, the data virtualization platform becomes bogged down in having many views that are overly complex. Thus, it is that no one really understands why all the extraordinarily large number of joins and other elements are within any view, everyone is afraid to remove any single piece, so more and more reports are built on top of these views. 

Ultimately, this results in circumstances wherein no one is willing to change a single line of any view, for fear of initiating a cascade of failure across existing reports. Change is thereby avoided while preventing the excessive cost to re-work everything. Isn’t this the uncomfortable situation where some folks began? This does not need to be another “circle of life"; instead, use data virtualization when you need to virtualize, not just because you can.


Sponsors