Last New Horizon of Computing - the Dataverse

May 9, 2012

By Dr. Michael Corey and Don Sullivan

In the Beginning - Information was Unknown

In the beginning, information was unknown. Eventually, the growing populace absorbed information as it was passed on by each generation person to person, but only the elite had access to that knowledge base. King James and Guttenberg made that information more accessible to the general public but still, access to information was very limited. Galileo, the famous Italian physicist, mathematician and astronomer, originally published his finding in Latin that proved the world was in reality round not flat. Even though this science contradicted church doctrine, the population remained apathetic. When Galileo then published those same findings in Italian, he was excommunicated. By sharing this knowledge outside the "inner circle" he became an outcast. The world was not quite ready to share the "knowledge base" with the common man. As time passed, the ability of the privileged to keep the knowledge base contained to a privileged few waned.

Moving the calendar forward, information became more and more organized and more and more discrete and eventually became known as data. Toward the end of the 20th century, a new breed of science emerged which had as its focus the management, manipulation, and protection of data. That same science known as "information technology" facilitated the development of methodologies to analyze and access that data with infinite permutations of usefulness.

Initially, the data was managed in primitive two-dimensional Cartesian charts but as the understanding of the nature of data and how it was interrelated progressed, the mathematicians involved on the periphery of data management discovered the mathematics of discrete data relationships then referred to as the relational model. Cumbersome as that strict model was, it fully defined the manner in which data should be logically stored so as to eliminate incorrect inferences from analysis of that data and inconsistencies occurring due to the manipulation of improperly relating data. The relational database management system (RDBMS) was born as was the trillion-dollar industry that created the many products such as Oracle, Sybase, Ingres, and Microsoft SQL Server that all have their roots based on the relational model.

Beyond the Relational Model

In the last decades of the 20th century into the 21st century, other methods of storing data emerged. These included the object model and in-memory data management systems, which essentially bypassed the relational model, moving data from tables into programmatic structures.

The introduction of a plethora of tools to create different logical models that corresponded to the utilization needs of the users became commonplace. Terms such as dimensions, fact tables, star schemas, materialized views, data marts and others evolved from the idea that the relational model's requirement of "normalization," although mathematically flawless, imposed an impossible burden of strict limitations and regulations on both users and developers of data access tools. Many concluded that the technical requirements of the relational model in most cases were unnecessary. To solve the problem of the burdensome mathematical strictures, some data analysts claimed that the relational model itself was no longer a valid intellectual premise from which to begin data analysis. However, the relational model was not a technology but a mathematical model and therefore would not become obsolete or irrelevant with any more alacrity than the math of algebra, no matter how much the high school freshman would wish for such a thing to happen.

As time went on, these new extended data management techniques started to incorporate the ideas of formally managing data outside the structures of the relational model. As memory became less expensive and developers gained an understanding of how they could create data structures that bypassed the relational model without jeopardizing the integrity of the data, different models and products appeared. Some of these models were based on "de-normalization" which was the idea of combining relational structures in a way that the new resulting structures were built with an understanding of how the data would be manipulated and that any manipulation would not inadvertently destroy the logical integrity of that data.

The in-memory data management systems, which generally used the object model, matured to the point that the developers learned how to meet reasonable guarantees of durability (ACID test). Finally, unstructured data sometimes referred to as "big data" which had only a very loose connection to traditional data management techniques emerged as one of the most significant concepts in data management. Unstructured data dwarfed all other forms of data in its size and sometimes importance. In fact, it was often the case, that the management of unstructured data was so daunting that an RDBMS was used to manage it but not store the physical unstructured data. Only reference locations were actually stored within the RDBMS.

History Leads to an Interesting Conclusion

All this history leads one to the interesting conclusion that the mindset that has dominated the last half-century of techno-thought must evolve because the reality of data and data management has itself evolved. However, as mentioned above, the traditional mathematical axioms of data management will not disappear as if they were an obsolete 19th century science project. It should also be noted that the relational model itself can not diminish in scientific, mathematical or intellectual significance because it is simply a mathematical description of how data relates.

Despite this conundrum, it is clear that those who limit their thinking to these early philosophies are doomed to the two-dimensional space to which the history of 20th century computing is confined. To naturally progress, data management specialists must open their minds to the variety of manners and methods that exist now and are constantly emerging to manage data. Each of the methods described above have profound value and influence. Depending on the usage context, either from the perspective of manipulation or access, each of these models or even combinations of these methods may constitute or contribute to the correct model for a given set of data.

The Common Denominator - Virtualization

The only common denominator for each of these extended data management methods is virtualization. Virtualization brings the elasticity, flexibility, and efficiency to allow developers to construct systems that optimize data access, which guarantee data protection while allowing users to form data sets that accommodate their data access requirements and needs.

In summary, modern information technology and those who are the high priests of data management must evolve their own thinking from the two-dimensional world of 20th century data management to the 21-century universe of n-dimensional data management thinking. The database administrators must become the custodians of the "dataverse." The more sophisticated personnel have already evolved.

The database administrator has evolved into a vDBA and the RAC-DBA has evolved into a vRAC-DBA. These new custodians of the dataverse have started to embrace the ideas of extended data management strategies where the relational database is part of a hub of data which exists to support companies' informational needs. This elite group is the first to recognize that we have entered the last new horizon of computing - the dataverse.

Newsletters

Last New Horizon of Computing - the Dataverse

White Papers

Sponsors