The amount of data being generated, captured and analyzed worldwide is increasing at a rate that was inconceivable a few years ago. Exciting new technologies and methodologies are evolving to address this phenomenon of science and culture creating huge new opportunities. These new technologies are also fundamentally changing the way we look at and use data.
The rush to monetize “big data” makes the appeal of various "solutions" undeniable. But companies must perform proper due diligence to fully understand the current state of their data management systems. Companies must learn to recognize the various forms of disparate and seemingly extraneous forms of information as data and develop a plan to manage and utilize all their data assets as a single, more powerful whole. The integration of the modern data management techniques with classic data modeling methodologies presents an interesting and sometimes counter intuitive conundrum. The fundamental challenge stems from the validity of classic data modeling techniques and the recognition that for many datasets those classic methodologies will always apply.
Below, Figure 1 shows a sampling of the many modern systems being utilized to manage different forms of disparate data. Companies need to adopt a comprehensive and holistic approach to managing these many systems and incorporating them into a combined strategy.
A Unified Data Strategy (UDS) is an abstract broad based concept that describes how massive amounts of data in a multitude of forms can (and should) be understood and managed as distinct dimensions and manifestations. A UDS is also a specific individualized methodology developed by each data owner to manage that data in all its forms in a comprehensive but interrelated manner. By adopting a UDS, owners of data will be able to develop comprehensive, customized methodologies to manage their data. By taking into account the interconnected nature of the various sources of data and tailoring the management of that data to the specific business requirements the maximum value will be achieved.
The transition from traditional relationally-structured data to a UDS is complex, but can be navigated effectively with a disciplined approach to the effort. To successfully adopt a Unified Data Strategy, companies should focus on the following:
1. Developing a rigorous understanding of how the business consumes, produces, manipulates and uses information of all types.
2. Determining how the business can use data to both understand external behavior and to assist in making internal decisions, as well as understand how the data itself is relevant to influencing a critical moment of attention.
3. Analyzing the “personality” of each data form so that it can be matched with tools that appropriately acquire, filter, store, safeguard and disperse the data into useful information.
4. Selecting infrastructure and tools that automate or eliminate traditional high-cost tasks such as import, provisioning, scalability, and disaster tolerance. A highly virtualized infrastructure with complementary tooling should provide the majority of these capabilities.
5. Commitment to the process of learning an entirely new approach to technology, and to adopting it in risk-appropriate increments.
Any company with a significant data infrastructure is aware of the pitfalls that could exist in rushing into this brave new world. However, thorough analysis can lead to an understanding of the current state of their data management systems, and subsequently to better control of their existing data. Ultimately they will be able to recognize, manage and utilize new forms of disparate and seemingly extraneous information as data. Companies that develop a plan to comprehensively address all their concerns around managing and utilizing all useful data will gain significant strategic advantages.
Interestingly enough, the other major IT phenomenon of this era – cloud computing – may provide the solution to this data management and recognition problem.
Throughout this multi-part article we will endeavor to define the idea of a UDS to address the daunting task of comprehensive data management. Virtualization, the foundation of cloud computing, is the cornerstone of this strategy. The capabilities and architecture enabled via a virtual/cloud infrastructure can help enterprises develop an elegant UDS to address the global seminal movement in data management, philosophy and practice. This is the “data cloud.” This is the “dataverse.”
Yesterday’s Data, before the Dataverse
Over the past half century the management of data has progressed through a number of different evolutionary stages. As is often the case academia produces ideas that are later transformed into useful products. “Data,” is simply a collection of discrete units of information but like the stars in the night sky taken together form an infinite array of stellar mosaics which overwhelm the mind and spirit.
A “database” or at least the common use of the term database is generally interchangeable with the term relational database. A database is a logical construct created by one of the systems that together establish the historic dominant force in data, the relational database Management System or RDBMS. The products based on the RDBMS are consequences of mathematical studies in the 1960s and earlier that resulted in the “relational model” developed from “relational calculus.” However, prior to the relational model dominating all thought pertaining to data; other systems were used to manage data. Simple flat file structures were used in early mainframe systems, hierarchal and network or codasyl systems were used in that timeframe as well but since the late 1960s the great percentage of data management systems have been implemented using the products that grew from the relational model. Oracle, SAP Sybase, IBM Informix, Microsoft SQL Server and others constitute a trillion-dollar industry that continues to dominate global data management.
Throughout these past 40 years expensive software application products and sophisticated hardware architectures have been developed and implemented for the singular purpose of implementing the applications that could make use of the RDBMS products. The data was formed to and management methodologies accommodated to, the structures and limitations of the available products. This was because data professionals and their management believed that these products provided the most effective and safest methodology to manage and protect critical data. Throughout this era the data controlled the company and the culture.