Classically, the systems selected to manage and protect the most critical asset of any company were chosen using a narrow set of criteria. Those standards were usually based on and relegated to the capabilities and limitations of the most prominent tools available in the timeframe of the development of the initial architecture. As the lifecycle of a data system progressed, attempts were often made to transition a company’s data from the older out-of-date software systems and hardware platforms to more modern technologies. The process was always expensive and cumbersome in every conceivable dimension. Data could not flow between applications based on the applications interest in that data or the data’s relevance to the application’s logical requirements and was therefore unnecessarily restricted and consequently minimally accessible.
The Asteroid on the way – Data’s KT Boundary
65 million years ago a massive asteroid or comet struck the earth near the YucatanPeninsula in Mexico portending the end of the 150-million-year reign of the dinosaurs on earth. The sudden transition between the two geologic periods is known as the Cretaceous-Tertiary (K-T) boundary. Today, information technology is experiencing a similar transition. Although the planet will not be set on fire for a decade, the profundity and abruptness of the transformation is no less significant as the dinosaurs of data must adapt to a massive evolutionary shift. In the case of IT, the asteroid is the massive volume of data being generated at an ever increasing rate from a variety of disparate sources. The asteroids target is the legacy RDBMS systems.
For many decades data has been ruled by the products based on the relational data model known as relational database management systems (RDBMS). The relational model is simply a mathematical model that describes how data relates but it is not the sole proprietor of data in general, at least not in the 21st century. In fact, the most significant factor in this evolutionary development and our euphemistic asteroid is the pervasive presence of unstructured data.
Unstructured data comes in an endless number of possible configurations to include pictures, videos, audio recordings, PDF files, spreadsheets, documents and many other forms we have yet to conceive of. Sometimes unstructured data lives within a database. Sometimes the database acts as an index for the unstructured data. Often the metadata (information about the data) associated with the unstructured data is larger than the data itself. Consider the example of a set of videos. Although the files may be small in size the information stored regarding the content within a particular video may be enormous.
Often the notion of “unstructured data” is referred to casually as “big data.” As the subject evolves, we may see a clearer definition of the term “big data.” However, the idea that the trillions of files of disparate content, each of which has individual value to someone, somewhere at some time will only become more prevalent and dominant in 21st century IT. So, although the relational model will likely survive, its dominance will ultimately be diminished.
The future of data management – a tower of babel?
Data management is quickly evolving into many different forms each of which has a particular value to add to the broader scope of data dependent functionality. Modern IT systems must be able to ingest, access, store, manipulate and protect data within a wide array of disparate conditions and constraints. Often the restrictive strictures of an RDBMS mutually exclude the necessary flexibility, elasticity and alacrity that many modern business functions require. In some circumstances data must be accessed so quickly that the velocity of light is considered an inconvenient limitation.
Certain business functions may require analysis of massive amounts of data using a seemingly infinite assortment of parameters performed within whimsically narrow time slices. Data also exists in an unlimited number of forms and structures as well. Each of these systems recognizes a particular style of data with a fairly well-defined set of attributes and manages that data to satisfy a particular thematic business function. Figure 2, the “Database Landscape Map” being developed as on-going project by 451Research displays the overlapping and interconnected reality of modern data managements systems and in a single elegant graphic highlights the daunting nature of selecting the appropriate technologies to manage each distinct set of data.
However, ordered outcome is not guaranteed and innovation is accompanied by risk and cost. Whether the overall evolution is contrived or organic, the rapid development of these new data management technologies has resulted in isolation between these new technologies in terms of communication and management. If left unchecked, this will inevitably lead to each technology evolving its own “language”, to include separate processes and protocols making intercommunication between systems a challenge. Absent a methodology which allows for seamless comprehensive management of all these forms of data and smooth flow of that data between the data managements systems a Tower of Babel will exist and the Dataverse will be entirely dysfunctional. A Unified Data Strategy must be employed to avoid this chaos. Part 2 of this article will introduce and describe “An Alternative Approach” to the long-standing methods of accumulating, managing and being managed by data.