An alternate view of the enterprise data architecture is figure 5. We take some representative physical technologies like memory, disks, clusters and their software usage paradigms like in-memory complex event processing, relational databases / data warehouses, Hadoop distributed computing.
On the Y axis is the data sizes handled by the representative technologies and on top we indicate potential best time case for handling data. Enterprises have used relational databases (RDBMS) as the “System of Record” to capture data persistently. However RDBMS have bottlenecks in terms of data size handling capability. The traditional solution for scaling relational data is to have parallel database instances and the entire data space is “sharded” amongst the instances. For reports, data warehouses are used wherein data is “de-normalized” from the RDBMS data store, which is suitable for reporting purposes. Furthermore, advanced large data warehouses use specialized hardware for reporting “dimensions of data” that have been mapped onto cube like structures so that speeds can be improved.
Data value with representative technology
If we take the data value proposition curves from figure 1and superimpose the graph onto the representative technologies used in enterprises then we have the following figure 6. The picture succinctly captures the value proposition of data with enabling technologies.
There is one-to-one correlation between the two. Highly valued data at creation time is handled by some in-memory program that tracks the event and along with other events makes a complex decision. Then, over time, the data is stored in databases and/or warehouses. Eventually the data is sent to distributed clusters for historical and exploratory analytics using Hadoop technologies for processing. We also have an idea of potential time scales for each of the technologies used for processing data over its life cycle movement/usage through an enterprise.