Big Data: The Battle Over Persistence and the Race for Access Hill

Jan 7, 2014

By John O’Brien

<< back Page 2 of 4 next >>

Decades ago, multidimensional databases, or MOLAP cubes, were optimized to persist and work with data in a way different than row-based relational databases management systems (RDBMs) did. It wasn’t just about representing data in star schemas derived from a dimensional modeling paradigm—both of which are very powerful—but about how that data should be persisted when you knew how users would access and interact with it. OLAP cubes represent the first highly interactive user experience: the ability to swiftly “slice and dice” through summarized dimensional data—a behavior that could not be delivered by relational databases, given the price-performance of computing resources at the time.

Persisting data in two different data stores for different purposes has been a part of BI architecture for decades already, and today’s debates challenge the core notion of transactional systems and analytical system workloads: They could be run from the same data store in the near future.

The New Big Data Paradigm: Data Is Data

The NoSQL family of data stores was born out of the business demands to capitalize on the “orders of magnitude” of data volume and complexity inherent to instrumented data acquisition—first from the internet website and search engines tracking your every click, to the mobile revolution tracking your every post. What’s different about NoSQL and Hadoop is the paradigm on which it’s built: “Data is data.”

Technically speaking, data is free, but what does cost money and contributes to return on investment calculations are costs to store and access data: infrastructure. So, developing a software solution that leverages the lowest cost infrastructure, operating costs, and footprint was required to tackle the order of magnitude that big data represented—i.e., the lowest capital cost of servers, the lowest data center costs from supplying power and cooling, and the highest density of servers to fit the most in a smaller space. With the “data is data” mantra, we don’t require understanding of how the data needs to be structured beforehand, and we accept that the applications creating the data may be continuously changing structure or introducing new data elements. Fortunately, at the heart of data abstraction and flexibility is the key-value pair of data, and this simple elemental data unit enables the highest scalability.

A Modern Data Platform Has Emerged

The Battle Over Persistence principle argues that there are multiple databases (or data technologies), each with its own clear strengths, and most-suited for different kinds of data and different kinds of workloads with data. For now, the pendulum has swung back into the distributed and federated data architecture. We can embrace flexibility and overall manageability of big data platforms, such as Hadoop and MongoDB. Entity-relationship modeled data in enterprise data warehouses and master data management fuse consistent and standard context into schemas and support temporal aspects of reference data with rich attribution to fuel analytics. Even analytic optimized databases—such as columnar, MPP (massive parallel processing), appliances, and even multi-dimensional databases—can be combined with in-memory databases, cloud computing, and high-performance networks. Separately, highly specialized NoSQL or analytic databases—such as graph databases, documents, or text-based analytic engines—have their place and can be executed natively in these specialized databases.

For more articles related to big data, download DBTA's Big Data Sourcebook.

<< back Page 2 of 4 next >>