Page 1 of 4 next >>

The Data Warehousing Sanity Check

The “big data” era is still very much upon us, ushering in an age of constantly evolving technologies and techniques. Many wonder whether the enterprise data warehouse (EDW) still has relevance in the industry, particularly since many new alternatives exceed the technical capabilities of the traditional EDW at a drastically reduced cost. A year ago, I wrote that the EDW is still a sound concept, albeit one that needs to evolve. That sentiment is still true today; however, the rise of several groundbreaking technologies in the last year makes it clear that meaningful evolution is occurring.

At the core, all EDWs share the central concepts of integration and consolidation of data from disparate sources while governing that data to provide reliability and trust, enabling credible reporting and analytics. But in today’s world of high demand for analytics-driven business decisions, is credible reporting enough?

The Legacy of the Data Warehouse

Over the past few decades, use of the EDW has proven to be a worthy but insurmountable undertaking, with a relatively low success rate. In fact, generally accepted survey data indicates that 70% of data warehouses ultimately fail. Of the 30% of “successful”

EDWs, many will never achieve ROI or strong user acceptance. EDW failures can largely be attributed to legacy interpretations of the design and traditional waterfall software development lifecycle (SDLC) approach. A current trend that is helping EDW projects succeed is the use of more modern, agile techniques. These techniques allow EDW implementations to grow naturally and be malleable, as the central requirements for both data and business evolve.

For more insight into big data technologies and trends, get the free Big Data Sourcebook

Another point of failure is the traditional EDW does not fulfill all of the data analytics needs of a modern organization. Many organizations—particularly large corporations—view the EDW as the sole solution for all data analytics problems. Consumers of data have been conditioned to believe that if they want analytics support, their only choice is to integrate data and business processes into the EDW program. This often leads to a situation in which extreme IT effort is put into modeling and loading new subject areas into a rigid and governed system before the true requirements and value of the data are known.

In many cases, the core design and technology of the EDW alone is simply not effective to solve the business problem at hand. New requirements such as analysis on semi-structured, semi-governed data, or the ability to analyze streaming data from IoT, or network analysis, search, and data discovery and exploration are all ill-served by traditional EDW methods backed by relational database technology.

Years ago, data mostly came from rigid, on-premises transaction systems backed by relational database technology. However, use cases such as those listed above have become more common in the era of big data. The premise-based systems still exist, but many have moved to the cloud as SaaS models. Additionally, many no longer run on relational platforms, and our method of interaction with them is often via APIs with JSON or XML responses.

Page 1 of 4 next >>


Subscribe to Big Data Quarterly E-Edition