Until recently, industry pundits were relegating data warehouse technology to the backwaters of the big data revolution. But a funny thing is happening on the way to the big data future. The enterprise data warehouse is undergoing a rapid evolution: It’s faster, more analytical, more open, and more cloud-friendly. Even new-platform proponents have begun to acknowledge that Hadoop and NoSQL alone can’t effectively present data that is well-vetted, indexed, and digestible to core business applications and decision makers. It takes a data warehouse.
Data managers say most of their big data activity at this time is coming out of enterprise data warehouses and associated business intelligence environments. Close to half of data executives responding to a Unisphere Research/Information Today, Inc. survey on big data say that analytical data and related data warehouse environments are strong factors in the expansion of big data in their organizations. This is topped only by growing business demand and the “push to compete on analytics” which is fueling the need to prep and store data within analytical platforms and tools (“Achieving Enterprise Data Performance: 2013 IOUG Database Growth Survey,” underwritten by Oracle, July 2013).
For more articles on this topic, check out the DBTA Best Practices Section on the Future of Data Warehousing
Enterprise data warehouses aren’t going away anytime soon. Despite claims that Hadoop will usurp the role of data warehousing, Hadoop needs data warehouses, just as data warehouses need Hadoop.
However, making the leap from established data warehouse environments—the kind most companies still have, based on extract, transform and load (ETL) inputs with a relational data store and query and analysis tools—to the big data realm isn’t a quick hop. For starters, data warehouses simply don’t have as much flexibility as newer big data tools and frameworks, since warehouses are configured to schemas and data models, so changes to accommodate big data analysis mean time-consuming realignments. At the same time, efforts around unstructured data analysis may be taking place outside the rigor of established data warehouse operations, in more informal modes not tied to business monitoring or metrics.
To successfully integrate data warehouses into today’s unstructured data world, organizations need to rethink their enterprise data architecture and identify the business goals that will be accomplished through such efforts. The good news is that the next-generation data warehouse that is evolving in conjunction with big data initiatives doesn’t require a total restructuring of the data environment. Instead, it builds upon existing relational data warehouse technology, with the data warehouse either remaining at the core of the enterprise data network, or playing a role as a powerful citizen of that network.
Best practices that will help enterprises move into this new realm include the following:
Let the business drive data priorities.
Cost isn’t the only factor when it comes to selecting data technology, but it’s an important one. The business needs to set the tone of how much and where it is willing to invest in various types of data. There is often greater cost with storing and managing data for analysis through data warehouse environments, since they are purpose-built to extract, transform and load secure, vetted data on a batch-processing basis. Many big data projects, on the other hand, may simply require on-the-fly analysis against large sets of sensor-generated data. Such data wasn’t even on the radar screen when ?data warehousing was the only option for analysis, and often got discarded.
Image courtesy of Shutterstock