While there have been many questions about the future of data warehouses, industry observers tend to agree that they are evolving and their role is just as crucial as ever.
There is speculation today about how data warehouses and data lakes will share the enterprise stage.
Data lakes—a place to store diverse datasets without having to build a model first—have seen adoption rise as data managers seek to develop ways to rapidly capture and store data from a multitude of sources in various formats. Overall, 38% of organizations in a recent Unisphere Research survey are employing data lakes as part of their data architecture, up from 20% in the 2016 survey. Another 15% are currently considering adoption.
At the same time, data warehouses aren’t going away anytime soon because they “allow analysts to slice and dice the data to determine important insights derivable for a variety of purposes, such as driving better decisions, and laying out better organizational and pricing strategies,” believes VoltDB CTO Dheeraj Remella. “Not all insights need to be utilized in a real-time manner.”
Gerrit Kazmaier, EVP of HANA and analytics at SAP, agreed that data warehouses continue to play a vital role in enterprises. “Every enterprise needs a consistent and interconnected view of its data across its operations, customers, suppliers, and employees to understand its business. To make this possible, it is crucial to consider the key parts of the data value chain: the storage, quality and semantics, and usage of the data. Without a comprehensive, secure, consistent, and fast data management system, essential data loses its value and can even become a burden on a business.”
Traditional data management systems are still deeply embedded within enterprises, added Kim Kaluba, senior manager of data management solutions at SAS. Such systems “will continue to be viable offerings because of their maturity and strong user community, as well as plentiful resources to support these environments.” Nonetheless, with emerging players and technologies, data warehousing solutions will have to get cheaper and more efficient, Kaluba observed.
For companies that don’t have a data warehouse implementation, “the outlook appears quite different, with a ‘data lake first’ strategy generally taking precedence,” said Rob Small, principal consultant with Dell Technologies Consulting. “For the time being, data warehouses will remain at the core of enterprises, serving as the source of truth for operational reporting and BI. What is changing drastically is the underlying technical and data management architecture.” There is a broad, cross-industry trend to move away from the traditional vendor offerings and to migrate to different platforms and alternative data integration strategies to reduce cost, improve operational efficiency, and better support user needs around self-service analytics, Small said.
Data warehouses play a functional role in helping organizations to have data integrated in an understandable format that is efficient for queries and in capturing all business rules, added Pedro Desouza, principal for solutions at Dell Technologies Consulting. “The technology part is about how it is implemented. The functional part will remain, but the technology part will change dramatically with every new wave of technological innovations.”
ENTER THE DATA LAKE
Are data lakes—enterprise repositories that capture and organize information before it is transformed for consumption—taking the place of data warehouses? Opinions vary across the industry as to how these two environments will mesh—or clash. According to some experts, there may be a great melding underway. “Data warehouses are in the process of becoming part of data lakes,” said Desouza. “There will be no such thing as one or the other anymore.”
“Data warehouses aren’t necessarily being positioned against data lakes, they’re working in tandem,” Pete Brey, senior product marketing manager at Red Hat, concurred. However, he added, they organize data in different ways. Data lakes are an overarching technology that can accommodate all types of data—structured and unstructured. Data warehouses are specifically designed for structured data.
Not all industry leaders are convinced that data lakes are ready to storm the enterprise, however. “Data lakes are billed as the repositories for big data—data that is too large and diverse, semi-structured and unstructured, to be stored in relational databases,” said Monte Zweben, CEO of Splice Machine. Proponents of data lakes positioned schema-on-read as an advantage so businesses no longer had to worry about the tedious process of defining which tables contained what data and how they are connected to each other—a process that took months and did not allow a single data warehouse query to be executed before it was complete, he said. “We believe the build-it-and-they-will-come philosophy of schema-on-read has failed.”
SELECTING ONE OR THE OTHER
When is it preferable to rely on a data warehouse instead of a data lake for enterprise insights, and vice versa? It depends on the situation, as both bring advantages to the table. Data lakes, for instance, are a great source of primary data, said Jake Freivald, VP at Information Builders. “They’re ideal for non-repeated processes where users already know the data’s context, such as an AI model that looks at raw data to discover new patterns in it. A data warehouse has assumptions already built into its model, which would distort attempts to find new models.”