The New Data Analytics: Riding on Data Lakes, Data Warehouses, and Clouds

<< back Page 3 of 5 next >>

Structured data repositories such as data warehouses may be essential within businesses turning to data-in­tensive approaches such as AI and machine learning. “Many analyt­ics jobs running in an environment like Spark create structured features from unstructured data,” said Van­Hook. “As features like this feed more machine learning models, as well as traditional reporting, the need for a structured data store to hold the composite features emerges. You can imagine traditional data warehouse capabilities evolving to hold feature tables and becoming a high-perfor­mance feature store that can drive both the training and inference activ­ities of machine learning models. At the bottom line, warehouses become the querying point, while lakes are the analytics point.”

The data within data warehouses is generally trusted as the central version of truth because it’s highly curated and processed, said Anjan Kundavaram, chief product officer at Precisely. “For analytics, the structured format of data warehouses makes it easier for standardized access, queries, and reporting. The predetermined struc­ture also offers ready-to-use, clean data that is ideal for organizations that need to conduct operational analysis or reporting.”

The platform “should have a data fabric to drive data flow orchestra­tion and automation to deliver infor­mation and intelligence to users,” Winfield said. “The platform will also need shared management and secu­rity services and support for a range of clients to meet the application development requirements for differ­ent users—including data engineers, data scientists, business analysts, and business users.”

Data fabrics also offer a way to bring these environments closer together to deliver analytics as needed. “Today’s data warehouses are collecting immense amounts of data—more than may have been anticipated when these technologies were originally implemented,” Gnau said. “While data lakes have helped organize this raw data into central repositories, they still are not typically involved in opera­tional and transactional data flows. This is where modern data architec­tures, such as data fabrics, come into play. Not only do data fabrics effec­tively organize the datasets into fields that help identify the most actionable and high-quality resources but each one [also] tends to meet a unique, IT-driven purpose. Without a well-or­chestrated architecture, the data remains either inaccessible and wasted or not efficiently addressable, regard­less of where it sits within the data lake or warehouse.”


Are analytical platforms such as data warehouses, lakes, or lakehouses going to the cloud? Are there scenar­ios where on-premise approaches are still preferable? A recent survey of IT leaders found that the major­ity, 53%, see hybrid or multi-cloud data warehousing as one of the most important data warehousing-re­lated trends of this year—more than any other trend. The question isn’t really about “why” to use cloud any­more, said Minnick. “Increasingly, we’re seeing customers now ponder ‘which’ clouds.” Minnick noted that the majority of Databricks’ enterprise customers work with at least two cloud providers today. “As a result, it’s become much more important that organizations adopt solutions that offer a consistent experience for their employees, regardless of where the data resides.”

<< back Page 3 of 5 next >>