The New Data Analytics: Riding on Data Lakes, Data Warehouses, and Clouds

<< back Page 2 of 5 next >>

In all scenarios, business require­ments should be the ultimate deter­minant, and this is very much the case with data analytics platforms. “Ulti­mately, it is always about how well business challenges, such as customer churn, fraud detection, or security threat perception, are addressed,” said Sri Raghavan, director of data science and advanced analytics, product mar­keting for Teradata. “Data platforms that are also accompanied by a palette of analytics capabilities—algorithms, visualizations, workflows—that can be used by a wide range of personas will always be dominant.”

There are considerable choices above and beyond the data envi­ronment itself. “Today’s analytics requirements range from real-time and time-series-based analysis right down to standard BI,” said Kelker. Some analytics processes are too expensive to be done in the cloud due to ingestion costs, and, therefore, they require edge AI solutions with local algorithms and tinyML (tiny machine learning).

“The days of master data man­agement are over,” Kelker stated. The focus has moved away from data models toward algorithms, he noted. “The explosion of external data fields intermeshed with internal data makes this increasingly difficult to design upfront. Modern concepts, such as streaming architectures and data meshes, are blending the worlds of data storage and analytics together.”

The main issue is that “organi­zations need real-time, actionable insights to inform critical decision making,” said Scott Gnau, VP of data platforms at InterSystems. “Seamless, cross-silo access to the right data at the right time is difficult due to increas­ing complexity and latency challenges. Scalable, high-performance data plat­forms that connect distributed data to the composable stack need to serve as the foundation of modern analytics strategies.”


What kind of role is emerging for today’s data warehouses, and how have data lakes shaped this role? It’s important that “data warehouses and data lakes operate in unison if businesses want to stay ahead of the game,” said Adams. “Data ware­houses typically ingested informa­tion from relational databases that was then extracted by business intel­ligence tools for further analysis,” said Adams. Data lakes have created “an undercurrent for warehouses,” whereby they are now able to store all business data—from contacts, user information, documents, pictures, logs—or any data the business and its users generate, he noted. “While hav­ing a breadth of diverse information in data lakes makes transforming data a more difficult task, it gives organiza­tions a wealth of information that was previously inaccessible.”

A converged data warehouse-lake architecture is the best path forward for supporting increasingly complex analytic data environments, Ragha­van said. “Data lakes, or data swamps, require robust solutions to under­stand, search, and analyze the data in a context-sensitive manner, while not losing the associated lineage and provenance information,” he pointed out. “Data warehouses have been rearchitected to meet the emerging need of analytics that is near real-time and can handle large volumes of data.”

Raghavan also pointed out that “today’s data warehouses have become high-efficiency, super-compute clus­ters where not only are ETL processes used to deliver clean data but also combined with state-of-the-art fea­ture engineering and modeling capa­bilities to deliver high performance models and operationalizations at scale.” Data lakes have contributed to these super data warehouses “by simply increasing the volume and the breadth of data that could be ingested into a data warehouse. The presence of a loosely coupled compute-storage architecture ensures that subsets of the data can be selected for ETL [pro­cesses] and more production-ready work within the warehouse.”

<< back Page 2 of 5 next >>