New tools and technologies are becoming established in enterprises, helping organizations extract more value from data. To explore the ways that data-centric companies are managing their data more effectively, we asked industry experts about the trends that loom large in the future and those that appear to be waning.
Data Lakes and the Rise of Spark
Data lakes, which can handle large, diverse sets of data in its original state, continue to expand, and organizations are increasingly making use of Spark for processing and analyzing the data. While Hadoop MapReduce is still leveraged for big data processing, many industry experts say that its use is leveling off.
“Most Global 2000 companies have data lake initiatives underway, and many of these deployments have moved beyond exclusive use of MapReduce to use a mix of MapReduce and Apache Spark as their primary means of processing and analyzing data,” said Kelly Stirman, CMO and VP, strategy, at Dremio. “When these companies have moved to cloud deployments on AWS and Azure, many are using object stores like Amazon S3 and Azure Data Lake Store instead of HDFS [Hadoop Distributed File System] to store their data, while processing of the data is handled by Spark and some cloud-native processing services. Because the data lake does not provide sufficient performance and concurrency for BI workloads, many companies move data from their data lake into a data warehouse such as Teradata on-prem, and cloud services such as Redshift and Azure SQL Data Warehouse.” In addition, Stirman noted, data-as-a-service platforms are being used with the data lake and the data warehouse to provide a uniform access layer that is capable of joining and accelerating data between the two.
Spark is being adopted because its power and capabilities are “significantly more advanced” than most analytics tool sets available within the Hadoop ecosystem, agreed Brian Schwarz, VP of product management at Pure Storage, who also pointed out that Spark’s core strengths are its flexibility and the power to simplify company’s environments. “Spark is generally perceived as the most dexterous and agile technology for big data processing given that it can handle large amounts and different types of analytics.” Instead of deploying three or four smaller, more specialized tools, the libraries within Spark give companies wide flexibility to handle tasks such as SQL queries or other types of unstructured analysis, pattern matching, machine learning (ML) library searches, and regression analysis, said Schwarz.
Hadoop is still an important foundational technology that many customers are using within their data lake with Apache Spark, stated Mike Lehmann, VP of product management, Oracle. However, he added, “Oracle is seeing a transition happening where there is less of a dependence on the core Hadoop ecosystem and more focus on putting data directly into cloud object storage, doing data processing and manipulation using Apache Spark, real-time interactive queries directly against the data lake, and the overall infrastructure running on cloud-native foundations such as Kubernetes.”
Analytics Gets Closer to the Edge
A cacophony of streaming data from devices and sensors is contributing to the demand for real-time analytics, which necessitates edge processing, executives noted.
The need for real-time analytics is motived by IoT, said Avanti Sané, product marketing, Internet of Things, VMware. “There are now millions of devices across the globe transmitting billions of bits of data. The majority of data being generated by these devices is time-sensitive, but by the time a piece of data is sent up to the cloud and brought back down, it has already become obsolete,” Sané said. As a result, edge computing, where processing occurs at the extremes of a network near where the data is generated, “greatly accelerates” the steps, making it critical to real-time analytics. “In addition, the advent of 5G will bring faster networks that are capable of accommodating mass volumes of data and will thus bring IoT adoption further into the mainstream,” she noted.