What to Look for When Modernizing the Data Lake

Data lake adoption has more than doubled over the past three years. The technologies and best practices surrounding data lakes continue to evolve – and so do the challenges.

Currently in use by 45% of DBTA subscribers to support data science, data discovery and real-time analytics initiatives, data lakes are still underpinned by Hadoop in many cases, although cloud-native approaches are on the rise. From data governance and security, to data integration and architecture, new approaches are required for success.

DBTA recently held a webinar with Ali LeClerc. director of product marketing, Alluxio, and Ritu Jain, director of product marketing, Qlik, who discussed how leading companies are optimizing their data lakes for speed, scale, and agility.

The drivers behind enterprise data lakes include:

  • Speed to store and provide rapid access
  • Reduce cost of traditional data warehousing
  • Scale & flexibility for volume and types of data
  • Single source of trusted, timely data for AI & ML

Data lakes can deliver trusted data for business insights without the constraints of data warehousing, according to Jain.

However, there is a high failure rate upon the delivery of benefits. Qlik integration for data lake creation can pinpoint the pain points and identify the gaps and requirements for a data lake that can deliver, Jain said.

Data lake architectures are evolving so users need to choose a solution that adapts to the companies’ changing/growing needs. AI and ML require continuously updated data, Jain explained. Organizations need to check for metadata, data lineage, security, and governance when choosing a data lake. And establishing IT and business alignment is crucial.

The challenges for independent scaling for data-driven workloads include data locality, data abstraction, and data accessibility, according to LeClerc.

To reach a truly independent scaling of the data stack, a new layer needs to emerge between compute and storage. That’s where Alluxio comes in, said LeClerc.

The Alluxio platform can enable any frameworks running on data stored anywhere, LeClerc explained. Alluxio makes remote data local to the compute without copies. Data is immediately available for quicker data-driven insights and more cloud computing power is provided to solve problems quicker.

An archived on-demand replay of this webinar is available here.