Uncovering the Best Data Lake Strategy

Jun 4, 2018

By Stephanie Simone

As Hadoop adoption in the enterprise continues to grow, so does commitment to the data lake strategy.

DBTA recently held a webinar with Mark Van de Wiel, CTO, HVR, Dale Kim, Sr. director, products/solutions, Arcadia Data, and Rick Golba, product marketing manager, Percona, who discussed unlocking the power of the data lake.

Data lakes organize large, diverse sets of data, enable access to data with minimal latency, store data in its raw, detailed state, supports multiple use cases and architectures, Van de Wiel explained.

The three challenges of adopting a data lake include continuous fee, security, and trusting the data. To overcome these issues, Van de wiel recommended:

Continuous feed: Log-Based CDC
Security: Encryption - Certificates
Trust the data: Data Compare

Because data is so widespread in enterprises, Kim suggests having creating two separate BI standards: data warehouse BI and data lake BI.

BI built for data warehouses fails in data lakes because there’s inefficient scale, it cannot handle diversity, and is agile only in name.

BI for data lakes must be architected for scale and performance, Kim said. Native BI unleashes the power and flexibility of the data lake by:

Scaling without compromise
Enabling real-time, streaming analytics
Unlocking complex data not easily reachable before
Acting directly from your data discovery
Optimizing and productionizing based on usage and need

Successful attributes of a data lake include data movement, data storage, provides analytic options, and enables machine learning, Golba said.

Hosting a data lake in the cloud is another option and can provide the following benefits:

Low cost storage enables large volumes of data to be stored
Different storage options for different access needs
Scalability is built in
Access is easily made available to authorized users
Flexibility of the cloud permits new technologies to be spun up and spun down to try out different applications

An archived on-demand replay of this webinar is available here.

Newsletters

Uncovering the Best Data Lake Strategy

White Papers

Sponsors