Data Lake Boot Camp

From centralized data acquisition and offloading, to data discovery and data science projects, data lake adoption continues to rise with enterprise deployments more than doubling over the past three years according to a recent DBTA study. The benefits of inexpensive storage, data democratization and greater flexibility and scalability are easily understood. On the other hand, establishing effective processes for integrating, securing and governing that data is a far more complicated endeavor, and this is where the rubber meets the road. Register for Data Lake Boot Camp today for a deep dive into the latest supporting technologies, best practices, real-world success factors and expert insights.


Tuesday, May 19

Track C: Data Lake Boot Camp

Abhik Roy, Lead Big Data Architect, Transamerica Data Engineering

C101. Modern Data Lake Essentials

10:45 AM2020-05-192020-05-19

Tuesday, May 19: 10:45 a.m. - 11:45 a.m.

There are certain critical elements to a data lake, without them, implementations often fail.

Essentials of a Modern Data Lake

10:45 a.m. - 11:45 a.m.

Roy describes the essential capabilities and architecture patterns that any modern data lake should possess, as well as the real challenges and opportunities to confront while building an enterprise data lake. Issues include on-premise and cloud integration, performance-tuning and scaling-up strategies, audits and controls, storage layers, compute patterns, presentation layers, data discovery, and governance.


, Lead Big Data Architect, Transamerica Data Engineering


C102. Drilling Down on Data Lake Architecture

12:00 PM2020-05-192020-05-19

Tuesday, May 19: 12:00 p.m. - 12:45 p.m.

Data lake adoption is increasing to support initiatives such as data science, data discovery, and real-time analytics.

Cruising the Data Lake: From Zero to Scale

12:00 p.m. - 12:45 p.m.

As part of the Highly Automated Driving (HAD) group at HERE Technologies, the company is building the High-Definition Map (HDMap) of the real world to power autonomous-car-driving use cases. With the complexity of pipelines for data enrichment and the petabyte scale of the content, the company needed a mechanism to avoid data silos and achieve a centralized way to analyze, predict, and evaluate the data. Chaphalkar highlights the principles and technology behind the company's data lake architecture and present strategies that can serve as guidelines to others seeking to stand up and run a data lake at scale.


, Senior Engineering Manager, HERE Technologies


C103. Cloud Data Strategy and Data Lakes

02:00 PM2020-05-192020-05-19

Tuesday, May 19: 2:00 p.m. - 2:45 p.m.

Across all industries, companies are investing heavily in modernizing data infrastructures to manage big data by creating “data lake” environments that process large volumes and varieties of data for reporting and analytics.

Cloud Data Strategy: Multiple Lakes Versus One Enterprise Lake

2:00 p.m. - 2:45 p.m.

Organizations with a strong growth focus in their corporate strategy allow internal business divisions to build their own lakes to enable faster time-to-market for analytics, resulting in a myriad of data lakes and a complex enterprise data landscape. Conversely, organizations with a strong defend focus tend to build a highly bureaucratic, centralized data lake to enable regulatory and compliance analytics. Learn why a balanced focus is essential to define a robust data strategy.


, Senior Manager, Data Management & Governance, Vanguard


C104. Building a Transactional Data Lake

03:15 PM2020-05-192020-05-19

Tuesday, May 19: 3:15 p.m. - 4:00 p.m.

With the proliferation of data in the past years, business-critical decision making is now heavily influenced by deep data analysis.

Building Large-Scale, Transactional Data Lakes Using Apache Hudi

3:15 p.m. - 4:00 p.m.

Hudi, which stands for "Hadoop Upserts Deletes and Incrementals," is a storage abstraction library that improves data ingestion. Our Uber speakers explain what Hudi offers and why it is needed, including how Hudi can provide ACID semantics to a data lake, and some of the basic primitives required to achieve acceptable latencies in ingestion, while also providing high-quality data by enforcing schematization on datasets.


, Engineering Manager, Uber

, Senior Software Engineer, Uber


C105. Securing the Modern Data Lake

04:15 PM2020-05-192020-05-19

Tuesday, May 19: 4:15 p.m. - 5:00 p.m.

The transition from storing data in an on-premise data warehouse to using a hybrid infrastructure has enabled tremendous agility and scale, but has also created a security and privacy risk.

Overcoming Challenges to Securing Modern Data Lakes

4:15 p.m. - 5:00 p.m.

Organizations that are concerned about the quality of their data, protecting their brand and intellectual property, and complying with evolving privacy regulations must understand how the modern infrastructure has broken the relationship between data and metadata, and how this in turn impacts the quality and security of their data. What's needed is a new approach that sits in the “data plane” and enforces metadata creation on write, manages user access, and performs data transformations.


, CEO & Co-Founder, Okera

Don’t Miss These Special Events