Hadoop Day Conference at Data Summit

May 15 - 17, 2017 // New York, NY

The Elephant is coming back to NYC ...

Since its beginning as a project aimed at building a better web search engine for Yahoo – inspired by Google’s well-known MapReduce paper – Hadoop has grown to occupy the center of the big data marketplace. From data offloading to preprocessing, Hadoop is not only enabling the analysis of new data sources amongst a growing legion of enterprise users; it is changing the economics of data. Alongside this momentum is a budding ecosystem of Hadoop-related solutions, from open source projects like Spark, Hive and Drill, to commercial products offered on-premises and in the cloud. These new technologies are solving real-world big data challenges today.

Whether your organization is currently considering Hadoop and Hadoop-related solutions or already using them in production, Hadoop Day is your opportunity to connect with the experts in New York City and expand your knowledge base. This unique event has all the bases covered:

Enterprise Use Cases Today
Architecting a Scalable Hadoop Platform
Building Hadoop Applications
Data Warehouse Optimization with Hadoop
Troubleshooting Hadoop Performance Issues
Data Science with Hadoop
Machine-Learning with Spark
Data Analysis with Hive & Pig
Taking Advantage of SQL-on-Hadoop Solutions
Running Hadoop in the Cloud
Securing Data in Hadoop
Optimizing ETL with Hadoop
Diving into the Data Lake

Tuesday, May 16, 2017

8:00 a.m. - 9:00 a.m.

Continental Breakfast

9:00 a.m. - 9:45 a.m.

WELCOME & KEYNOTE: The Human Side of the Data Revolution
For more than a decade “data” has been at or near the top of the enterprise agenda. A robust ecosystem system has emerged around all aspects of data—collection, management, storage, exploitation and disposition. And yet, more than 66% of Global 2000 senior executives are dissatisfied with their data investments/capabilities. This is not a technology problem. This is not a technique problem. This is a people problem. Futurist Thornton May, in a highly interactive session, shares research results of his multi-institution examination of the human side of the data revolution.
Thornton A May, CEO, FutureScapes Advisors, Inc.

9:45 a.m. - 10:00 a.m.

SPONSORED KEYNOTE: Architecting Business Disruption with Machine Learning
"Data" is the new differentiator and companies who can successfully adapt their businesses based on insights gleaned from data will have a significant advantage. In this session, Rob Thomas will provide a brief overview of Machine Learning and the use case patterns that clients are using to disrupt their industry.
Rob Thomas, General Manager, IBM Analytics

10:00 a.m. - 10:45 a.m.

COFFEE BREAK in the Data Solutions Showcase

10:45 a.m. - 11:45 a.m.

H101: Unleashing the Power of Hadoop
Hadoop is here to stay, but so are a host of other approaches. To be effective, they must all work together in the enterprise.
Accelerating Big Data Implementations Through Hadoop Interoperability
During the last 10 years, Apache Hadoop has proven to be a popular platform among developers who require a technology that can power large, complex applications. For customers, partners, and application ISVs who write on top of Hadoop, there is still one huge issue that remains—interoperability. Steve Jones and John Mertic take a closer look at how Apache Hadoop can become more interoperable to accelerate Big Data implementations.
John Mertic, Director, ODPi
Steve Jones, Global VP, Capgemini
SQL on Hadoop & Big Data Systems
SQL has been with us for more than 40 years and Hadoop, about 10. Even though when Hadoop was born there was no SQL interface to it, it has become imperative that SQL on Hadoop solutions are brought to the market. This talk provides an overview of SQL on Hadoop, including low latency SQL on Hadoop for analytic workloads, and how SQL engines are innovating
Sumit Pal, Strategic Technology Director, Graphwise.ai

12:00 p.m. - 12:45 p.m.

H102: Harnessing Big Data With Spark
Open source platforms and frameworks such as Apache Spark have paved the way for commodity-priced processing on a massive scale.
Build Machine-Learning Algorithms Powered by Spark
One of the most exciting use cases of Apache Spark is the development of Self Service and interactive Predictive Analytic platforms. We can now integrate model generation and prediction of machine learning with data visualization capabilities that are powered by distributed processing capabilities of Apache Spark. In this presentation we would explore this capability to see how you can 'see' your data in full color.
Marcin Tustin, Consulting Data Engineer

12:45 p.m. - 2:00 p.m.

ATTENDEE LUNCH in the Data Solutions Showcase

2:00 p.m. - 2:45 p.m.

H103: The Streaming Future of Big Data
Real-time utilization of streaming data requires a modern architecture that can scale. Learn about the technologies that can help.
Event-Driven Microservices With Streams & Docker
This presentation covers how to build a multiple location, event-driven architecture that uses streaming data to interconnect Docker-hosted microservices that allow implementation of scalable, redundant, and highly available services across multiple data centers. Using Docker containers and single-purpose microservices, this presentation demonstrates how these services are interconnected with event-driven streams and how this architecture can be deployed
Paul Curtis, Senior Field Enablement Engineer, MapR Technologies
Streaming of Big Data over the decades
Streaming Analytics has been around since before even big data was around. Proprietary streaming engines like Software AG Apama (2000) and IBM Streams (2003), and open source streaming like Yahoo S4 (2010) and Apache Storm (2011). For awhile, Big Data seemed to refer only to Hadoop (Paper in 2003 and development in 2006). But now, streaming is all the rage - whether you call it stream computing, streaming pipelines, streaming analytics or fast data, it means people care about analyzing data as it's created, not after it's been indexed and stored in some persistent repository. With so many choices - on prem, cloud, resource managers, virtualized machines, containers, and nearly 40 streaming offerings, it's hard to know where to begin. Come learn about the current landscape and some thoughts on where's it going to be in the future.
Roger C. Rea, IBM Streams Product Manager, IBM Watson and Cloud Platform

2:45 p.m. - 3:15 p.m.

COFFEE BREAK in the Data Solutions Showcase

3:15 p.m. - 4:00 p.m.

H104: Building an Enterprise Data Lake
The concept of an enterprise data lake is enticing. Find what’s needed and the technologies available to help build a data lake for the enterprise.
Open Source, Code-Free Data Pipelines
An enterprise data lake typically requires substantial effort to ingest, process store, secure, and manage data from a variety of sources. Cask Data Application Platform (CDAP) is an open source solution, which offers a self-service user interface for creating data lakes and simplifies the building and managing of production data pipelines on Spark, Spark Streaming, MapReduce and Tigon. This talk discusses how to achieve broad, self-service access to Hadoop while maintaining the controls and monitors necessary within the enterprise.
Jonathan Gray, CEO & Founder, Cask

4:15 p.m. - 5:00 p.m.

H105: Integrating Hadoop Into Your BI Environment
A recent Unisphere Research survey on data management found that Apache Hadoop is gaining significant traction. About 40% of respondents now have a Hadoop installation.
The Do’s & Don’ts for Success With BI on Big Data
Think Hadoop is not in your future? According to a recent survey, 97% of organizations working with Hadoop anticipate onboarding analytics and BI workloads to Hadoop. When this happens, the companies which have disregarded the Big Data opportunity may be left behind. The good news is that onboarding your business intelligence workloads to Hadoop is not as complicated as it used to be. If you understand some key concepts, the transition can be simpler and more successful—allowing you to recycle current skill sets while avoiding either a rip-and-replace of your technical stack or elimination of business analysts to hire data scientists.
Josh Klahr, VP, AtScale

The Elephant is coming back to NYC ...

Don’t Miss These Special Events

BROUGHT TO YOU BY

ASSOCIATION PARTNER

Diamond Sponsors

Platinum Sponsors

Gold Sponsors

Networking Reception

Media Sponsors