The Elephant is coming back to NYC ...


Since its beginning as a project aimed at building a better web search engine for Yahoo – inspired by Google’s well-known MapReduce paper – Hadoop has grown to occupy the center of the big data marketplace. From data offloading to preprocessing, Hadoop is not only enabling the analysis of new data sources amongst a growing legion of enterprise users; it is changing the economics of data. Alongside this momentum is a budding ecosystem of Hadoop-related solutions, from open source projects like Spark, Hive and Drill, to commercial products offered on-premises and in the cloud. These new technologies are solving real-world big data challenges today.

Whether your organization is currently considering Hadoop and Hadoop-related solutions or already using them in production, Hadoop Day is your opportunity to connect with the experts in New York City and expand your knowledge base. This unique event has all the bases covered:

  • Enterprise Use Cases Today
  • Architecting a Scalable Hadoop Platform
  • Building Hadoop Applications
  • Data Warehouse Optimization with Hadoop
  • Troubleshooting Hadoop Performance Issues
  • Data Science with Hadoop
  • Machine-Learning with Spark
  • Data Analysis with Hive & Pig
  • Taking Advantage of SQL-on-Hadoop Solutions
  • Running Hadoop in the Cloud
  • Securing Data in Hadoop
  • Optimizing ETL with Hadoop
  • Diving into the Data Lake
Tuesday, May 16, 2017
8:00 a.m. - 9:00 a.m.
  • Continental Breakfast
9:00 a.m. - 9:45 a.m.
  • WELCOME & KEYNOTE: The Human Side of the Data Revolution
  • For more than a decade “data” has been at or near the top of the enterprise agenda. A robust ecosystem system has emerged around all aspects of data—collection, management, storage, exploitation and disposition. And yet, more than 66% of Global 2000 senior executives are dissatisfied with their data investments/capabilities. This is not a technology problem. This is not a technique problem. This is a people problem. Futurist Thornton May, in a highly interactive session, shares research results of his multi-institution examination of the human side of the data revolution.

    Thornton A May, CEO, FutureScapes Advisors, Inc.
9:45 a.m. - 10:00 a.m.
  • SPONSORED KEYNOTE: Architecting Business Disruption with Machine Learning
  • "Data" is the new differentiator and companies who can successfully adapt their businesses based on insights gleaned from data will have a significant advantage. In this session, Rob Thomas will provide a brief overview of Machine Learning and the use case patterns that clients are using to disrupt their industry.

    Rob Thomas, General Manager, IBM Analytics
10:00 a.m. - 10:45 a.m.
  • COFFEE BREAK in the Data Solutions Showcase
10:45 a.m. - 11:45 a.m.
  • H101: Unleashing the Power of Hadoop
  • Hadoop is here to stay, but so are a host of other approaches. To be effective, they must all work together in the enterprise.

  • Accelerating Big Data Implementations Through Hadoop Interoperability

    During the last 10 years, Apache Hadoop has proven to be a popular platform among developers who require a technology that can power large, complex applications. For customers, partners, and application ISVs who write on top of Hadoop, there is still one huge issue that remains—interoperability. Steve Jones and John Mertic take a closer look at how Apache Hadoop can become more interoperable to accelerate Big Data implementations.

    John Mertic, Director, ODPi
  • Steve Jones, Global VP, Capgemini
  • SQL on Hadoop & Big Data Systems

    SQL has been with us for more than 40 years and Hadoop, about 10. Even though when Hadoop was born there was no SQL interface to it, it has become imperative that SQL on Hadoop solutions are brought to the market. This talk provides an overview of SQL on Hadoop, including low latency SQL on Hadoop for analytic workloads, and how SQL engines are innovating

    Sumit Pal, Big Data and Data Science Architect, Independent Consultant
12:00 p.m. - 12:45 p.m.
  • H102: Harnessing Big Data With Spark
  • Open source platforms and frameworks such as Apache Spark have paved the way for commodity-priced processing on a massive scale.

  • Build Machine-Learning Algorithms Powered by Spark

    One of the most exciting use cases of Apache Spark is the development of Self Service and interactive Predictive Analytic platforms. We can now integrate model generation and prediction of machine learning with data visualization capabilities that are powered by distributed processing capabilities of Apache Spark. In this presentation we would explore this capability to see how you can 'see' your data in full color.

    Marcin Tustin, Consulting Data Engineer
12:45 p.m. - 2:00 p.m.
  • ATTENDEE LUNCH in the Data Solutions Showcase
2:00 p.m. - 2:45 p.m.
  • H103: The Streaming Future of Big Data
  • Real-time utilization of streaming data requires a modern architecture that can scale. Learn about the technologies that can help.

  • Event-Driven Microservices With Streams & Docker

    This presentation covers how to build a multiple location, event-driven architecture that uses streaming data to interconnect Docker-hosted microservices that allow implementation of scalable, redundant, and highly available services across multiple data centers. Using Docker containers and single-purpose microservices, this presentation demonstrates how these services are interconnected with event-driven streams and how this architecture can be deployed

    Paul Curtis, Senior Field Enablement Engineer, MapR Technologies
  • Streaming of Big Data over the decades

    Streaming Analytics has been around since before even big data was around. Proprietary streaming engines like Software AG Apama (2000) and IBM Streams (2003), and open source streaming like Yahoo S4 (2010) and Apache Storm (2011). For awhile, Big Data seemed to refer only to Hadoop (Paper in 2003 and development in 2006). But now, streaming is all the rage - whether you call it stream computing, streaming pipelines, streaming analytics or fast data, it means people care about analyzing data as it's created, not after it's been indexed and stored in some persistent repository. With so many choices - on prem, cloud, resource managers, virtualized machines, containers, and nearly 40 streaming offerings, it's hard to know where to begin. Come learn about the current landscape and some thoughts on where's it going to be in the future.

    Roger C. Rea, IBM Streams Product Manager, IBM Watson and Cloud Platform
2:45 p.m. - 3:15 p.m.
  • COFFEE BREAK in the Data Solutions Showcase
3:15 p.m. - 4:00 p.m.
  • H104: Building an Enterprise Data Lake
  • The concept of an enterprise data lake is enticing. Find what’s needed and the technologies available to help build a data lake for the enterprise.

  • Open Source, Code-Free Data Pipelines

    An enterprise data lake typically requires substantial effort to ingest, process store, secure, and manage data from a variety of sources. Cask Data Application Platform (CDAP) is an open source solution, which offers a self-service user interface for creating data lakes and simplifies the building and managing of production data pipelines on Spark, Spark Streaming, MapReduce and Tigon. This talk discusses how to achieve broad, self-service access to Hadoop while maintaining the controls and monitors necessary within the enterprise.

    Jonathan Gray, CEO & Founder, Cask
4:15 p.m. - 5:00 p.m.
  • H105: Integrating Hadoop Into Your BI Environment
  • A recent Unisphere Research survey on data management found that Apache Hadoop is gaining significant traction. About 40% of respondents now have a Hadoop installation.

  • The Do’s & Don’ts for Success With BI on Big Data

    Think Hadoop is not in your future? According to a recent survey, 97% of organizations working with Hadoop anticipate onboarding analytics and BI workloads to Hadoop. When this happens, the companies which have disregarded the Big Data opportunity may be left behind. The good news is that onboarding your business intelligence workloads to Hadoop is not as complicated as it used to be. If you understand some key concepts, the transition can be simpler and more successful—allowing you to recycle current skill sets while avoiding either a rip-and-replace of your technical stack or elimination of business analysts to hire data scientists.

    Josh Klahr, VP, AtScale

Don’t Miss These Special Events