The Elephant is coming to NYC ...
With a flourishing ecosystem and central position in the big data marketplace, Hadoop continues to grow. In a recent poll of DBTAmagazine subscribers, 30% of organizations reported having Hadoop deployed, while 26% indicated they are currently planning to adopt it over the next 12 months. From data offloading to preprocessing, Hadoop is not only enabling the analysis of new data sources, it is changing the value equation of maintaining an active archive of all your data.
Whether your organization is currently considering Hadoop or already using it in production, Hadoop Day is your opportunity to connect with the experts in New York City and advance your knowledge base. This unique educational event has all the bases covered:
- Integrating Hadoop into Your IT Environment
- Hadoop Cluster Administration
- MapReduce Programming
- Hadoop Architectures
- Hadoop and Your Data Warehouse
- Building Hadoop Applications
- Using Hadoop for Analytics
- Hadoop and the Cloud
- Data Security and Hadoop
- Harnessing the Hadoop Ecosystem (Spark, YARN, Hive, Pig... the list goes on)
Tuesday, May 12, 2015
8:00 a.m. - 9:00 a.m.
WELCOME & KEYNOTE - Understanding the Data Value Chain
9:00 a.m. - 9:45 a.m.
Creating value from data requires a new mind-set. It’s hard to escape silos, whether they are technical or conceptual. To exploit fully the opportunity of Big Data tools and architectures, we need a new way of think- ing that frames data as a raw material of business. The answer is to focus not on the functional components— what you do to data—but on business outcomes and how they can be achieved—what you do with data. This new approach can be cultivated through looking at the data value chain.
MODERATOR: Marydee Ojala, Editor-in-Chief, Online Searcher magazine
Edd Dumbill, VP Strategy, Silicon Valley Data Science
Agile "AppStore" (SBA) Creation on a Rich Search Index
9:45 a.m. - 10:00 a.m.
The bad reputation of enterprise search will change as more powerful technology allows the extension of search to many enterprise data sources. Once enterprises have done the ground work of indexing all or most of their data, they can do things they had never thought of before, such as easily and rapidly developing Search Based Applications (SBAs) to meet user needs. This talk will present several of the SBAs that populate the AstraZeneca "AppStore" on top of the Sinequa platform.
Hans-Josef Jeanrond, VP Marketing, Sinequa
Rob Hernandez, Data Analytics Lead, CTO Office, AstraZeneca
COFFEE BREAK in the Data Solutions Showcase
10:00 a.m. - 10:45 a.m.
H101: The Current State of Hadoop
10:45 a.m. - 11:45 a.m.
Apache Hadoop has become the predominant Big Data platform for storing and analyzing data. Companies use Hadoop to get value and gain competitive differentiation from their ever-increasing wealth of data. Knowing where and how to start exploring Hadoop's rich set of tools is a “Big Data” challenge of its own. Learn the key differences between the most popular Hadoop distributions so you can start using Hadoop today.
The Hadoop Ecosystem
James Casaletto, Principal Solutions Architect, Professional Services, MapR
Hadoop: Whose to Choose
David Teplow, Founder & CEO, Integra Technology Consulting
H102: Hadoop and Your Data Warehouse
12:00 p.m. - 12:45 p.m.
Elliott Cordo shares real-world insights across a range of topics, including the evolving best practices for building a data warehouse on Hadoop that also coexists with multiple processing frameworks and additional non-Hadoop storage platforms, the place for massively parallel-processing and relational databases in analytic architectures, and the ways in which the cloud offers the ability to quickly and cost-effectively establish a scalable platform for your Big Data warehouse.
Building a Real-World Data Warehouse
Elliott Cordo, Chief Architect, Caserta Concepts
Snowflake and Data Warehouses
Greg Rahn, Director of Product Management, Snowflake Computing
ATTENDEE LUNCH in the Data Solutions Showcase
12:45 p.m. - 2:00 p.m.
H103: Hadoop in the Cloud
2:00 p.m. - 2:45 p.m.
To get your Big Data job done right, you need to use the right Big Data tools. How can you make sure you are leveraging the right tools? Learn from Ben Sgro about how Simulmedia, a pioneer in audience-based advertising on TV, is using a custom Python framework to programmatically create EMR clusters, move data to and from Amazon Simple Storage Service, and load data into its Redshift data warehouse. Xplenty’s Yaniv Mor talks about how using Hadoop in a coding-free, cloud-based environment ensures that businesses can benefit from Big Data without having to invest in hardware, software, or related personnel.
Python and EMR for MapReduce ETLs in the Cloud
Ben Sgro, Director of Data Engineering, Simulmedia
Offloading Data Integration/ETL to the Cloud (Using Hadoop)
Yaniv Mor, Founder & CEO, Xplenty
COFFEE BREAK in the Data Solutions Showcase
2:45 p.m. - 3:15 p.m.
H104: Harnessing the Hadoop Ecosystem
3:15 p.m. - 4:00 p.m.
Big Data is transforming how companies analyze information and enabling them to connect with customers in ways never possible before. Radius, which provides companies with a real-time marketing intelligence platform, is moving its core infrastructure from Hadoop to Spark. Hear Spotright’s Nathan Halko talk about his experiences moving from Hadoop to Spark. Qubole’s Jason Huang provides an overview of Apache Hive, the key differences between Hive and traditional data warehouses built on top of RDBMSs, and key techniques to increase performance and simplify Hive.
Moving From Hadoop to Spark: The Business Case
Nathan P Halko, Data Scientist, Spotright
Deep Dive Into Apache Hive
Jason Huang, Senior Solutions Architect, Qubole
H105: Panel Discussion: The Data Lake: From Hype to Reality
4:15 p.m. - 5:00 p.m.
There has been a lot of hype around data lakes and their relevance to Big Data challenges. The data lake approach is being championed by some as a way to realize the promise of Big Data, allowing organizations to move data in its raw form into a central storage reservoir until it is needed. There has also been much scrutiny in the marketplace over the potential pitfalls of data lakes. To find out what you need to know before you dive into the data lake, join Venkat Eswara of GE, Joe Caserta of Caserta Concepts, and George Coregedo of RedPoint Global for a lively panel discussion about using Hadoop to create a centralized processing pool where data is captured, cleansed, linked, and structured in a consistent way.
Joe Caserta, CEO and Founder, Caserta Concepts
George Corugedo, Chief Technology Officer, RedPoint Global Inc.
NETWORKING RECEPTION in the Data Solutions Showcase
5:00 p.m. - 6:00 p.m.