Tuesday, May 10, 2016

Track A — Moving to a Modern Data Architecture
Track B — Analytics and Applications
Track C — Big Data Tools for the Oracle DBA
Hadoop Day

CONTINENTAL BREAKFAST

8:00 a.m. - 9:00 a.m.

WELCOME & KEYNOTE - How Statistics (And a Little Public Data) Can Change a City

9:00 a.m. - 9:45 a.m.

The creator of I Quant NY, a data science and policy blog that focuses on insights drawn from New York City's public data, Ben Wellington, advocates the analysis of open data to affect policy. The Open Data movement is growing, and governments are releasing vast amounts of data to the public. As citizens push for more transparency, it is fair to ask what we can actually do to derive actionable insights from this data. How can this data help us improve the cities we live and work in, whether we are policymakers, businesses, or residents? Wellington explores how he's used his blog and some simple data science techniques to make changes in New York City. He discusses best practices for data science in the policy space, explores how storytelling is an important aspect of data science, and highlights the various datadriven interactions he's had with city agencies. He contends that data science need not use complicated math: It's often more about curiosity and the questions we ask than the complexity of the equations we use.

Ben Wellington, I Quant NY

Rethink Data Management

9:45 a.m. - 10:00 a.m.

From On-Premises to Cloud It is undeniable that data continues to power business growth, competitive advantage and customer experience. In the midst of a monumental transformation fueled by social, mobile, and cloud, business and IT leaders alike are rethinking their roles and how they manage and deploy technology to accelerate business growth. Leveraging Oracle Database 12c, customers can rethink data management and gracefully evolve architectures from on-premises to the Cloud. Learn about the latest data management trends and how transforming to the Cloud can help organizations innovate faster, improve time to market, and stay ahead of the pack.

Nicholas Chandra, VP, Cloud Computing Success, Oracle

COFFEE BREAK in the Data Solutions Showcase

10:00 a.m. - 10:45 a.m.

Track A — Moving to a Modern Data Architecture

Moderator: John O'Brien, Principal Advisor & Industry Analyst, Radiant Advisors

A101: Enabling Data Architecture

10:45 a.m. - 11:45 a.m.

A major shift is taking place at the IT architectural level, changing how enterprise data is captured, stored, and processed. Modern data management practices can help address the increasing volumes and varieties of data as well as the growing imperative to deliver data where and when it is needed.

Designing and Implementing a Data Architecture for the New World

This presentation discusses the design and implementation of a modern data architecture and enabling technologies to support the new world of data (structured, semi-structured, and unstructured), data integration (ETL, iPaaS, data cleansing, MDM, data preparation) and analytics (traditional BI, data discovery, predictive analytics). We discuss moving to a Logical Data Warehouse, leveraging various database technologies, incorporating an integration technology portfolio, creating analytical sandboxes, and building data science hubs.

Enabling Governed Data Discovery in Modern Data Architectures

Data governance is a charged topic. As data discovery becomes increasingly important, analysts must be able to move through discovery with as little friction—and as much IT-independence—as possible. However, with more data and capabilities, ensuring data is trustworthy and protected becomes more difficult and yet imperative. Data governance and discovery seem at odds: IT wants to ensure data is accurate and secure, while users want to explore without handcuffs. This session showcases Radiant Advisors’ research with Fortune 500 companies to share accepted, principles-led practices that will become best practices for extending data governance frameworks to enable governed data discovery.

John O'Brien, Principal Advisor & Industry Analyst, Radiant Advisors

A102: New Approaches to Data Management

12:00 p.m. - 12:45 p.m.

The speed at which global business operates today is fueling a demand for systems that can enable insight about everything from sentiments to machines in the moment. Real-time data systems are offering the ability to accelerate business reaction time as never before.

How to Count Tens of Billions of Daily Events in Real Time Using Open Source Technologies

Twitter’s 300 million-plus users generate tens of billions of tweet views per day. Aggregating these events in real time—in a robust enough way to incorporate into our products—presents a massive scaling challenge. This talk introduces TSAR (the TimeSeries AggregatoR), a robust, flexible, and scalable service for real-time event aggregation designed to solve this problem and a range of similar ones. The discussion covers how Twitter built TSAR using Python and Scala from the ground up, almost entirely on open source technologies (Storm, Summingbird, Kafka, Aurora, and others) and describes some of the challenges Twitter faced in scaling it to process tens of billions of events per day.

Claudia Perlich, Chief Scientist, Dstillery

From Data Management to Data Driven

With the advent of machine learning, cognitive computing, graph technology, IoT, and data marketplaces, decision management will never the the same again. Khanna explores how emerging trends such as cloud, mobile, social, and big data analytics are making data management a disruptive technology in this new age of the consumer. He reviews some of the cutting edge use cases in data management making a real impact on our lives and introduce possibilities for the future.

Ajay Khanna, Vice President, Product Marketing, Reltio

ATTENDEE LUNCH in the Data Solutions Showcase

12:45 p.m. - 2:00 p.m.

A103: Supporting Modern Applications

2:00 p.m. - 2:45 p.m.

Organizations are increasingly seeking to use their data for business advantage. However, they often find difficulty in leveraging data as fast as they would like. New data management technologies and techniques help organizations meet the changing requirements posed by modern applications for speed, scale, and flexibility.

Eliminating the Data Constraint in Application Development

Many consider the No. 1 bottleneck in application development to be the provisioning of parity data from production data sources to use in development environments, QA, UAT, and integration testing, including building and managing development and QA environments. This session explores what the impact is, as well as how to solve this bottleneck with virtual data and define metrics to track to validate the solution.

Kyle Hailey, Technical Evangelist, Delphix

The New Stack for Content Managmenet: Supercharging the Content Repository with Elasticsearch and MongoDB

We strongly think SQL and NoSQL technologies should be used side by side to deliver a storage solution that can scale to the sky while preserving data security, and guaranteeing transactional write operations. Nuxeo is an open source platform centered around a content repository. Over the years, our use of the technologies has evolved. Among other changes, our content repository migrated from an Object Database to SQL. For a couple of years, we’ve been leveraging NoSQL technologies, enabling us to provide a full range of solutions, including a new hybrid SQL + NoSQL architecture, depending on the challenges and the requirements of each project. This presentation explains the technical and design choices we had to make to keep the content repository up-to-date and provide incredible performance to our users.

Vladimir Pasquier, Senior Software Developer, Nuxeo

COFFEE BREAK in the Data Solutions Showcase

2:45 p.m. - 3:15 p.m.

A104: Going Beyond Relational

3:15 p.m. - 4:00 p.m.

The era of relying on one relational database system is winding down. Today, data is flowing into organizations from new sources outside of and within organizations. The question is: How can companies leverage these new data types as they seek to go beyond relational?

An Introduction of Big Data Concepts for Relational Users

Big Data cannot be ignored if you wish to continue pursuing a data management career. But what is Big Data? Join this session to find out how Big Data technologies differ greatly from Oracle, SQL Server, DB2, and other relational database systems.

Craig S. Mullins, President & Principal Consultant, Mullins Consulting, Inc. IBM Gold Consultant

A105: The Future of Data Warehousing

4:15 p.m. - 5:00 p.m.

Many organizations have invested huge quantities of time and money perfecting their data warehousing systems to support analytics and business intelligence. However, new data sources are abounding, resulting in not only greater volumes of data but also in greater data variety. To gain a competitive edge in the era of Big Data, organizations must modernize their data warehousing environments.

Introducing NoSQL Into Your Traditional RDBMS EDW

Finding your way through the maze of NoSQL products and technologies is often mind-numbing for companies heavily entrenched in RDBMS. This session discusses how to introduce NoSQL into your environment and make the best use of it in places where it might bring the most ROI.

Chuck Ezell, Practice Lead of Integration & Development Services, Development, Tuning & Automation, Datavail

Gajanan Gaidhane, Tier3 US - SQL, DBA, DBT, Datavail

Augmenting Physical Data Warehouse with Logical Data Warehouse

Data warehouses could not deliver on the promise of a single source of the truth since the rate of proliferation of data sources outstripped the ability to integrate all that data into the data warehouse. Leading ompanies are using data virtualization to build logical data warehouse to unify the data across data data warehouses and other data sources to enable analytics and reporting. Moxon covers the architecture and performance characteristics of logical data warehouse supported by successful customer case studies.

Paul Moxon, Senior Director of Product Management, Denodo

NETWORKING RECEPTION in the Data Solutions Showcase

5:00 p.m. - 6:00 p.m.

Track B — Analytics and Applications

Moderator: Lindy Ryan, Professor & Research Faculty, Montclair State University; Rutgers University

B101: Moving From Traditional BI to Data Discovery

10:45 a.m. - 11:45 a.m.

More data is available for analytics than ever, but in order for that information to make an impact on an organization’s processes and decisions, it needs to be accessible to more people. Data discovery offers an additional approach to traditional business intelligence for extracting insights from Big Data.

Transitioning From BI to Data Discovery and Analytics: Best Practices and Pitfalls to Avoid

Businesses large and small recognize the need to become data-driven, but transitioning from a focus on reporting through business intelligence (BI) to data discovery and analytics can seem like an insurmountable task. This does not have to be the case. This talk explores both the technical aspects and business strategy of making this transition from the perspective of a data analytics director helping to drive change.

Justin Smith, Enterprise Director, Data Analytics, Sanford Health

Improving Agility and Insights Through Visual Discovery

As an extension of BI, data discovery should drive BI to support data-driven competencies for evolving data-centric organizations. This session discusses ways to frame the discovery processes in order to reduce friction in revealing insights, how to navigate the four forms of discovery to maximize business value, and the emergence of visual discovery in the increasingly visual processes of exploration and analysis. The session, based on content from Lindy Ryan’s new book The Visual Imperative, also touches on the role of visual discovery in the days to come, and the unique challenges and opportunities it will bring as it becomes an increasingly fundamental strategic process.

Lindy Ryan, Professor & Research Faculty, Montclair State University; Rutgers University

B102: Becoming a Data-Driven Enterprise

12:00 p.m. - 12:45 p.m.

Data-driven enterprises have more information than ever available for scrutiny. In today’s fast-paced digital economy, it is understood that effective data management strategies can have significant impact, and leading vendors are providing innovative products and services aimed at helping organizations derive more value from their data.

The Two Faces of Data Strategy: Why It Can Be Hard for Business and IT to See Eye to Eye

There is no doubt in the boardroom or throughout the organization that data is a valuable asset, but despite the ardent agreement about its value, business and IT leaders alike have distinct differences of opinion on how data should be managed and used. In this session, we look at the heart of the data strategy arguments, share why the different mindsets exist, and provide methods to develop an enterprise approach that allows for optimized, efficient data management and maximizes business value generation opportunities. It is important to understand what elements of data strategy are best under the innovative wing of the business and which portions need the careful administration and oversight of IT.

Anne Buff, Business Solutions Manager, SAS Best Practices, SAS Institute

Move the Data That Moves Your Business

To manage growing volumes, varieties and velocities of data today, and business demands to use it, you need to rapidly enable data for analytics. But integration across data warehouse, Hadoop, and other platforms, on premises or in the cloud, can be prohibitively complex and costly. Traditional data movement and transformation tools are slow and require manually coded commands that tie up your best programmers. Learn how you can ingest data at high speed into modern data warehouses with automation that accelerates analytics projects, enables timely insights, and improves agility.

Kevin Petrie, Senior Director of Marketing, Attunity, Inc.

ATTENDEE LUNCH in the Data Solutions Showcase

12:45 p.m. - 2:00 p.m.

B103: Big Data Analytics in Action

2:00 p.m. - 2:45 p.m.

There are a myriad of ways Big Data is being used to measure things that were previously unmeasurable, anticipate events that could never before be foreseen, and analyze data that never before could have been collected. Today, Big Data analytics is having real-world impact.

A Data-Driven Approach to Environmental Regulation at U.S. Environmental Protection Agency (EPA)

It was only 45 years ago that U.S. cities had smog-filled skies, and rivers were so contaminated, they burst into flames. Today, the EPA collects toxic chemical data released by industries and monitors our air/water/land using sensors for ensuring healthy and sustainable environment for our citizens. In this session, Robin A. Thottungal, the EPA’s first chief data scientist, highlights the how the agency is becoming more data-driven and shares some of challenges and the innovative solutions taken by the agency in implementing real-time monitoring of environmental parameters for gaining better understanding of the current state of our ecosystem.

Robin Augustine Thottungal, Chief Data Scientist, U.S. Environmental Protection Agency

ThoughtSpot's Presentation To Be Announced

Mike Booth, Analytic Search for Big Data, ThoughtSpot

COFFEE BREAK in the Data Solutions Showcase

2:45 p.m. - 3:15 p.m.

B104: Supercharging Your Marketing With Big Data

3:15 p.m. - 4:00 p.m.

There are more ways than ever to reach customers with marketing messages and also for them to talk back to you about what they like and what they don’t. Big Data technologies are increasingly being put to big use in marketing today to collect, analyze, and measure everything from sentiments to webpage clicks.

Using Big Data Analytics to Optimize Email Marketing Campaigns

This session takes you behind the curtain to unveil brand new (and incredibly comprehensive) email engagement insights uncovered by Constant Contact's data scientists. Compiled from the analysis of the billions of emails Constant Contact customers send every year, these insights dig deep into the data behind what factors influence subscribers to sign up, open, click, and convert. The session proves invaluable for data scientists who are looking to expand their teams’ value into areas of marketing, data vendors looking for fresh ideas on how their services can be innovatively applied to other areas of their clients’ companies, and, of course, marketers looking for the latest best practices for their own email campaigns.

Matt Laudato, Director, Big Data & Analytics, Constant Contact

B105: Data Science for the Enterprise

4:15 p.m. - 5:00 p.m.

Much work is being done at the enterprise level to make use of data that is widely acknowledged to be growing at exponential rates. Rapidly evolving techniques in the field of data science promise to help enterprises extract valuable information from data in all its forms.

Democratizing Data Science in the Enterprise

“Data science” has become one of the most common buzz phrases in the modern enterprise software dictionary. This session explores the patterns and techniques of real-world enterprise data science solutions. The session covers the fundamental building blocks of modern data science architectures such as batch and realtime data processing, visualization, analysis, machine learning, and data access. To keep things practical, the session illustrates the platforms that provide the capabilities to implement data science solutions in today’s enterprise.

Jesus Rodriguez, Managing Partner, Tellago

NETWORKING RECEPTION in the Data Solutions Showcase

5:00 p.m. - 6:00 p.m.

Track C — Big Data Tools for the Oracle DBA

Moderator: Suzanne Prezorski, Lead Programmer/Analyst, Cablevision

C101: Get Real-Time Oracle Data Into Kafka to Unlock the Value of Your Data

10:45 a.m. - 11:45 a.m.

Kafka is gaining momentum as a very popular and fast messaging platform. It is extremely good at quickly integrating different types of data. Kafka makes this data available as an up-to-date, real-time data stream for enterprise users who are no longer satisfied making business decisions with static data. So much hidden data is available in our Oracle databases, the issue becomes how to turn the database inside out and make this data available in real-time to Kafka along with other enterprise data sources. This session presents use cases and a live demo for Oracle real-time data streaming as well as an introduction to Kafka and how to use Oracle logical replication to get real-time data into Kafka.

Chris Lawless, Product Manager, Dbvisit Software Limited

C102: Spark for Oracle Developers

12:00 p.m. - 12:45 p.m.

This session covers moving from an Oraacle-only pipeline to a real time Spark pipeline.

Anant Asthans, Big Data Consultant/Data Scientist, Data Scientist, Pythian

ATTENDEE LUNCH in the Data Solutions Showcase

12:45 p.m. - 2:00 p.m.

C103: Integrate Big Data with Master Data on Hadoop and Spark

2:00 p.m. - 2:45 p.m.

Big Data analytics furnishes actionable information to help business decisions—e.g., "Which of our products got four stars rating on social media in the last quarter?” Producing actionable information requires both Big Data i.e., facts, usually stored in HDFS or NoSQL DBMS as well as Master Data i.e., products, customers, usually stored in Oracle database. You either ship Master Data to Hadoop (ETL) or access it directly at the storage location through a storage handler. This technical session demonstrates Oracle Table Access for Hadoop and Spark (OTA4H) which turns Oracle database tables into Hadoop or Spark datasources thereby allowing Hive SQL or Spark SQL queries and joins without shipping data around.

Kuassi Mensah, Director of Product Management, Oracle Corporation

COFFEE BREAK in the Data Solutions Showcase

2:45 p.m. - 3:15 p.m.

C104: Hadoop Data Store and Retrieval: Perceptions and Reality

3:15 p.m. - 4:00 p.m.

Hadoop-based data stores have attracted the imagination of technologists, management, and technology leaders. Hadoop data stores are distinctly different from traditional structured relational databases. They have opened the door for new paradigms in storing structured and unstructured data and led to a concept of "store first, think later" as to how to use it. This session focuses mainly on how data can co-exist between traditional RDBMS and emerging data types within the Hadoop ecosystem. It discusses how data formats and query engines differ from the traditional access patterns. Many complex data management issues within Hadoop exist. With many open source projects running parallel, there are new ideas, new query modules, and new storage methods evolving. We discuss perceptions and realities from a data administrator and data manager perspective.

Madhu Tumma, Director, IT Engineering, TIAA-CREF

C105: Oracle NoSQL for the Oracle RDBMS DBA

4:15 p.m. - 5:00 p.m.

Numerous NoSQL databases are on the market, under evaluation, and implemented in shops that are traditionally RDBMS shops. Where does the Oracle NoSQL Database fit in? The Oracle RDBMS database administrator has an opportunity to learn Oracle NoSQL and add it to the portfolio.

Charles Pack, Technical Director, CSX Technology IOUG, NFOUG

NETWORKING RECEPTION in the Data Solutions Showcase

5:00 p.m. - 6:00 p.m.

Hadoop Day

Moderator: Joe McKendrick, Principal Researcher, Unisphere Research

H101: Unleashing the Power of Hadoop

10:45 a.m. - 11:45 a.m.

Data analytics has emerged as the must-have strategy of organizations around the world, helping them understand customers and markets and predict shifts before they happen. At the center of the new Big Data movement is the Hadoop framework, which provides an efficient file system and related ecosystem of solutions to store and analyze big datasets. Find out how to make the power of Hadoop work for you.

Harnessing the Hadoop Ecosystem

People who are new to Big Data lack a big-picture view of how end-to-end solutions are actually constructed. Non-adopters are confronted with a vast amount of disparate information with no understanding of how to use the underlying tools. As a result, they are left with an incomplete understanding of how Hadoop may be used to solve their problems. This session addresses that knowledge and experience gap.

James Casaletto, Principal Solutions Architect, Professional Services, MapR

HBase Data Model—The Ultimate Model on Hadoop

There are many limitations on Hadoop-HDFS. HBase— the Database of Hadoop—helps overcome these issues. HBase is a NoSQL (nonrelational) database and an Apache Project. It is a column-oriented database management system that runs on top of HDFS, it is modeled after Google’s BigTable, and is suited for hosting very large tables to store semi-structured parse datasets. Attend this session to learn more.

Tassos Sarbanes, Mathematician / Data Scientist, Investment Banking, Credit Suisse City University of New York

H102: Querying Hadoop and NoSQL Data Stores

12:00 p.m. - 12:45 p.m.

The pressure is growing for organizations to react faster to changing opportunities and risks by using data to improve decision making. Hadoop and NoSQL data stores provide new options for organizing and aggregating data in all its forms. Find out what you need to know about querying Hadoop and NoSQL data stores.

De-Siloing Data Using Apache Drill

Study after study shows that data scientists spend 50–90% of their time gathering and preparing data. In many large organizations, this problem is exacerbated by data being stored on a variety of systems, with different structures and architectures. Apache Drill is a relatively new tool that can help solve this difficult problem by allowing analysts and data scientists to query disparate datasets in-place using standard ANSI SQL without having to define complex schemata or rebuild their entire data infrastructure. This session introduces the audience to Apache Drill and presents a case study of how Drill can be used to query a variety of data sources.

Jair Aguirre, Lead Data Scientist, Booz Allen Hamilton

ATTENDEE LUNCH in the Data Solutions Showcase

12:45 p.m. - 2:00 p.m.

H103: Harnessing Big Data With Spark

2:00 p.m. - 2:45 p.m.

Apache Spark, an engine for large-scale data processing, can be complementary to Hadoop but it can also be deployed without it. Find out what Spark offers and why it is gaining ground in the world of Big Data.

Apache Spark and Effective Machine Learning

The introduction of Hadoop MapReduce (MR) allowed the application of algorithms to data of unprecedented scale using systems built from cheap commodity hardware. However, MR is slow, significantly curtailing its applicability to advanced iterative machine learning (ML) algorithms. These algorithms frequently need to be run multiple times in order to effectively train and optimally parameterize. Spark changed this, and, by providing speedups of 100X or more, fundamentally introduced the possibility of applying ML to Big Data and extracting meaningful insights in actionable timeframes. This presentation provides an overview of the framework Alpine Data developed and present results for a variety of well-known datasets, illustrating how the software can significantly eliminate the repetitive, trial-and-error nature of today's data science and reduce the time to an effective model.

Lawrence Spracklen, VP, Engineering, Alpine Data

COFFEE BREAK in the Data Solutions Showcase

2:45 p.m. - 3:15 p.m.

H104: Building Hadoop Applications

3:15 p.m. - 4:00 p.m.

Building Hadoop Applications

Whether Hadoop becomes the de facto management platform of the future for Big Data or simply a key component in a hybrid architecture comprised of numerous technologies, by now it is certain that Hadoop, along with its larger ecosystem, is no fly-by-night technology. Find out the key issues involved in leveraging Hadoop for Big Data applications.

Building Scale-able Machine Learning Applications on Apache Spark

There is a growing demand in the industry for highly scale-able data processing platforms that are built around simplicity of architecture, compatibility and robustness. Apache Spark is one such example, and is one of the most exciting emerging technologies in today's computing landscape. It is finding many exciting use cases, one of them being Machine Learning. In this session, a brief description of the Apache Spark Architecture will be presented, along with a case study of a machine learning algorithm to demonstrate the versatility of the Spark Architecture.

Abhik Roy, Engineering Principal, Wells Fargo

H105: The Great Data Lake Roundtable

4:15 p.m. - 5:00 p.m.

The concept of the data lake is intriguing for Big Data because it allows data to be retained in its original format so it can be used now and in the future by different users for different purposes. But what are the issues that need to be considered in terms of data lake governance, regulatory compliance, security, and access, as well as data cleansing and validation to make sure data is accurate and up-to-date?

Anne Buff, Business Solutions Manager, SAS Best Practices, SAS Institute

Abhik Roy, Engineering Principal, Wells Fargo

Tassos Sarbanes, Mathematician / Data Scientist, Investment Banking, Credit Suisse City University of New York