Data Summit Panel Discussion Focuses on the Future of Big Data

May 19, 2015

Organizations have begun to realize the importance that big data can provide them. The technologies and issues involved in becoming a data driven enterprise were explored in a panel discussion at Data Summit 2015 in New York City, featuring David Mariani, CEO of AtScale; Andy Schroepfer, chief strategy officer with HOSTING; and Wendy Gradek, senior manager, R&D Analytics, EMC.

Organizations are beginning to come around on big data and realize the benefits that it may provide a company, there are still issues with the methods of how they go about big data. “You need to know your goal. Too many people fail to know why they are crunching the data. Having a generic goal is not going to deliver the desired outcome,” stated Schroepfer. Mariani and Gradek both cited that there is too large of a gulf from the time that data is prepared until it is queried and disparate data sources being causes for data failing businesses.

“With the time it takes for you to answer a question, it doesn’t always work for a business,” stated Gradek. The data may not produce an answer for 6 months to a year, and businesses typically work on weekly or monthly cycles so by the time they receive the data it is too late. Both championed the data lake as a tool that will help solve the disparate data issue and hopefully improve the speed in which businesses can gain value from their data.

However, an issue that has emerged with data lakes is data governance. “Having data distributed throughout the organization on numerous people’s desktops is far worse than having a centralized store,” said Mariani.

Data warehouses have provided a tremendous storage option and still provide use today. Time has shown some flaws in data warehouses though. Mariani noted that data volume has become so massive that we can’t pre-process or pre-structure it. Gradek pointed out the importance of having a defined question that needs to be solved and the specific data for that question.

“From the standpoint of the data miner, it is important to not mine for everything and just address the problem.” It is necessary, she said, to provide the customer their data information in a timely manner is and have a laser focus on only the pertinent information is crucial.

Schroepfer discussed “biting the bullet” on the one big expense of moving data to the cloud. “Once you’ve got it to the cloud, all the solutions being made allow you to be portable with your data with at least 1/10^th of the cost of what it used to be going from a dedicated system to the cloud for the first time. If you learn a lesson, bite that bullet once so you can be in the cloud and have a data lake that is going to cost a lot less as it expands compared to a data warehouse.”

All of the panelists agreed that the cloud is the future. With the amount of data being collected it will eventually make too much financial sense to not move to the cloud for organizations.

Looking ahead to the future, the panel generally agreed that at Data Summit 2020 we may have too much data. Wendy Gradek noted that eventually, we may even want less data. Added David Mariani, “we’re going to have too much data because we are going to figure out how to capture it and store it cheaply, but we are not going to know how to parse it.”

To access the slide decks from Data Summit presentations, go to www.dbta.com/datasummit/2015/presentations.aspx