Three things people need to think about in a big data implementation are persistence, context and access, John O'Brien, founder and principal, Radiant Advisors, told attendees during his keynote, "The Big Data Paradigm," at DBTA's Big Data Boot Camp. O'Brien's talk provided an overview of the technologies and issues that attendees would learn about during the conference which took place this week in New York City.
In terms of persistence, and where data is stored, leverage the strength of many databases in a data platform; in terms of context, recognize that semantic context has many forms to govern; and in terms of access, the value of big data is by making it accessible to as many users as possible. “Hadoop doesn’t have the open accessibility that it needs yet, but it will soon. It is maturing very quickly,” observed O’Brien. Ultimately, he noted, if you want to unleash the value of big data you have to consider accessibility.
Let the Math Do the Work
O’Brien as well as Fred Gallagher, general manager, Vectorwise, Actian Corp.; Septehn Arnold, Consultant, Arnold Information Technology; and Jack Norris, CMO, MapR Technologies, participated in a wide-ranging and spirited panel discussion on big data management and data warehouse modernization. The discussion was moderated by Peter Auditore, Big Data Boot Camp conference chair and principal, Asterias Research.
Reflecting on the discussion, O’Brien noted, there were two main themes that emerged: First, that it is best to start out with a small project; and second, how messy and hard and iterative it is. “It’s not like classic BI projects at all,” he notes. “Everyone agreed that it was important to get started with a project that is small in scope, but another important aspect was to remind everyone that analytics and big data is messy and hard. A lot of times, we want a to push a button and get a quick answer. But really, working with data in a mathematical sense means taking yourself out of the picture and trying to forget about the business. Let the math do the work. Let the math come up with answers. It’s a very hard thing because we have all been trained to analyze, and think and hypothesize.”
Big Data Research from SAP and Unisphere Research
Following the opening address, David Jonker, senior director, Big Data Marketing, SAP, highlighted the results of a new big data survey, the “2013 Big Data Opportunities Survey,” which revealed a variety of practical approaches that organizations are adopting to manage and capitalize on big data.
The study was conducted by Unisphere Research, a division of Information Today, Inc., and sponsored by SAP. Jonker covered the key findings of the research, the attributes of Hadoop in a data environment, and also highlighted some ways in which big data is being leveraged by SAP customers to gain insight in fields as diverse as cancer treatment, connecting with customers in real time in retail environments, and analyzing athletic performance with sensor data during sporting events.
Reflecting on the big data study, Jonker said there were two things that stood out for him. The first was that when people think of big data now it is still how they can deal with more traditional data – transactional data. For many customers, the ability to deal with sensor data or social media data is far out in the future. They are still dealing with large amounts of relational data and not yet uncovering the benefits of unstructured data. “In fact, the research suggests quite strongly that what a lot of people are doing is trying to improve existing processes and the number of organizations that are seeking to untap big data to start whole new business models or embark on new areas of business is relatively small,” said Jonker. “That was a hunch that the data confirmed.”
The second key finding, said Jonker, is that the struggle organizations have with the relational database and how they store data largely revolves around getting to the information faster - collecting the information and getting the information faster. “That came out loud and clear,” he said. With the adoption of technologies such as columnar databases, Hadoop, and massively parallel processing solutions, they are all trying to get around the speed bump and in many ways the data confirmed that, he said. “Mostly, what they are deploying is relational databases and mostly what they have got are speed bumps.”
While it is still batch processing, Hadoop helps with the speed bump in a couple of senses, says Jonker. “If I don’t know what the data is, I don’t want to spend a lot of time, cleansing the data, prepping it to load it into a data warehouse. Hadoop lets you plunk it there and figure it out later.” It also helps in the sense that you can scale out lots of machines that are all processing smaller amounts of data, he adds.
“At SAP, we believe in-memory is the future. Everything that these solutions are doing – data warehouses columnar databases, Hadoop – they are all trying to get around the disk bump; they are all just taking different approaches to getting around the limitations of disk,” says Jonker. SAP’s perspective, he notes, is, why not just mine data in-memory. While companies may have different specialty data stores in their data environments, in-memory can tie them together.
The “2013 Big Data Opportunities Survey” report, authored by Unisphere Research analyst Joe McKendrick, is available from DBTA at www.dbta.com/DBTA-Downloads/ResearchReports/3905-2013-Big-Data-Opportunities-Survey.htm.