Leveraging Big Data with Hadoop, NoSQL and RDBMSs

In recent years in the world of big data, there has been an increase in data that has been difficult for many companies to track.

The deluge of big data available for analysis presents great opportunity, but many organizations are having difficulty managing it. This has led to companies not being able to process valuable data, or having to delete sets of data to clear room for the massive amounts of new data. 

In DBTA’s most recent webcast, titled “The Big Data Trifecta: Using Hadoop, NoSQL and RDBMS," presenters Dale Kim of MapR, Rich Reimer of Splice Machine, and Mark Davis of the Dell Software Group outlined some optimal approaches for dealing with emerging big data challenges.

Big data is comprised of the well-known 3 V’s:  volume, velocity, and variety.  Traditionally, when there has been an influx of new data, systems have scaled up. Recently though with the variety and volume of data, organizations have struggled with these traditional methods, the presenters said.

Scale-Out Versus Scale-Up Data Management

“Data has been growing 30%-40% per year,” said Rich Reimer, vice president of marketing and product management for Splice Machine. “When traditional databases hit 1 terabyte, performance traditionally suffers.” As a result, many of the newer databases provide options for scaling out.

MapR, which provides an enterprise platform to support consolidated deployment of both Hadoop and NoSQL, has focused on trying to help customers on an individual basis. MapR is about open source and helping its customers select the project that is best for them, said Dale Kim, director of industry solutions for MapR.  “When we work with you, we understand what your specific requirements are so we follow the mantra of using the right tool for the job,” said Kim.  A key feature is backward compatibility which allows the customers to update components of the database without having to update the entire database.

Splice Machine aims to help customers struggling with traditional data management approaches that are unable to handle large amounts of data. Splice Machine is able to cut costs of up to 75% of traditional databases while increasing query speeds up to three-to-seven times, said Reimer.  It is unique because it is the only database that consists of Hadoop and an RDBMS, he noted. “We bring the best of both worlds.”

Emerging Use Cases for Big Data

There have been five main uses for Splice Machine, said Reimer. These include the areas of digital marketing, creation of an operational data lake, Internet of Things, personalized medicine, and fraud detection. With the influx of digital marketing, there have been massive amounts of advertising data for companies to sift through concerning their customers. According to Reimer, Splice Machine is able to aggregate the data and help build a better profile of their target customer. The use of operational data lakes has allowed customers to offload data from expensive data warehouse systems. Splice Machine has also allowed the tracking of and prediction of trends using the Internet of Things. On the personalized medicine front, Splice Machine is bringing more sets of information and helping doctors coordinate that information for their patients. Lastly, in the area of fraud detection, Splice Machine is improving fraud algorithms to make detection easier, and also improving security measures for credit card holders.

Mark Davis, big data distinguished engineer, Dell Software Group, agreed that many of organizations are having a hard time scaling enormous amounts of data. “Early use of Hadoop was to analyze social media from web servers, group users based on pattern behavior, and then build machines to group the data,” Davis explained.   

Today, big data use cases include not only social network analysis, but government defense intelligence, and insight into IoT data to analyze failure nodes from sensors and then be able to predict potential failures. The information management within the Dell Software Group includes database management, application and data integration, and business analytics and big data analytics. This allows organizations to manage data across different sources and integrate them together.

A replay of DBTA’s “The Big Data Trifecta: Using Hadoop, NoSQL and RDBMS” webcast is available for 90 days at