Hadoop is contributing to the success of data analytics. Anad Rai, IT manager at Verizon Wireless, examined the differences between traditional versus big data at Data Summit 2015 in a session titled “Analytics: Traditional Versus Big Data.” The presentation, which was part of the IOUG track moderated by Alexis Bauer Kolak, education manager at the IOUG, showed how big data technologies are helping data discovery and improving the transformation of information and knowledge into wisdom.
There are two types of techniques to dealing with data analytics, supervised and unsupervised learning, Rai said.
Supervised learning is the machine learning task of inferring a function from training data while unsupervised learning is trying to find hidden data, Rai explained.
After defining those two methods Rai outlined the differences between Hadoop, NoSQL and the RDBMS and how they co-exist with each other.
Hadoop is a data storage solution that is made up of HDFS, which stores data, and MapReduce, which processes data. “It allows companies to process the data using cheap resources and easily storing it,” Rai said. “Since it is a cheaper resource it helps building that ROI and you can use that data.”
A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model.
“All these are different data stores and each one of them is complementary,” Rai said.
Then Apache Hive comes along to help Hadoop, Rai said, which is similar to the external table on HDFS. It has a SQL-like access to data by using MapReduce, he noted, and Hive can also use tools like Tableau.
“In the past we were using several techniques to solve problems now we are adding insights to value by exploring things from Hadoop,” Rai said. “In order to explore and understand data we need the necessary analytical tools.”