Leveraging Real-Time Big Data with Spark, Kafka, and Hadoop

As more and more data comes into the enterprise, companies are looking to build real-time big data architectures to keep up with an increased amount of information.

In order to do this efficiently, they need to tap into a variety of tools, said Shane Johnson, senior manager of product marketing at Couchbase, in a DBTA webinar. Those tools combine NoSQL for operational data access, Spark for streaming analytics, Kafka for distributed messaging, and Hadoop for offline processing.

“No one product solves your whole puzzle. It is extremely unlikely, if not impossible, that you can buy one data tool and it solves all of your needs,” Johnson explained. “The real art is picking a variety of tools and connecting them together to create a data flow that does what you need it to accomplish.” 

A database can be the first stop on the road to building the best platform, followed by pushing out data to Kafka, Hadoop, or streaming for asynchronous work, Johnson said.

He noted how Couchbase combines NoSQL and Hadoop by using a “bucket of data” that is then split into sub-buckets. This data subsequently gets distributed across clusters which then open up multiple connections that move that data in parallel. This allows enterprises to achieve a level of continuous data streaming or analyze real-time data by adding Kafka to help.

Kafka can give enterprises the ability to gain insights closer to real-time, instead of moving data in bulk. Data can then stream into the Kafka message queue, which then reaches a certain tipping point and allows data to move over, achieving a semi continuous stream, according to Johnson.

Enterprises can also add either Storm or Spark into the equation to process data in real time while Hadoop is archiving. As Storm is analyzing, a continuous stream of data puts results in dashboards so people can see what’s happening, Johnson said.

To view a replay of this webinar, go here.


Image courtesy of Shutterstock.



Newsletters

Subscribe to Big Data Quarterly E-Edition