Hadoop Has Come a Long Way in 10 Years


With Hadoop marking its 10th anniversary this year, Sean Suchter, CEO of Pepperdata, recently reflected on his experience with the platform and speculated on what the next 10 years may bring.

Ten years ago, in 2006, Suchter noted, he was head of the Yahoo web search team when the company decided to experiment with an open source project that would store and process large data sets on clusters.

Hadoop was created by Doug Cutting and Mike Cafarella. Cutting, who was working at Yahoo at the time, famously named it after his son's toy elephant.

 It was originally developed to support distribution for the Nutch search engine project.

“I believed in the value of what was going on, so I said, ‘Hey, I want to be your first customer and we’re going to commit to this,’” Suchter said. “And we did, we were moving critical infrastructure to this thing when it was barely stable.”

To test the platform’s capabilities the team threw everything at it, Suchter explained.

“Hadoop started as a big batch analytics platform,” Suchter said. “Then the really exciting stuff that’s happened, in 2010 and beyond, is that there’s been an explosion of different possible use cases.”

Since 2006, Hadoop has come a long way, Suchter noted, as the platform continues to be used innovatively for a variety of use cases.

“The clusters were going from one little science experiment to a departmental thing to a companywide thing. The vision that you can take all the data assets a company has and correctly use them to apply to an arbitrary problems and stream data in is actually happening now,” Suchter said. “It’s not just an analytics platform now; it’s an arbitrary distributed computing platform.”

However, a barrier of adoption is cropping up as the systems, applications, and the data being thrown into Hadoop become more complex.

“I just want organizations to not have to miss a step,” Suchter said. “Our product is focused on this wall that everyone is hitting where you can’t trust, if you throw one more app on, that the cluster is not going to blow up.”

While working for Yahoo, Suchter said, the team ran into this issue and it wound up taking down the whole Yahoo search engine.

“The root problem was the same pattern: one tenant mucking with the hardware that another tenant needs,” Suchter said. “The solution to that was expensive and painful at the time.”

He predicts that in the next decade people will realize Hadoop is a system that can run a myriad of different applications, not just large-scale data, and that will change the industry.

“The more advanced users realize this and are starting to use Hadoop in this way where it is about many different things,” Suchter said. “People are going to realize this is a fabric that can be used for a lot of different things.”

Image courtesy of Shutterstock. 



Newsletters

Subscribe to Big Data Quarterly E-Edition