Decoding the Mixed Messages of Hadoop

Bookmark and Share

Someone new to big data and Hadoop might be forgiven for feeling a bit confused after reading some of the recent press coverage on Hadoop.

On one hand, Hadoop has achieved very bullish coverage in mainstream media. Consider “The joys and hype of software called Hadoop” from and “Why you’ll be hearing ‘Hadoop’ a lot in 2015” from Furthermore, analysts remain virtually unanimous on the significance of Hadoop. For example, “Forrester: Hadoop to become an enterprise priority in 2015” was the title of a recent post.

However, counter to this positive coverage, there have been a number of claims that Hadoop is overhyped—for instance, in Forbes’ “The End of the Hadoop Bubble?”—as well as numerous reports from the field citing problems with Hadoop’s ease of use and value proposition.

What’s a person to make of all these mixed messages?

Gartner’s Hype Cycle describes how new technologies frequently undergo a peak of inflated expectations prior to a trough of disillusionment. These peaks and troughs proceed what Gartner calls the “plateau of productivity.” To some extent, we are seeing this inevitable hype cycle play out for Hadoop. However, there are also some more specific factors at work.

Hortonworks, one of the major Hadoop vendors recently filed for an IPO, requiring them to reveal details of their revenue and customer base. Although the Hortonworks IPO was quite successful, the disclosures of Hortonworks revenues confirmed what many had suspected but few knew for certain: Services make up a large proportion of revenues for Hadoop and although uptake of the commercial Hadoop distributions is healthy, the route to profitability is unclear.

Hortonworks is a good example of the dilemma that faces many companies whose business model revolves around open source software. Companies such as Hortonworks and Cloudera want to be the Red Hat of Hadoop—driving growth through the commercialization of an open source core product. However, Red Hat is increasingly being seen as a unique case, unlikely to be repeated. Where multiple vendors fight for dominance over an open source core technology, and where the free version of the technology itself is robust, it’s difficult to become another Red Hat. Consequently, part of the reservation around Hadoop is not so much about the future of Hadoop as a technology but rather a fear that the Hadoop vendors are overvalued.

We are also entering the phase in which the “rubber hits the road” with respect to big data projects. The premise of big data is that we can derive competitive advantage through the more intelligent analysis of larger volumes and varieties of data. Hadoop to a great extent solves the storage problem for big data—it provides economical storage for large volumes of data and is capable of storing both structured and unstructured data. However, Hadoop alone does nothing to expedite the intelligent analysis side of big data. Currently, there is no single technology which solves the problems of big data analytics—open source projects such as R, MLlib, and mlpy are tools suitable for advanced data scientists but not for mainstream business. And, even these tools have scalability issues in the big data arena. So, Hadoop is a necessary but not sufficient contributor to the big data value proposition.

And, of course, as Hadoop projects graduate from pilot to production, issues with manageability, security, and performance arise. These factors lead to a “Hadoop hangover” which is often the result of unrealistically high expectations.

However, Hadoop remains a core proven technology which will be part of big data projects for the forseeable future. New technologies such as Spark represent significant innovations, but for all intents and purposes increase rather than diminish the importance of Hadoop. My expectation is that we’ll continue to see growing adoption of Hadoop over the next few years—albeit with possibly more realistic expectations around the costs and benefits that Hadoop entails.

Guy Harrison is an executive director of R&D at Dell and author of the Oracle Performance Survival Guide (Prentice Hall, 2009). Contact him at