A unified data platform has been the holy grail for data managers for some time now. But, anyone paying attention to the big data space over the past several years might look at today's data integration problems and ask, "Weren't Hadoop and data lakes supposed to solve all those issues?" according to Pythian's Danil Zburivsky who offered a presentation titled, "Dismantling Data Silos Through Cloud Integration," at Data Summit 2019.

"Why are we still talking about data lakes versus the data warehouse?" said Zburivsky. "Well, the problem is that I think Hadoop hasn't quite played out as the unified data platform. And the reasons behind, this as we see it, in general in the industry, is that it turns out to be quite complex to use and navigate, use and operate. It's a big system with many moving pieces and just the operations aspect of it took quite a bit of effort. You have to basically have people dedicated, especially if you are working at large scales. Nodes will fail, you will need to fix them, you will need to keep the services updated, things like that."

In addition, it is complex and hard to use, said Zburivsky. "The promise of having a Hadoop [system] which scales by adding more servers into it, and the whole idea that you scale by buying servers, it may work easily for companies that have a buy-new-server process kind of nailed down. If you have Google-scale, Yahoo-scale, yeah, you're constantly buying new servers."

But, according to Zburivsky, in reality, buying a server in an average company takes months of procurement time. "The Hadoop clusters that I've seen in the past are either dramatically underutilized, so you buy more capacity than you actually need, or they're dramatically overutilized because you can't scale them fast enough because buying stuff is kind of a real-world process you can only do this fast. So, basically, these factors together killed the Hadoop cost proposition."

