Pre-integrated solutions and engineered systems also break enterprise IT silos by forcing companies to build a cross-skilled single team responsible for that whole engineered system.
The Future for Hadoop and NoSQL
Whether Hadoop is the best big data platform from a technology perspective or not, it has such a broad (and growing) adoption in the industry nowadays that there is little chance for it to be displaced by any other technology stack.
While, traditionally, core Hadoop has been thought of as a combination of HDFS and MapReduce, today, both HDFS and MapReduce are really optional. For example, the MapR Hadoop distribution uses MapR-FS, and Amazon EMR uses S3. The same applies to MapReduce—Cloudera Impala has its own parallel execution engine, Apache Spark is a new low-latency parallel execution framework, and many more are becoming popular. Even Apache Hive and Apache Pig are moving from pure MapReduce to Apache Tez, yet another big data real-time distributed execution framework.
Hadoop is here to stay and that means the Hadoop ecosystem at large. It will evolve and add new capabilities at a blazing-fast pace. Some will die out and others move into mainstream. “Core Hadoop” as we know it will change.
There are many commercial off-the-shelf (COTS) applications available that use relational databases as a data platform—CRM, ERP, ecommerce, health records management, and more. Deploying COTS applications on one of the supported relational database platforms is a relatively straightforward task, and application vendors have a proven track of deployments with clearly defined guidelines. It can be argued that the majority of relational database deployments today host a third-party application rather than an in-house developed application.
Big data projects, on the other hand, are pretty much 100% custom-developed solutions and nonrepeatable easily at another company. As Hadoop has become the standard platform of the big data industry, expect a slew of COTS applications to deploy on top of Hadoop platforms just as they are deployed on top of relational databases such as Oracle and SQL Server.
For example, all retail players have to solve the challenges of providing a seamless experience to the clients across both physical and online channels. All city governments have the same needs for traffic planning and real-time control to minimize traffic jams and at the same time to minimize the cost of operations and ownership. Companies will be able to buy a COTS application and deploy it on their own Hadoop infrastructure no matter what Hadoop distribution it is.
It is, however, quite possible that the new big data COTS applications will be dominated by software as a service (SaaS) offerings or completely integrated solution appliances (as an evolution of engineered systems) and that means a completely different repeatable deployment model for big data.
Unlike Hadoop, however, the world of NoSQL is still represented by a huge variety of incompatible platforms and it’s not obvious who will dominate the market. Each of the NoSQL technologies has a certain specialization and no one size fits all—unlike relational databases.
Relational Databases Are Not Going Anywhere
While there is much speculation about how modern data processing technologies are displacing proven relational databases, the reality is that most companies will be better served with relational technologies for most of their needs.
As the saying goes, if all you have is a hammer, everything looks like a nail. When database professionals drink enough of the big data Kool-Aid, many of their challenges look like big data problems. In reality, though, most of their problems are self-inflicted. A bad data model is not a big data problem. Using 7-year-old hardware is not a big data problem. Lack of data purging policy is not a big data problem. Misconfigured databases, operating systems, and storage arrays are not big data problems.