<< back Page 3 of 3

Grasping Hadoop’s Full Potential


Now that Hadoop has activated resource management in the cluster with YARN, response-time performance and near real-time are operational requirements. These are driving the development of the Apache Storm project and distributed in-memory architectures that couple big data scalability capability with high-performance and user experience requirements. Operational requirements are also being supported by the recent performance improvements of SQL engines Apache Hive and Drill in the ecosystem, as well as by higher-performance HDFS replacements, such as MapR-FS and General Parallel File System (GPFS).

The Next Wave of Data

Hadoop has democratized big data, making it more accessible for viable use within data-driven businesses. As the next wave of data begins to swell with the Internet of Things, we are starting to realize that “big data” might have just been a warm up for the even bigger data, velocity, and variety soon to come.

Forward-thinking data-centric companies are already planning for how data architectures will need to operate to meet the business demands and opportunities of the next wave of big data. Understanding Hadoop’s origins, evolution, and accomplishments allows us to think about its future. When you factor in decades of computing architecture evolution and vision, you realize that Hadoop has a seat at the long-term strategic table for enabling enterprise data architecture.

You can look at Hadoop as a cluster. You can look at a data lake as storage. But until you fully embrace the concept that Hadoop is data persistence with a data management layer, you’re missing its full potential.

Relational transaction systems are based on ACID capabilities of a record or transaction to maintain integrity in most RDBMS. In a high-user and multi-tenant environment where many systems are updating data, the ability to correctly maintain data consistency is critical. Future Hadoop data engines and applications will require that transaction integrity be maintained in order to replace the current generation of RDBMS-based applications.

Transactional systems also have requirements for backup, recovery, fault-tolerance, and disaster recovery scenarios from the Hadoop clusters that are typically an isolated system in the data center. These requirements will be met in third-generation Hadoop, and compatibility with existing IT tools that perform these functions for the incumbent RDBMS systems is a factor in accelerating the adoption of next-generation data operating systems.

Now that Hadoop has activated resource management in the cluster with YARN, response-time performance and near real-time are operational requirements. These are driving the development of the Apache Storm project and distributed in-memory architectures that couple big data scalability capability with high-performance and user experience requirements. Operational requirements are also being supported by the recent performance improvements of SQL engines Apache Hive and Drill in the ecosystem, as well as by higher-performance HDFS replacements, such as MapR-FS and General Parallel File System (GPFS).

The Next Wave of Data

Hadoop has democratized big data, making it more accessible for viable use within data-driven businesses. As the next wave of data begins to swell with the Internet of Things, we are starting to realize that “big data” might have just been a warm up for the even bigger data, velocity, and variety soon to come.

Forward-thinking data-centric companies are already planning for how data architectures will need to operate to meet the business demands and opportunities of the next wave of big data. Understanding Hadoop’s origins, evolution, and accomplishments allows us to think about its future. When you factor in decades of computing architecture evolution and vision, you realize that Hadoop has a seat at the long-term strategic table for enabling enterprise data architecture.

You can look at Hadoop as a cluster. You can look at a data lake as storage. But until you fully embrace the concept that Hadoop is data persistence with a data management layer, you’re missing its full pote

<< back Page 3 of 3


Newsletters

Subscribe to Big Data Quarterly E-Edition