EMC Corporation has announced a new alliance to enable integration of technology from Cloudera, a provider of Hadoop-based data management software and services, with the EMC Data Computing Products division's Greenplum technology to help businesses better manage and analyze large and continuously growing amounts of structured and unstructured information such as log files, sensor data, streaming data, sales receipts, emails, research data and images collectively known as "big data."
The integration between Cloudera's Distribution for Hadoop (CDH) for collecting, consolidating and analyzing data with EMC's Greenplum massively parallel processing database and enterprise data cloud platform will provide an architecture for collaborative analysis of large amounts of structured and unstructured data. The connector between the two products will be supported by both Greenplum and Cloudera.
Cloudera's data management platform is built on the Apache Hadoop open source software package that consolidates data into a single repository for comprehensive analysis at lower costs while enabling fast, detailed processing and analysis of the data. Data staged by Cloudera's Distribution for Hadoop will be integrated with the EMC Greenplum Chorus platform, which uses cloud computing techniques and social collaboration for enterprise data warehousing and analytics. As a result, users will be able to discover, access and analyze data from both Greenplum databases and Hadoop infrastructure seamlessly.
"From the Greenplum perspective, we are all about big data, and about allowing flexible analysis and allowing people to analyze large volumes of data, but we want to support both structured and unstructured data and all of the different types of data people have," explains Ben Werther, director of product strategy, Greenplum / EMC Data Computing Products division. "There is a very good complement between Hadoop and Greenplum, between Hadoop's ability to do unstructured data and some of the transformations for loading data, and Greenplum's ability to do complex analysis on very large volumes of data," he tells 5 Minute Briefing. For Greenplum, which has worked with Hadoop, and with Cloudera specifically, Werther says, "this announcement really signals the new wave of this relationship," going beyond an initial phase of getting things up and running and finding synergy on an account by account basis, to actually now build integrated offerings leveraging the best of Greenplum and Cloudera's Hadoop offering together.
While customers are using the technologies together already, offerings that represent the fruit of this relationship will first appear in early 2011. Under the agreement, in addition to technical integration, EMC and Cloudera will work together on joint sales activities. EMC will be exhibiting and presenting on its relationship with Cloudera at the annual Hadoop World conference taking place in New York City on October 12. For more information on Greenplum, go here.