EMC Corporation, a provider of storage and infrastructure solutions, announced it will be shipping a data warehouse appliance that leverages the Apache Hadoop open-source software used for data-intensive distributed applications. The company's high-performance, data co-processing Hadoop appliance - the Greenplum HD Data Computing Appliance - integrates Hadoop with the EMC Greenplum Database, allowing the co-processing of both structured and unstructured data within a single solution. EMC also says the solution will run either Hadoop-based EMC Greenplum HD Community Edition or EMC Greenplum HD Enterprise Edition software.
"We expect customers to run applications that process unstructured batch applications on the Greenplum HD and use the Greenplum Database for building interactive applications for structured data," Susheel Kaushik, director of product management for EMC's Data Computing Division, tells 5 Minute Briefing. He anticipates a number of use cases, including log processing, ETL processing, fraud pattern identification in customer data, video conversion, and index generation.
The solution supports Hadoop external tables, thereby enabling users to access data residing on the Hadoop Distributed File System (HDFS) without materializing the data. Administrators can read and write files in parallel from Greenplum to HDFS, enabling rapid and simple data sharing. Cross-platform analysis can be performed using the power of Greenplum SQL and advanced analytic functions accessing data on HDFS. According to the vendor, the combined solution delivers the industry's only complete big data analytics platform.
EMC Greenplum HD Enterprise Edition is an interface-compatible implementation of the Apache Hadoop stack. By maintaining Hadoop interface compatibility, the enterprise edition provides application portability while delivering advanced features required by larger organizations.
EMC Greenplum HD Community Edition is an open-source certified and supported version of the Apache Hadoop stack comprising HDFS, MapReduce, Zookeeper, Hive and HBase. EMC Greenplum provides fault tolerance for the Name Node and Job Tracker, both single points of failure in standard Hadoop implementations.
More information is available at the EMC website.