EMC Corporation has announced version 4.2 of EMC Greenplum Database, which includes a high-performance gNet for Hadoop; simpler, scalable backup with EMC Data Domain Boost; an extension framework and turnkey in-database analytics; language and compatibility enhancements for faster migrations to Greenplum; and targeted performance optimization.
The 4.2 release advances the EMC Greenplum Unified Analytics Platform (UAP), William Davis, EMC Greenplum marketing manager, tells 5 Minute Briefing.
Greenplum announced the UAP in December 2011 as a platform to support big data analytics, combining the co-processing of structured and unstructured data with a productivity engine for collaboration among data scientists. The UAP brings together the MPP EMC Greenplum Database for structured data, the enterprise Hadoop offering EMC Greenplum HD for analysis and processing of unstructured data, and EMC Greenplum Chorus, its productivity engine for data science teams.
One of the most important enhancements in Greenplum 4.2, says Davis, is the enablement of high-performance import and export of all data (compressed and uncompressed) from Hadoop using gNet for Hadoop, a parallel communications transport, for direct query interoperability between Greenplum Database and Hadoop. This is necessary, according to the company, because in order to expand the range of solutions that can be created for data integration and processing and to run queries for mission-critical complex analysis, customers need the most efficient and flexible data exchange between Greenplum Database and Hadoop, in addition to the existing parallel data access.
Greenplum 4.2 also launches the new Greenplum Command Center, a web-based big data infrastructure management console, providing a unified administrative and real-time and historical health-monitoring dashboard for all Greenplum products. As data passes across Greenplum HD and Greenplum Database, "under one single pane of vision," users can not only manage resources but manage the performance across both platforms, says Davis. Supported Greenplum Database administrative operations include start, stop, and initialize Greenplum Database; search, prioritize, or cancel any query; and recover and rebalance data mirrors. The initial release of Greenplum Command Center is available with Greenplum Data Computing Appliance version 1.2.
In addition, in Greenplum 4.2, advanced integration with EMC Data Domain deduplication storage systems via EMC Data Domain Boost, enables "what's changed backup," says Davis. The faster, more efficient backup enables 10 to 30x data reduction on average, according to the company. The integration distributes parts of the deduplication process to Greenplum database servers, enabling them to send only unique data to the Data Domain system, thus increasing throughput, reducing the amount of data transferred over the network and eliminating the need to create and manage virtual drives.
Addressing database manageability and performance, Greenplum Database delivers an agile, extensible platform for in-database analytics, leveraging the system's massively parallel architecture. With Release 4.2, Greenplum enables turnkey in-database analytics via Greenplum Extensions, which can be downloaded from EMC Subscribenet and installed using the new Greenplum Package Manager-a new utility that ensures automatic installation and updates of functional extensions to simplify the task of enabling and managing advanced in-database functionality across a cluster.
Release 4.2 also supports dynamic partition elimination and query memory optimization, thus drastically reducing the data scanned for a query, significantly accelerating query processing and allowing for more concurrency.
Complete details are available about the EMC Greenplum Database.