EMC Updates Greenplum Big Data Analytics Appliance

EMC Corporation has updated its appliance-based unified big data analytics offering. The new EMC Greenplum Data Computing Appliance (DCA) Unified Analytics Platform (UAP) Edition expands the system's analytics capabilities and solution flexibility, achieves performance gains in data loading and scanning, and adds integration with EMC's Isilon scale-out NAS storage for enterprise-class data protection and availability.

The Greenplum DCA enables analysis of both structured and unstructured data together. Within a single appliance, the DCA integrates Greenplum Databases for analytics-optimized SQL, Greenplum HD for Hadoop-based processing as well as Greenplum partner business intelligence, ETL, and analytics applications. 

The appliances have been able to host both a relational database and Hadoop for some time now, Bill Jacobs director of product marketing for EMC Greenplum, tells 5 Minute Briefing. “The significance of this launch is that we tightened that integration up even more.  We make those two components directly manageable with a single administrative interface and also tighten up the security. All of that is targeted at giving enterprise customers what they need in order to use Hadoop in very mission-critical applications without having to build it all up in Hadoop themselves.”

According to Greenplum the new DCA offers the power of a massively parallel processing (MPP) architecture, over 70% performance gains over the prior generation for data loading and scanning, and 100% performance increases for concurrent query workloads.

While Greenplum Data Integration Accelerators (DIAs) enable faster data loading, emphasizes Jacobs, there is more to it than that with the goal of helping customers get down to a single appliance. “We decided to open the appliance up and make options available in the appliance that allow things like ETL to be installed inside of the appliance under controlled conditions. This gives users flexibility but still keeps it in a single cabinet or a single set of cabinets that can be moved in, turned on, and easily administered from a single console. We call those Data Integration Accelerators (DIAs).”

Beyond the ETL, file staging and loading improvements that were the original impetus for the DIAs, with this announcement of this Greenplum DCA, “we will very shortly add additional DIAs that have very large computational capacity and that give us a platform for running some very high-end analytics applications, most particularly SAS visual analytics, which is very common for a big SAS user,” says Jacobs.

The support within the appliance of emergent, compute-intensive analytic applications, data visualization, and applications doing packaged analytics for specific types of data or specific industries is important for efficiency, says Jacobs.   “All of those can find a home on these DIAs within the DCA without forcing another cabinet, another rack, another system, another administrator, another network connection.”

The Greenplum DCA also enables increased system and data availability through integration with EMC's storage solutions.  According to the company, integrating the DCA with EMC Data Domain deduplication storage systems provides backup and recovery for Greenplum Database modules at rates up to 13 TB/hour, with services for wide-area replication for enhanced disaster recovery. The DCA provides both HDFS triple-redundant storage on direct-attach devices, as well as integration with EMC Isilon Scale-out NAS to provide HDFS storage that also provides data protection using snapshots, mirroring, backup, recovery and replication. Isilon also simplifies data loading and permits independent scaling of compute and storage resources.

“What is not available with most Hadoop distributions is an easy way to achieve disaster recovery preparations that will satisfy an auditor,” says Jacobs. With Isilon, he contends, it is much easier than the traditional Hadoop HDFS to achieve the enterprise-class values that auditors demand for mission-critical systems. “People who are used to running big data centers and large banks will look at this and think it is very interesting because they can solve a problem they have to solve in order to deploy, and do so in a way that is familiar. It becomes a very attractive option for the 5-10% of our customer base that has to have it.”

Learn more about the Greenplum Data Computing Appliance (DCA).