Cloudera Delivers Complete Enterprise Data Hub

Bookmark and Share

Cloudera  has introduced an update to its Cloudera Enterprise offering, with three new editions aligned to how customers commonly use Hadoop. Cloudera also announced commercial support for Apache Spark (incubating), which offers fast interactive analytics and stream processing. 

The three editions are the Cloudera Enterprise Basic Edition, which focuses predominantly on storage and processing use cases, the Cloudera Enterprise Flex Edition, which allows a user to take the basic edition and add on a single option on an la carte basis depending on their use case scenario; and the Cloudera Enterprise Data Hub Edition, “which is essentially the all you can eat offering for all of our products,” explained Clarke Patterson, senior director, product marketing at Cloudera.

Cloudera Enterprise Support for Apache Spark

Along with these announcements, Cloudera Enterprise has added support for Apache Spark, an open source, parallel data processing framework that complements Hadoop, making it easy to develop fast, unified big data applications that combine batch, streaming, and interactive analytics. Spark is integrated with Hadoop with common data, metadata, security, and resource management, is faster than Hadoop MapReduce for data processing, and also enables easy development of stream processing applications for the Hadoop ecosystem.

The Cloudera program's inaugural partner, Databricks, spun out of AMPLab at the University of California (UC), Berkeley, is the company behind the Apache Spark framework. With Spark, Cloudera users can perform rapid, resilient processing of in-memory datasets stored in Hadoop, as well as general data processing.

Key Features of Cloudera Enteprise Data Hub

According to the company, the Cloudera Enterprise Data Hub Edition gives customers everything they need to build an enterprise data hub. It includes unlimited supported use of all Cloudera’s advanced components including Cloudera Impala for interactive analytic SQL queries;  Cloudera Search for interactive search; Cloudera Navigator for data management including data auditing, lineage and discovery; Apache Spark for interactive analytics and stream processing; and Apache HBase for online NoSQL storage and applications.

Cloudera Enterprise Flex Edition provides dedicated mission-critical applications, using one of the Cloudera Enterprise Data Hub components of the customer’s choosing on a given Hadoop cluster.

Cloudera Enterprise Basic Edition is designed for customers who need only core Hadoop for batch processing and storage, at an economical price.

According to Cloudera, using an enterprise data hub customers can:

  • Manage all their data from one unified platform, with faster time to insight and business value than was ever before possible.
  • More quickly detect fraud or security attacks, and remain in compliance with the regulatory requirements of their industry, with greater data retention across the organization.
  • Achieve better operational efficiency by freeing up resources and keeping more data readily available for analysis by business users.

Greater Understanding of Hadoop

There is an acceleration of the understanding of what a Hadoop-based platform is actually capable of and it is driven by some key capabilities, such as security, said Patterson. Many organizations have been reluctant to do much more than storage and processing of second class data because they don’t want to put anything of importance in Hadoop because they are worried that it is going to be exposed and they don’t have specific policies around data protection.

With the addition of technologies such as Sentry which offers fine-grained security for the data in the cluster itself it starts to open people’s eyes in terms of what they can do, Patterson noted. Sentry is an independent security module that integrates with open source SQL query engines Apache Hive and Cloudera Impala, providing authorization controls to enable multi-user applications and cross-functional processes for enterprise data sets.