Cloudera Enterprise 5.5 Introduces Cloudera Navigator Optimizer

By Joyce Wells

Nov 19, 2015

Hadoop distribution provider Cloudera has introduced Cloudera Enterprise 5.5, including Cloudera Navigator Optimizer, a new product targeted at helping organizations improve big data workload performance and efficiency. Cloudera Navigator Optimizer, now in beta, is expected to be generally available in 2016.

The new release of Cloudera Enterprise has three main areas of focus, according to Anupam Singh, head of data management at Cloudera. They are a focus on community, making data analysts more successful, and helping people get onto Hadoop as quickly as possible.

Cloudera Navigator Optimizer is Launched

With the 5.5 release, Cloudera is introducing Cloudera Navigator Optimizer, a rebranding of Xplain.io’s Big Data Integration Service (BDIS) product which Cloudera acquired in February 2015, said Singh, who, before joining Cloudera, was a cofounder and CEO of Xplain.io. Describing the capabilities of Cloudera Navigator Optimizer as being very similar to those of Oracle Enterprise Manager, Singh said the idea is to address four pain points customers have now, including that: ETL is notorious for being very brittle and having data quality issues, analysts are waiting too long for reports, there is increasing pressure to give ad hoc data access to data scientists and statisticians, and customers have extremely long queries that make it difficult to identify problems.

Cloudera Navigator Optimizer instantly analyzes existing workloads, providing visibility into which ones are the most critical, which data is accessed most, and how is it being used. It then automatically turns this information into a full optimization strategy for fast success with Hadoop. Through an intuitive dashboard, customers get prioritization guidance on where to focus development efforts to achieve the biggest impact, centered around identifying duplication, exposing complexity, and leveraging compatibilities with ecosystem tools such as Impala and Apache Hive.

“Customers have thousands and thousands of SQL queries and that is what is driving their need for the product,” Singh said.

Cloudera Continues Support for Open Source Community

In terms of community news, Impala, a SQL on Hadoop dataase system for interactive workloads that was first released in 2012, is being donated by Cloudera to the Apache Software Foundation (ASF) for governance, along with Kudu, a columnar store for Hadoop that enables fast analytics on fast data. By donating its analytic database and columnar storage projects to the ASF, Cloudera hopes to accelerate the growth and diversity of the respective developer communities.

Furthermore, this year, Cloudera has focused on Impala and Kudu together to improve the speed of writes. And on the security side, Cloudera has started integrating RecordService with Impala’s I/O manager so that people can get classical database security from Impala. In addition, Kudu is addressing a longstanding request from customers to make storage updatable.

Improved Analytics in Cloudera 5.5

On the analytics side, said Singh, one of the biggest requests has been for support for JSON, which is the most popular form of nested data types, and that is being added with this release. Impala now supports nested data types, including JSON, for expanded data discovery and business intelligence. More importantly, he noted, as customers give more access to more analysts, security is becoming more critical and with this release, column-level security to enable fine grained access controls is being introduced.

And, with both Impala and Hive, there is better traditional management in Cloudera Manager, and features are being introduced for better archiving of data to support information lifecycle management. Cloudera Navigator enforces full data lifecycle workflows, including retention and archiving, so the right data is available. Additionally, building on the Cloudera Navigator Accelerator Program, a new Cloudera Navigator SDK opens up lineage and metadata capabilities to the partner tools, for expanded visibility no matter what tools are used to integrate, wrangle, or analyze data.

In terms of processing engines, Cloudera’s philosophy has always been that people should be able to use different processing engines and with this release, Cloudera is offering users Spark SQL engine, for another SQL Engine that is focused on Spark use cases, and MLlib, Spark’s machine learning (ML) library, which also goes along with the philosophy of supporting multiple processing engines. And finally, there is the ability to use Spark on S3 Storage so data can be queried using Spark.

Helping Customers Get on to Hadoop More Quickly

In total, with this release, said Singh, Cloudera is helping customers use the best SQL engine based on the use case, continuing its commitment to open source by making Impala and Kudu Apache Software Foundation-governed, and helping organizations get productive on Hadoop more rapidly, rather than having to figure out all the technology underpinning it, so they can “quickly jump on to the Hadoop train.”

For more information about Cloudera 5.5, go to the Cloudera Engineering Blog, and access the Release Notes.

Image from Cloudera website.