Cloudera Enables Fast Data Analytics with New Storage

Bookmark and Share

Cloudera has announced a public beta of a new storage to enable faster analytics in Hadoop. Kudu, a new columnar store for Hadoop, enables the combination of fast analytics on fast data. Complementing the existing Hadoop storage options, HDFS and Apache HBase, Kudu is a native Hadoop storage engine that supports both low-latency random access and high-throughput analytics, dramatically simplifying Hadoop architectures for increasingly common real-time use cases.

A public beta of Kudu is available under the Apache open source license, and will be transitioned to the Apache Software Foundation incubator in the future.

Cloudera has launched a public beta release of RecordService, a new high-performance security layer for Apache Hadoop that centrally enforces role-based access control policies across the platform. Complementing Apache Sentry (incubating), which provides unified policy definition, RecordService delivers complete row- and column-based security, and dynamic data masking, for every Hadoop access engine

With the rise of streaming data, there has been a growing demand for combining the two features to build real-time analytic applications on changing data - leading developers to create complex architectures with the storage options available. Kudu complements the capabilities of HDFS and HBase, providing simultaneous fast inserts and updates and efficient columnar scans. This powerful combination enables real-time analytic workloads with a single storage layer, eliminating the need for complex architectures.

"We've been making Hadoop better since the very beginning," said Charles Zedlewski, vice president, products, Cloudera. "We have an ambitious mission: to constantly drive innovation within the community to usher in the next-generation of analytics supported by Hadoop, so companies can adapt to the latest technologies. Cloudera has already transformed what's possible with Hadoop — enabling interactive data discovery and analytics with Impala and flexible data processing and streaming with Apache Spark. Kudu continues this trend by revolutionizing Hadoop's storage architecture to better support development of real-time analytic applications, and serves as a crucial step towards solidifying Hadoop as leading platform for modern analytics."

Kudu's architecture streamlines the developer experience for building analytic applications - supporting common use cases that include time series analysis, machine data analytics, and online reporting. Additionally, Kudu is designed to take advantage of changing trends in hardware and in-memory processing. It delivers outstanding CPU performance, takes advantage of RAM and Flash, and drives high I/O efficiency as a true columnar store. Finally, as a native, open component within Hadoop, Kudu is integrated with and provides faster query performance for the most powerful analytic frameworks. Users already rely upon many of them, including Impala and Spark – for end-to-end analytic applications in a single platform.

Kudu was jointly engineered by Cloudera and Intel in advance of the changing hardware landscape. Intel has actively contributed to Kudu to help it take full advantage of current and future Intel processor and memory technologies. Kudu was designed to use new persistent memory (pmem) innovations being developed through Intel's pmem project.

To access a technical white paper on Kudu, go to