Hortonworks Data Platform 3.0 Expands Support for Hybrid Deployments, Containerization

At DataWorks Summit, its conference in San Jose this week, Hortonworks is unveiling a major new release of the Hortonworks Data Platform (HDP) as well as expanded partnerships with Microsoft, Google, and IBM.

A priority for Hortonworks with HDP 3.0 is to make hybrid cloud deployments easier for customers, said Scott Clinton, vice president of product marketing at Hortonworks, who spoke to DBTA before the company’s announcement.

As a result, Hortonworks is continuing to evolve its products and services to help organizations be more efficient as they move data, workloads, and applications back and forth between cloud and on premise installations, as well as to enable comprehensive metadata and security to support their need for compliance with rapidly changing regulations, said Clinton. Importantly, he added, Hortonworks is also cloud-agnostic with support for a variety of cloud platforms so that organizations, which increasingly are relying on multiple cloud vendors, have choice and flexibility.

In particular, the 3.0 release of the company’s flagship platform delivers new enterprise features in support of the modern data architecture, including containerization for faster and easier deployment of applications and increased developer productivity, support for deep learning workloads, and real-time database query optimizations for fast time to insight.

Unlike other Hadoop based distributions, Hortonworks emphasizes, many of the new enhancements to HDP 3.0 are based on Apache Hadoop 3.1.

Hortonworks was also recognized last week in the DBTA 100 2018 list of companies that matter in data.

Four Key Areas of Enhancement in HDP 3.0

  • HDP support for application deployment via containerization enables apps to be launched quickly, allowing users to save time and resources. The flexibility enabled by containerization is at this point highly valued in the market, but it has been traditionally used for web apps, said Clinton, who noted that the ability to use it for data applications is a notable point of differentiation.
  • The new release also provides support for deep learning applications, allowing customers to run workloads such as machine learning and deep learning that require substantial—and expensive—GPU resources. This feature leverages pooling and isolation which enables data scientists to expand GPU access.
  • Supporting the real-time database, the new release delivers query optimization to process more data at a faster rate by unifying the performance gap between low-latency and high-throughput workloads. Enabled via Apache Hive 3.0, HDP 3.0 offers a unified SQL solution that can perform interactive query at scale, regardless of whether the data lives on-premises or in the cloud.
  • With growing pressures stemming from new regulatory requirements such as compliance with GDPR, HDP 3.0 also puts a focus on enhancing security and governance, promoting greater regulatory compliance through full chain of custody of data as well as fine-grained auditing of events, said Clinton. These new features offer the ability to track the lineage of data from its origin to the data lake. It also enables auditors to view data without making changes, have time-based policies, and audit events around third parties with encryption protection.

HDP supports Spark enabling organiztions to leverage its processing power for workloads when needed. Memory and CPU-intensive Spark-based applications can coexist with other workloads deployed in a YARN-enabled cluster.

The  HDP platform also includes engineered support for all of the major cloud object stores: Amazon S3 with support for native EDW, Windows

Azure Storage Blob (WASB), and Google Cloud Storage (GCS). This includes enhancements across the platform that deliver a consistency layer for non-consistent cloud stores. Customers also benefit from shared services of enterprise security, data governance and operations across public clouds and automatic cluster scaling cluster based on usage or time metrics for added efficiency.

Showcasing HDP's ability to handle diverse customer workloads, Hortonworks has showcased Geisinger Health System, a large health service organization, which turned to HDP to consolidate structured and unstructured data, and TMW Systems which develops enterprise management software for the transportation services industry, and is using Hortonworks to develop BI and big data tools that enable critical insights for transportation application users.

Hortonworks Data Platform 3.0 is expected to be generally available in Q3 2018.