MapR Advances Support for Flexible and High Performance Analytics on JSON and S3 Data with Apache Drill 1.15

Jan 30, 2019

MapR Technologies, provider of a data platform for AI and analytics, has announced support for Apache Drill 1.15. The new release offers new enhancements to conduct queries on complex nested data structures, including files, MapR JSON database tables, and cloud data sources specifically for S3 (Amazon Simple Storage Service‎).

Apache Drill is an open source distributed SQL query engine integrated into the MapR Converged Data Platform that enables self-service BI SQL analytics at scale. Drill’s distributed shared-nothing architecture enables incremental scale-out with low-cost hardware to meet the increasing demands of query response and user concurrency.

“The latest Drill release is aimed at further improving intuitive access to different data types across on-premises and cloud data sources as well as enhancing performance and usability,” said Neeraja Rentachintala, vice president of product management, MapR. “We evolved Drill by closely listening to our customers, and it is exciting to see our customers achieve true self service data exploration without compromising on analytic flexibility and performance.”

Drill 1.15 expands on ANSI SQL compliance and query performance improvements both for Parquet and MapR-DB JSON tables. With the new release it is easier to deploy Drill in multi-tenant environments co-existing with other analytic frameworks such as Hive and Spark, while achieving predictable SLAs, to successfully conduct interactive analytics at any scale.

As a result of the S3 plug-in support, customers can now access data in S3 through Drill and join them with other supported data sources such Parquet, Hive and JSON all through a single query. The spill to disk capability for memory intensive queries has also been expanded to include all SQL operations that rely on memory such as GROUP BY, JOIN, ORDER BY, DISTINCT. Memory controls can now be put in place so that large memory-intensive queries that pass a defined threshold spill to disk.

Customers now also have the ability to spin up multiple Drill clusters within a single MapR cluster to support multi-tenancy and the ability to segregate workloads by user personas, as set CPU resource limits through cgroups. In addition, users now have the ability to spin up multiple Drill clusters to cater to different user personas on a shared MapR cluster which allows isolated Drill compute workloads with guaranteed minimum resources.

The new release also allows users to leverage MapR Document Database Secondary Indexes for Complex Types and provides deeper integration with Apache Parquet, an open source column-oriented data storage format.

More information is available about the new capabilities from MapR Technologies.