Dremio Enables BI Directly on Cloud Data Lakes

Dremio has introduced new capabilities to deliver sub-second query response times directly on cloud data lakes and support for thousands of concurrent users and queries. 

The latest Dremio product release enables companies to run production BI workloads, including interactive dashboards, directly on Amazon S3 and Azure Data Lake Storage (ADLS)—without having to move data into data warehouses, cubes, aggregation tables or extracts. According to Dremio, the new capabilities deliver simple, self-service access to data and enable analysts to see results immediately, eliminating their dependency on manual ETL processes or data engineering while reducing the costs associated with data warehousing.

In addition, Dremio now includes a built-in integration with Microsoft Power BI, enabling users to launch the data visualization software from Dremio and query data via a direct connection.

“The fact that organizations don't need to copy their data into a data warehouse for BI workloads has been unthinkable for the last 30 years,” said Tomer Shiran, Dremio co-founder and chief product officer. “Today, our users can leverage Dremio to power live dashboards and reports directly on S3 and ADLS, instead of waiting weeks to have data moved into a data warehouse. We’re removing limitations, accelerating time to insight and empowering data teams.”

Key new features of Dremio’s cloud data lake engine are designed to enable high-concurrency, low-latency SQL workloads, including BI dashboards, directly on the cloud data lake. These include:

  • Apache Arrow caching- Dremio can now cache data reflections (physically optimized representations of data) in the Apache Arrow format so the data can be loaded directly into memory with zero compute processing overhead. This eliminates the need to decode and decompress data at runtime, enabling sub-second query response times for BI dashboards. 
  • Scale-out query planning- Dremio supports horizontal scaling for coordinator nodes, in addition to executor nodes, allowing companies to run high-concurrency workloads consisting of thousands of simultaneous users and queries.
  • Runtime filtering- By automatically leveraging runtime intelligence from dimension tables, Dremiodrastically reduces the amount of data that must be read from a fact table. This results in a performance speedup of more than 100x for star schemas, workloads that have traditionally only been run on data warehouses.
  • Enhanced Power BI integration- Microsoft and Dremio have partnered to develop a deeper integration between Power BI and Dremio that enables users to launch Power BI Desktop directly from the Dremio interface with the click of a button. Power BI automatically connects to Dremio using a native connector, so users can easily transition from building a dataset in Dremio to analyzing their data in Power BI.
  • External queries-  Dremio enables users to incorporate explicit SQL queries on their relational databases within Dremio virtual datasets. This makes it easy to join data between large datasets in a cloud data lake and smaller datasets in existing relational databases.

More information about deployment options are available here.