Dremio 2.0 Puts the Data Lake to Work

Dremio has announced its first release since the company’s launch in the data analytics market with the availability of the Dremio Self-Service Data Platform in July 2017.

Embracing the data lake and helping companies to drive more of their data science and BI workloads directly out of it without data transfer is a central theme in the Dremio 2.0 release. This focus is exemplified in four key advancements in the new release, said Kelly Stirman, VP, strategy, and CMO, Dremio.  These are the new Starflake Data Reflections capabilities, a new learning engine, support for Looker, and greater controls on data access in support of regulatory compliance, he noted.

Dremio’s open source, self-service data platform addresses data analysis issues that challenge modern organizations. These include acceleration of queries on massive datasets for interactive analytics with BI and data science tools; the complexity of connecting data from disparate sources like data lakes, NoSQL databases, and relational databases; and long lead times for data engineering tasks. Dremio accelerates time-to-insight by helping analysts and data scientists to be independent and self-directed in their use of data across the enterprise, while preserving governance and security.

New “Starflake” Data Reflections

Dremio maintains physically optimized representations of source data known as Data Reflections. Dremio’s query optimizer can accelerate a query by 100x or even 1000x over other technologies by utilizing one or more Data Reflections to partially or entirely satisfy that query, rather than processing the raw data in the underlying data source. Unlike traditional cubes, extracts, and data marts, Data Reflections leverage Apache Arrow for performance gains and are invisible to end users.

With this release, Dremio is adding a specialized Data Reflection for star and snowflake schemas. With the addition of the new “Starflake” Data Reflections, Dremio can automatically detect a star schema or snowflake schema in data stores such as a data lake in Hadoop or Amazon S3, and build Reflections that give users interactive speed on their data regardless of the data size. This avoids the need to load that data into a specialized data warehouse. For example, said Stirman, instead of moving data from S3 to Redshift on AWS, a company can leave its data in S3, which is more cost-effective and easier to operate, with Dremio creating Starflake Reflections on top to get fast access to that data for tools such as Tableau, PowerBI, or Python.

Dremio Learning Engine for Ease of Use

The new release also introduces the Dremio Learning Engine which uses artificial intelligence to make Dremio smarter and easier to use. Using AI, Dremio can recommend complementary datasets to users as they work to curate data for analysis and Dremio also learns how datasets can be joined based on observed behavior across all workloads, with support for all join types. Dremio automatically observes data during query execution to detect schema changes in source systems, then adapts its Data Catalog automatically, which is essential for modern sources such as Elasticsearch, MongoDB, JSON, where schema can vary from record to record, and for systems with evolving schemas. 

Integration with Looker

This release also adds support for the Looker BI tool, allowing Looker users to access new data sources, said Stirman. Users of Looker can now take advantage of Dremio’s data acceleration capabilities for relational databases, as well as NoSQL databases such as MongoDB, Elasticsearch, and data lakes built on Hadoop, Amazon S3, and Azure Data Lake Store. Dremio can query across multiple data stores without consolidating the data, enabling organizations to save time and resources.

Fine-Grained Access Controls in Support of GDPR

With new customer concerns about GDPR and other regulations governing data collection and use, the new release of Dremio introduces fine-grained access controls that can layer on top existing technologies that organizations use to limit users’ data access to across their infrastructure, said Stirman. Dremio now integrates with centralized security controls such as LDAP and Kerberos. Users can programmatically control access at both the row and column level based to dynamically secure, mask, and transform data for end user access.

For more information, visit