Ahana Moves Presto Forward


Launched in June during the COVID-19 crisis, Ahana is a Presto-based analytics company founded by Steven Mih (formerly Couchbase, Aviatrix, Alluxio) and Dipti Borkar (formerly IBM, Couchbase, Alluxio). Bob Wiederhold, former Couchbase CEO is also an Ahana advisor. 

The company, backed by GV, provides a federated SQL engine for data engineers and analysts to run interactive, ad hoc analytics on large amounts of data.

Since emerging from stealth mode this summer, Ahana has released the open source PrestoDB Amazon Machine Image (AMI) on the AWS Marketplace as well as a PrestoDB container on DockerHub. The free PrestoDB offerings are intended to make it easier for data platform teams to get started with Presto in the cloud particularly for interactive, ad hoc analytics on S3 data lakes and other popular data sources such as AWS RDS, Redshift, and Amazon’s Elasticsearch service.

Recently, Mih, who is CEO, and Borkar, who is chief product officer, discussed the analytics challenges data professionals are facing now and their involvement with the Presto Foundation.

How did you become involved with Presto?
Steven MihSteven Mih: Dipti and I have been following and involved with Presto since 2018. We saw a big shift toward ad hoc analytics, as well as doing analytics with a federated architecture. Those were two key parts of what we identified. However, we actually found that Presto was quite confusing because in 2019 it had a fork that became PrestoDB, and then there was another fork called PrestoSQL. We were fairly confused about which one to use.

Which did you go with?
SM: We tried to work with both of them. But then Facebook donated PrestoDB to the Linux Foundation and created a new foundation called Presto Foundation. Given our experience and everything we saw, we joined right away. We became part of the Presto Foundation early on, and we have rejoined as Ahana to be part of the Presto Foundation.

With the move toward a disaggregated stack as opposed to a fully integrated database or data warehouse—which, has been the norm, and is still useful for certain use cases—more users want to have a single approach, an abstraction on top of the sources.

Why did Facebook donate it?
SM: Facebook donated so it could become a vendor-neutral open source project that has neutral governance, transparency, and has multiple vendors that can participate. However, Presto remained very complicated, and our focus is to make it much easier for the community to do use PrestoDB. That is how we started the company. In addition to Ahana, Uber is another big user of PrestoDB and a founding member of the foundation, along with Twitter and Facebook, and others.

What are you providing?
SM: In addition to the PrestoDB on the AWS Marketplace and the PrestoDB container on DockerHub that we recently announced, we're also providing commercial support for people to use PrestoDB. That is another area in which some users may need operational help from folks that are involved with the project.

How is analytics changing?
Dipti Borkar, AhanaDipti Borkar: We have seen that analytics has changed from more batch analytics using Hive and Spark to more interactive, ad hoc analytics—and by that, what we mean is: as needed, when needed, and in an interactive fashion with low latency for queries. We're talking about seconds rather than minutes and hours. That is how we define ad hoc, interactive querying.

Is this new?
DB: We are seeing a major shift to this kind of an approach.The role of the analyst is changing from a more traditional data analyst that only focuses on enterprise data to what we call a "data hacker analyst," which is an evolving persona focused more on connecting all the different data sources that may be located in different systems, different databases, and different data lakes. Gardner calls it the "citizen data scientist." And so for this new persona, the requirement that we're seeing is the ability to do interactive ad hoc querying. And, obviously, to be able to do that, the technology needs to be sophisticated, flexible, and low-latency. That is what we're seeing in the market.

Is part of the approach also to alleviate the pressure on IT departments?
DB: Absolutely. Data is distributed in many different databases and data lakes, and the platform teams within the company are typically responsible for making the data accessible through the systems like BI tools, dashboards, notebooks, and various approaches, all of which speak SQL. And so the platform team needs to copy this data and massage it or ETL it, make other connections, and move it around to make it accessible. We hear that 60% of data engineers' time is spent in ETL processes rather than more useful tasks. And so, because of this, with Presto, which is a federated query engine, you don't need to move the data. You can query data in place, which means that you can push down the analysis through the data source, get access to the data that matches, and then perform the rest of the analysis within memory within Presto.

The role of the analyst is evolving to become a "data hacker analyst," which is an evolving persona focused more on connecting all the different data sources that may be located in different systems, different databases, and different data lakes.

What does this add to analytics?
SM: Instead of analysts dealing with smaller datasets that are backward-looking and that are more about reporting, they can now be innovative. They can now get access to lots of datasets. And without having to go through a lot of ETL pipelines, the data platform teams can now just connect to them with Presto and then provide access, and this means that these analysts can now "hack on" data. Think about the kind of innovation that a company such as Uber can provide do make data-driven offers, which is the kind of innovation every company wants to be able to have. That takes a system like Presto, which gives interactive types of performance on a wide set of data sources. We firmly believe that data is distributed. It is not going to all be ingested in one big place.

What are the data stores that Ahana can use to query?
DB: Given the big shift toward data lakes, one of the prime resources that Presto can query is object storage—whether it's AWS S3 or other object stores, including file systems like HDFS, which was born out of the Hadoop ecosystem. In addition to that, then you have the structured relational databases such as MySQL or PostgreSQL. And then you have the NoSQL databases like MongoDB, the JSON-based systems, and you have Elasticsearch, which is a search kind of system that also stores data. You can see that there are many flavors of different data systems that are supported—object stores, file systems, structured, NoSQL, RDBMSs, data warehouses, and the newer emerging streaming column stores, or high-performance column stores, as well, like Druid and Pinot.

Why is it important to have a single approach?
DB: Given that we're moving toward a disaggregated stack as opposed to a fully integrated database or data warehouse—which, has been the norm, and is still useful for certain use cases—more users want to have a single approach, an abstraction on top of the sources, that allows them to query the sources using one mechanism and one query engine, and that is Presto.

Can you explain Ahana's approach to PrestoDB and what you're adding?
SM: We believe PrestoDB is a pretty widely used system. It's only going to get more widely used, but not every company has the type of data platform engineering teams that Uber and Facebook have. For those companies that want to get help, we provide commercial support for the PrestoDB open source project. So that is the model that we've announced.

What have you provided?
DB: We recently added a PrestoDB Sandbox AMI on the AWS Marketplace. We believe that it's very important for users to get started, to have an environment that they can very easily have up and running. Our focus is cloud, and within that, we focus on making it easier for users by providing pre-integrated, pre-configured environments, such as AMIs and containers.

What are the next steps for Ahana?
DB: We believe that Presto is a good foundation to build on top of, but, in terms of some of the performance and other characteristics, there is room for improvement. We have seen in 50 years of databases and relational data systems that they have evolved and there's a lot to learn from them—particularly with the query optimizer and the query execution. We hope to make significant improvements to the core of Presto over time, and make it available to users and obviously support those improvements as well.

Interview conducted and edited by Joyce Wells.


Related Articles

Ahana, a PrestoDB-based analytics company, has announced its new offering designed to simplify the deployment, management, and integration of Presto, an open source distributed SQL query engine, with data catalogs, databases and data lakes on Amazon Web Services (AWS).

Posted September 21, 2020


Newsletters

Subscribe to Big Data Quarterly E-Edition