Making Data Accessible: Q&A with Dremio's Kelly Stirman


For many companies, designing and implementing a data platform for analytics is a critical task. With data growing internally at a rapid pace, plus the challenges of mergers and acquisitions adding new systems and data silos, accessing data for exploration and insight can be a problem.

Recently, Kelly Stirman, vice president strategy of Dremio, a VC-backed firm which emerged from stealth in 2017, discussed how using open source projects, open standards, and cloud services, companies can deliver data as a service to their data consumers across critical lines of business. By combining capabilities and technologies into a solution that enables access, transformation, security, and governance, Stirman contends, data as a service represents a new approach to vexing analytics challenges, delivering data at scale with high performance.

Where is data as a service most useful?

kelly stirman, dremioKelly Stirman: It is targeted at data consumers—people who are dependent on data to do their jobs effectively. These are analysts, data scientists, users of BI tools and—if you think about it—it is a lot of people. Most days, if you have a question about the world around you—if you want to find a restaurant, or know what the weather is going to be over the next few days, you can do that instantly, but when you are at work and you have basic questions, there is no simple way to answer them and it can take weeks and months to get an answer. It is completely different from your experience in your personal life.

What does it enable?

KS: Data as a service is an underlying movement to change people’s relationship to data and enable data consumers to get what they need just as easily as they can in their personal lives. They are not beholden to IT and they are not waiting for their turn to get IT to do something on their behalf.

What type of organization is it best suited for?

KS: I think it applies to every company, no matter how big or small.  It would be hard to find any company that would say: Oh, we have plenty of folks in IT. IT is very responsive. Everyone is short-staffed and waiting on IT to help them get their jobs done. One of the big trends over the past decade has been how lines of business have come to own the bulk of the budget because companies have realized that if IT owns the budget, things just never get done. So, data as a service applies to virtually all companies. Even for small companies, as they get larger, the proliferation of data in different systems and different fiefdoms across the business grows.

Is it necessary to move data?

KS: The answer to the problem of having data in different systems has been to try to just copy it all into a new system. But then you create a new silo and you never finish consolidating all the data into this new system—and that is the history of data warehouses, data lakes, and even the cloud. You are never going to have all of your data in one place. If your data is in the data warehouse or the data lake or one of your ERP systems, a supply chain system, or a cloud service, it should not matter. You should be able to access and work with the data wherever it is, no matter what the underlying technology is, and—no matter how big it is—data as a service needs to make it work the way it already is

What level of skills or expertise are required?

KS: Skills are on the order of Office 365 or Google Docs—most people who use data to do their jobs have some sort of tool like a Tableau or a Power BI or, if you are a data scientist, something like Python or Jupyter Notebook. Those are tools that people like and are good at using, and that is ultimately how they take data and visualize it or plug data into a predictive model or recommendation system.  That part of the equation is not broken. It is getting the data into those tools and making it accessible—searching and finding datasets, and blending or curating data from different sources together to get the data that you need to perform your analysis—that is the really hard problem and that is what Dremio is focused on.

How is it accessed?

KS: The experience of someone using Dremio is logging in through a browser, doing a search, finding different datasets and then clicking around, sharing data with other people. There is no coding involved and if you can use something like Office 365 or Google Docs, you can use Dremio. Once you find or build the dataset that you need, you click a button and launch your favorite tool and it connects to that dataset.

Data as a service sounds similar to data virtualization.

KS: There are some things that feel and sound similar, but data virtualization was never a tool for the data consumer. First, data virtualization was always something that required programming and complex APIs and kind of low-level machinery designed for an IT user. Data as a service in contrast is focused on the data consumer. It is about removing the complexity—streamlining access, making work collaborative, and  using existing tools. The second big difference is that data virtualization never really solved the problem that a lot of companies have today which is around performance.  It was really designed for smaller datasets. Part of the underlying challenge for any company dealing with data is making it perform at a speed that lets people do their jobs.

Why is open source critical to data as a service?

KS: It is a reflection of modern IT and data infrastructure that newer projects are predominantly based on open source technology. In our experience, companies expect mission-critical infrastructure to be open source. That allows us to build on efforts that span hundreds of companies that are collaborating on the development of these core building blocks, and to also benefit from that to bring a more robust product to market more quickly.

There is some very key open source technology that is part of the Dremio stack, the most important of which is Apache Arrow, a product we helped start 3 years ago that is the standard for how people today do analytics. It has become the cornerstone for how newer systems are built.

Apache Parquet is another and, if you look back over the past 10 years or so as companies have embraced data lakes, overwhelmingly the standard that they have used to store their data for analytics is Parquet. We build on that in our own product, so that if you already have your data in Parquet, Dremio is the fastest way to access that data.

What else?

KS: There are a number of other open source projects that we build on in our product but actually Dremio itself is open source because we want it to be something everyone can access whether you are a small startup or a Fortune 10 company. We monetize Dremio by selling subscriptions that include an enterprise version with advanced features in security and maintenance beyond what is available in open source. But the open source version of Dremio is in use by thousands of companies and in 100-plus countries.

Where does privacy and security fit into the platform?

KS: Every company is focused on security and, if they are not, they should be. However, one of the fundamental causes of security vulnerability is this friction between IT and legacy systems. People are going to find a way to get their jobs done. If you make the process of accessing data too hard, too cumbersome, or too slow, they will take matters into their own hands by moving data into Dropbox, spreadsheets, and emailing information to other people, and moving out of any kind of controlled environment. One of the philosophies of data as a service is that you attract bees with honey. We integrate with existing security standards. Data as a service is about keeping data access in a controlled, protected environment by letting people be more productive, and removing the temptation for people to take matters into their own hands.

Looking to the future, what are the challenges you see on the horizon?

KS: There is an interesting transition coming up as organizations move part, or all, of their infrastructure to the cloud. Whether you are 100% in the cloud or you have a mix of things in your data center and in the cloud, security is one of the challenges but data access in general is an issue because the services that cloud providers make available to their customers are different than what people are using in their own data center. That should not matter and it should not matter what the underlying technology is. And, data consumers should not have to care that over the  past 3 months their company has moved pieces of their data center to the cloud.  As companies make the critical transition to the cloud, data as a service lets them do that in a much more seamless way to their data consumers.

This interview has been edited and condensed.



Newsletters

Subscribe to Big Data Quarterly E-Edition