Why It’s Time for TSDBs: Q&A with InfluxData’s Evan Kaplan

InfluxData provides an open source platform built for metrics, events, and other time-based data. Recently, Evan Kaplan, CEO of InfluxData, reflected on the future of databases and why time-series represents the next wave of databases for data—from humans, sensors, and machines.

What is a time-series database?
A time-series database, or TSDB, is built specifically for handling metrics and events or measurements that are time stamped. A TSDB is optimized for measuring change over time. If you know that the data you are collecting is going to be best visualized and understood in the context of time, then you are going to use a purpose-built TSDB to store and process that data. Time-series databases are not new—they have been around for some time and have traditionally been used for finance, industrial process control, and scientific data. What is a relatively new phenomenon is the increasing use of time-series in more traditional IT environments, and this has called for a more “modern” approach.

What do you mean by modern?
The fundamental conditions of computing have changed dramatically over the last decade. Everything has become compartmentalized. Monolithic
mainframes have vanished, replaced by serverless servers, microservers, containers. Today, everything that can be a component is a component. Hence, the need for something that is architected for this new componentized world, that is built upon an open source core for maximum developer productivity, and is optimized for providing realtime results with fast business time-to-value.

What is the sweet spot for the technology?
There really are two key sweet spots or markets for “modern” TSDBs: IoT and IT systems. Both areas are best understood in the context of monitoring and control systems. IoT can be thought of as using sensors to monitor and control systems in the “physical world.” This can be monitoring anything from machines, to buildings, devices, people—even to electrons. The data from anything that a physical sensor can be applied to can best be stored and managed in a TSDB. As the IoT market takes off, TSDBs are poised to grow in lockstep.

And IT systems?
This can best be thought of as monitoring and controlling the “virtual” or software-driven world. IT systems are instrumented to generate events and metrics that indicate their state on a constant basis. The entire software and hardware ecosystem—from applications to hardware, VMs, containers, microservices, and networks—is monitored as a function of change in state over time. As “software eats the world” the monitoring and control load is growing exponentially. In sync with this, we are seeing more and more enterprises, SaaS, cloud service, and network providers turn away from generalized NoSQL or relational databases to purpose-built TSDBs like InfluxDB.

Are there additional use cases for time-series databases beyond this area?
Yes, broadly defined TSDBs are ideal for realtime applications. Think of autonomous vehicles, health monitoring, and process control systems. TSDBs are great for any use case where vast amounts of time-series data are used to determine appropriate actions in real time.

What does it allow that is not possible with other database technologies?
Relational technologies like MySQL and Oracle do a great job with keeping references to other interesting data. But relational technologies do a terrible job with problems like search, and databases such as Solr or Splunk were created to solve the search problem. Both relational and search-oriented databases do a terrible job with time-series data. They are just not designed to solve the intricacies of time—for instance, compressing time-stamped data for better resource utilization; handling millions of writes per second that are required in processing sensor data; time-dependent queries—for example, has this sensor’s reading broken the 14 day moving average more than twice this week?; or downsampling the time precision of data entries the older the data becomes. Quite frankly, it boils down to using the right tool for the job. If change over time is a critical vector in the business, a modern time-series database is required.

What are the requirements for big data analytics and IoT that make it particularly useful?
First, for IoT, time series is the lingua franca. Sensor data is time-series data so you’re more efficient using a time series platform. In terms of analytics in general, there are specific kinds of analytics that are looking at real-time patterns and change and learning about that change, so working in a time-series database is easier.

Is it considered a NoSQL technology?
Generally, yes, it is considered a NoSQL technology because the underlying data store isn’t relational. However, for InfluxDB we have provided a SQL-like query language that allows people who are familiar with SQL to be productive and derive value in a very short period of time.

What separates the InfluxData approach from other time-series databases?
First, InfluxData is written in Go, a super-performance, server-side language. Second, it’s oriented around time to value. The idea is that a developer, in a short amount of time, can be up and running. Lastly, it’s not just a database, but a full stack solution—comprised of the open source TICK stack. There are very few full stack implementations, if any, that are easy to use and set up, are highly scalable, and are fully clustered. It can handle a couple of million points per second and it’s all based on open source.

What is the TICK stack?
InfluxData provides a comprehensive platform that supports the collection, storage, monitoring, visualization, and alerting of time-series data. The TICK stack enables developers to collect, analyze, visualize, and act on their data. It is an open source project comprised of the projects: Telegraf, InfluxDB, Chronograf, and Kapacitor. Telegraf is a plugin-driven server agent for collecting and reporting metrics, InfluxDB is a time series database built from the ground up to handle high-write and query loads, Chronograf is a graphing and visualization application for performing ad hoc exploration of data, and Kapacitor is a data processing framework providing alerting, anomaly detection, and action frameworks.

What is changing in IT environments that make a time-series database necessary?
In the IoT space, we are seeing increasing “sensorfication” of everything. Every sensor is streaming data that needs to be made sense of, monitored, and controlled. Time series platforms were built for this purpose. And, in the cloud and DevOps space, what you have is an increasing fragmentation of the infrastructure. There is a movement from VMs to containers to microservices to serverless. This fragmentation and portability of all of the components of the infrastructure leads to a bigger monitoring workload. Time-series platforms sit under all of that capability.

Why is a times-series database needed now?
The amount of data being spun up every year is increasing dramatically and time-series databases are at the heart of making sense of all that data.

If all goes according to your expectations, 5 years from now, what do you think will be different in enterprise environments?
There will be a common hub for all metrics and events coming in. There will be a lot of applications that are using time-series data that will be storing the data in a common metrics and events hub to be able to process and act on the data.

Second, I think all IoT will mean “business application.” Any application that is customer-facing will be perceived to be important, to the extent that the term IoT becomes irrelevant. All business-facing applications will be instrumented around customer experience, and the changes in the physical world. All the things we think of as business applications today will be IoT applications—from the tracking of customers to materials and equipment. And all of those will be tracked in real time. They won’t need to be, but they will be to enable increased visibility.

With the advent of sensor data, containerization, and microservices we are seeing this design pattern is everywhere. A modern time-series data platform is required for IoT, DevOps, and for real-time analytics. Everybody who is architecting the next generation of solutions and involved in one of those three things is starting with a time-series database as opposed to a relational or general-purpose database.



Newsletters

Subscribe to Big Data Quarterly E-Edition