Newsletters




Managing Time Series Data with Informix


Time series data is a sequence of data points typically measured at successive times and may be spaced at uniform time intervals.  Time-stamped data can be analyzed to extract meaningful statistics or other characteristics of the data.  It can also be used to forecast future events based on known past events.  Time series data enables applications such as economic forecasting, census analysis and forecasting, fleet management, stock market analysis, and smart energy metering.

Because it is time-stamped, time series data has a special internal structure that differs from relational data.  Additionally, many applications such as smart metering store data at frequent intervals that require massive storage capacity.  For these reasons, it is not sufficient to manage time series information using the traditional relational approach of storing one row for each time series entry.  Doing so creates performance challenges as the data volumes grow exponentially.  Some relational databases offer solutions to address the performance issue but compromise on storage.  Some non-relational solutions address the storage and performance issues but suffer from limitations such as lack of generality, extendibility, and interoperability.

Hence, efficient time series data management requires a solution that reduces storage, ensures high performance, is easy to extend, and interoperates with other relational information and other products.  IBM's Informix TimeSeries feature provides a solution to these problems with breakthrough technology.

Time series technology has been a proven technology with Informix for 15 years, and with 11.70,  it comes with the server free of charge. Originally, the time series technology was used in the financial markets (for time stamped data such as stock ticks). Now, with the modernization of energy and utilities industry, with advanced metering infrastructures (AMIs), and smart grid initiatives, the time series capability is finding a unique opportunity to manage the meter or sensor data for utility companies.

The Informix TimeSeries feature is a combination of a new time series data type and a large set of in-build analytics.  It reduces the storage requirement by more than 50%, improves performance by orders of magnitude, and improves application development productivity when compared to a relational approach.  Because the time series data type is a native data type in Informix, administrative overhead and learning effort is minimal. Informix TimeSeries also interoperates well with other relational components and other products such as Cognos, Streams, and Optim.

Major Challenges With a Relational Approach

The traditional relational approach stores one row for every time series entry resulting in the storage of redundant information.  For example, smart meter applications may need to collect meter readings every 15 minutes for millions of meters.  In a purely relational system a meter_id and timestamp would be stored with each meter record.  To improve retrieval time a composite index on the meter_id column and the timestamp column (Table 1.1) would be required.  This approach leads to massive amounts of data that adds administrative overhead and performance challenges, likely requiring additional hardware investments.  These challenges often become major road blocks for further enhancements in solution and services.

Table 1.1: Representation of data stored in traditional RDBMS approach

Meter_id

Timestamp

Current

Voltage

Resistance

1

2010-12-01 01:00:00.00000

4

160

40

1

2010-12-01 01:00:10.00000

4.5

155

35

1

2010-12-01 01:00:20.00000

5

165

33

.

.

.

.

.

.

.

.

2

2010-12-01 01:00:00.00000

4

175

44

2

2010-12-01 01:00:10.00000

4.5

160

35

.

.

.

.

.

.

.

.

Due to the challenges of managing and processing massive data the frequency that data is collected might need to be less frequent than desired, for example, every half hour, in order to reduce the amount of data that is stored.

Another major challenge is to perform time-based analytical operations on relational data. Most of the time, an application developer must use complex and proprietary program logic to perform time-based analysis.  Quite often these functions are not available in the native SQL language that most databases speak.

The Informix Approach to Time Series Data

 IBM Informix offers a new perspective with the implementation of time series as a native data type.

The Informix TimeSeries data type uses a single row for each tag/tick identifier, such as a meter, and appends subsequent readings from the same meter to that row, as shown in Table 1.2. The data is ordered physically on disk.  Hence, the number of rows in a TimeSeries table is the same as the number of meters being monitored.  No index is required on the data since it is stored in sorted order.   An index may be created on the meter_id column or any other non-TimeSeries column.  As a result, the storage requirement is reduced by almost 50-80% compared to a traditional relational approach, and performance is improved by an order of magnitude.  This is due to not having to create an index on the data, not duplicating the Meter_id on every row, and other efficiencies.

Table 1.2: Representation of data stored in an Informix TimeSeries column

Meter_id

TimeSeries column

1

[(4,160,40), (4.5,155,35)....]

2

[(4,175,44), (4.5,160,35)..]

..

..

Smart metering is an example where data is stored with at a regular fixed frequency and hence it is considered to be a regular time series. But there are certain solutions where we need to manage information for arbitrary points in time, such as event data. Informix supports irregular time series for such cases.  Informix supports irregular time series just as efficiently as it does regular time series.

Regular time series use the concept of offset; they have a mapping between the time point associated with an element and its position relative to the start of the time series. Individual timestamps are not stored in regular time series; instead, they are computed from the element's offset.  Because of this Informix is able to save 11 bytes per record compared to relational storage.

How does Informix store time series information? How is it accessed? How is the frequency of data collection defined?  How is incoming data validated?

To get the answer for all these questions, let's look at the Informix time series objects.

Container: A container is a space created to hold time series data.  The data is stored in the container clustered by id (for instance meter id) and the sorted by time.  Multiple time-series can be stored in a single container.  A container is created by executing an SQL command.

Calendar: The calendar defines the frequency of data collection and times when data should or should not be collected.  It allows different combinations of frequencies.  For example, if you want to collect data every day in the week accept Saturday and Sunday, you define the calendar as "{5 on, 2 off}, day".  The incoming data is validated automatically against the defined calendar.

SQL Routines and APIs: Informix provides over 80 time series-specific SQL routines to perform SQL operations.  It also provides a set of C APIs and JAVA APIs to develop your own analytics.  The TimeSeries SQL routines provided perform operations such as creating, aggregating, moving averages, and arithmetic functions on one or more time series as well as add data to or change data in a time series

Virtual Table Interface: A view of the time series data is also provided that makes time series data appear to be in a traditional relational schema.  This view is called the Virtual Table Interface (VTI) and shows time series data in relational format.  This view allows regular SQL operations to be performed on the underlying time series data.  Note that a VTI table is just a view and does not actually store any data directly.  Virtual tables can be used in interoperability with other products.

TimeSeries Table

Meter_id

TimeSeries column

1

[(4,160,40), (4.5,155,35)....]

2

[(4,175,44), (4.5,160,35)..]

..

..

Virtual Table

Meter_id

Timestamp

Current

Voltage

Resistance

1

2010-12-01 01:00:00.00000

4

160

40

1

2010-12-01 01:00:10.00000

4.5

155

35

1

2010-12-01 01:00:20.00000

5

165

33

.

.

.

.

.

.

.

.

2

2010-12-01 01:00:00.00000

4

175

44

2

2010-12-01 01:00:10.00000

4.5

160

35

.

.

.

.

.

.

.

.

 Advantages of Informix

Less Storage: Due to the small index size and the specific way of storing the timestamp value, the storage requirement is drastically reduced. Storage reduction can be up to 80% in the case of regular time series and up to 30% in the case of irregular time series.

Greater Performance: Because the index size is small and the data is ordered physically on disk, much less I/O is required to select data. This results in extremely high performance, in both uploading and retrieving data.  Also, since less disk space is required, administration tasks like backing up and purging data are faster.

This table illustrates the results of an actual customer proof-of-concept comparing Informix against another well-known database:

Other Database

Informix

Load time for 1,000,000 meters

7 hours

18 minutes

Time to run required reports

2 - 7 hours

25 sec to 6 min

Storage required for 1 Million Meters

1.3 TB

350 GB

Total Cost of Ownership: TCO is much lower since there is no additional administration, no extra license cost, and no long learning curve.  On the contrary, the results show huge disk space saving and extremely high performance.

Faster application development: Developers can rely on existing built-in SQL routines, C APIs and JAVA APIs for accurate and faster development.

Interoperability: The Informix TimeSeries data type, being a native data type, provides interoperability with database features as well as other products with no extra tweaking.

Summary

The Informix time series approach is a breakthrough technology for managing and analyzing time series data, offering distinct advantages of huge savings on disk storage and extremely high performance.  The built-in SQL routines and extensions of C APIs and JAVA APIs allow users to develop and incorporate their own program logic into the database engine and develop faster.  Because it is easy to learn and use, Informix is being widely accepted by the software community as meeting time series data management requirements, compared with other database and non-RDBMS solutions.


Sponsors