How Throwing Away Data Limits the Power of the IoT

The amount of data collected through the multitude of devices and connections with the Internet of Things (IoT) will lead many in coming years to question the decisions that we make today with so little information. IoT will have a profound impact on healthcare, with sensors detecting problems at the second (or even millisecond)—instead of minute—level, or on autonomous vehicles, where real-time data is delivered and acted upon.

While the focus of the IoT data discussion has understandably been on the strength of near real-time analytics to power what will soon become automated decisions based on information such as sensor data, there is also the question of what happens with data after the real-time window has closed. The time value of data is an important thing for IoT-enabled businesses to consider. Applying cost versus benefit to data is the first step organizations must take when considering the benefit of storing data, which data to store, and how secure the data will be.

Data being analyzed in near real time provides a lot of interesting IoT use cases. Health monitoring equipment can enable faster action than doctors or nurses monitoring the same signals manually. Monitoring the status of airline equipment as a plane is flying prevents the post-flight download that could miss an important adjustment that needs to be made in-flight. And, monitoring the distance of objects around an autonomous vehicle can feed directly into brake usage and speed adjustments in order to prevent collisions.

However, IoT data that is not immediately used is not without value. The primary use of long-term data is to look back at trends. Industries that rely on trends, positions, and snapshots in time must store data in order to analyze it later on. Trend data for commodities and securities is a classic example, as this type of data is all based on trend analysis. If you threw away the data, you would not have the picture that analysis provides. Weather data is another classic example. Looking back 5 or 10 years to discover seasonal trends or important differences based on environmental factors provides information that will only increase in value. For IoT environments, sensor data is coming in rapidly and serving real-time functions, but the knowledge provided by long-term data can also add value.

The Cost of Storing IoT Data

Cost is often the first factor IT departments consider when looking at data storage. In the past, the relational database management systems utilized by many were not designed to scale with the amount of data IoT-enabled businesses are collecting. The cost and increased likelihood of database downtime with a relational database would prove a hurdle to data storage. Now, the highly scalable open source and cloud-based storage systems available in the market today have removed the penalty of storing too much data, with essentially limitless amounts of storage available at a small cost.

However, removing the storage cost barrier has led to poor practice in database management. Organizations utilizing or newly adopting IoT technologies are often approaching data with the argument that if there is any value in IoT data, storing everything is the best bet. This mindset—essentially using a Ferrari as a cargo truck—is something that IT departments must address as an operational problem. Database administrators who need a push out of this poor practice should consider that storage cost is no longer the only expense.

The time to sort and analyze data can create excessive cost if organizations are not careful in the data they choose to store and later analyze. The mentality of procrastination among database administrators leads to a “store it now, figure out what to do with it later” approach.

Instead, assigning value to datasets will reduce the amount of data stored and the time and complexity of analyzing that data at a later date. Looking at the example of data produced by lighting sensors, storing every piece of data that signals when the light remained “on” would result in millions of redundant and therefore useless datasets. However, assigning a higher value to the delta datasets, indicating when lights turned from “on” to “off,” would decrease the overall volume of data stored and provide a less complex series of data for analysis.

Security and Privacy

Data privacy and security are often raised as concerns when organizations start talking about storing data. Of course, the safest data is the data that is not there. Storing only the data that provides business value is key. For IoT data, organizations should also make decisions about which data to store. Desensitizing data by storing only certain datasets will prevent some problems during a security breach. For instance, IoT sensors in the home can track when a person is in his or her residence. That can be very sensitive information, particularly if in the wrong hands. For the business that wants megatrend data from these sensors—for instance, data showing the average amount of time a person is at home during the week—the business can avoid storing the data indicating the times of day a person is at home to avoid a problem. This is a data cleansing technique for IoT sensors and takes into account security and data volume.

Applying a value to the data that is being stored will also help organizations determine which security measures to apply. Simpler techniques such as encryption will likely be a first step. Encryption is the equivalent to locking a car when parking at a shopping mall. This may not seem as secure as using a padlock on the driver’s wheel, but the simplest of criminals will jiggle door handles and lose interest in cars with locked doors. Data hacking is similar—taking files that contain unencrypted data is often the first step.

Where to Begin

Data storage does not provide business value for every industry. Stock trade data, for instance, loses value after milliseconds. Weather station data is similar—within a 5-minute-window the data is useful, but beyond that, older data can only provide insight into much larger trends. Essentially, the case for data storage is application-specific.

For organizations launching IoT technologies and collecting massive amounts of sensor data, the value of the data long-term may remain unknown for the immediate future. Rather than taking a “store it all” approach by default, the retention time for IoT data should be based on the value of that data. Once the value has been determined, data tiering can be applied as an easy operational practice.

In the first tier, raw data can be stored for a specified amount of time, with the appropriate security measures applied. Once that time period has passed, datasets can be summarized and stored in a smaller set. Data can continue to be summarized to prevent an explosive amount of IoT data stored and allow for administrators to see a picture of the long-term data without compromising analysis time or security.

This article first appeared in the Summer issue of Big Data Quarterly Magazine

Image courtesy of Shutterstock.


Subscribe to Big Data Quarterly E-Edition