IoT Makes Big Data Work Even Harder

Here we go again – yet another world changing innovation. Billions of dollars are being invested to develop and deploy this next-generation industry. Many existing methods of doing business, and businesses themselves, will be disrupted and replaced by this new wave of technology.

What is this wondrous new innovation? Why it is the Internet of Things (IoT) of course. But before we get into specifics, and its impact, it might be instructive to review recent history.

In June of 2007, Apple launched the first iPhone. Today, more people access the Internet with smart phones than with any other single device. According to Cisco, global mobile data traffic has grown 4000-fold over the last 10 years and has reached 3.7 Exabyte’s a month. To put this in perspective, data enabled mobile phones represent only one class of new Internet connected devices.

Now let’s shift over to the Internet of Things (IOT). The plural “things” is critical. We are talking about many as yet undefined things. Each of these things will be spewing data. Let’s drill into what kind of data we can expect, independent of what the things might be:

  • Operational and Health Data – If you expect something connected to the Internet to provide you value as result of that connection, the device and its connection must work. Look at the trouble companies have keeping their online systems working properly all of the time (when was the last time you were on a slow web site?). When was the last time you restarted your phone because it became slow or unresponsive? Are you looking forward to restarting your thermostat, toaster, or oven when they do not work? To address and prevent these problems massive streams of systems management data will need to be collected and continuously analyzed.
  • User Experience Data – This is data about how users interact with these devices and most importantly how these devices respond to their users. Within each Internet connected Thing will be one or more “apps” that control the interaction of that device with its users. This is “systems management” data in that it is typically collected by the same kinds of management software that collect the operational and health data. But it is also data that is of enormous business value like…
  • Business Data – In today’s Internet, business data is well understood. It’s almost always related to commerce in some form. It is business transaction data about how suppliers, partners, vendors and consumers interact with each other. With IOT we will face reams and many new types of business data. The temperature at which thermostats are set. The electricity used by the average oven per day. All of the attributes of where your car went on its trips for that day.

How Does This Change Big Data?

To date big data has focused upon two primary use cases. The first is batch business data. This is data collected on a periodic basis (monthly, quarterly), analyzed with batch analytics and returned in the form of reports that allow the business to make better decisions about products, pricing, messaging, targeting of consumers, and purchasing.

The second is in the realm of systems management and this is log data. Streams of log data are being collected and stored in various types of log management back ends. While many of these streams of logs are being collected on a continuous basis, analysis of these logs is only done with after the fact ad hoc queries. So the user of the log based system has to both know what they are looking for and know how to form the appropriate query.

Both modern online application systems and the forthcoming Internet of Things share some attributes that require a completely different approach to big data:

  • Streams of Metrics – A modern online application system produces streams of metrics starting with the response time (user experience) data, and including data about how all layers of the software and hardware that support that interaction are behaving. Similarly and Internet connected device is going to produce streams of data periodically as it transmits its status, and a stream every time it interacts with a human or another Internet connected device.
  • Streams of Relationships – In a modern online application system, a transaction of interest is related to the Java virtual machine in which it runs, the operating system upon which the JVM runs, the virtualized hardware where the operating system runs and then the entire virtualized and physical infrastructure (down to the spindle on the hard disk) that support that transaction. Similarly, for every interaction between an Internet connected device and a human or another device, there will be a commensurate set of relationships.
  • Streams of State – Most enterprises have Configuration Management Databases (CMDB’s). These were intended to store the current state of the entire software and hardware environment of the enterprise. But now online systems change too rapidly for CMDB’s to be kept up to date – relegating them to a “ghetto” of worse-than-useless legacy technologies. The Internet of Things is going to exponentially increase both the number of things for which state needs to be understood and stored, and the rate at which state changes across these devices.

IOT Means New Requirements for Big Data

The existing approaches to big data, respectively focused upon log analysis and batch processing of business data with technologies like Hadoop are completely inadequate when it comes to being able to process streams of metrics, streams of relationships, and streams of state in real time. Process means the ability to ingest these streams as they arrive, continuously perform operations on these streams to add value (perspective, context) to the data, and then transform the resultant data and relationships into forms useful to people using market-leading query and visualization tools.

This directly leads to the need for the following new approaches to big data:

  • Data “Pushed” not “Pulled” – Today most of the data that is “collected” into big data back ends is collected by having someone or something query the data from its source. Smart phones broke this paradigm as it is impossible to query billions of smart phones and ask them for their data. In this new world it must become the responsibility of each end device to provide or push its data into the new real time back end for these metrics, relationships and state. All systems management software that relies upon querying things for data is now legacy software unsuited for the modern world. The modern data collection paradigm needs to be based upon streams of data pushed from each device to a back end that is capable of ingesting them all in real time and processing them all in real time.
  • Data Collection In Real-Time – Legacy systems management products collect data at best every 5 minutes and many collect data as infrequently as hourly. This leaves too much time in between when something bad happens and the management system doesn’t about it. Modern data collection needs to be real-time and continuous.
  • Comprehensive Data Collection – Because legacy management systems cannot deal with volumes of data, they pursue “sparse” approaches to data collection that sample and that fail to collect data comprehensively across all of the aspects of the software and hardware systems that support an end user, a device or a “thing. Modern data collection needs to comprehensively collect data from every layer of the hardware and software ecosystem that supports an interaction or a transaction.
  • Open Multi-Vendor Data Collection – The diversity in the sources of new management data is too great for any single vendor to stand a chance of collecting everything, or even of collecting all of the metrics, relationships and states the pertain to even a single set of interactions or transaction. The pace of innovation is simply too fast for any one vendor to be able to keep up. Therefore only an approach that recognizes that there will be many sources of data and many vendors who specialize in collecting various types of data will succeed.
  • Relationships Established at Ingest Time – It is impossible to know ahead of time, and to be able to plan ahead of time for how future streams of data will be related to existing streams of data. Therefore these relationships must be established at the time that each new stream of metrics and state are added into the system. All previous attempts to pre-define a model of an environment, like the Common Information Model (CIM) of computing are now invalid since anything defined by a committee cannot keep up with the pace of innovation in these new environments.
  • No More Extract, Transform and Load (ETL) – Data needs to arrive in real time (no extract) and be stored in a useful form on a continuous basis (no more Load). Instead streaming ingest, coupled with continuous and real time transformation, and streaming writes of the resulting useful data needs to replace the “Excruciating – Torture – Lose” paradigm.

IoT Signals a New Approach to Data Management

Both modern online systems and the forthcoming Internet of Things require a completely different approach to collecting data, processing data, and making this data useful to users and analysts. Real-time streaming combined with continuous transformation needs to replace existing batch processes. Single vendor approaches needs to be replaced by an ecosystem of vendors that can collectively keep pace with the pace of innovation.