A jet airliner generates 20TB of diagnostic data per hour of flight. The average oil platform has 40,000 sensors, generating data 24/7. 80% of all households in Germany (32 million) will need to be equipped with smart meters by 2020, in accordance with the European Union market guidelines. These examples alone represent a staggering amount of data that must be captured, analyzed and acted upon.
More “things” are now connected to the internet than people, a phenomenon dubbed The Internet of Things. Fueled by machine-to-machine (M2M) data, the Internet of Things promises to make our lives easier and better, from more efficient energy delivery and consumption to mobile health innovations where doctors can monitor patients from afar. However, the resulting tidal wave of machine-generated data streaming in from smart devices, sensors, monitors, meters, etc., is testing the capabilities of traditional database technologies. They simply can’t keep up; or when they’re challenged to scale, are cost-prohibitive.
Just 10 years ago, the largest data warehouse in the world was 30TB; today, petabyte-sized data warehouses are common, and the volumes continue to grow. According to a 2012 Information Difference survey, most of the 209 customers surveyed said they were experiencing data growth of 20-50% annually. So how can companies deal with this massive uptick in M2M data, while also supporting increasing demands for real-time insight?
More Data, More Connections … More Insight?
Google’s Eric Schmidt and Jared Cohen recently released The New Digital Age: Reshaping the Future of People, Nations
and Business, which discusses an imminent future where everyone is connected. In a recent interview about the new book, the authors talked about the intersection of the “Internet of People” with the Internet of Things. We’re living in a world that’s not only more connected and more efficient, but also more informed; machine-generated data is helping us to better serve the people. Schmidt and Cohen discussed how the bombings in Boston offer a small glimpse
into this transformation; the massive volume of photos and videos from mobile phones at the scene helped police identify
the perpetrators, while the cell phone left in the carjacked vehicle allowed police to track the vehicle with precision.
The smart grid serves as another example. As Opower, a provider of energy usage and efficiency reports for utilities, continues to generate more and more data, it needs analytic solutions that can help them extract valuable business intelligence without breaking the bank on expensive hardware or database administration expenses. It is at the intersection between the Internet of Things and the Internet of people that Opower can turn data into true insight. Opower is able to quickly analyze behavioral, demographic and log data that shows, for example, which of its products are being leveraged most successfully by utilities and how utility customers interact with the information that Opower provides.
Meeting the Data Challenge with Investigative Analytics
Raw data, whatever the source, is only useful once it has been transformed into knowledge through analysis. And anywhere that machines or devices generate information there is a need for investigative analytics, e.g.: What Boston Marathon photos uploaded from spectators fell within a particular timestamp? Why did the smart grid sensor fail? When there’s more and more data to mine, investigative analytics cut through the clutter with precision, ensuring accurate, immediate results, even as machine-generated data grows beyond the petabyte scale.
With investigative analytics, companies can take action immediately, as well as identify patterns to either capitalize on or prevent the event in the future. This is especially important since most failures result from a confluence of multiple factors, not just a single red flag. Only by extracting rich, real-time insight from the onslaught of machinegenerated data can companies use the M2M explosion to their advantage. However, this requires a technology foundation
that accounts for two critical requirements: speed and scale.
The need for speed. M2M data is generated very, very quickly and often needs to be investigated within a short period.
For example, a mobile carrier may want to automate location-based smart phone offers based on incoming GPS data, or a utility may need smart meter feeds that show spikes in energy usage to trigger demand response pricing. If it takes too long to process and analyze this kind of data, or if applications are confined to predefined queries and canned reports, the resulting intelligence will fail to be useful.
The need for scale. As demand for investigative analysis of M2M data increases, businesses also need highly scalable solutions that can handle current and future data growth. At some point, traditional, hardware-based infrastructure will run out of headroom in terms of storage and processing capabilities, but adding more data centers, servers and disk storage subsystems is expensive to buy and maintain. Costs can begin to outweigh the benefits. Solutions deployed to leverage big data need the speed and performance to match, as well as the ability to operate efficiently and cost-effectively. A range of technologies— whether software tools specifically designed for efficient investigative analytics, distributed processing frameworks like Hadoop, NoSQL, log data text analytics, cloud solutions, or a combination—all have a role to play in data management. In order to find the right database technology to capture, connect and drive meaning from data, consider the below checklist of requirements.
• Real-time analysis. Businesses can’t afford for data to get stale; they need data solutions that can quickly and easily load, dynamically query, analyze and communicate M2M information in real-time, without huge investments in IT administration, support and tuning.
• Flexible querying and ad hoc reporting. In fast-paced business and operational environments (smart grids are a great example), intelligence needs change quickly, so analytic tools can’t be constrained by data schemas that limit the number and type of queries that can be performed. This deeper analysis requires a flexible solution that doesn’t require a lot of “tinkering” or time-consuming manual configuration (such as indexing and managing data partitions) to create and change analytic queries.
• Efficient compression. Efficient data compression is key to enabling M2M data management within a network node, smart device or (even) a massive data center cluster. Better compression allows for less storage capacity overall, of course. However, compression also enables tighter data sampling increments as well as longer historical data sets, which both increase the accuracy of query results.
• Ease of use and cost. In a time of still-constrained budgets, data analysis needs to be affordable, as well as easy-to-use and implement, in order to justify the investment. This demands low touch solutions that are optimized to deliver fast analysis of large volumes of data, with minimal hardware, administrative effort or customization needed to set up or change query and reporting parameters.
Big data demands a big change in thinking. Companies that keep doing things status quo will find themselves spending progressively more money on servers, storage and DBAs—an approach that’s difficult to sustain and still presents the risk of not getting the answers they need. By maximizing insight into the data, companies can make better decisions at the speed of business, thereby reducing costs, identifying new revenue streams, and gaining a competitive edge.
About the author:
Don DeLoach is CEO of Infobright, which develops and markets a high performance, self-tuning analytic database designed for applications and data marts that analyze large volumes of “machine-generated data.”