We’ve all heard the statistics on the amount of data out there. More data will be created this year than last and more data has been created, cataloged and stored in the past couple of years than in all of human history. In fact, the very terms used to describe the amount of data are changing. Petabytes are so 2005; it’s exabytes and even zettabytes now. For data to grow to such large volumes in short time periods, it must be generated at extreme velocity. Consequently, volume is only one of the challenges organizations face. Real-time processing of in-motion high-velocity feeds is crucial to truly unlock big data’s potential.
The Impact of Data Velocity is Broad
A look at where data is originating and being consumed puts the opportunity and importance of velocity processing into context. The impact is broad and deep—it affects daily customer experience and reaches all the way to our core infrastructure.
- Consumer: A majority of US consumers are online; 91% of households with incomes over $50k are broadband customers.
- Mobile: You can intuit the importance of mobile Big Data by noting that all top US retailers promote branded mobile applications. Wireless broadband penetration continues to grow. The US is currently ranked 8th in wireless broadband subscriptions per-capita with 76 subscriptions per 100 people. By 2017, it is estimated that mobile traffic will exceed desktop bandwidth usage.
- B2B: Real-time advertisement auction markets have already transformed the advertising vertical. Real-time B2B will continue to penetrate additional verticals including infrastructure and energy. Smart grid and smart-meter technologies are emerging that use real-time and historical data to improve service, reduce cost and eliminate waste.
As big data continues to explode and broaden its impact, big data processing becomes a bigger and more challenging problem. Maybe it helps to consider two fundamental choices for handling large volume, high velocity data feeds:
- Archive and analyze data after-the-fact using warehousing and batch tooling (like data cubes, columnar databases, or newer techniques like map reduce / Hadoop and friends); or,
- Analyze and decision against incoming data in real-time—as it’s generated.
Data is of Most Value When it is Fresh
Historical analysis tools are designed to answer research style queries (e.g., finding correlations, statistical clusters or calculating recommendations based on like customers). However, storing and batch processing is not sufficient when motivated by opportunity for improved customer experience, fraud detection, security enforcement, regulatory compliance, and market efficiency. In these cases, real-time analytics with ”now decisioning” is critical.
One good example is in the area of authorization and security. High velocity authorization appears in many mobile telco policy engines and cannot be pre-computed. The question “does a mobile user have a valid balance for a pre-paid phone?” is among the multitude of real-time issues Telco companies must deal with. Refining mobile policy enforcement allows providers to bring cost-effective solutions to second and third tier markets; it provides the accuracy and personalization needed to serve established markets where cost of customer acquisition is high.
Another good example is micro-personalization. Consumers expect personalization—and online advertisers have to deliver. There’s no question, personalized display ads increase effectiveness. Monitoring and responding to campaign trends in real-time further improves targeting. The cost savings can be measured in tens of thousands of dollars for even medium sized online campaigns, while improving customer targeting and response.
Monitoring sensor networks for safety and product quality is another established real-time market growing as sensor use penetrates additional industries. The auto industry, hospitals, mineral and gem mines and public transportation are all deploying sensor networks to monitor and locate equipment, ensure safety and improve customer, patient and commuter experience. These time-sensitive use cases are not well solved by batch processing.
All of these examples underscore a consideration in the discussion of big data: an individual piece of data is of greatest value at the moment it arrives at your organization. New data is often interesting alone or in the context of immediate peers, however, its value declines over time as it takes its place within larger aggregate data sets (which in turn, increase in value as time passes).
Given these types of use cases, it’s clear that organizations must be able to take action on data at its point of highest individual value; the question that many are grappling with is—how? Velocity and volume place unique technical requirements on the implementing technology. The warehousing products architected for write-once patterns offering complex read later analytics are poor technical fits for high-velocity data that is mutating rapidly, is extremely write intensive and necessitates real-time, non-batched analytics.
Velocity-Oriented Databases Support Real-Time Analytics
The solution is a velocity-oriented database that can support real-time analytics and complex, ACID, real-time transactional decision making while processing a relentless incoming data feed. It must be able to:
- Process many thousands of events per second. The sensors, mobile equipment, machine-to-machine and internet-of-things data feeds driving the explosion of data must be ingested and processed without batching. Velocity data management requires high velocity write ability.
- Make transactional decisions based on a combination of historical and incoming data. Decisions require transactions—the capacity to read and write multiple pieces of data in a consistent and repeatable fashion.
- Deliver extremely low price-per-transaction. This implies being architected to take full advantage of the performance of modern hardware, and having the ability to scale horizontally, both on-premises and in the cloud.
Big data might still sound like a lot of hype to many, but mobile, machine-to-machine business models, and sensor data feeds are going to have a deep and disruptive impact. Data is usefully processed historically (in batch against warehouses), but the combination of new data sources, emerging velocity data management products and the opportunity for customer value and market efficiency drive home the need for real-time processing. The volume of big data and the rate at which it is produced creates a need to examine and transact faster than legacy systems can manage. High-velocity OLTP systems are essential to fully realize the disruptive potential.
Image courtesty of Shutterstock
About the author:
Ryan Betts is CTO at VoltDB, a Massachusetts-based company that delivers a NewSQL in-memory relational database that offers high-velocity data ingestion, ACID compliance, high scalability, and real-time analytics. One of the initial developers of VoltDB’s commercial product, Betts closely collaborates with customers, partners and prospects to understand their data management needs and help them to realize the business value of VoltDB and related technologies. He can be reached at firstname.lastname@example.org.