The term "big data" refers to the massive amounts of data being generated on a daily basis by businesses and consumers alike - data which cannot be processed using conventional data analysis tools owing to its sheer size and, in many case, its unstructured nature. Convinced that such data hold the key to improved productivity and profitability, enterprise planners are searching for tools capable of processing big data, and information technology providers are scrambling to develop solutions to accommodate new big data market opportunities.
How Big is Big Data?
The McKinsey Global Institute (MGI) "estimates that enterprises globally stored more than 7 exabytes of new data on disk drives in 2010, while consumers stored more than 6 exabytes of new data on devices such as PCs and notebooks. One exabyte of data is the equivalent of more than 4,000 times the information stored in the U.S. Library of Congress. Indeed, we are generating so much data today that it is physically impossible to store it all. Health care providers, for instance, discard 90 percent of the data that they generate (e.g., almost all real-time video feeds created during surgery)."
The situation today is somewhat analogous to the data management dilemmas of the early 1990s in which enterprises were unable to properly process large amounts of customer and other structured data. That big data problem was ameliorated, at least in part, through the fusion of inexpensive storage and a new technology called massively parallel processing that enabled the creation of large-scale data warehouses-repositories from which enterprise planners could sift through terabytes of data to gain critical insight into how to improve productivity and profitability.
That experience, as much as any other factor, convinces today's planners that there's real value in today's version of big data, provided next-generation tools are developed to efficiently and economically store and process the text, audio, video, and other complex data that surround and pervade enterprise operations.
Processing Big Data
IBM, which is working on its own big data platform, believes that big data spans three dimensions: volume, velocity, and variety.
Volume - Enterprises are awash with ever-growing data of all types, easily amassing terabytes - even petabytes - of information. How can an enterprise, for example:
- Turn 12 terabytes of Tweets created daily into improved product sentiment analysis?
- Convert 350 billion meter readings per annum to better predict power consumption?
Velocity - Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into the enterprise in order to maximize its value. How can an enterprise:
- Scrutinize 5 million trade events per day to identify potential fraud?
- Analyze 500 million call detail records per day in real-time to predict customer churn faster?
Variety - Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files, and more. New insights are found when analyzing these data types together. How can an enterprise:
- Use hundreds of live video feeds from surveillance cameras to monitor points of interest?
- Take advantage of the 80 percent data growth in images, video, and documents to improve customer satisfaction?
These are questions which would-be big data vendors like IBM are trying to answer.
As for the composition of this still nascent market, big data will create opportunities for:
- Large-scale storage providers, like IBM and EMC
- Large-scale cloud storage providers, like Amazon and Google
- Data analysis companies, like Oracle and SAP
In addition to the "usual suspects," McKinsey & Company analysts believe that "Big Data will also help to create new growth opportunities and entirely new categories of companies, such as those that aggregate and analyze industry data. Many of these will be companies that sit in the middle of large information flows where data about products and services, buyers and suppliers, and consumer preferences and intent can be captured and analyzed. Examples are likely to include companies that interface with large numbers of consumers buying a wide range of products and services, companies enabling global supply chains, companies that process millions of transactions, and those that provide platforms for consumer digital experiences."
Perhaps the single biggest contributor to big data is the machine-to-machine (M2M) movement. The goal of M2M (sometimes referred to as the "Internet of Things") is to make every individual machine "addressable" and capable of communicating and interacting with every other machine.
At its most basic, M2M involves four simple machine-to-machine functions:
- Collection - Select data is extracted from "Machine A," a temperature sensor, for example.
- Transmission - The data is forwarded from Machine A - via a wired or wireless connection-to "Machine B" for analysis.
- Assessment - The data is evaluated by Machine B to determine what, if any, action should be taken (for example, the room temperature-as recorded by the temperature sensor (Machine A) - may be too high).
- Reaction - Machine B initiates the appropriate response, either activating the HVAC unit, or alerting a human operator. In the first instance, Machine B would interact directly with the HVAC system, essentially starting a second M2M transaction.
M2M is the foundational technology that undergirds the "Smart Grid," our national effort to reduce energy consumption by fully automating the generation, distribution, and utilization of electricity.
To indicate the pervasive nature of M2M today, more than 30 million networked sensor nodes are present in the transportation, automotive, industrial, utilities, and retail sectors. The number of these sensors is increasing at a rate of more than 30 percent a year. Collectively, these sensors produce a lot of data; in other words, big data.
Big Data Beneficiaries
Among those industrial sectors that will benefit most from the harnessing of big data are:
- Healthcare, especially in the US where escalating costs are rendering healthcare increasingly unaffordable
Private sector companies are expected to devote more resources to big data analysis than public sector agencies owing to competitive pressures. One exception might be public sector administration in Europe where many countries are driving austerity programs - programs tied to increasing productivity and eliminating extravagant expenditures. Big data, it is hoped, will produce big productivity.
Big Data - Small Data Push-Pull
At least for now, the big data push is being opposed by a small data pull. Rather than embracing the mountains of information generated each day, especially emails, many organizations, concerned about regulation, litigation, and other potential exposures like e-discovery demands, have imposed strict data retention policies aimed at ensuring that "legacy" data are purged on a regular schedule. As big data analysis tools become available, these organizations will have to be persuaded that the value of processing all - or most of - their data is actually worth the risk of keeping it.
Although it's certainly premature to predict the big winners in the big data race, industry handicappers are betting on some established firms like AWS, EMC, and IBM - organizations that can combine massive storage with powerful analytics.
Amazon Web Services (AWS)
Amazon Web Services (AWS) offers a set of cloud-based services, such as Amazon Elastic MapReduce, that make it easy and cost effective for customers to process and extract information from massive volumes of data. Customers are using AWS for big data projects such as mapping genomes, analyzing web logs, and analyzing financial services data.
In September 2011, EMC introduced the EMC Greenplum Modular Data Computing Appliance (DCA), which, according to the company, is the industry's first complete big data analytics platform.
Several Greenplum Data Computing Appliance modules are available today, including:
- The Greenplum Database Module - a purpose-built, highly-scalable data warehousing appliance module that architecturally integrates database, computing, storage, and network components into an enterprise-class system.
- The Greenplum Database High Capacity Module - a module designed to host multi-petabytes of data without surging power consumption or mushrooming space. For businesses that require detailed analysis of extremely large amounts of data - or those looking for a longer term archive - this high-capacity version offers the lowest cost-per-unit data warehouse.
IBM's big data platform has four core capabilities:
- Hadoop-based analytics - Processes and analyzes any data type across commodity server clusters
- Stream Computing - Drives continuous analysis of massive volumes of streaming data with sub-millisecond response times
- Data Warehousing - Delivers deep operational insight with advanced in-database analytics
- Information Integration and Governance - Allows clients to understand, cleanse, transform, govern and deliver trusted information to your critical business initiatives
The platform is supported by the following services:
- Visualization & Discovery - Helps end users explore large, complex data sets
- Application Development - Streamlines the process of developing big data applications
- Systems Management - Monitors and manages big data systems for secure and optimized performance
- Accelerators - Speeds time to value with analytical and industry-specific modules
IBM plans to build out its big data platform through in-house development and strategic acquisition. For example, IBM has acquired Vivisimo, Inc., enabling federated discovery and navigation of structured and unstructured sources via scalable and secure indexing.
Big Data Technology and Services Market
In March 2012, International Data Corporation (IDC) released a worldwide big data technology and services forecast showing the market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015. This represents a compound annual growth rate (CAGR) of 40 percent or about seven times that of the overall information and communications technology market.
According to Dan Vesset, program vice president, Business Analytics Solutions at IDC, "There are ... opportunities for both large IT vendors and start ups. Major IT vendors are offering both database solutions and configurations supporting Big Data by evolving their own products as well as by acquisition. At the same time, more than half a billion dollars in venture capital has been invested in new Big Data technology."
Additional findings from IDC's study include the following:
- While the 5-year CAGR for the worldwide market is expected to be nearly 40 percent, the growth of individual segments varies from 27.3 percent for servers and 34.2 percent for software to 61.4 percent for storage.
- Today there is a shortage of trained big data technology experts, in addition to a shortage of analytics experts. This labor supply constraint will act as an inhibitor of adoption and use of big data technologies, and it will also encourage vendors to deliver big data technologies as cloud-based solutions.
Individuals, both employees and consumers, are concerned about the confidentiality and integrity of their personally identifiable information (PII), fearing, for example, that data compromises can foster identity theft and other personal crimes. b initiatives have the potential to exacerbate these privacy issues since more PII will be gathered and retained, and breaching a big data warehouse will likely expose a "big" volume of sensitive information.
Accordingly, the promoters of will have to ensure that big data repositories are properly locked down. Moreover, they will have to convince an already skeptical public that operations are both worthwhile and secure.
Since big data projects require big storage, enterprise clients are likely to turn, at least initially, to cloud providers for their big data infrastructure. This should give a boost to companies like Amazon, Google, and Rackspace that offer Infrastructure as a Service (IaaS). By leveraging third-party storage - and, in some cases, third-party analytical resources - enterprise planners can conduct big data pilots without unduly taxing their on-premise assets.
As previously discussed, the pursuit of big data imposes a responsibility to review and revise enterprise data management policies, particularly in the privacy and security arena.
According to McKinsey, "Big Data's increasing economic importance also raises a number of legal issues, especially when coupled with the fact that data are fundamentally different from many other assets. Data can be copied perfectly and easily combined with other data. The same piece of data can be used simultaneously by more than one person. All of these are unique characteristics of data compared with physical assets. Questions about the intellectual property rights attached to data will have to be answered."
From the perspective of a cynical IT manager, big data might appear as the "next big thing," the "technology du jour." Unfortunately, most IT departments are still busy trying to comprehend and implement the last "big thing," which was virtualization, and the phenomenon that preceded it, cloud computing.
If enterprise business planners believe in big data, they will likely have to inject additional resources into their IT departments, or establish big data as the number-one enterprise data management priority.
Big Data Analysts
As IDC observed, there is a shortage of trained big data technology experts. Before investing in any big data projects, enterprise officials may need to engage the services of prominent IT consulting firms for big data design, development, and deployment planning. As an object lesson, these officials should consider recent enterprise experience with virtualization, in which some companies and agencies did not achieve the level of resource reduction they had anticipated - due, in part, to misconceptions about the technology as well as poor planning.
About the author:
James G. Barr is a leading business continuity analyst and business writer with more than 30 years' IT experience. A member of "Who's Who in Finance and Industry," Barr has designed, developed, and deployed business continuity plans for a number of Fortune 500 firms. He is the author of several books, including How to Succeed in Business BY Really Trying, a member of Faulkner's Advisory Panel, and a senior editor for Faulkner's Security Management Practices. Barr can be reached via email at firstname.lastname@example.org.
This article is based on a report published by Faulkner Information Services, a division of Information Today, Inc. that provides a wide-range of reports in the IT, telecommunications, and security fields. For more information, visit www.faulkner.com.