Image courtesy of Shutterstock.
Let’s say that you are intrigued by the possibility of big data and want to begin capitalizing on its potential for your business and industry. What do you do first? Buy a big Hadoop cluster for your data center? Hire a bunch of data scientists? Copy all the internet data ever created and store it in your data center?
Hold on! First, you need to do some thinking about where big data fits into your business. The most important step is to decide on a particular strategy for big data. You need to assemble your senior management team and start talking about what big data can do for your company and which of its many possibilities you want to pursue. That process should start with some serious thinking about the objectives you want big data to fulfill.
What’s Your Big Data Objective?
Like many new information technologies, big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and service offerings. Like traditional analytics, it can also support internal business decisions. Which of these benefits are you seeking? The technologies and concepts behind big data allow organizations to achieve a variety of objectives, but you need to focus a bit—at least at first. Deciding what your organization wants from big data is a critical decision that has implications for not only the outcome and financial benefits from big data, but also the process—who leads the initiative, where it fits within your organization, and how you manage the project.
Cost Reduction from Big Data Technologies
If you’re primarily seeking cost reduction, you’re probably conscious of the fact that MIPS (millions of instructions per second—how fast a computer system crunches data) and terabyte storage for structured data are now most cheaply delivered through big data technologies like Hadoop clusters (Hadoop is a unified storage and processing environment for big data across multiple servers). One company’s cost comparison, for example, estimated that the cost of storing 1 terabyte for a year was $37,000 for a traditional relational database, $5,000 for a data appliance, and only $2,000 for a Hadoop cluster—and the latter had the highest speed of processing data under most circumstances.
Of course, these comparisons are not entirely fair, in that the more traditional technologies may be somewhat more reliable, secure, and easily managed. And in order to implement a new Hadoop cluster and all its associated tools, you may need to hire some expensive engineers and data scientists. One retailer, GameStop, decided not to pursue work with Hadoop because it didn’t want to train its engineers in the software or bring in consultants for help with it. But if those attributes don’t matter—perhaps you already have the necessary people, for example, and the application doesn’t need much security—a Hadoop-based approach to big data could be a great bargain for your company.
If you’re focusing primarily on cost reduction, then the decision to adopt big data tools is relatively straightforward. It should be made primarily by the IT organization on largely technical and economic criteria. Just make sure that they take a broad perspective on the cost issues, pursuing a total cost of ownership approach. You may also want to involve some of your users and sponsors in debating the data management advantages and disadvantages of this kind of storage, but that’s about it. No detailed discussions about the future of your industry are necessary.
Cost reduction is the primary objective for one large US bank, for example. The bank is actually known for its experimentation with new technologies, but like many such institutions these days, it’s become a bit more conservative. The bank’s current strategy is to execute well at lower cost, so its big data plans need to fit into that strategy. The bank has several objectives for big data, but the primary one is to exploit “a vast increase in computing power on dollar-for-dollar basis.” The bank bought a Hadoop cluster with fifty server nodes and eight hundred processor cores that is capable of handling a petabyte of data. It estimates an order of magnitude in savings over a traditional data warehouse. Its data scientists—though most were hired before that title became popular—are busy taking existing analytical procedures and converting them into the Hive scripting language to run on the Hadoop cluster. According to the manager of the project:
This was the right thing to focus on, given our current situation. Unstructured data in financial services is sparse anyway, so we are doing a better job with structured data. In the near to medium term, most of our effort is focused on practical matters—those where it’s easy to determine ROI—driven by the state of technology and expense pressures in our business. We need to self-fund our big data projects in the near term. There is a constant drumbeat of “We are not doing ‘build it and they will come’”; we are working with existing businesses, building models faster, and doing it less expensively. This approach is less invigorating intellectually, but more sustainable. We hope we will generate more value over time and be given more freedom to explore more interesting things down the road.
Reprinted by permission of Harvard Business Review Press. Excerpted from "Big Data at Work: Dispelling the Myths, Uncovering the Opportunities" by Thomas H. Davenport. Copyright 2014. All rights reserved.
The book is available from Amazon, Barnes & Noble, and other retailers.