2015 is Evolving into a Big Year for Big Data

Image courtesy of Shutterstock

In the coming year, big data will evolve to be more specifically targeted to company needs. In the past the term “big data” has had a fairly broad definition. Now that companies are putting big data solutions online, the market is coming into focus.

Many businesses have gone from investigation to experimentation to actual implementation. With installations coming online, and more to come in 2015 and beyond, big data will become more efficient and more customer-focused. Essentially, what many saw as hype will now turn into real implementations.

The list of companies adopting big data solutions includes such notables as Netflix, Southwest Airlines, American Airlines, NASDAQ, Citibank, Visa, MasterCard, Bank of America, AT&T, BT, Sears, Macy’s, Starbucks, and Coca Cola. What these companies have in common is the need to expand their analytics systems to include all the new sources of data coming from social media, the Internet of Things, and mobile transactions. To add further complication, it is no longer about Big Data, but rather real-time big data. This area includes the stream of data from sensors, website metrics, and distributed sales monitoring.

Now that things are shaking out, what does the landscape of big data look like? A review of the market shows that use cases are divided into several categories: customer relationship marketing, Internet of Things, data warehousing, and smart data recovery.

Customer Relationship Marketing: CRM

The better a company can know its customers, the better its chance of tapping the market they represent. Much has been written about gathering information about customer transactions, preferences, loyalty cards, response to advertising and promotions, to say nothing of the flood of information coming from social media. It’s all a matter of understanding the relationship between the business and the customer.

As an example, major chain stores are now tracking sales in their stores in real time. They are using this data to change marketing campaigns on-the-fly depending on factors such as the competition’s pricing, the customers’ reaction to online promotions, and even the person’s actual position in the store! Yes, technologies such as Apple’s iBeacon allow businesses to target your exact location in retail stores, airports, and even ballparks. Some people find this a scary specter of Big Brother from George Orwell's novel Nineteen Eighty-Four; others consider this to be the new reality of highly targeted marketing.

The Internet of Things: IoT

An increasing number of devices generate data that is available to interested companies. For example, power companies can monitor energy usage and provide feedback to consumers about smart energy consumption. Biometric information is being collected in real time from millions of fitness devices to help the people wearing them live a healthier life.

Another factor is the emergence of “bring your own device” (BYOD). This is the trend in which companies support whatever personal smart phones their employees own. While corporate email was previously limited to the people in sales and upper management who were issued early email-enabled phones, suddenly everyone is sending and receiving data. This trend has left IT departments scrambling to support more and more smart devices—and the flood of data from them.

The convergence of BYOD with the IoT is predicted to result in more than 25,000,000,000 devices connected online and streaming data constantly. The challenge is to not only collect this vast amount of data but also process it in real time and extract information to support business decisions.

An interesting example is a city in China that gathering data from taxi cabs, buses, and traffic cameras. Their goal is to monitor traffic to identify usage patterns in the volume of people who are being transported among hundreds of routes. The results of this project will be extrapolated to predict traffic patterns, which will aid in better planning the future development of public transportation. This is a typical example of a combination of technologies such as NoSQL, IoT, and big data, which would have hardly been possible only a few years ago.

Data Warehousing

The field of data warehousing has been around for years, if not decades. The new twist, thanks to big data, is that data can be actively mined as it comes in. This is possible because the data warehouse is becoming less of a physical archive and more of a “logical layer.” This logical layer sits between the analytical systems and the large amount of data beneath them.  Companies are increasingly taking full advantage of the distributed capabilities of Hadoop to collect data with all the variety and velocity demanded by the flood of data coming in from multiple data sources.

The traditional approach to data warehousing was to use ETL (extract, transform, load) tools to consolidate data from multiple sources into a centralized database. This often meant filtering out data that did not fit into a central, relational data model. In the modern context, valuable data comes from a variety of separate, non-compatible, sources. If you are filtering out data simply because it does not fit the central model, you are leaving valuable information un-tapped. The new generation of ETL tools connects to a logical data warehouse layer, which is an abstraction layer for the analytics tools that process the data. Which brings us to the next point: smart data recovery.

Smart Data Recovery

By its very nature of accommodating unstructured data, big data offers a solution for companies needing to integrate data from multiple sources.

Different business units in the same company often use different data formats, sometimes because they were previously independent companies. The IT departments, charged with finding a way to integrate data from these sources, often turned to massive data extractions and conversion. The resulting data conversion projects could take months to complete and supported only a sub-set of the original data. Big data tools offer a new solution of mapping data from many sources into a single data store.

A similar example is the need for different companies in the same industry, say the airline industry, to exchange data with each other to ensure a seamless experience for the customer. Although a data exchange protocol may exist for the industry, it may be decades old and support only a subset of the information gathered nowadays. Data that falls outside of the industry-defined protocol may not comply with any existing standards. Big data offers a way to cope with this essentially unstructured data.

Modern business intelligence (BI) tools have become more intelligent when extracting data. They can add semantics to data and build a data dictionary on-the-fly. These tools look at data, identify what it is, and decide what to do with it. The goal is to allow you to select tables from a database and the BI tool will attempt to understand the relationships between those tables. For example, if you selected tables containing invoices, customers, and locations, the tool would find the relationships between them: each invoice belongs to a different customer and each customer is in a different location. From this inference it could decide how you wanted to display the information, such as creating an infographic showing sales on a map. In this way the tool automatically brings information to the decision maker instead of merely displaying unprocessed data.

Of course, that example involves tables in a relational database, so the relationships between the tables are well defined. Some tools go beyond this: they are smart enough to handle unknown data structures. In the earlier examples of integrating information from different business units and different companies, the schema is well known. The new trend is to integrate from sources where the schemas are not known in advance, such as the many sources we have discussed, including sensors, social media, etc. The tools must figure out the schemas automatically, which is why they are called “smart.”

Hybrid Transaction/Analytical Processing: HTAP

Traditionally, transactions and analytics have been two separate fields with inherently different needs. Transaction processing places an emphasis on throughput and data persistence. Analytics relies on volumes of transient data. In the past, analytics could wait for data to be gathered from the transaction processing systems, most of the time with batch jobs running overnight. Today, businesses are interested in making decisions in real time. For example, they may want to send a coupon to your smart phone while you are walking down a certain aisle in a store.

With IoT and the explosion of data streaming from sensors in real time, analytics need to happen in real time without giving up on transaction processing. Enter the world of hybrid transaction/analytical processing.

HTAP integrates high-speed transaction processing with analytics so that business decisions can be made on-the-fly. It relies on in-memory computing for the speed to gather data from transactions as they happen and pipe it to an analytics engine. One can foresee more database vendors and BI tools tapping into this market.

2015 Will Be An Important Year for Big Data

It looks like 2015 will be an important year for big data and many other technologies such as HTAP and in-memory computing. It may be too soon to see market standards being defined. The best advice is to keep an eye on the market and be prepared to experiment with new things. If you are still thinking about how big data can help you, you are probably late to the party ... but maybe just fashionably late. Many of the early adopters are still experimenting with the technology. Although there are promising new trends, these projects are not cheap, and they are high risk. So you have to decide if you want to join the party while it is still wild or wait for it to settle down a little bit. Just make sure you don’t miss the fun.