Image courtesy of Shutterstock.
The essence of big data is the leveraging of more data—more granular, varied, and voluminous—for greater effect through smarter and more sophisticated algorithms. In almost every market segment, better leveraging of more data leads to competitive advantage and, hence, virtually every business is trying to work out what big data means for them.
Most businesses need to overcome multiple hurdles in order to fully exploit big data. They need to establish a software and hardware infrastructure, acquire data science expertise, and acquire the necessary data. Generally, we tend to focus on the technologies and algorithms of big data, but, without the right data, these capabilities lead nowhere.
The pioneers of big data, such as Google, Amazon, and eBay, generated a “data exhaust” from their core operations that was more than sufficient to allow them to create data-driven process automation. But, for smaller enterprises, data might be the scarcest commodity.
Emergence of Data Marketplaces
Hence, the emergence of data marketplaces. Just as a traditional marketplace brings the buyer and seller together in one physical location, the data marketplace provides those who have data with an opportunity to sell it to those who need data. For instance, Google search data for Italian foods broken down by geography might be of interest to Olive Garden’s planning departments. Google could make this data available for purchase on the data market (not that they need the money) and various retail chains would buy it.
The sale of data is nothing new: Dun & Bradstreet has been selling credit-oriented data since the mid-1800s. But, as the number and types of data that businesses might need proliferates, the emergence of data brokers that can act as intermediaries between data sellers and buyers seems logical.
As the big data movement gained momentum in the late 2000s, many companies attempted to establish data marketplaces. Some—such as Infochimps and DataMarket—made the exchange and provision of data their core business. Others—including Microsoft and Amazon—make data available as a value-add to their core business. Both Microsoft and Amazon offer Hadoop processing within their respective clouds (Azure and AWS, respectively) and both offer datasets through a marketplace. In all the above cases, the datasets include both publically available free data—government census data, for instance—as well as commercial datasets made available on a subscription or pay-per-user basis.
Access to Data Marketplaces as Part of a Big Data Strategy
In an ideal world, access to data in these data marketplaces would be via standard interfaces that allow for merging and comparing data from different services—that provides for data marketplaces what SQL provided for relational databases.
Open Data Protocol—OData—was designed to serve this purpose. OData is built upon ATOM, JSON, and REST technologies. OData is conceptually similar to JDBC or ODBC for databases. It allows a client to examine the structure and relationships within a data offering that may include multiple related tabular datasets—what we would call master detail relationships in SQL. OData has been endorsed by the Open Government Initiative, and is the API used by the Azure data marketplace.
The early enthusiasm around data marketplaces has somewhat waned over the past few years. Most of the marketplaces have failed to accumulate sufficiently diverse and valuable data collections, and early support for OData has decreased—notable supporters such as eBay and Netflix have depreciated their OData interfaces.
It’s clear that we haven’t yet delivered on the promise of the data marketplace. Nevertheless, the need remains strong, and, for many smaller enterprises, the data marketplace remains an essential prerequisite for a big data strategy.