While unstructured data may represent one of the greatest opportunities of the big data revolution, it is one of its most perplexing challenges. In many ways, the very core of big data is that it is unstructured, and many enterprises are not yet equipped to handle this kind of information in an organized, systematic way.
Most of the world’s enterprise databases—based on a model designed in the 1970s and 1980s that served enterprises well in the decades since—suddenly seem out-of-date, and clunky at best when it comes to managing and storing unstructured data. However, insights from these disparate data types—including weblog, social media, documents, image, text, and graphical files—are increasingly being sought by the business.
There’s no doubt that unstructured data is becoming a greater part of enterprise data environments. In a recent survey of 304 data managers conducted by Unisphere Research, an overwhelming majority, 87%, expect the amount of unstructured data in their organizations to increase over the next 3 years. Close to one out of five say this amount will double during this time period. (“2013 Big Data Opportunities Survey,” sponsored by SAP, May 2013.) A large segment, 42%, estimate that a significant portion of their data stores—25% or more— is already in the form of unstructured data. Interestingly, unstructured data is as much a part of the picture for small firms as it is large, global organizations. In addition, the prevalence of unstructured data cuts across most major industry groups in the survey.
However, a separate Unisphere Research survey of 264 data managers conducted early last year (“Big Data Is Real and It Is Here,” sponsored by MarkLogic, January 2012.) finds that organizations are struggling with both understanding and managing available unstructured data. They do not see traditional enterprise relational database systems as up to the task, and most data managers say it’s getting difficult to line up all the hardware and IT resources to meet the challenge.
Adding to the challenge, a majority of respondents, 55%, report that their organizations are unaware of what types or how much unstructured data is even available to their organizations. Technical issues aside, most organizations simply are not ready to take advantage of unstructured data from a management standpoint.
For more articles on this topic, access a DBTA special Thought Leadership section.
While every enterprise has its share of individuals or teams working with unstructured data, such activity usually happens on an ad hoc basis for single, one-off projects. For example, marketing departments are aggressively exploring and participating in social networking dialogs, and working with social media data with solutions such as sentiment analysis tools to develop a picture of customer adoption of products and services. In the IT shop, website managers keep track of log data to gain a better sense of end-user experiences on corporate websites. On the shop floor, managers scope out machine log data to better predict when breakdowns will occur and avoid disruptions to their supply chains.
Typically, for integration efforts up to this point, enterprises have been relying on data warehouse environments, through which data is loaded through extract, load, and transform (ETL) systems and archived for follow-up analysis. The emergence of unstructured data in its various formats is often beyond the capabilities of ETL systems, which were built and optimized to handle structured, relational data. Now, with unstructured data a key part of core enterprise information assets, data integration needs to extend well beyond ETL and data warehouses.
In fact, the challenge with unstructured data is that it is in the same state that relational data was more than a decade ago: stored and maintained within silos within enterprises. Such information needs to be brought out, in an automatic way, to the rest of the enterprise. What if customer service managers could integrate production data within their projections, to look for potential ups and downs in customer dissatisfaction as a result of supply chain disruptions? What if application development managers had access to weblog analytics to see where online application performance may be lagging?
Many agree that once enterprises can get their arms around unstructured data, the benefits to their organizations will be far reaching. Managing unstructured data resources and making these assets available to the organization can help lead to greater productivity, more insightful decision making, and the ability to employ more advanced analytics to enable businesses to compete more effectively in global markets.
What’s the key to bringing potential new silos of unstructured data into the enterprise fold? To accomplish this, enterprises need to recognize and understand the role unstructured data will play in future growth. Business decision makers and data managers need to work closely together to meet the challenge with the following approaches:
Click on bottom right to continue to Page 2