Achieving Value From Machine- and User-Generated Unstructured Data

Aug 22, 2012

Big data is toughfor enterprises to handle, and adding to the challenge is the fact that much of it is unstructured data— business documents, presentations, log files, and social media data. Respondents to a survey of 264 data managers and professionals—subscribers to Database Trends and Applications—almost unanimously agree that unstructured data is on the rise and ready to engulf their current data management systems.

The trouble is, their management typically does not understand the scope of the challenge and is failing to recognize the significance of unstructured data assets to the business. The survey, conducted by Unisphere Research, a division of Information Today, Inc., in partnership with MarkLogic, finds that a significant amount of the data now passing through enterprises is unattended.

“Most of that data is ‘lost’ in the sense that it’s not captured and stored,” says David Jonker, director of product marketing for data management and analytics at SAP. At best, he says, unstructured data that is

captured “tends to be stored in pockets — for example, in databases, email servers, intranets, and more.” Unlike relational data, unstructured information lacks defined data types and rules to enforce where that data is stored. In addition, unstructured data is created and generated across a broad cross-section of end users and is stored in user-defined directories, beyond the reach of enterprise rules. The result is chaos.

Go here to access the Debut Edition of DBTA’s Thought Leadership Series: Achieving Value from Machine- and User-Generated Unstructured Data the DBTA Best Practices section on Analytics, Business Intelligence & Reporting.

“You know the sinking feeling you have when you reach for your wallet only to discover it’s not

where you thought you put it?” HK Bain, CEO of Digitech, explains. “You cannot rest until you actually have it in your hands and can verify that everything you need is safely inside. This is similar to the current surge in attention businesses are giving to unstructured data. We think we know what we have until we actually reach for something specific only to discover we can’t place our fingers on the piece of information we need. Panic ensues as we begin to realize that we only sort-of know where our unstructured information is, but we don’t know exactly what we have or how to get to it.” Many organizations are still in the early stages of identifying and discovering their unstructured data assets. Only 14% of respondents in the DBTA-MarkLogic survey can say that a majority of the data they manage could be considered unstructured. A total of 40% see more than one-fourth of their data as being in unstructured formats. This is likely to grow as time goes on. “Structured data has been traditionally viewed as the bread and butter of an organization,” says Sam Alapati, president of Miro Consulting, Inc.

“However, data driven by social media such as Twitter and Facebook is growing at an annual rate that’s three to four times higher than that for traditional structured data. This means that vast amounts of very useful information, especially that pertaining to customer relationship management and relationship marketing, are lying dormant and unused in the heaps of unstructured data in the possession of firms.”

Sometimes, the volume of unstructured data may seem overwhelming, says Fiona McNeill, global text analytics lead at SAS. “Businesses and individuals may not pay attention to it because it seems like too big of an elephant to eat,” she points out. “The key to dealing with this data is to distinguish between what is relevant to them, and what is simply white noise.” The net result is a major gap in the views of the business provided by analytics.

“Without the ability to capture all data available, analytics efforts can’t render the full picture, which translates to inaccurate feedback and reporting,” says Nicolas Maquaire, CEO of EntropySoft. Previously, unstructured data “was primarily seen as a cost of doing business — it’s hard to make sense of semi-structured human- and machine-generated data — now it could very well be a profit center,” says Grant Ingersoll, chief scientist for Lucid Imagination. “We’ve seen a number of companies who are using this data to effectively drive better relevance and recommendations for their customers.”

Go here to access the continuation of this article in the Debut Edition of DBTA’s Thought Leadership Series: Achieving Value from Machine- and User-Generated Unstructured Data the DBTA Best Practices section on Analytics, Business Intelligence & Reporting.

A brief registration page is necessary.