Collaborate closely to understand how unstructured data assets can be applied to solve particular business problems. Since the notion of engaging with massive stores of machine-generated or user-generated data may be intimidating to business executives, data managers and professionals need to work closely with these users to help them understand how to realize these opportunities. As recent Unisphere Research surveys confirm, more of the information flowing into enterprises will be unstructured data—16% of respondents to the “Big Data Is Real and It Is Here” survey acknowledge that they already have more unstructured than structured data within their enterprises, and another 34% see it surpassing relational within the decade. Business leaders need to be willing to fund initiatives to mine this rising and abundant new source of value. The proliferation of unstructured data offers unprecedented opportunities to expand insightful decision making, customer service, and innovative thinking beyond the flat, columns-and-rows exchange of information seen today.
Inventory unstructured data assets available to the organization. Just as relational data is well documented andmeasured, decision makers and data managers need to understand the scope and degree of unstructured data that ismoving through their enterprises. As part of the collaborative effort mentioned above, data managers and professionalsneed to reach out across all business units to explore what types of unstructured dataare already being used or stored, and howan enterprise- wide approach can streamlinethese efforts. As noted in the Unisphere Research surveys, a majority of businessleaders are not even fully aware of howmuch unstructured data is flowing throughtheir enterprises.
Help design and implement databases and platforms that effectively leverage unstructured data stores. The good newsis that this doesn’t necessarily mean large up-front investments in new technologies. Rather, new approaches and platforms can be positioned right alongside existingrelational databases. As a prime example, the open source Apache Hadoop and MapReduce framework—and its ecosystem of complementary open source tools—is considered to be a very effective platform for capturing and packaging large streams of unstructured data into manageable files that can be consumed by the rest of the enterprise. To adequately store unstructured data, organizations may need to acquire so-called NoSQL databases—which are often open source andlightweight as well.
For more articles on this topic, access a DBTA special Thought Leadership section.
Seek out and develop unstructured data management skills. The ability to analyze and manage unstructured data for nuggets of business value is not part of the heritage or training of today’s database professionals. Nonetheless, many are assuming unstructured data analysis tasks as part of their day-to-day jobs.
In a recent survey of data managers and professionals conducted by Unisphere Research among members of the Independent Oracle Users Group, 46% say that their job roles—as well as those of their team—are evolving closer to the skills associated with those of data scientists. Most organizations have long had database administrators and analystson staff, with institutional knowledge and capabilities to tackle emerging big data opportunities. (“Big Data Visionaries: 2013 IOUG Data Science Skills Survey,” February 2013.)
Develop an architectural approach to data management, so that unstructured data is sourced and ingested as readily as relational data. Unstructured data integration capabilities should be built into technology planning, processes, and workflows right alongside relational data.
The challenge is that many organizations have siloed and fragmented data environments, supporting applications and interfaces that are built to operate independently. An enterprise data architecture approach means that unstructured data integration will not occur as one-off projects, but rather become a seamless process that takes place almost automatically as data flows into organizations from various sources.
Ideally, unstructured data sources should be just one click away—decision makers should only have to point at a new data source to have it quickly brought into the enterprise within established rules and processes, so that it is cleansed and trustworthy. Since a significant portion of unstructured data flowing through enterprises is external data, and thus with questionable quality, the challenge is establishing processes and interfaces that vet and cleanse data, just as is the case with relational data entering the enterprise.
With an enterprise architectural approach, this integration can be transformed into an ongoing and repeatable process. Any and all information sources—no matter what their format—are opened up to the enterprise. That’s because when implemented properly, enterprise architecture is not tied to a particular data type—it can be relational, unstructured, and semistructured. Nor does an architectural approach favor any type of technology or integration style, whether it’s supporting existing ETL-based interfaces, a combination of ETL, data federation or virtualization. Data may be processed within NoSQL databases, relational databases, operational data stores, data warehouses, appliances, or Hadoop.
Effectively capturing and capitalizing on unstructured data isn’t just a technical challenge, it represents an organizational challenge. A flexible and agile enterprise environment—supported and embraced by all business units—will elevate unstructured data processing and analysis to a position in which it can help drive the business.