Data Integration Evolves into a Science as Much as an Art

Bookmark and Share

As companies learn to embrace "big data" terabytes and gigabytes of bits and bytes, strung across constellations of databases - they face a new challenge: making the data valuable to the business. To accomplish this, data needs to be brought together to give decision makers a more accurate view of the business. "Data is dirty and it's hard work; it requires real skills to understand data semantics and the different types of approaches required for different data problems," Lawrence Fitzpatrick, president of Computech, Inc., tells DBTA. "It's too easy to see data as ‘one thing.' "

Thanks to a proliferating variety of databases, devices, and data feeds from both inside and outside the organization, the variety and scope of data has changed dramatically for enterprises just in the last few years. "Ten years ago, the main focus was on transactional data from the operational systems and working to integrate it across the enterprise," Rob Armstrong, director of data warehouse support for Teradata, tells DBTA. "The data was very structured, well known, and for the most part is was under the control of the corporations themselves or under the control of a very trusted partner. Now, the data is not just transactional but interactive in nature."

A recent survey of 289 data and IT managers finds that companies are only beginning to grasp the complexities that are arising with the proliferation of data and databases. The study, "Data Cross-Currents: 2011 Survey on Cross-Platform Database Administration," produced by Unisphere Research, a division of Information Today, Inc. (and publisher of DBTA), in partnership with Quest Software, finds integrating the data moving between multiple databases to be the greatest challenge within multidatabase management system environments. In most cases, homegrown solutions are used to manage across different brands. In addition, the survey finds, more than one-fourth of responding companies already have databases in the cloud - in most cases, private clouds. There is a lack of clarity or certainty as to how these environments will be handled, however. Most respondents do not know if new tools or skills will be required to manage these new types of environments

New Wealth

One thing is certain, however: Companies that don't do the hard work of data integration risk missing out on a new wealth of information. "Companies recognize the wealth of information, and potential for lost opportunities if business analysis does not handle data variety as a whole," Anjul Bhambhri, vice president of Big Data Products for IBM, tells DBTA. "The diverse data now becoming available represents new opportunity for companies to use the new insights available."

The types of opportunities data integration makes plausible include helping organizations make better product decisions, reduce fraud, and reduce costs in real time, Bhambhri continues. "This integrated approach also provides a 360-degree customer view that helps predict customer preferences. Further, profitability and growth are accelerated since companies create efficiencies quicker and impact the entire business sooner."

In addition, the impact of data integration on internal operations can't be underestimated, as it is "the key to process automation, data quality, and other operating efficiencies - including putting the data integration tools into the hands of non-technical users," George Gallegos, CEO of Jitterbit, tells DBTA. "To the business this means fewer technical resources bogged down on integration issues, and a greater number of employees focused on customer-facing solutions that drive growth."

While the growing diversity of data may seem to be a challenge, this also presents a strength that savvy organizations can leverage as well. "The companies that are not willing to diversify their data management systems, and are stuck on arcane IT policies - such as regarding themselves as ‘an Oracle shop' - will not be able to leverage new technologies and become a real-time enterprise," Dirk Bartels, vice president of strategic product management at Versant, tells DBTA. "Data, data integration and data management should be viewed as a means to an end. For example, emerging NoSQL technologies are born out of urgent needs, and they are worth considering for new application development."

Management issues often don't only stem from data diversity but also simply data size. Companies immersed in the information delivery business, for example, are struggling with sorting through the massive proliferation of data. "The primary issue we're concerned with is the amount of time a specialist has to devote to simply finding the data they need," Jud Dunham, senior product manager science and technology for Elsevier, tells DBTA. "We know for researchers, that this is an unsolved problem and an important one - the amount of time wasted simply looking for data. This is a pointless task and a waste of human resources. Our goal is to keep scientists doing science instead of rooting around in the corners of the internet looking for data, which is frankly about as interesting as digging around in their couch looking for their car keys."

New Initiatives

The rise of well-connected enterprises with views across all their information realms is fueling new opportunities for organizations. "Many of the core building blocks and technologies are in place to empower the intelligent society, such as mobile devices that allow access to data from anywhere to anyone," Shawn McPherron, director of cloud marketing at Fujitsu America, Inc., tells DBTA. "While there has been tremendous progress in technologies that help to integrate and share data, the real challenges that will slow progress are often not technical, but, political challenges such as data ownership, data governance and compliance. If regulatory or business issues prevent sharing data, then it will not be shared."

This is creating a new role for technology in the business, no longer simply about cost savings and efficiencies but also "about enabling revenue growth, improved customer satisfaction, and faster time to market," says Margaret Dawson, vice president of product management for Hubspan. "Integration, within and beyond an enterprise, plays a vital role in this mandate. How quickly users can access, use, and exchange information directly influences the business impact. Customers want their order information now, manufacturers want to know if a supplier can deliver a part now, and a research institution needs to have the latest DNA cloning information now."

Robust data integration practices have a direct impact on business profitability and growth, since "they make the data presentation and analytics so much easier,"  Scott Staples, head of data and analytic services and president of the Americas for MindTree Ltd., tells DBTA. "This helps to present data to business decision makers in a timely manner to make right decisions. Such data and analytics presented enable businesses to take right decisions to gain competitive advantage and win higher market share. Having good sources of data with established data integration practices can have a catalytic effect on business profitability and growth."

New Roles

Databases aren't what they used to be - and the people who run and manage them are rapidly evolving as well. "In-memory, appliance and columnar databases are changing the way we think of databases," Chris Hagans, vice president of operations for WCI, tells DBTA. "Included in this is how the database is meshed together with cloud-based applications. More and more underlying databases are being hidden and managed by cloud-based application vendors. The type of database is becoming irrelevant. The questions are now about how one integrates into those, how data can be moved in or out."

To bring this all together, there is a need for a new breed of data professional - one who can turn data into actionable insights. At one level, there are new job classifications emerging for specialists who do nothing else but ponder the business value of data. At the other extreme, there are many business professionals assuming data management roles themselves.

For example, the Unisphere-Quest study finds most companies now have databases under their roofs which are managed informally by someone other than a trained database professional. In many cases, these are single-purpose or edge databases, and companies simply don't have enough database administrators (DBAs) to go around to properly manage these environments. Most respondents support applications that can run across multiple databases - but licensing costs hold back multidatabase adoption.

There are a number of forces converging that are opening up databases and corporate information to new caretakers. "Data sources used to be strictly controlled by DBAs, tied to massive stored procedures that took years to construct and optimize, and very closely integrated into applications using a handful of interfaces like JDBC and ODBC," Jamie Ryan, partner solutions architect for Layer 7 Technologies, tells DBTA. "Now, even the largest and most conservative enterprises are recognizing the need to open up their information and application systems through APIs, driven by cross-departmental integration requirements, partner connectivity, expansion to the cloud, and mobile paradigms. This introduces new challenges around security and governance."

The new complexity is exacerbated by the proliferation of new data technologies as well. "There are now so many different ways to skin the cat," says Deirdre Mahon, vice president of marketing for Rainstor. "Nowadays, we have Oracle, MySQL, Hadoop and multiple data warehouses which requires enterprises to have more DBAs and developers to integrate all disparate data sets," she tells DBTA. "And it's not just about which vendor you have decided to invest in but really which mix of data management languages-whether it's SQL, NoSQL, Hadoop, or MapReduce."

To meet these new diverse data challenges, companies need a "data scientist" on board or, as WCI's Hagans puts it, a "mad scientist." According to Hagans, "someone who is able to patch very different sources and formats into something that makes sense for the company will become very valuable in the future. I think of it like Frankenstein; he had lots of different parts, from different sources, but he was stitched together in a way that gave him the ability to walk and talk. Being able to see the big picture and get these strange data sets together, into something that works is going to be a very important role in the future."

These new or evolving roles - data stewards, data scientists and business analysts - will help companies leverage the new opportunities that big data presents, MindTree's Staples tells DBTA. While many companies have made progress in linking data, "few are doing anything with it," he says. "They are crippled by the sheer volume, complexity, and lack the experience to mine it. They need a few key roles to lead the way, individuals who can work closely together to ensure the data is accurate, their analysis is meaningful, and the learning and actions are understood and distributed across the organization."

Data scientists may even be able to address major business issues before they spiral out of control. "Today, the job of the data scientist is to make sense of all this raw data and draw intelligent information that really tells business decision makers what is going on, ahead of time before it becomes too late to do something," says Mahon. For example, she notes,  if a car manufacturer had a potential design flaw in one of its models, a data scientist might discern a number of technical service issues from around the world that form a pattern, perhaps allowing the company to take swift action that could alleviate the possibility of negative PR and lawsuits.

While Elsevier's Dunham regards the term "data scientist" as "a somewhat ambiguous term," he agrees "there is certainly a growing need for people who understand data and can work with it to create solutions to meet specific needs. This requires strong analytical skills, subject matter expertise and software development abilities, which is a very unique and powerful combination." Dunham sees additional data roles emerging, including those of data publisher and data archivist. MindTree's Staples also sees an emerging role for "dashboard specialists" who can "create a distribution medium that will captivate users."

Still, many industry watchers feel that the job of translating data into actionable business value is too much for any single type of professional. "The art of getting the value from the data is too big for one person in most cases," Peter Duffy, CTO of Sumerian, tells DBTA. "They'll either need to build up specialist teams, or create strategic partnerships with vendors that have this capability."

Teradata's Armstrong points out that the new skills required for data management are reaching into a broad range of disciplines typically not associated with databases - such as "business systems and economics, plus consumer behavioral psychology and influencer intelligence." This then needs to be complemented with "linguistic and social terminology skill sets, almost akin to a translator," he explains. "People who are well-versed in human nature, thought processing, and understand causal relationships are in growing demand."

Wither the DBA?

Industry experts are divided on what roles DBAs will have in this new multidatabase, multisource, multidimensional world. Layer 7's Ryan, for one, sees a gradual shift away from the technical administrative roles that DBAs have filled over the years. "The emphasis has definitely moved away from the DBA and into other areas," he opines. For example, he illustrates, "most modern data sources have been simplified to the point where anyone can be up and running with an application in a matter of hours. It's just as common to see a database evangelist role, someone who knows the ins and outs of using MongoDB, or optimizing database calls at a higher level using Memcached, or tying data to computation for a specific application using Hadoop." Plus, he adds, a lot of innovation and management now occurs at the middleware level, versus directly on databases.

Still, many industry leaders see DBAs as continuing to be the main go-to professionals in the emerging data-driven enterprise. "The DBA role is still critical, both in the structured and unstructured world," IBM's Bhambhri points out. "While traditional DBAs look at database queries and storage efficiencies, the new forms of data management and integration expand the scope of DBAs to handle these functions at scale across a distributed set of computing power."

Hubspan's Dawson agrees that the DBA role is absolutely still relevant, but adds that "today's DBA needs to understand more than SANs and servers; they need to have a deep knowledge of virtualization, integration, data security and cloud computing." New roles also emerging include cloud strategist, integration specialists, data managers, SOA architects, and business process architects, Dawson says.

As platforms and technologies change, traditional DBAs will have to as well, agrees WCI's Hagans. "Their role will need to evolve into a broader data specialist role, instead of just a concentration on database platform."