Page 1 of 2 next >>

Making It Measurable—Justifying Investments in Data and Data Quality for AI and Machine Learning

Many organizations are experimenting with AI programs, but most of them face a significant and seemingly intractable problem. Although proof-of-concept (POC) projects and minimum viable products (MVPs) may show value and demonstrate a potential capability, frequently, they are difficult to scale.

One major issue is the quality, completeness, and availability of production data. POCs and MVPs are done in sandboxes with curated and cleansed data that is frequently adapted by hand.

However, it can be difficult to build executive support for and justify the need to make investments in upstream data processes. Senior executives would often rather put their organizational and social capital against something that is sexier than “data quality” or “governance,” such as applying AI to address a problem or better serve customers. The problem is that you can’t deploy the sexy apps unless the data foundation is in place.

One organization trying to create a 360-degree view of its customers encountered the following impediments:

  • Sales and marketing technologies were disconnected.
  • Basic analytic processes were not fully leveraging available data.
  • The company lacked a clear understanding of the full customer lifecycle.
  • Its data governance maturity was rudimentary at best.
  • Data was in inconsistent formats across the technology ecosystem.
  • No data curation and quality metrics were being developed.
  • Ownership of data sources was unclear.
  • No mechanisms were in place to monitor or enforce compliance with standards.
  • Many analytics projects were not coordinated and lacked consistent approaches.
  • New sources and formats lacked a standardized approach for onboarding.
  • The complex technology stack had many stakeholders and users whose interests were sometimes in conflict.

Because of these issues, data quality, completeness, and consistency suffered. People did not trust that the data was up-to-date or reliable. Multiple efforts were made to fix the data issues downstream, but usually after it had already been consumed by some applications.

For more articles like this one, go to the 2020 Data Sourcebook

Because of the complexity of the problem, which had numerous causes and contributors, no senior executive wanted to tackle it. Although important, the task was neither sexy nor fun and it was not fixable through shiny new tools. Not only was the challenge too great but the organizational structure did not allow clear ownership of the problem and its costs or the benefits of a solution. The problem was woven into numerous processes and applications that spanned departments and functional areas. As I often say, “There is no budget for the greater good.” Though solving this challenge would have had benefits across the enterprise, it required sponsorship and accountability at the most senior levels of the enterprise.

The Problem of Data Accountability

Data and data remediation efforts are frequently considered infrastructure that is part of the cost of doing business, rather than representing something that can provide clear, measurable ROI. Because these efforts are difficult to tie to a specific business outcome, funding to solve the problem is difficult to secure. Data is also considered “an IT problem,” with little accountability on the part of the business side. But many problems cannot be solved by technology. One potential source of poor-quality data, for example, is salespeople who do not enter complete and accurate information into the CRM system. Yet, this problem is the responsibility of the business, not the IT department.

Page 1 of 2 next >>


Subscribe to Big Data Quarterly E-Edition