<< back Page 4 of 4

AI: Data Quality’s New Frontier


Interestingly, at the same time that machine learning and AI applications have raised the stakes for data quality, it may also be part of the solution. A new breed of cloud-based data quality solutions has begun to emerge. Even more importantly, machine learning algorithms are being used to enhance and enrich data quality tools, a trend which Gartner has dubbed as “augmented analytics.” In Gartner’s view, augmented analytics promises to disrupt the data quality marketplace in very important ways.

Improving Data Quality

Identifying the different ways that machine learning can augment and improve data quality is not difficult. For example, it can identify duplicate records. It can automate aspects of data capture, a key source of errors. Machine learning algorithms can be used to detect anomalies, or, conversely, identify missing data and the sources from which to get the needed data. Perhaps, the most compelling aspect of the potential of incorporating machine learning into data quality processes themselves is that the process is iterative and should improve over time. One global asset management company has reported that it reduced the mismatches between the metadata it has and the actual data associated with the metadata from millions of instances to thousands in little more than a year.

The problems that poor data quality pose for businesses has been well-known and well-documented for decades. Recent research indicates that, on average, poor data quality costs the average business $10 million to $15 million a year. That number continues to climb as companies operate within more complex and data-rich environments. Moreover, the effective application of data in business processes, product development, and customer service is making an increasing critical contribution to many organizations’ overall competitiveness.

But the relationship of data quality to machine learning, has, in many ways, qualitatively changed the equation. Fundamentally, the use of machine learning output is built on trust. Few if any end users truly understand the algorithms used to produce the results from these applications. But even if the algorithms themselves are sound, their results will be flawed if the data used to train them is flawed. If end users can’t trust the results produced by machine learning models, those models are essentially useless.

What’s Ahead

If machine learning poses a special challenge for data quality, it may also hold the key to the solution. Cloud-based data quality tools that integrate machine-learning techniques can both accelerate and enhance data quality efforts. Of course, applying new tools requires an investment.

With that in mind, business managers have to invest in building a business case for improving data quality. The business case should be based on the priorities of the company and focus on identifying metrics that can be tied to important business outcomes. And, the plan should be able to identify the costs and benefits of implementing the new procedures for enhancing data quality.

While many companies have long given lip service to the need for improved data quality, efforts in that area are often unheralded and even overlooked. But the rise of AI has put a spotlight on data quality. Inaccurate data no longer simply costs companies money. It can derail their entire decision-making process.

<< back Page 4 of 4


Newsletters

Subscribe to Big Data Quarterly E-Edition