RESEARCH@DBTA: The Data Quality Confidence Gap Keeps Widening

Data managers, administrators, engineers, and analysts know more than anyone what kind of data is available to enterprises seeking to compete in the AI and analytics age. However, their perceptions of data quality are slipping—even though data quality has increased in importance due to the rise of analytics, AI, IoT, data monetization, and other next-generation initiatives.

Data leaders see the quality of their enterprise data is deteriorating. Less than 1 in 4 express full confidence in their data, which is down 7 percentage points from 2 years ago.

These are the fi­ndings from a new survey of 202 data decision makers by Unisphere Research, a division of Information Today, Inc.  The survey, conducted in partnership with Melissa, reports no letup in the growing confi­dence gap in the data needed to support next-generation initiatives.

This is the third survey in this annual series, and its results show that trends have evolved in the data quality space since the first survey back in 2021.

The survey ­finds that confi­dence in data quality is slipping. Only 23% express full con­dence in their organization’s data. This is down 7 percentage points from a similar survey conducted 2 years ago.

In addition, data quality is likely impeding efforts to move forward with the data-driven initiatives that are essential to the 2020s’ enterprise. Close to one-third say data quality is a constant, ongoing issue, which is up from 26% 2 years ago.

Why does data quality appear to be waning while issues associated with data quality are on the rise? Evidence from this survey suggests that it is likely tied to the growing demand for AI or analytics workloads, which require the best data at a moment’s notice. Additionally, the push to build AI capabilities is exposing weaknesses in companies’ data supply chains that urgently need to be addressed.

Confi­dence in data quality is slipping, yet organizations have taken their eye off the ball when it comes to addressing data quality issues. Lack of organizational support for data quality efforts—along with determining ROI—may be at the root of lagging progress in such efforts. Fifty percent of respondents cite lack of internal support, which is up 8 percentage points from 2 years ago.

The need is urgent, as initiatives such as advanced analytics and AI are shining greater attention on data quality. When asked how data issues are discovered, most are now uncovered through data analytics and AI projects, surpassing occasions such as more standard database upgrades or changes. A majority, 57%, report that they found out about issues through implementation of next-generation data projects, which is up from 43% 2 years ago. This points to the criticality of data quality to the success of AI and analytics initiatives.

How do data managers and administrators monitor and discover quality issues within their datasets? The most prevalent means of discovery is still through human engagement over automated or systems discovery. Typically, dealing with data quality issues has involved time-consuming manual processes.

The survey shows that there are efforts to step up automation to address data quality, while still relying on human input. There has been an acceleration in efforts to build custom solutions to address data quality issues, which is up 6 percentage points from 2 years ago to 64% today. Still, 40% expect business users to take greater responsibility for the quality of data they oversee, a trend that has also accelerated somewhat during the past 2 years.

It’s also important to note that the survey con­firms that the cloud is not a panacea for data quality issues. While close to half of executives say a majority of their data is now cloud-borne, this has not resolved data quality issues. They acknowledge that data quality needs to start at home, and that these issues persist even after data has been migrated or generated in the cloud. While the use of cloud for data storage and management has expanded, moving to the cloud has not resolved data quality issues, showing little or no improvement from previous surveys. Forty-two percent report an increase in issues, which is in line with previous studies.

Within today’s environments, businesses are aggressively moving into AI— both generative and operational—which requires massive amounts of accurate and timely data. The rise of large language models—both those publicly available as well as those contained within enterprises— to support business decision making and customer communications means data is being pressed into service in new and highly demanding ways, such as training data and real-time streaming feeds. The models need to support growing demand for intelligent, customer-facing applications such as chatbots and conversational interfaces, as well as intelligent assistants for internal enterprise operations. With all this, the need for high-quality, reliable data is urgent.