Data Observability: Is It Data Quality Monitoring or More?

Feb 27, 2023

By Petr Travkin, Senior Manager, Data Analytics Consulting, at EPAM Systems, Inc.

Page 1 of 2 next >>

What Is Observability?

Observability by definition is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. In other words, a system’s behavior is determined from its outputs—information obtained by certain measures.

Though observability as a term has been used for many years, the modern understanding originates in software observability, which is the ability to collect data about program execution, internal states of modules, and communication between components.

When used in the broader IT context, the term observability usually describes the ability to understand and manage the performance of all the systems, servers, and applications. The overall objective of observability here is to ensure that all resources are available and performing as expected.

Observability Versus Monitoring

Observability and monitoring are often referenced simultaneously in various discussions. Monitoring has supported businesses since early IT systems emerged, but it’s only been a decade since companies discovered the need for more extensive visibility capabilities across ever expanding software and infrastructure.

There are similarities between observability and monitoring: Both involve the collection of diverse sets of data and serve the goal of identifying problems within the IT landscape.

The difference between observability and monitoring focuses on whether data pulled from an IT system is predetermined or not. Monitoring issues or abnormalities are usually anticipated and there are certain criteria defined to measure against. By contrast, observability collects metrics across the entire IT landscape to proactively notify of potential issues. Various machine learning techniques are used for the observability systems to learn what could be wrong. While monitoring simply displays data, observability is measuring all the outputs across multiple applications and systems to understand the relationships and offer actionable insights into the IT landscape’s health.

At the same time, monitoring makes observability possible, and usually, observability and monitoring work together. While monitoring alerts of an issue occurrence, observability helps to detect and solve the root cause. With monitoring alone, we would only know that something is broken, not how to potentially fix it.

Data Observability Scope

There are multiple opinions about what data observability should include. Some individuals believe that since data is running through the veins of the whole enterprise IT landscape, data observability must provide insights into infrastructure and applications, not only data pipelines.

Those pipelines run on this infrastructure or have integrations with these applications. While that is logically correct, because everything is interconnected in a modern IT world, individuals must look at data observability from the perspective of DataOps. If we are calling it data, let’s talk about data, not infrastructure.

Good, Old Data Quality

Data quality discipline has been here for many decades helping to determine if data can be considered fit for its intended uses in operations, decision making, and planning. It has never been easy to implement data quality practices, processes, and tools as there is no single definition of high-quality data. It’s always context-dependent.

Studying this context, gathering requirements, making a trade-off between “fixing” the data or filtering “bad data” out, measuring actual state, and reporting on it have always been challenging.

Data quality management should be considered in relation to the goals and overall strategy of an organization, its management, culture, business processes, and technical architecture. That’s why it is difficult to select data quality dimensions (timeliness, completeness, etc.) and apply them properly at every stage of the data lifecycle.

Data quality management usually never takes place in a vacuum. It should be part of the overall effort of an organization to manage and govern its data. A proper data governance program has always been a key to successful data quality management practice.

What remains clear are the capabilities data quality management should have:

Data quality standards: have high-quality data definitions and requirements for data quality
Data quality assessment: identify errors, risks, obstacles to use and quantify data quality levels
Data quality monitoring: track quality levels to detect unexpected conditions and act in response to them
Data quality reporting: communicate around the state of data quality to consumers
Data quality issue management: remediate root causes of data issues
Data quality improvement: have process and technologies to prevent data issues, enforce data quality standards, and improve trustworthiness of data

Page 1 of 2 next >>