Modern Data Governance: Active Metadata, Data Catalogs, and Data Observability


Last year, hackers broke into AT&T’s customer database, as documented in a report by the Mozilla Foundation, grabbing the phone records of almost all current and former customers—73 million in total. “The data breach involved data AT&T was stor­ing on a third-party cloud storage company that was left poorly secured. It includes records of calls and texts—including information about who users called and texted, when, and for how long.” The third-party vendor, a database cloud provider, purportedly lacked proper access controls. While AT&T may be one of the most technologically savvy companies on the planet—as is its cloud vendor—it still fell short in proactive governance practices.

There’s no shortage of examples of where data governance tends to fall short. PwC’s data gover­nance experts identified five issues where data gov­ernance is struggling:

  1. Data security vulnerabilities: exposure to cyber threats: Poor data governance weak­ens security frameworks, creating vulnerabil­ities that can be exploited by cybercriminals or internal threats. Without clear policies and controls, organizations struggle to safeguard sensitive information such as customer data and intellectual property, increasing the risk of breaches and data leaks.
  2. Data quality issues: inaccurate, inconsis­tent, or incomplete data: Poor data gover­nance often results in inaccurate, inconsistent, or incomplete data, leading to errors such as duplicate records, outdated information, and missing entries. Moreover, high-quality data is a prerequisite for successful AI and generative AI adoption, as these technologies rely on accu­rate, well-structured data to generate reliable insights and outcomes.
  3. Compliance risks: non-compliance with data protection laws: Without strong governance practices, organizations may fail to implement crucial data protection measures, such as data encryption, consent management, and data anonymization.
  4. Lack of data accessibility and integration: data silos and fragmentation: Teams may lack access to the most up-to-date or relevant information, impacting their ability to make informed decisions. Data silos also result in duplicated efforts, where multiple departments collect and manage the same data in different ways, further compounding inefficiencies.
  5. Inefficient resource management: wasted effort and redundant work: In organizations with poor data governance, inefficiencies often arise from unclear data ownership, lack of coor­dination between teams, and the absence of standardized processes.

Modern data governance is also essential to support and accelerate AI development, especially for achieving greater agility, scalability, and trust. DBTA’s latest survey of 424 data executives found 44% rank data governance as a top priority for their organizations, only second to 46% seeking greater data quality.

To achieve modern data governance, organiza­tions need to animate and automate their data gov­ernance practices, moving from manual processes and siloed treatments to a proactive business enabler, built on real-time insights, context-aware tools, and proactive monitoring and observability across the enterprise data pipelines. These capabilities deliver the three essential ingredients to governance: active metadata, data cata­logs, and data observability.

The need is clear: Data team members—as well as business leaders—need to have a clear understanding of what data they have within their purviews, its lineage, who is generating the data, who is requesting the data, and how securely it is locked down.

Often, much data is “dark data,” according to a survey of 1,288 executives from Hitachi Vantara. The survey shows that 75% of data executives “save everything, then never touch half of it.”

In a related statistic, 44% state that “no one in their organiza­tion knows all the data they are collecting and storing.”

As David Hand, author of Dark Data: What You Don’t Know Matters, “A lack of awareness of what you are missing can lead to distorted understanding, incorrect conclusions, and mistaken actions.” Even more distressing, the presence of dark data across the enterprise brings with it massive security exposures—from mishandled personally identifiable information to potentially exposed corporate secrets.

Here is how the three pillars of modern data governance—metadata, data catalogs, and data observability—can help enter­prises and their decision-makers to get a better grasp on the data available to them:

Active metadata. Metadata, or data about data, is the very core of modern data governance. The metadata is the reference point for the entire enterprise to understand what data assets are available, where it came from, when the data came online or went offline, and what parts of the business it serves. Meta­data comes in many forms depending on its industry and func­tion. It may be associated with knowledgebases and academic resources, it may be the technical and billing details associated with cloud providers, or it may reflect production statistics from factory floors. It’s also important to understand the data that is being managed or stored by third-party contractors or partners.

Data catalogs. These are built upon metadata to provide a searchable service for enterprise end users and data teams. Importantly, from a productivity perspective, having robust data catalogs enables users to find and access data without hav­ing to put requests in with their IT or data engineering teams. The data catalog also provides for standardized and consis­tent data naming and mapping, enabling seamless movement between applications. There are security benefits as well: Data catalogs help identify and categorize the sensitivity of data—for privacy and compliance purposes.

Data observability. When it comes to data management, “hope is not a strategy,” as articulated by Andy Thurai, principal analyst with Constellation Research. That means data manag­ers and professionals need to have a clear understanding of how data is moving through various enterprise pipelines to ensure it is serving its intended purpose. Today’s enterprises are com­plex, relying on an increasingly wider range of data sources for a variety of applications, from AI to time-series monitoring. In addition, the need for observability extends well beyond data flows: We need to incorporate observability of surrounding applications and systems. It’s a form of continuous intelligence, especially in the age of AI. In essence, observability is about a continuous, holistic view of what’s taking place within the enter­prise. This is where automation is required—to assist operations or DevOps teams in connecting data to business outcomes. Again, it’s also important to understand and be able to manage data moving to and from third parties.

Modern data governance has many moving pieces, and with initiatives such as AI, IoT, and edge computing increasingly being a part of data environments, it’s more important than ever to have a solid data foundation. These three pillars—metadata, data catalogs, and data observability—need to be part of any data governance strategy moving forward.



Newsletters

Subscribe to Big Data Quarterly E-Edition