Page 1 of 2 next >>

Data Governance Gets Meshy


Data mesh is all the rage. The objective? To eliminate artificial roadblocks and extend the means of data production across the enterprise—thereby expanding the scope of data products the organization generates. And, ultimately, increasing the value and use of data in decision making and operational practice.

The data mesh framework moves the means of data production closest to the point of creation, giving those with the most knowledge that both agency and flexibility provide in its management. This eliminates handoffs between disparate, often centralized and resource constrained groups responsible for different stages of the data and analytics lifecycles (i.e., data ingestion, integration, provisioning). A data mesh strategy does not eliminate the need for enterprise governance wholesale. It does, however, call for rethinking some commonly held beliefs regarding who is accountable for what aspects of an organization’s data assets.

What Data Product Ownership Entails

A key shift in mindset and accountability is that business system owners are responsible for making the data created or captured by said systems widely available. Stated differently, adoption of a data mesh obliges all business domains to participate in the organization’s internal data economy. Data products should, in time, become a core part of every business domain’s product portfolio.

In this model, business domain owners—hereafter referred to as data producers—must:

  • Proactively seek out customers for their data wares
  • Conduct ongoing “market” surveillance to identify emerging needs
  • Take an outside-in approach (aka customer-centric) to data product design
  • Adapt data products to meet the needs of diverse customer segments
  • Collaborate with other data producers to address crossover and compound data needs
  • Actively manage a portfolio of data products

The obvious trick is for data product owners to decide what is credibly needed by whom. The less obvious but prevalent trap is for them to also try to control how their data products are used. With this in mind, adoption of the data mesh approach does not:

  • Require delivery of perfect data. It mandates that producers disclose how imperfect the data asset is, such that consumers can make appropriate decisions about its use. For raw or lightly refined datasets this may include metrics on data freshness, completeness, quality, source, and intended use. For data analytics products, this may also include the intended use, the circumstances/environment the model was explicitly designed against, sources of error, expected outcomes, and so on of the delivered model or insight.
  • Mandate the right mix of data products for any given data producer (aka business domain). The nature of the data created and the profile of consumers will influence the composition of each domain’s data product portfolio. All business domains will retain a spectrum of data and analytics skills. However, some may skew more toward data engineering and Ops, while others may be more weighted to analytic development.
  • Leave product design to the producer’s sole discretion. The data product mix is not based on the path of least resistance for each domain owner. An extreme position would be that data producers are responsible for exposing raw data assets only, thereby requiring each consumer to subsequently refine or analyze the data for themselves. For obvious reasons, this would be neither productive nor desirable. Instead, as with a traditional retailer or service provider, the onus in this model is on the data producer to meet the needs of multiple customer segments, each of whom may need to consume the data differently. Examples might include lightly refined datasets for use by data science teams, model outputs in the form of graphical visualizations, or fully encapsulated analytic models for incorporation into other products and services.
  • Allow any tool to be used for any purpose. A common fallacy is that a data mesh architecture leaves full control of the tools used in each domain. A data mesh—and the federated governance and management models central to this model—promotes flexibility and nimbleness. This is not the same as unfettered choice. The mish-mosh of technologies and skills resulting from an “anything goes as long as the job gets done” approach creates both organizational and technical inefficiencies. Choice may be provided, for example, between algorithmic modeling tools with similar capabilities or when selecting net new capabilities. However, in the interest of organizational learning and interoperability, promoting the use of common toolsets and infrastructure is still encouraged when possible.
  • Support a “my data, my rules” mentality. The inevitable slipups is for data producers to try to exercise control over the use of their data products once they are released into the world, or to want a say in any product or service that utilizes the provisioned data asset.

But, as noted above, the data mesh paradigm shifts accountability—and indeed creates a requirement—for comprehensive disclosure by data producers. There is a corresponding shift in accountability for using the data product appropriately to the consumer. This is a fundamental change in the de facto operating model for most data owners and organizations, one that sounds simple, but in practice, requires a sustained commitment to changing historical accountabilities and decision structures.

Data Buyer Beware

If data producers are not allowed to create moats around their data, who guards the gate? Where are the checks and balances? Who decides what is appropriate? Herein lies the critical role of governance and an equally important shift in thinking about the responsibilities of the data consumer.

For data product consumers, adoption of a data mesh does not:

  • Provide plausible deniability for negative consequences. As the saying goes: Enter here at ye own peril. Data consumers have a responsibility to ensure the products they are consuming are fit for (their) purpose—complete with requisite due diligence to ensure the product is as advertised. This is true whether a data product is poorly documented, doesn’t match the specification, or fails to meet established performance benchmarks. Far from alleviating consumers from responsibility, consumers must exercise care and common sense when deciding which data products to use and how. Long paragraph short: The onus is on the consumer to use acquired data products appropriately.
  • Prohibit refactoring or integration of provisioned data products. Data products may run the gamut from raw datasets to encapsulated, packaged analytics models. Consumers have the right to modify provided data inputs or integrate/embed the model into other products and services, as long as doing so doesn’t violate prevailing data usage policies. More importantly, this creates a dependency between the provisioning provider and the consumer. The ability to actively track and manage such dependencies is a key, albeit not new, challenge. Adapting product management practices, including prescribed SLA and providing clear expectations for asset maintenance, is part of the solution. Active metadata and other mechanisms to track interdependencies may also play a role.
  • Promote ignorance as an excuse for malfeasance. Or, stated nicely, being able to easily access or acquire a data product does not divorce the consumer of responsibility for how it is used. Rather, consumers of data products have a singular and heightened responsibility to be both aware and in compliance with internal data policies as well as external legal and regulatory requirements.

A mature, disciplined governance practice provides support by defining primary authorities and resources, defining guardrails, and providing standard mechanisms for validating compliance— including policies to mediate conflicts between producers and consumers.

However, the lack of such mechanisms does not excuse the consumer from potential liabilities. Governance can help ensure data products include/account for features required for compliance (e.g., seatbelts in cars). But should a consumer choose to take a data product offroad—either because seatbelts were not included or they simply chose not to use them—responsibility for negative consequences, intended or otherwise, lies with them.

Page 1 of 2 next >>


Newsletters

Subscribe to Big Data Quarterly E-Edition