Data Wants to Be Free—Why Good Governance Is Critical

Bookmark and Share

Operational systems are where data is born. These systems either force people to enter their details or acquire the same from trusted sources. Names, addresses, merchandise selections, and credit card numbers are consumed. The operational solutions interact with users and compatriot applications to give birth to their raison d’etre, be it purchase orders, payroll checks, or any of the thousands of other documents and transactions.

Operational systems are the data factories helping our organizations execute business. In serving those functions, clean database structures reflecting the semantics inherent in the contained data are usually best formulated in normalized assemblies. By expressing these business-focused semantics, the database structures naturally support the logical consequences of those very linguistic relationships. A proper normal harmony between data and process can greatly enhance the speed of delivery as systems are built and expanded. The databases behind these applications are the primordial seas churning with the proto-information that breeds, evolves, and eventually grows legs to crawl out from those birthing seas and into the downstream operational data stores, data warehouses, data marts, and data lakes.

Data wants to be free; and data finds its way out from the source systems to end up all over the place if an organization is not careful. If the business has not found a way to properly support even simple standardized reporting, desperate managers will leverage personal relationships to get what they need, somewhere and somehow. As time goes on, more people find more ways. Even when simple reporting is supported, those data consumers having more complex needs will travel down the same “by hook or by crook” path. Eventually, data is replicated everywhere, and as sources change over time, because answers will vary, trying to clean things up and get everyone on the “same page” becomes very, very difficult. Good data governance can help control and manage these downstream data uses, but often data governance as a practice isn’t even considered until long after the data horse has left the proverbial barn. Therefore, data governance’s first job is usually a game of catch-up in tracking down users, data stores, and reports.

Data warehousing, business intelligence, data mining—these are all areas that are downstream from running the operational side of the organization. In having that distance, data designs are much easier, or much harder, depending on one’s perspective. Where operational solutions should follow a semantic path in their structuring, once downstream, the data must be structured to support whatever use is made of the data. This usage-based modeling perspective opens the doors to all sorts of different ideas. Data users are a varied group of many differing types. Managers reviewing progress on assigned corporate goals have very different needs from data scientists trying to create new statistical models that might provide fresh insights to the organization. There is no such thing as a one-size-fits-all data solution to serve everyone’s needs and skills.

For analysis of a specific business event or set of metrics surrounding some event, a multidimension approach is most useful. Dimensional schemas allow users to play with their numbers, moving levels of granularity up or down by each of the business groupings defined as dimensions. Data lakes, having copies of all kinds of incoming data, support many data scientists who want data as raw as possible because they wish to mash it up in fashions of their own choice and caprice. Data vault or other anchor modeling structures can support building a data warehouse that absorbs change easily and can aid in building new data marts quickly. The choices that may work best for an organization depend on the complexity of the sources, the users, and the data needs. Organizations that believe a simple one-stop data solution works for all these data needs is thinking very small. Data functions best when free and available to open expression in many ways so that any user may view answers in a functional manner supporting all sorts of ideas.