An Architecture or ‘Anarchytecture’ for Data?

What exactly is a data architecture? As the Zachman Framework exposed long ago, different people look for different kinds of details and documentation to answer fundamental questions about an enterprise’s architecture.

Someone involved with infrastructure will need to understand the tools used and the methods employed to move data and to be clear on concepts about how security will be enforced. But these aspects are only initial parts of the overall architecture, and as such, a simple diagram of tools used is incomplete and insufficient for a comprehensive view of data architecture.

Others in the organization will need to understand what kinds of data are being retained and where. How are the data structures composed? What is the organizational purpose for each stopping point where data rests? Is the persisted data a landing area in an attempt to keep things as similar to a source as possible? Or, is the database a place where the sources have been integrated and normalized? Or, has data content been twisted into star schemas? Or, is something else such as a data vault being used? How long should the data be kept? Who are the expected user groups of the data? What kinds of questions or functions will these user groups be employing against the data? What naming standards are applied, and where do they appear?

Many organizations seem to focus wholly on the tools used, and then simply let things evolve in terms of which data structures are created where, and what kinds of rules they may follow.

Under the limited perspective of such conditions, an actual architecture does not really exist. At best, it might be considered more of an “anarchytecture” to express the chaos filling in the spaces. After choosing and connecting the tools, survival of the fittest is left to rule. Anything can happen, as data ends up where any developer feels moved to place it.

However, that approach is problematic. Architecture means data-for-a-purpose has a place. It does not mean that data put anyplace is acceptable. Not applying a proper vision to ensure that data-for-a-purpose has a known and understood place means that, organizationally, it is not known how to build solutions. Under these undefined practices, finding a place for new data is a random function, a roll of the dice or a spin of a roulette wheel.

And if new data can go anywhere, when someone later needs to find that data, they had better know the name of the right person to ask. If a proper architecture is utilized, knowing where data must be placed, and why, is understood—by all. As well, it should be understood over time why and when new places for data must be created. The fuzzier the answers to those data location questions, the less clear your organizational data architecture is.

Beyond having clarity on where data should be placed, how data moves also must be part of the overall understanding. Ideally, there is a scheme—a pattern—to the flow of data that makes sense and seems rational to the organization. Any data going anywhere, or everywhere, leaves the impression that all is chaos, and we are back to having an anarchytecture. Which tools are allowed, which practices are used between resting points, where change transactions versus complete refreshes flow should be called out when possible.

These architectural answers provide the lines within which each solution should color. Rules can grow and change over time, and they will, as that is an aspect of life. But changes should be in a controlled fashion. Developers and engineers should not be left to fend for themselves, creating new and unique rules for every solution. The Old West was a colorful place with many unique and captivating characters, but, architecturally, we do not wish to revisit the wild frontier.