The Importance of Metadata

Today, nobody argues about the importance of data. It is a given that data must be collected, managed, and analyzed to conduct business successfully in the modern era. But all too often the meaning of the data is not being preserved. This can result in large stores of data that are difficult to interpret and use in any meaningful manner.

Situations that create unusable data can be minimized with appropriate metadata management. But this requires effort and time, and therefore metadata is frequently ignored (much like documentation).

But let’s step back a moment and define what I’m talking about. What is metadata? Metadata describes and defines data. It is used to provide documentation such that data can be understood and more readily consumed by your organization. Metadata answers the who, what, when, where, why, and how questions for users of the data.

Think about it: users of data must be able to put their data in context before the data becomes useful as information. Metadata makes data useful by embellishing it with details such as data type, length, a textual description, and other characteristics of the data. So, for example, metadata allows the user to know that the customer number is a five-digit numeric field, whereas the data itself might be 123024. That number, without context, doesn’t really mean much at all.

I like to use examples to clarify concepts, so let’s clarify metadata using an example that most people will recognize. Have you ever watched the Antiques Roadshow program on television? In this show people bring items to professional antique dealers to have them examined and evaluated. The participants hope to learn that their items are treasures of immense value. The antique dealers always spend a lot of time talking to the owners about their items. They ask questions such as “Where did you get this item?” and, “What can you tell me about its history?” Which is interesting, because the item is sitting right there in front of them!

Why do they need to ask these questions? The answer is metadata! The answers to the questions provide details about the authenticity and nature of the item. The dealer also carefully examines the item looking for markings and dates that provide clues to the item’s origin. So, the actual item being evaluated by the experts is the “data.” The answers to the antique dealer’s questions and the markings on the item are the “metadata.” Value is assigned to an item only after the metadata about that item is discovered and evaluated.

There are many trends that are driving up the requirement for capturing and managing metadata. The first thing that metadata helps with is simply understanding the data that is being used by your organization. And, for the most part, data that is used by application programs must have at least some basic metadata in order for it to be processed.

What is the name used to access and process the data? Is the data textual or numeric? How big can it be? But not all data is used by application programs to drive transactional workloads. Organizations are generating and capturing more types of data in data lakes for eventual analysis. But simply plopping a bunch of data without context into a data lake is a nice way to waste storage and create useless data. Metadata is required if you ever want to make sense of that data.

Understanding and describing data is a big part of regulatory compliance. As such, metadata’s importance cannot be understated. Regulations dictate what data must be protected and managed, along with requirements for how that is accomplished.

If you do not know what data you have because you have not defined the metadata, then complying with regulations becomes impossible.

Additionally, improving data quality requires accurate and up-to-date metadata. Without accurate data definitions, it is not possible to implement the controls to assure that the data is correct.

Poor data quality is pervasive and quite costly. According to software marketing and technology expert Hollis Tibbetts, “Incorrect, inconsistent, fraudulent, and redundant data cost the U.S. economy over $3 trillion a year.”

And these trends are not independent; they interact with and impact one another. Think about it: if the data is not accurate, how can you be sure that it is being treated properly to comply with the appropriate regulations?

So, we must be as proactive in identifying and managing our metadata as we are with our data. For data to be useful, metadata is required. Without metadata, data has no identifiable meaning—it is merely a collection of digits, characters, or bits. Metadata gives data its form and makes it usable by information professionals. How does your organization manage metadata?