Master Data Management’s Unsung Heroes in the Quest for the Golden Record

Bookmark and Share

When it comes to integrating Master Data Management (MDM) into an IT infrastructure, it is important for IT professionals to “open the hood” to make sure the system is capable of handling today's complex information environment. To realize real long-term value, information teams should move beyond the MDM marketing hype and instead focus on some key and often-overlooked technical functionality - matching, classifying and coding.

In many MDM systems, the set up of matching algorithms is embedded within the data quality functionality.  If done correctly, different match algorithms may co-exist for separate records, e.g., one for matching at an individual person level and one for matching members of a household, so that the various individuals can be related by reference.  In this example, householding is typically setup to evaluate the relationship between the individual member records of a party entity.  Golden records on the other hand are the result of combining attribute values from linked member records for a party entity. 

The ability of an MDM system to manage multiple match algorithms can allow for a given party to be a part of multiple logical entities.  This is essential for companies who want to resolve person records across sources into a Person entity, but also want to know which persons are part of the same household.  In this scenario, IT professionals would have separate Person and Household matching algorithms. 

Having multiple match algorithms is also important in meeting the needs of various divisions or functions within an organization.  These various groups will often have differing definitions or tolerances for how persons should come together into entities.  For example, a critical health system would want a very high threshold for matching with little tolerance for false positives, which is something that multiple algorithms support.  Marketing, on the other hand, may not have the same need as they tend to have a higher tolerance for false positives.  After all, it’s not life threatening if marketing sends two emails to the same person      

The Correlation Between Automated Match Actions and the Golden Record

Unlike earlier versions, today's MDM solutions make it possible to automatically define what constitutes an entity and allows it to be executed when a match is found within an acceptable tolerance.  Entity management supports two levels of tolerance: auto linkage and clerical review.  The end result of matched and linked member records is a Golden Record, a single trusted 360 degree view of the customer.

In the case of auto threshold, matches above this level will lead to the linking of the record into an entity that contains all Person records that are a match.  From that entity, a Golden Record can be either persisted or dynamically created in real time based on all attribute values from linked member records based on preconfigured attribute value survivorship rules.  In clerical review, matches above this threshold but below the auto threshold can be directed to a specified clerical review or data stewardship workflow, which allows a user to manually decide if there is a match between the two records. 

Having both automatic and human-based data steward capabilities enables the handling of completely different types of matching scenarios including:

  • High-Volume Automatic Matching:Uses an auto-link threshold and calculated match score to reduce the need for manual intervention when a match score is above a certain confidence interval.  This approach naturally requires careful tuning of the matching algorithm unless quality is not critical.
  • Clerical Review: Here, all matches above a specified threshold (i.e. a Clerical Review threshold) but below the auto-link threshold are automatically sent through clerical review.  This is useful when it is essential to not have false positive matches.
  • Miscellaneous: Many other threshold combinations can reach the desired balance of clerical review work needed versus the risk of false positive/false negative outcomes.

Another key consideration when defining the Golden Record is to establish the survivorship rules that determine which source data (name, attribute, reference) should survive.  Two common principles include the Most Recent, which takes the value that was most recently updated and the Trusted Source, which takes the data from the source that is deemed most trustworthy, both defined at the attribute level.  Popular survivorship rules can be specified for each individual attribute or reference type, or established so that a specific rule can act as the default in absence of other contributing attribute values.  On the other hand, another scenario might exclude certain attributes from a particular Golden Record view for security reasons. 

Exploding Channel Options Highlight the Importance of Auto-Classification

The ability to automatically classify data types and assets is also critical to MDM success is.  As any IT professional knows, it can be extremely cumbersome and time consuming to manually classify products into multiple classifications that represent different websites, brands, channels or standard hierarchies such as eCl@ss, ETIM, UNSPSC or GPC.  Leading MDM solutions should come equipped with auto-classification capabilities, which provide the ability to specify rules for automatically classifying products into multiple classification hierarchies.  

This is important for both IT and the business users in the following scenarios:

  • Multi-Brand Retail Websites: Retailers sometimes have multiple websites under different brands that are supplied from the same product set.  Using auto-classification, retailers can automatically assign new products to these websites and significantly reduce the time it takes to onboard new products.
  • Electronic Catalogs with Standard Hierarchies: Manufacturers and distributors often need to produce electronic catalogs where products are categorized into many different standard hierarchies like UNSPSC, GPC, ECl@ss and ETIM.  Furthermore, each of the standard hierarchies may exist in multiple versions that vary slightly.  Using auto-classification, companies can automatically maintain links into the various hierarchies.

Frequently, auto-classification features separately govern rule-set objects that allow a number of different rule sets to be defined and edited.  Rule sets can contain a large amount of rules that belong to different categories.  Most notably, Allow Rules, which specify rules for products to consider, meaning that they limit the scope to specific product brands or specific types of products, and Link Rules, which specify rules for where to link products to by specifying a number of target classification nodes and conditions for which products should be linked into each node.

In each case, the rule is specified in terms of conditions on object type, attribute values or the hierarchy position of the product.  Auto-classification rules may also specify exceptional conditions.  

Emerging Capabilities for Automatically Applying Business Rule Actions

Another advancement in today's MDM architecture is the ability to automatically apply business rule actions at different points in time including:

  • At the time of import in order to quickly place a new product in the right classification.  The next option could possibly be to have someone approve the assignment.
  • In a workflow transition when the product has reached a status where enough information is available to properly evaluate rules.
  • At the time of approval to ensure that products are always classified when messages are sent out about a new product.

It is also possible to invoke a bulk update to automatically classify a group selection of products.  So before standardizing on a specific MDM solution, take the time to look under the hood to make sure the MDM platform is capable of keeping up with your business.  

After all, by understanding the advantages of today's technology, you may find that you are in fact the unsung hero.

Image courtesy of Shutterstock.

About the Author

Bjarne Hald is Chief Technology Officer at Stibo Systems, provider of a leading and award-winning multidomain Master Data Management (MDM) solution.  For more information, contact him at or visit