A Perfect Match

Bookmark and Share

Integrating data from multiple sources offers constant challenges. The necessity to cross-map values, from completely different identifier sets meant to represent the same population, offers the greatest challenge. For example, consider the set of whole numbers. English-speakers refer to objects within such a set using “one, two, three..,” as identifiers of the concepts, while Spanish-speakers use, “uno, dos, tres…” Therefore, English-speakers and Spanish-speakers have differing identifiers for the same population of “things.” In building systems that integrate data, these object identifiers may represent customers, products, locations or almost anything under the sun. While “a rose by any other name smells just as sweet,” hopefully a customer does not get identified by smell at all. And if different systems use completely different identifiers, then the process of matching two or more identifiers clearly and unambiguously, as symbolizing exactly the same customer (or location or product or whatever), may blossom into a career path all by itself. Unlike the whole-number example mentioned above, it likely will be the exception to the rule that objects map one-for-one across systems. One system may define locations at a floor-number level, another may go down to a suite or room number, while another may not look at any lower level of detail than a building; yet another system may use developer’s lot numbers instead of street addresses. One-to-many, even many-to-many, often serves as the working relationship in cross-referencing identifiers for “common” things throughout an enterprise.

Common patterns of relations are often used to handle such cross-mapping. A “direct method” would define a table containing the key combinations from each source. Every row within this direct association table constitutes a mapping between source values. Each source’s keys may have a separate table as well, which contains information from each of these separate sources. An approach using an alternative “indirect method” would generate “yet another” new key inside the integration system. Generating new values for these keys must flow from the agreed-upon business rules for any item considered a new object, whereas rule-based matches associate with the same key values; and this indirect approach requires care as these business rules may be hard to work out, or may require several iterations via a proof-of-concept before they become clear. Each source system key requires that it maps into the integration system’s new key. And let’s not forget about time. Any or all of the components utilized in each of these patterns may have start and stop dates incorporated, so that one can track associations over time as necessary. Additionally, keep in mind there may be two classes of change that may arise: “real” change, as in two companies merging into one; and a second type, correcting previously mismarked or otherwise wrongly associated data. Variations on these central themes abound. For any given situation, the nature of the source identifiers, the intended use of the integrated solution being built, and the amount of manual and automated efforts invested in maintaining the integration will lead the details and variants under consideration.

Finding the perfect match presumes doing more than simply filling out a list of interests at a computerized dating website. In fact, there may be times when a match of object identifiers is not perfect, yet that match proves necessary and functions nonetheless. A data modeler may choose from several implementation options for combining multiple keys sourced by multiple applications. And while no single choice remains absolutely right when executed under all circumstances, multiple alternatives may prove appropriate for a given circumstance. The selected choice drives the realized data structures, and that choice should also support the desired process flow maintaining the linkage between values. The data architect works to find the balance of source data content, processing limitations, and business usage requirements. In order for the designer to arrive at a solution providing significant value to the organization, this balance of needs delivers the “perfect match” that counts.