For data architects, it is not unusual to use a data modeling tool to reverse-engineer existing solutions’ databases. The reverse-engineering could occur for a functional reason, such as gathering information to evaluate a replacement option, or to comprehend a solution, seeking to work out what data should be extracted for a downstream business intelligence need.
When the solution being reverse-engineered is a third-party vendor-packaged solution, then the data architect evaluating the raw details from the reverse-engineering tool needs to tread cautiously through the evidence. Any automatic assumptions based on the structure alone are not sufficient. The data content needs to be queried too. Every primary key should be evaluated, especially composite keys, and every likely foreign key should be explored as well.
The evaluator may discover that things are not as they seem. Regardless, problems may result by assuming that the vendor’s data model represents the purchasing organization’s data model. The vendor’s model expresses how that vendor expected their product to be used. The organizations utilizing that package very well may have set up their organizational usage so that the original vendor’s intentions have been circumvented.
As an example, many packaged software solutions assume a hierarchy of data ownership within their content. This owner object may be called a Company Code, or Org ID, or Entity ID, or anything else that conveys a similar idea. The software expectation is that the product customer will divide up their customer data across business units, divisions, sub-organizations, or whatever.
As these solutions are put in place, some institutions will use the Company Code, others may not. Those not using the data item may populate it with a single default value. Or in a subtler alteration, many objects are defined with the Company Code and then an Object ID used in combination to define a composite Primary Key.
Object IDs were expected to be unique only within Company Code values; but if the package user establishes the Object ID value as a unique identifier across the enterprise, then the Company Code is no longer functioning as part of any primary key. For organizations that do not actually use these items in a meaningful way, the data model for their organization now differs from the data model originating from the vendor. There are many variations on this kind of circumstance that cause changes to the identities and interrelationships within the initial vendor-defined database structures.
Changes, as defined above, are both real and substantial. Presumably, previous implementers were doing the best they could. Maybe long-term needs weren’t completely understood. Perhaps the reasons the package was designed as it was went unappreciated, or for the implementor those reasons truly were not applicable. Regardless, changes happened. Far too often, organizations do not document, publish, and distribute such detailed information. Downstream data users simply hunt and peck to find what they need. Therefore, it becomes likely that many of their queries are based on exactly how the data works. The great danger is that if a day comes when the organization wishes to leverage the functionality of what was previously worked around, such a change could break a great many downstream reports and extracts.
Just because the “application supports it” doesn’t mean the downstream users ever were notified to include such expectations inside the code they created. The business has a governance responsibility to manage its data knowledge. And if the decision is made to alter rules and relationships, the business should publish its uniquely variant models to share that knowledge with users. Unconscious implementation options obviously are undocumented, but organizations doing so should be ashamed of not knowing and communicating what they are doing. The truth needs to be out there to prevent problems and unexpected rework when changes occur.
Todd Schraml has more than 20 years of IT management, project development, business analysis, and database design experience across many industries from telecommunications to healthcare. He can be reached at TWSchraml@gmail.com.