Newsletters




The Referential Integrity Workaround


It often seems that working around things is a full-time task in every area of information technology. When workarounds are conceived and deployed, people are not always in agreement. Beyond disagreements over tactics, there are often disagreements over the longer-term impact of short-circuiting things. Some believe that overall quality starts to crumble and never recovers; others disagree. Those who disagree probably also don’t see a problem with an ever-increasing pile of technical debt that rarely seems to rise to the level of being important enough to address. These items often languish and are never refactored. It seems that once a workaround is used, magically the workaround is considered as substantial and solid, as if the work were done properly and well, a bit like those infomercials chiming, “Set it and forget it.”

Consider loading a database table that has many rows. Long ago, many database management system vendors had performance issues when loading extremely large datasets. Datasets started having load times that were unacceptable. In order to address degraded load performance, organizations started dropping referential integrity constraints before loads, maybe dropping indexes or primary keys too. Those objects would be rebuilt once the load completed. These kinds of tactics were workarounds that were put in place to deal with deficiencies. And lo and behold, these kinds of workarounds have now largely been labelled as “best practices” by DBMS vendors. Today, one is likely to find many relational database implementations that simply do not even take the time to define referential integrity constraints. People are considering referential integrity as not worth the cost of coping with the potential problems. Referential integrity has not disappeared as something of value; instead, it has remained where it always was before databases, i.e., when needed, it is done inside application code.

Aren’t we clever enough to find a way to make referential integrity work inside the DBMS and not become a performance bottleneck? Computerized algorithms for the Simplex method were hard before we developed clever ways to resolve those equations; so something such as referential integrity should be far easier. Made it to the moon? Check. Found a way to grow diamonds? Check. Found a way to enforce referential integrity without making data loads crawl? Nope. Seemingly the real shortcoming has been that once the workaround was socialized, vendors just shrugged and moved on. It became one of those technical debt items hidden in the back closet to be forgotten, and just as there is nothing as permanent as a temporary tax, there is nothing as permanent as a workaround.

Some vendors hide behind the idea that customers aren’t clamoring for a change, and therefore they need to use their limited resources in other areas. The DBMS vendors don’t see that the same issue applies to customers, as in the customers have limited resources, so why would those customers waste time adding to lists of gripes for the vendor, when the vendor doesn’t really respond to those gripes? Doing things the right way should not require a special request. Obviously, vendors can only focus on shiny new buzzwords instead of actually surprising and delighting customers by making workarounds go away and doing things more straightforwardly. Customers generally accept what they have, and assume the limitations are immutable because, if they weren’t immutable, the vendors would have addressed the issues, right? This ultimately results in a vicious circle that leaves everything in a sad state. Some vendors have customized ways to quiesce referential integrity checking, having a catch-up-later function, but it seems no one has figured out how to simply provide a load that can deal with massive datasets and referential integrity. The big data world offers hope of motivating vendors into renewed vigor, but it may also set them off into other more confusing directions. No wonder the sneakier bits of relational theory will never be addressed by these industry leaders.

Todd Schraml has more than 20 years of IT management, project development, business analysis, and database design experience across many industries from telecommunications to healthcare. He can be reached at TWSchraml@gmail.com.


Sponsors