In a world where data is among the most valuable assets a company can have, a focus on data integrity is critical. As new data management systems come into the enterprise, however, a variety of issues are emerging that must be addressed by data professionals, says Craig S. Mullins, who will present a talk titled, “Ensuring Data Integrity in Operational Databases Systems” at Data Summit 2017. Mullins is president and principal consultant of Mullins Consulting, an IBM Gold Consultant, and a DBTA columnist.
Data Summit 2017 takes place May 16-17 at the New York Hilton Midtown, with pre-conference workshops on May 15.
With the advent of NoSQL database systems that may not even support true ACID, data integrity issues can be compounded. DBAs must understand how their DBMS works, the locking and isolation options available, and how to appropriately code both the database access code and the application code to achieve proper data integrity. And, unfortunately, not every developer has the proper background in database systems and data integrity to code applications with a focus on the integrity of the data, according to Mullins.
“Data integrity is the single most important aspect of managing database systems that a DBA must deal with,” says Mullins. “The data is the raison d’etre of the database management system. That is, the only reason we have a DBMS is to safely house corporate data and to be able to serve that data correctly and safely to the business stakeholders so even though there are a myriad of tasks undertaken by DBAs, assuring the accuracy and integrity of the data is paramount.”
Often, says Mullins, DBAs wrongly assume that performance monitoring and tuning is their most important mission, but, while necessary, that is still not as critical as assuring data integrity.
“Problems that can arise if data integrity is given short shrift range from simple errors that are easy to spot to more subtle issues that result in risky business decisions,” says Mullins. “For example, by choosing the wrong isolation level for your database queries it is possible to read data that is never actually committed to the database. If this data is then used to populate other tables or even simply written to a report that is used by business analysts to make critical decisions, those decisions might well be wrong because the data is wrong. Depending on what was being decided this could be a minor issue or a major catastrophe.”
A common mistake Mullins says he sees all the time is specifying dirty read isolation – also known as uncommitted read – on all SELECT statements in a program. “This is commonly done to speed up performance because the DBMS can forgo locking for uncommitted reads – and obviously that will perform better than the same query with locking. However, the implications of reading uncommitted data can cause problems.” One problem is returning rows that are never committed to the database, for example, when work is rolled back before being committed, said Mullins noting that other types of problems can occur, as well. “A dirty read can cause duplicate rows to be returned where none exist. Alternatively, a dirty read can cause no rows to be returned when one (or more) actually exists. Obviously, these problems must be taken into consideration before using uncommitted reads.”
The best thing that a DBA can do is to achieve an understanding of the database system they are using – whether it is relational, NoSQL, or anything else – and to learn the options that are available in terms of assuring data integrity. “Learn about ACID support, locking, SQL options, DBMS parameters, program settings and coding options and how they work together to provide varying degrees of data protection, isolation, and accuracy, Mullins advises.
“And be sure to focus on the integrity and accuracy of your data as the number-one thing,” says Mullins. “Remember, if the data is wrong then there is no reason to even keep it. So do everything you can to ensure that it is correct!”
Mullins’ presentation will be given during a session at Data Summit 2017, titled “Database Management Today” on Tuesday, May 16, 2017, at 2 pm as part of the Moving to a Modern Data Architecture track.
Data Summit 2017 takes place May 16-17 at the New York Hilton Midtown, with pre-conference workshops on May 15. To register, go here.