Ensuring data quality is an important aspect of data management and these days. DBAs are increasingly being called upon to deal with the quality of the data in their database systems more than ever before. The importance of quality data cannot be overstated.
Poor data quality costs the typical company between 10% and 20% of their revenue. According to software marketing and technology expert Hollis Tibbetts “Incorrect, inconsistent, fraudulent and redundant data cost the U.S. economy over $3 trillion a year.”
The cost of poor data quality notwithstanding, high quality data is crucial for complying with regulations. Think about it. If the data is not accurate, then how can you be sure that the proper controls are being applied to the right pieces of data to comply with the appropriate regulations?
Good data quality starts with metadata. Metadata characterizes data, providing documentation such that data can be understood and more readily consumed by your organization. Metadata answers the who, what, when, where, why, and how questions for users of the data. Suffice it to say, accurate data definitions are required in order to apply the controls for compliance to the correct data.
Metadata is required to place the data into proper categories for determining the controls that are needed to assure quality and to determine which regulations apply to the data. For example, SOX applies to financial data, HIPAA applies to healthcare data, and so on. Some data will apply to multiple regulations and some data will not be regulated at all. Without proper metadata definitions, it is impossible to apply regulatory compliance to your data.
After the data has been accurately defined, it is important to put in place procedures to assure the accuracy of the data. Imposing controls on the wrong data does no good at all. Which raises the question: How good is your data quality? Estimates show that, on average, data quality is an overarching industry problem. According to data quality expert Thomas C. Redman, payroll record changes have a 1% error rate; billing records have a 2% to 7% error rate, and; the error rate for credit records: as high as 30%.
But what can a DBA do about poor quality data? Data quality is a business responsibility, but the DBA can help by instating technology controls. By building constraints into the database, overall data quality can be improved. This include defining Referential Integrity into the database. Additional constraints should be defined in the database as appropriate to control uniqueness, as well as data value ranges using check constraints and triggers.
Another technology tactic that can be deployed to improve data quality is data profiling. Data profiling is the process of examining the existing data in the database and collecting statistics and other information about that data. With data profiling, you can discover the quality, characteristics and potential problems of information. Using the statistics collected by the data profiling solution business analysts can undertake projects to clean up problematic data in the database.
Data profiling can dramatically reduce the time and resources required to find problematic data. Furthermore, it allows business analysts and data stewards to have more control on the maintenance and management of enterprise data.
Data governance programs are essential for organizations as they work to comply with the ever-growing mountain of industry and governmental regulations. A data governance program oversees the management of the availability, usability, integrity, and security of enterprise data. A sound data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures.
So an organization with a strong data governance practice will have better control over its information. When data management is instituted as an officially sanctioned mandate of an organization data is treated as an asset. That means data elements are defined in business terms, data stewards are assigned, data is modeled and analyzed, metadata is defined, captured and managed, and data is archived for long-term data retention.
The Good News
All of this should be good news to data professionals who have wanted to better define and use data within their organizations. In other words, regulations are finally catching up with what we knew our companies should have been doing all along.