The Integrity of Your Data is of Utmost Importance

Mar 3, 2016

By Craig S. Mullins

There are two important, yet different aspects of integrity with respect to databases: database structure integrity and semantic data integrity.

Ensuring that each database object is created, formatted and maintained properly is the goal of database structure integrity. Each DBMS uses its own internal format and structure to support the databases, table spaces, tables, and indexes under its control. System and application errors at times can cause faults within these internal structures. The DBA must identify and correct such faults before insurmountable problems occur. Semantic data integrity refers to the meaning of data and relationships that need to be maintained between different types of data. The DBMS provides options, controls and procedures to define and assure the semantic integrity of the data stored within its databases.

If the structural integrity of the database is not sound, everything else will be suspect, too. There are multiple types of structural problems that can occur. Indexing problems are one. Database maintenance can cause such problems and DBAs need to be able to recognize the problem, and rebuild the indexes to correct their structural integrity. Whenever database pointers are used to maintain integrity your database structures will be susceptible to problems. For example, many DBMSs use pointers to store large text documents and multimedia data, which can become corrupted. DBMS products usually ship with commands and utility programs to identify and correct structural integrity issues. Another, though somewhat rare, database structural integrity problem is table data page header corruption. Each individual physical page (or block) in the database stores housekeeping information, known as the header. If the header becomes corrupted, the DBMS may not be able to interpret the data stored on the page. Such situations usually require the database file to be recovered from backup files. But problems can lurk in your backup files, too. If the backup file is not formatted correctly, or if data is in the wrong location in the backup file, it cannot be used by the DBMS for recovery purposes. Media failures, tape degradation, and bugs can cause such problems.

The more difficult and more pervasive problem, however, is semantic data integrity. Getting that right requires accurate metadata, proper design, processes that match your business requirements, good communication skills, and constant vigilance.

Perhaps the number one cause of data integrity problems is improperly designed databases. Just getting the data type and length correct for each column can go a long way to making sure the right data is stored. For example, if you need to store dates but the column is defined as CHAR(8) how can you enforce that only valid dates are stored? You need to code program logic to accomplish that. But if the column is defined as a DATE then the DBMS takes care of it -- and more of the data will be accurate.

Entity integrity is the most basic level of data integrity provided by relational databases. Entity integrity means that each occurrence of an entity must be uniquely identifiable. In other words, a primary key is required for each entity. In practice, few database systems enforce entity integrity, so tables can be created without specifying a primary key. But that is considered a bad practice because it makes identifying individual rows difficult.

A user-defined data type, or UDT can be used to customize the type of data that can be stored in the database. The DBA can create UDTs to further clarify the permissible values for a column. UDTs can be beneficial when you need to store data that is specifically tailored to your organization’s requirements. For example, organizations that handle multiple currencies can benefit by creating a separate data type for each currency. By doing so, errors are less likely to be made in monetary conversion calculations.

Sometimes it is necessary to restrict the values that can be stored in a column to a subset of the domain of values as defined by the data type. This can be done using a check constraint, which is a DBMS-defined restriction placed on the data values that can be stored in a column or columns of a table. When a check constraint is defined, it restricts the values that can be stored in the column to which it is applied. For more complex requirements, triggers can be used instead of check constraints. Triggers are event-driven specialized procedures that are attached to database tables. A trigger is a piece of code that is executed in response to a data modification statement; that is, an INSERT, UPDATE, or DELETE. Triggers can be thought of as an advanced form of rule or constraint that is written using an extended form of SQL. Once a trigger is created, it is always executed when its “firing” event occurs, making triggers automatic, implicit, and non-bypassable. All good things.

Enforcing uniqueness is another key consideration of sematic data integrity. This can be accomplished with a unique constraint, but you may have to create a unique index, too.

The DBA can also set up data relationships properly in the database. This is done using referential integrity (RI), a method for ensuring the "correctness" of data within a DBMS. People tend to over-simplify RI stating that it is merely the identification of relationships between relational tables. It is actually much more than this. The combination of the relationship and the rules attached to that relationship is referred to as a referential constraint. The rules that accompany the RI definition are just as important as the relationship. These rules define how data is to be properly added to the databases and what happens when it is removed. Basically, RI guarantees that an acceptable value is always in the foreign key column.

Summary

Every DBA should take advantage of the mechanisms provided by the DBMS to ensure data integrity. When DBMS-provided methods are used, fewer data integrity problems are likely to be found. Fewer data integrity problems mean higher quality databases and more proficient end users. You have to know what integrity rules are proper for the DBMS to enforce. But once defined, many of those rules can be enforced by the DBMS.

Newsletters

The Integrity of Your Data is of Utmost Importance

White Papers

Sponsors