The Danger of Dirty Reads

Aug 5, 2014

By Craig S. Mullins

If you build an application program that accesses a database, it is crucial to understand the potential for concurrency problems. When one program attempts to read data that is in the process of being changed by another, the database management system must forbid access until the modification is complete in order to ensure data integrity. Most DBMS products use a locking mechanism for all data items being changed. Therefore, when one task is updating data on a page, another task cannot access data (read or update) on that same page until the data modification is complete and committed.

Programs that read database data can access numerous rows and are therefore susceptible to concurrency problems. To get around this issue, most major RDBMS products support read-through locks, also known as “dirty read” or “uncommitted read,” to help overcome concurrency problems. When using uncommitted reads (UR), an application program can access data that has been changed, but is not yet committed. Dirty read capability is commonly implemented using an isolation level, but the exact naming and implementation differs by DBMS vendor.

A program using dirty reads will read data without taking locks. This enables the application program to read data contained in the table as it is being manipulated. And it generally improves performance and availability of data because no locking mechanism is invoked during the process. Consider the following sequence of events:

1. At 9:00 AM, a transaction is executed containing the following SQL to change a specific value.

UPDATE EMP

SET FIRST_NAME = 'MICHELLE'

WHERE EMPNO = 10020;

The transaction is a long-running one and continues to execute without issuing a COMMIT.

2. At 9:01 AM, a second transaction attempts to SELECT the data that was changed, but not committed.

If it is a dirty read, the transaction would select the changed data even though it had yet to be committed. Obviously, if the program need not wait to take a lock and merely reads the data in whatever state it happens to be at that moment, the program will execute faster than if it had to wait for locks to be taken and resources to be freed before processing.

However, the implications of reading uncommitted data must be carefully examined before being implemented. Several types of “problems” can occur. A dirty read can cause duplicate rows to be returned where none exist. Alternatively, a dirty read can cause no rows to be returned when one (or more) actually exists. In some cases, dirty reads can return data that was never in the database at all (e.g., rolled back before committed). Obviously, these “problems” must be taken into consideration before using the UR isolation level.

Benefits and Drawbacks

So when might you want to consider using dirty reads in your applications? The general rule of thumb is to avoid dirty reads whenever the results must be 100% accurate. For example, avoid UR if calculations must balance, data is being retrieved from one source to modify another, or for any production, mission-critical work that cannot tolerate data integrity problems. Succinctly stated, it is my opinion that most production, mission-critical applications are not candidates for dirty reads. Yet I see UR used in production programs more frequently than is probably wise.

There are a few specific situations in which the dirty read capability may make sense. Consider the following cases:

Access is required to a reference, code, or look-up table that is basically static in nature. Due to the non-volatile nature of the data, a dirty read would be no different than a normal read the majority of the time. In those cases when the code data is being modified, any application reading the data would incur minimal, if any, problems.
Statistical processing must be performed on a large amount of data. For example, you may wish to determine the average age of female employees within a certain pay range. The impact of an uncommitted read on an average of multiple rows may be minimal because a single value changed may not greatly impact the result.
Dirty read can prove invaluable in a data warehousing environment. A data warehouse is used for online analytical processing and, other than periodic data propagation and/or replication, access is read-only. An uncommitted read is perfect in a read-only environment since it can cause little damage because the data is generally not changing.
In those rare cases when a table, or set of tables, is used by a single user only, UR can make a lot of sense. If only one individual can be modifying and accessing the data, locking is only adding overhead.
Finally, if the data being accessed is already inconsistent, little harm can be done using a dirty read to access the information.

The dirty read capability can provide relief to concurrency problems and deliver faster performance in very specific situations. Be certain to understand the implications of the UR isolation level and the “problems” it can cause before diving headlong into implementing it in your production applications.

Newsletters

The Danger of Dirty Reads

Benefits and Drawbacks

White Papers

Sponsors