Data on the Move: Techniques and Tips to Maintain Control

Page 1 of 2 next >>

Data is not sedentary. Once data has been created, organizations tend to move it around to support many different purposes—different applications, different geographies, different users, different computing environments, and different DBMSs.

Rarely is a single copy of any piece of data good enough. Data is copied and transformed and cleansed and duplicated and stored many times throughout the organization. Different copies of the same data are used to support transaction processing and analysis; test, quality assurance, and operational systems; day-to-day operations and reporting; data warehouses, data marts, and data mining; and distributed databases. 

Controlling this vast sea of data falls on the DBA who uses many techniques and technologies to facilitate data movement and distribution. 

LOAD and UNLOAD Utilities

One of the simplest ways for the DBA to move data from one place to another is to use the LOAD and UNLOAD utilities that come with the DBMS. The LOAD utility is used to populate tables with new data, and the UNLOAD utility is used to read data from a table and put it into a data file. A LOAD utility is used to perform bulk inserts of data into database tables. It typically can support

  • Adding rows to a table, retaining the current data, or
  • Replacing all existing rows with the new data

It is important to note that each DBMS may call the actual LOAD utilities by different names, but the functionality is the same or similar from product to product. For example, Microsoft SQL Server and Sybase provide the BCP utility, and Oracle offers SQL*Loader.

Factors to Ensure Success 

When loading data, the DBA must take into account many factors to ensure success. Is the LOAD utility restartable in the event it fails? Although it takes more time to implement a restartable load process, it is easier to support. For the load process to be restartable, the DBA must ensure that any work files used are allocated and that the LOAD utility can restart from where it left off.

It is usually a good idea to create all required indexes before loading data into a table. The LOAD utility is usually more efficient at populating the indexes during the load process than creating new indexes for a fully populated table. Of course, the DBA should verify this to be the case for the DBMS and version in use.

If the LOAD utility is capable of performing tasks in parallel, the DBA should take advantage of this when large amounts of data are being loaded. The LOAD utility might be capable of accepting multiple input files for concurrent loading into different segments or table partitions, or it might be able to build multiple indexes in parallel rather than building each sequentially. Another performance-related aspect of loading data is the need to plan for concurrent processing if you are loading multiple tables. Judicious scheduling of concurrent workloads can enable some LOAD utilities to operate concurrently.

If the LOAD utility provides an option to turn off logging, consider using it to speed up the load process and minimize overhead. However, the DBA will have to back up the data after the LOAD completes to ensure recoverability. If logging is turned off, the DBMS cannot recover the table without an image copy backup.

Database information frequently needs to be moved or copied to other locations. For example, you may want to move data to a different database, from a table to a sequential file for external processing, or possibly to another relational database system or platform. Certain database schema changes require database objects to be dropped and recreated—and when the objects are dropped, so is the data. Therefore, you need to unload the data before making database object changes. Perhaps you just want to extract a subset of rows from a table for use as test data. Even to reorganize a database object typically requires the data to be unloaded, optimized, and then reloaded.

Modern UNLOAD Utilities

The purpose of the UNLOAD utility is to read data from a database and write it to an output data file. Without an UNLOAD utility, database users are forced to use SQL SELECT statements issued by an interactive SQL facility, report writer, or application program in order to unload data. However, these methods are error-prone and slow for large quantities of data.

Many modern UNLOAD utilities are capable of unloading data from an image copy backup. Such a capability is useful because it enhances concurrent data access. Unloading from an image copy backup can be beneficial because the live data is unaffected—meaning no locks are taken on the live data nor is any data read from the actual table on disk. Because the UNLOAD utility reads the data from an image copy backup, the performance and availability of applications running against the live data will be unaffected by the concurrent unload operation.

Page 1 of 2 next >>