SQL*Loader

White Paper by Marty Weber
mweber11@earthlink.net

SQL*Loader is an Oracle utility that allows you to load ASCII files into database tables. SQL*Loader reads from an external file and loads data into an existing table while the Oracle database is open. SQL*Loader requires two types of input: the data to be loaded, and a control file. The control file is a script that tells SQL*Loader how to read the input data, what tables and columns to assign the data, and how to insert the data. The data can be placed inside the control file itself or in a separate file.

With SQL*Loader, you can specify multiple datafiles in the control file and load records of different formats. You also can use SQL functions to modify data before it is inserted. This includes many of the basic SQL conversion functions (for example, text characters to dates). You can use multiple physical records to create one or more logical record. You can put conditions on the incoming data to select which records will be inserted. (This is called filtering the data.)

Control File
SQL*Loader is controlled by its own data definition language, which is kept in the control file. You have to build control files. The control file describes the data to be loaded, the destination tables of the data and describes the interdependency between the data and the columns within the tables. The control file will control the SQL*Loader run, telling it where to find the input files and where to place the log file and the bad file. The control file contains required and optional keywords. Data conversions are specified as functions in the control file. Data can be filtered or selectively loaded using the WHEN clause. Using the DEFAULTIF clause, you can enable specification of default values. The control file isn't case sensitive except in quoted strings. The convention is to put reserved words in uppercase. To place comments in a control file, precede them with two dashes (--). Comments are not allowed in the data to be loaded.

The control file will specify the load type with one of the following:

INSERT requires the table to not have any data; if data is found, an error is returned.

APPEND adds rows to a table that already has data.

REPLACE removes the table's current data and then loads new data. This option uses the SQL delete command.

TRUNCATE removes the data in the table and then loads the new data. This option requires you to disable the table's referential integrity constraints.

SQL*Loader Datafile
Every line in the datafile is considered a physical record by SQL*Loader. Physical records can be combined into one logical record for loading (CONCATENATE, CONTINUEIF clauses). Physical records can also be broken down into numerous logical records for loading. Data can be sourced in various formats, such as FIXED, VARIABLE, and DELIMITED. Fixed format is similar to the old flat files. Fixed format requires that a column of data be the same size for all loaded records and that the columns be delimited by position and length. The variable format for a record requires that columns to be delimited in two ways: terminated and enclosed. Termination delimiters are characters such as a comma or period (as in Monday, Tuesday). An enclosed delimited field has a special character in front of the field and after the field (as in "Monday"). The control file for delimited format is easier to write. Although, I have always found it easier to get data from mainframes in fixed format.

SQL*Loader Log File
Detailed information about the load is stored in the log file. Any errors found during parsing of the control file are stored in the log file. The log file also identifies the number of records successfully loaded. The log file must be available during the entire run of the SQL*Loader. When loading data with SQL*Loader, nothing should be assumed without reviewing the log files.

SQL*Loader Bad File
The bad file contains records that weren't loaded into the target table. These records could have been rejected by SQL*Loader due to an invalid format. They also could have been rejected by the Oracle database if they violate an integrity constraint or had an invalid data type for the target table.

SQL*Loader Discard File
The discard file is created only if specified by the control file. It will contain records that didn't meet the criteria of the WHEN clauses in the control file.

Two Methods of Loading Data
SQL*Loader can be run in conventional mode or direct mode. During a conventional load of the database, the SQL*Loader acts as a normal background process, using all the Oracle database resources. The direct load option is highly optimized and can be significantly faster than the conventional method. It does not use SQL statements to place rows into tables, but has certain restrictions that might preclude it as an alternative for normal day-to-day processing.

Conventional Load
The conventional load uses an INSERT statement with a bind-array buffer for processing. A SQL INSERT statement is created from the field specification of the control file. SQL*Loader then stores the data in the database by finding the field in the datafile, mapping the data, and passing it on to the Oracle server. The database will accept or reject the record. The conventional load competes with other Oracle processes and allows certain other transactions against the table being loaded. Use this method if the table needs to be available for other processing or when a load is done over SQL*Net. A conventional load is great whenever a small number of rows need to be inserted. You can use this method when applying SQL functions to data fields. The conventional load competes with all other Oracle7 processes for database resources. It is likely to be slower than the direct path.

Direct Load
The direct mode works faster than the conventional load. It is faster because the data is loaded by formatting the data into an Oracle data block configuration and then loading it directly into the database file. This bypasses the SGA buffer cache, although a few calls are made to the Oracle database at the beginning and end of the direct load.

You can make the direct load even faster by using the unrecoverable option. This option will not log redo entries into the redo log. The only problem with this option is that the transaction is not recorded in the redo log, so you have to repeat the load process if the database must be recovered.

Referential integrity isn't checked during the data load, but after the load is complete. This means that the referential integrity constraints are disabled during the load and re-enabled after the successful completion of the load. Use the REENABLE clause of the INTO TABLE phrase in the control file to reenable constraints automatically. To determine if the load was successfully and the constraints have been enabled, review the log file for any error results. To manually reenable the constraints use the SQL ENABLE command. Database insert triggers are also disabled with a direct path load. Disabled triggers are reenabled after rows have been loaded and indexes built. The constraints that remain enabled are not null, unique, and primary keys. Default data values specified at the column level aren't used during a direct load. You can use a direct load when a table can be locked for the duration of a load. That means no transactions can occur against the tables during the load. This might preclude the use of the direct mode for normal day-to-day processing.

Specify a direct load with the parameter DIRECT=TRUE on the command line. Example below:

$ sqlldr USERID=scott/tiger CONTROL = test.ctl LOG = test.log DIRECT = TRUE

Advice in Transforming Data
You may have to change the format and content of the data to make it ready for Oracle tables. You have two options for transforming the data. The first is to use SQL*Loader's more advanced features, such as filters and functions. The filters allow you to eliminate unnecessary data while the tables are being loaded. Features allow you to convert character text to dates. However, you could simply bring the data into Oracle in whatever format the source system puts out and then use the full power of SQL and PL/SQL scripts to filter, convert, and massage the data. You would load the data into temporary tables (with no constraints), then write your conversion scripts to go against the temporary table. Once the data is transformed, it can be moved from the temporary tables to the permanent storage tables.

I like the simple load process. A simplified load process can make for a more complete and easier load. Temporary tables with no constraints mean records that might fail because they violate an integrity constraint are loaded. You then have an opportunity to write scripts that will resolve the integrity constraint problem. You may also find that the business rules for the data conversion do not exist until you and your team work them though. Pulling the data into temporary tables and working with it helps you develop those rules. If those business rules are changing or evolving, you can minimize rework by writing flexible conversion scripts. You may also be required to keep all rejected records and the reason for their failure in a separate table. If you filter the data, you will not be able to include those records in the rejected table.

Copyright 2001