
SQL*Loader is an Oracle utility that allows you to load ASCII files into database tables. SQL*Loader reads from an external file and loads data into an existing table while the Oracle database is open. SQL*Loader requires two types of input: the data to be loaded, and a control file. The control file is a script that tells SQL*Loader how to read the input data, what tables and columns to assign the data, and how to insert the data. The data can be placed inside the control file itself or in a separate file.
With SQL*Loader, you can specify multiple datafiles in the control file and load records of different formats. You also can use SQL functions to modify data before it is inserted. This includes many of the basic SQL conversion functions (for example, text characters to dates). You can use multiple physical records to create one or more logical record. You can put conditions on the incoming data to select which records will be inserted. (This is called filtering the data.)
Control File
SQL*Loader is controlled by its own data definition language,
which is kept in the control file. You have to build control files.
The control file describes the data to be loaded, the destination
tables of the data and describes the interdependency between the
data and the columns within the tables. The control file will
control the SQL*Loader run, telling it where to find the input
files and where to place the log file and the bad file. The control
file contains required and optional keywords. Data conversions
are specified as functions in the control file. Data can be filtered
or selectively loaded using the WHEN clause. Using the DEFAULTIF
clause, you can enable specification of default values. The control
file isn't case sensitive except in quoted strings. The convention
is to put reserved words in uppercase. To place comments in a
control file, precede them with two dashes (--). Comments are
not allowed in the data to be loaded.
The control file will specify the load type with one of the following:
INSERT requires the table to not have any data; if data is found, an error is returned.
APPEND adds rows to a table that already has data.
REPLACE removes the table's current data and then loads new data. This option uses the SQL delete command.
TRUNCATE removes the data in the table and then loads the new data. This option requires you to disable the table's referential integrity constraints.
SQL*Loader Datafile
Every line in the datafile is considered a physical record by
SQL*Loader. Physical records can be combined into one logical
record for loading (CONCATENATE, CONTINUEIF clauses). Physical
records can also be broken down into numerous logical records
for loading. Data can be sourced in various formats, such as FIXED,
VARIABLE, and DELIMITED. Fixed format is similar to the old flat
files. Fixed format requires that a column of data be the same
size for all loaded records and that the columns be delimited
by position and length. The variable format for a record requires
that columns to be delimited in two ways: terminated and enclosed.
Termination delimiters are characters such as a comma or period
(as in Monday, Tuesday). An enclosed delimited field has a special
character in front of the field and after the field (as in "Monday").
The control file for delimited format is easier to write. Although,
I have always found it easier to get data from mainframes in fixed
format.
SQL*Loader Log File
Detailed information about the load is stored in the log file.
Any errors found during parsing of the control file are stored
in the log file. The log file also identifies the number of records
successfully loaded. The log file must be available during the
entire run of the SQL*Loader. When loading data with SQL*Loader,
nothing should be assumed without reviewing the log files.
SQL*Loader Bad File
The bad file contains records that weren't loaded into the target
table. These records could have been rejected by SQL*Loader due
to an invalid format. They also could have been rejected by the
Oracle database if they violate an integrity constraint or had
an invalid data type for the target table.
SQL*Loader Discard File
The discard file is created only if specified by the control file.
It will contain records that didn't meet the criteria of the WHEN
clauses in the control file.
Two Methods of Loading Data
SQL*Loader can be run in conventional mode or direct mode. During
a conventional load of the database, the SQL*Loader acts as a
normal background process, using all the Oracle database resources.
The direct load option is highly optimized and can be significantly
faster than the conventional method. It does not use SQL statements
to place rows into tables, but has certain restrictions that might
preclude it as an alternative for normal day-to-day processing.
Conventional Load
The conventional load uses an INSERT statement with a bind-array
buffer for processing. A SQL INSERT statement is created from
the field specification of the control file. SQL*Loader then stores
the data in the database by finding the field in the datafile,
mapping the data, and passing it on to the Oracle server. The
database will accept or reject the record. The conventional load
competes with other Oracle processes and allows certain other
transactions against the table being loaded. Use this method if
the table needs to be available for other processing or when a
load is done over SQL*Net. A conventional load is great whenever
a small number of rows need to be inserted. You can use this method
when applying SQL functions to data fields. The conventional load
competes with all other Oracle7 processes for database resources.
It is likely to be slower than the direct path.
Direct Load
The direct mode works faster than the conventional load. It is
faster because the data is loaded by formatting the data into
an Oracle data block configuration and then loading it directly
into the database file. This bypasses the SGA buffer cache, although
a few calls are made to the Oracle database at the beginning and
end of the direct load.
You can make the direct load even faster by using the unrecoverable option. This option will not log redo entries into the redo log. The only problem with this option is that the transaction is not recorded in the redo log, so you have to repeat the load process if the database must be recovered.
Referential integrity isn't checked during the data load, but after the load is complete. This means that the referential integrity constraints are disabled during the load and re-enabled after the successful completion of the load. Use the REENABLE clause of the INTO TABLE phrase in the control file to reenable constraints automatically. To determine if the load was successfully and the constraints have been enabled, review the log file for any error results. To manually reenable the constraints use the SQL ENABLE command. Database insert triggers are also disabled with a direct path load. Disabled triggers are reenabled after rows have been loaded and indexes built. The constraints that remain enabled are not null, unique, and primary keys. Default data values specified at the column level aren't used during a direct load. You can use a direct load when a table can be locked for the duration of a load. That means no transactions can occur against the tables during the load. This might preclude the use of the direct mode for normal day-to-day processing.
Specify a direct load with the parameter DIRECT=TRUE on the command line. Example below:
$ sqlldr USERID=scott/tiger CONTROL = test.ctl LOG = test.log DIRECT = TRUE
Advice in Transforming Data
You may have to change the format and content of the data to make
it ready for Oracle tables. You have two options for transforming
the data. The first is to use SQL*Loader's more advanced features,
such as filters and functions. The filters allow you to eliminate
unnecessary data while the tables are being loaded. Features allow
you to convert character text to dates. However, you could simply
bring the data into Oracle in whatever format the source system
puts out and then use the full power of SQL and PL/SQL scripts
to filter, convert, and massage the data. You would load the data
into temporary tables (with no constraints), then write your conversion
scripts to go against the temporary table. Once the data is transformed,
it can be moved from the temporary tables to the permanent storage
tables.
I like the simple load process. A simplified load process can make for a more complete and easier load. Temporary tables with no constraints mean records that might fail because they violate an integrity constraint are loaded. You then have an opportunity to write scripts that will resolve the integrity constraint problem. You may also find that the business rules for the data conversion do not exist until you and your team work them though. Pulling the data into temporary tables and working with it helps you develop those rules. If those business rules are changing or evolving, you can minimize rework by writing flexible conversion scripts. You may also be required to keep all rejected records and the reason for their failure in a separate table. If you filter the data, you will not be able to include those records in the rejected table.