Improving the performance of your SQL can be achieved using a number of techniques. You could re-word your SQL, create or change indexes, or use hints. Determining the combination of approaches which is likely to result in performance improvements requires an understanding of how Oracle processes SQL, how you can influence Oracle's processing, and how to make the most of indexes.
Optimization
Optimization is the process of choosing the most efficient way
to execute a SQL statement. This is an important step in processing
any Data Manipulation Language statement (SELECT, INSERT, UPDATE
OR DELETE). There may be many different ways for Oracle to execute
such a statement, for example, varying which tables or indexes
are accessed in which order. The procedure used to execute a statement
can greatly affect how quickly the statement executes. A part
of Oracle called optimizer chooses the way that it believes to
be most efficient. The Oracle optimizer has three primary modes
of operation: RULE, COST or CHOOSE for the optimizer_mode parameter
in your database's init.ora file. You can override the optimizer's
default operations at the query and session level.
Setting optimizer_mode to RULE invokes the rule-based optimizer (RBO). The basic approach of the rule-based optimizer is that for each table in the WHERE clause, every possible access path is considered and ranked. The access path with the lowest rank is selected. The remaining tables are then selected based on their rankings. The rule-based optimizer will almost always favor an access path involving an index to one involving a full table scan. This is because without table sizes, the index is the safer choice. Star queries are not recognized by the rule-based optimizer. Star queries are used to optimize a data warehousing design called star schema. The rule-based optimizer does not consider Bitmap indexes.
Setting optimizer_mode to COST invokes the cost-based optimizer (CBO). Cost-based optimizer incorporates many features of the rule-based optimizer, but has the advantage of being able to take into account statistical information relating to volume and distribution of data within tables and indexes. This optimizer can therefore distinguish between a two row table and a two million row table and may generate different execution plans for each. To use the cost-based optimizer, you should use the analyze command to analyze the tables involved. Create histograms for columns that have an uneven distribution and/or a small number of distinct values.
Setting optimizer_mode to CHOOSE invokes the cost-based optimizer if the tables have been analyzed and the rule-based optimizer if the tables have not been analyzed. Oracle warns against using CHOOSE.
Analyze Command
The ANALYZE command collects and stores table and index statistics
which are essential for efficient operation of the cost-based
optimizer. The purpose of the ANALYZE command is to perform one
of the following functions on an index, table, or cluster:
- To collect statistics about the object used by the optimizer
and store them in the data dictionary
- To delete statistics about the object from the data dictionary
- To validate the structure of the object
- To identify migrated and chained rows of the table or cluster
Developer Note: The Analyze Command can slow processing time if the statistics are not kept current or if all the objects have not been analyzed. The Analyze command will need to be run regularly to maintain current statistics.
Histograms
The histograms are created as part of the ANALYZE command. You
can choose to generate histograms for all columns in the table,
for all indexed columns or for selected columns only. Normally,
you would choose to create histograms only for columns where the
data was not evenly distributed (skewed). The histogram stores
information about the frequency of various column values which
Oracle can use to decide whether or not to use the index. For
example, a table with yes/no column that 90% of the rows are yes
and only 10% are no. The optimizer would use the index for queries
on no, but not on queries on yes.
Full Table Scan Versus Indexes
The two most commonly used methods to retrieve rows from a table
are full table scan or index lookup. With a full table scan, all
rows from the table are read and compared against the selection
criteria. A full table scan can be faster than an index lookup
if your SQL statement accesses a large percent of the table's
data. Although it is not possible to generalize across all types
of SQL statements, hardware and data distribution, a rule of thumb
can help a programmer decide whether to use a full table scan
or an index lookup. One such rule of thumb is when accessing more
than five percent of the table's data, full table scans may be
fater than indexes. Based on your experience, you may find that
percentage to be closer to 10% or even 25%.
Full Table Scan
The full table scan is the simplest way for Oracle to get your
data. In a full table scan, every row of data in the table is
read into memory. To perform a full table scan, Oracle reads all
blocks (the basic unit of data storage) allocated to the table,
starting with the first block and continuing until it reaches
the high water mark. The high water mark is the "highest"
block in the table that has ever held data. Think of it as the
high tide mark.
To improve performance of a full table scan, you can reduce the number of block reads required for the scan. If the table is smaller than it once was, you could lower the high water mark by rebuilding the table. Rebuilding the table establishes a new high water mark. Another way to improve a full table scan's performance is to squeeze more rows into each block by reducing PCTFREE and increasing PCTUSED.
Indexes
The B-tree is the default Oracle index type. The B-tree has a
hierarchical tree structure. A header block contains pointers
to the appropriate branch block for any given range of key values.
Branch blocks point to leaf blocks. The leaf block contains a
list of key values and ROWIDs (row identifiers).
Each row of a table has a ROWID. The ROWID is a code (a logical address) reflecting the physical location of the row. The fastest SELECT statements are those where the WHERE clause contains a condition based on the ROWID.
The B-tree index provides flexible and efficient query performance. However, maintaining the B-tree with changing data can be expensive. To insert a row in the appropriate leaf block requires an index split, if there is no free space within the leaf block. Index splits are expensive operations. New blocks must be allocated and index entries moved from one block to another. Index splits can be avoided by using artificial keys or reduced by increasing the amount of free space kept within the index. Use the PCTFREE clause of the CREATE INDEX statement to define free space. Also look at the REBUILD clause of the ALTER INDEX statement. This allows an index to be rebuilt using the index itself as the source of data. Indexes should be in a separate table_space.
It is important that all your indexes contribute to query performance, since these indexes will otherwise needlessly degrade Data Manipulation Language performance. Inserts and deletes will be significantly slower for tables with a large number of indexes. It may be possible (with the appropriate scheduling) to drop the indexes make a large number of inserts/deletes and then rebuild the indexes. Dropping indexes will dramatically slow performance for other user transactions so this technique may not be an option.
Bitmapped indexes were introduced in Oracle 7.3. Oracle creates a bitmap for each unique value of the column in question. Each bitmap contains a single bit (0 or 1) for every row in the table. A "1" indicates that the row has the value specified by the bitmap and a "0" indicates that it does not. Bitmaps suit columns with few distinct values and which are often queried together. Oracle is unable to lock a single bit, and consequently, locking for bitmapped indexes is at the block (or page) level.
Even if there is an appropriate index available, the optimizer
may not be able to take advantage of the access because of the
wording of the SQL statement. Indexes are disabled if a column
is subjected to any modification. Two examples of modification
are UPPER(column_name) or column_name +1. Some additional query
types that prevent indexes being used are:
!=
IS NULL
NOT IN
LIKE '%Smith' (with wildcard as the first letter)
If the WHERE clause uses the column of only one index, Oracle performs a range scan on it to retrieve the ROWIDs of the selected rows, and then accesses the table by these ROWIDs. If the WHERE clause uses columns of more that five indexes, Oracle performs an index merge. When performing an index merge, Oracle retrieves rows from each index matching the appropriate condition. Rows, which are common to all "lists", are returned.
A composite index is an index that is made up of more than one column with a maximum of 16 columns. Using composite indexes for queries can be faster than single-column indexes. The WHERE clause must use all the columns of a composite index in equality conditions combined with AND operators. If the composite index has three or more columns, the index may be used if the WHERE clause references the leading columns. If your query specifies values for only the nonleading columns of the index, then the index will not be used to resolve the query.
Hints
Hints are instructions that you can include in your SQL statement
to guide the optimizer. Using hints, you can specify join orders,
type of access paths, indexes to be used, the optimization goal
and other instructions. Optimizer hints appears as a comment following
the first word of the SQL statement. A hint is different from
a comment in that they are prefixed with a plus sign ("+").
Example follows:
SELECT /*+ RULE*/
Subqueries
A subquery is a SQL statement that occurs within another SQL statement.
A WHERE clause can contain a subquery SELECT. Subquery selects
can be nested. You can nest subqueries down more levels than you
would ever need. Multiple sources recommend you not go down more
than 16 levels.
Correlated Subqueries
A correlated subquery is one that the subquery refers to values
in the parent query. A correlated subquery can return the same
result as a join, but can be used where a join cannot, such as
in an UPDATE, INSERT and DELETE statement. In a correlated subquery,
the subquery executes repeatedly, once for each value of a candidate
row selected by the main query. This is why a correlated update
can take more processing time.
Join
The join operation allows the results from two or more tables
to be merged on some common column values. The inner join is the
most common type of join operation. Rows are pulled from the tables
based on some common ("key") value. Rows that have no
match to the other table are not included. The outer join allows
rows to be included even if they have no match in the other table.
In Oracle, the outer join operator is "(+)". An anti-join
is used to select all rows from a table that do not have a matching
row in the other result set. You could create an anti_join using
NOT EXISTS or NOT IN. In a self join, a table is joined to itself.
A common reason for a self join is a recursive relationship. The
self join is accomplished by using multiple aliases for the same
table.
Set Operation
SQL has operations that allow result sets to be concatenated,
subtracted or overlaid. These operators are UNION, MINUS, and
INTERSECT. UNION returns the sum of two result sets without duplicates.
MINUS returns all rows in the first result set which do not appear
in the second result set. INTERSECT returns only the rows that
appear in both result sets ignoring nulls. All set operations
require that the component queries return the same number of columns
and those columns need to be compatible datatypes. The join operations
described above do not have these columns and compatible datatype
limitations.
Bind Variables
Variables can be specified as literals or as bind variables. Bind
variables are fixed references to variables contained elsewhere
in the programming language or development tool. Bind variables
are recognizable because they are prefixed by a colon. If you
are going to re-execute your SQL, but use different parameters,
use bind variables to define parameters. This will minimize parsing.
Bitmap indexes do not work with bind variables.
Null
Null values are used to indicate that a data item is missing or
undefined. The use of null values in relational databases can
lead to unexpected results. NULL values extend the coding logic
of TRUE/FALSE to TRUE/FALSE/UNKNOWN. Null values impact the query
results. Oracle recognizes null values as the highest values in
a column, which means in an ORDER BY ascending clause, null values
are placed at the bottom. Arithmetic operations can not be performed
on fields that contain null values. The result of adding, subtracting,
multiplying or dividing a null value and any number is a null
value. A field query using "!=" will not return records
where the query field is null. When an indexed column is NULL,
that row will not have an entry in the index. In other words,
nulls are not indexed. When possible you may want to define columns
as NOT NULL, for fields that will be indexed.
Truncate Table versus Delete
The TRUNCATE TABLE command allows all rows to be removed from
a table with minimal overhead. TRUNCATE TABLE has no rollback
segment and no commit. Emptying a table using the DELETE command
results in higher overhead in rollback segments and redo log entries.
Do not issue TRUNCATE TABLE unless you are sure you want all the
table data gone.
Execution Plan
To execute a Data Manipulation Language statement, Oracle may
have to perform many steps. Each of these steps either physically
retrieves rows of data from the database or prepares them in some
way for the user issuing the statement. The combination of the
steps Oracle uses to execute a statement is called an execution
plan. Oracle provides tools that reveal the way in which your
SQL is processing and the resources expended. These tools are
the EXPLAIN PLAN, SQL_TRACE facility and the TKPROF utility.
Explain Plan
Explain Plan is used to determine the execution plan Oracle7 follows
to execute a specified SQL statement. This command inserts a row
describing each step of the execution plan into a "plan table".
If you are using cost-based optimization, this command also determines
the cost of the executing the statement. If you examine the plan,
you can see how the server executes the statement.
Example of Explain Plan follows:
SQL> EXPLAIN PLAN SET STATEMENT_ID = 'MyInitialsTodaysDate'
for
SELECT * FROM
The results of the explain plan go into the plan_table. To view the explain plan, use a variation of the following statement. The key to the statement below is the CONNECT BY and the STATEMENT_ID = 'MyInitialsTodaysDate'. The CONNECT BY gives you the format and the STATEMENT_ID isolates the data to your Explain Plan. This format is the key to interpreting the execution plan. The more indented the access path, the earlier it is executed. If two statements are indented at the same level, the uppermost statement is executed first.
COL OPERATION FORMAT A30
COL OPTIONS FORMAT A20
COL OBJECT FORMAT A20
SELECT LPAD(' ', 2*LEVEL) || OPERATION ||
DECODE (ID, 0, ' COST = ' || POSITION) "OPERATION",
OPTIONS "OPTIONS",
OBJECT_NAME "OBJECT"
FROM PLAN_TABLE WHERE STATEMENT_ID = 'MyInitialsTodaysDate'
CONNECT BY PRIOR ID = PARENT_ID
START WITH ID = 0
/
Because the procedure for running explain plan is awkward, many Oracle users write scripts to automate the process. In addition, there are a number of commercial and shareware products which can display explain plans and perform additional analysis of your SQL statement.
SQL Trace Facility
Oracle provides a facility for tracing SQL statement execution,
which can provide all of the information provided by EXPLAIN PLAN
together with details of CPU and I/O requirements and even the
number of rows processed by each step in the execution plan. The
ALTER SESSION SET SQL_TRACE TRUE statement allows SQL tracing
to be initiated. The TKPROF command allows the trace files generated
to be formatted in a meaningful way. The SQL trace facility generates
the following statistics for each statement:
- Parse, execute and fetch counts
- CPU and elapse times
- Physical reads and logical reads
- Number of rows processed
- Missed on the library cache
To enable the SQL trace facility first set the following parameters:
SQL> ALTER SESSION SET SQL_TRACE = TRUE;
SQL>ALTER SESSION SET TIMED_STATISTICS=TRUE;
This parameter enables the collection of timed statistics. The
default is false. Setting it to true will reduce performance very
slightly.
SQL> MAX_DUMP_FILE_SIZE = N
This parameter specifies the maximum size of trace files. The
default is 500.
SQL> USER_DUMP_FILE_SIZE = DIRECTORY_PATH
This parameter specifies the destination for the trace file. The
default value for this parameter is the default destination for
system dumps.
Developer Note: SQL Trace increases system overhead. You should enable it only when tuning your SQL statements, and disable it when you are finished.
Having enabled SQL_TRACE, your next challenge will be to find your trace file. Your trace file will be in the user_dump_file_size directory you specified. In UNIX and many other operating systems, the name will start with "ora' or "oracle_sid" and end with ".trc". In between will be the process identifier for the Oracle server process. The timestamp will help to identify your file. Once the trace file is found, TKPROF utility is used to create a usable file.
TKPROF TRACEFILE OUTPUTFILE [SORT=OPTION] PRINT=N]
[EXPLAIN=USERNAME/PASSWORD] [INSERT=FILENAME] [SYS=NO]
[RECORD=FILENAME] [TABLE=SCHEMA.TABLENAME]
TRACEFILE the name of the trace output file
OUTPUTFILE the name of the formatted file
SORT=OPTION the order in which to sort the statements
PRINT=N prints the first n statements
EXPLAIN=USER/PASSWORD runs explain plan in the specified username
INSERT=FILENAME generates INSERT statements
SYS=NO ignores recursive SQL statements run as user sys.
RECORED=FILENAME records statement found in the trace file
TABLE-SCHEMA.TABLENAME puts execution plan into specified table
(default is plan_table)
Autotrace
SQL_TRACE and TKPROF are powerful tools, but they are not always
easy to use. Each time you use SQL_TRACE you have to find the
trace file then format and interpret the TKPROF output. Starting
with SQL*Plus version 3.3 (Oracle 7.3) you have an alternative.
AUTOTRACE can generate execution plans and execution statistics
for SQL statements executed from SQL*Plus. The set autotrace on
command will cause each query, after being executed, to display
both its execution path and high-level trace information about
the processing involved to resolving the query. Unlike the explain
plan command, the statement is actually run, even if you choose
not to see the output. When you use set autotrace on, records
are inserted into plan_table to show the order of operations executed.
To enable AUTOTRACE set the following parameters:
SQL> SET AUTOTRACE ON
To skip seeing the output of the SQL statement use the following
command:
SQL>SET AUTOTRACE TRACE
Summary
Tuning SQL can improve the response time, throughput and scalability
of an application and can help avoid costly hardware upgrades.
SQL tuning is a very cost-effective way of improving system performance
and can be done at any stage of a system's life cycle. Ideally,
SQL should be tuned as it is written. It is hoped that the information
presented here will help you understand how Oracle processes SQL,
how you can influence Oracle's processing, and how you can make
the most of indexes.