SQL Tuning for Oracle

White Paper by Marty Weber
mweber11@earthlink.net

Improving the performance of your SQL can be achieved using a number of techniques. You could re-word your SQL, create or change indexes, or use hints. Determining the combination of approaches which is likely to result in performance improvements requires an understanding of how Oracle processes SQL, how you can influence Oracle's processing, and how to make the most of indexes.

Optimization
Optimization is the process of choosing the most efficient way to execute a SQL statement. This is an important step in processing any Data Manipulation Language statement (SELECT, INSERT, UPDATE OR DELETE). There may be many different ways for Oracle to execute such a statement, for example, varying which tables or indexes are accessed in which order. The procedure used to execute a statement can greatly affect how quickly the statement executes. A part of Oracle called optimizer chooses the way that it believes to be most efficient. The Oracle optimizer has three primary modes of operation: RULE, COST or CHOOSE for the optimizer_mode parameter in your database's init.ora file. You can override the optimizer's default operations at the query and session level.

Setting optimizer_mode to RULE invokes the rule-based optimizer (RBO). The basic approach of the rule-based optimizer is that for each table in the WHERE clause, every possible access path is considered and ranked. The access path with the lowest rank is selected. The remaining tables are then selected based on their rankings. The rule-based optimizer will almost always favor an access path involving an index to one involving a full table scan. This is because without table sizes, the index is the safer choice. Star queries are not recognized by the rule-based optimizer. Star queries are used to optimize a data warehousing design called star schema. The rule-based optimizer does not consider Bitmap indexes.

Setting optimizer_mode to COST invokes the cost-based optimizer (CBO). Cost-based optimizer incorporates many features of the rule-based optimizer, but has the advantage of being able to take into account statistical information relating to volume and distribution of data within tables and indexes. This optimizer can therefore distinguish between a two row table and a two million row table and may generate different execution plans for each. To use the cost-based optimizer, you should use the analyze command to analyze the tables involved. Create histograms for columns that have an uneven distribution and/or a small number of distinct values.

Setting optimizer_mode to CHOOSE invokes the cost-based optimizer if the tables have been analyzed and the rule-based optimizer if the tables have not been analyzed. Oracle warns against using CHOOSE.

Analyze Command
The ANALYZE command collects and stores table and index statistics which are essential for efficient operation of the cost-based optimizer. The purpose of the ANALYZE command is to perform one of the following functions on an index, table, or cluster:
- To collect statistics about the object used by the optimizer and store them in the data dictionary
- To delete statistics about the object from the data dictionary
- To validate the structure of the object
- To identify migrated and chained rows of the table or cluster

Developer Note: The Analyze Command can slow processing time if the statistics are not kept current or if all the objects have not been analyzed. The Analyze command will need to be run regularly to maintain current statistics.

Histograms
The histograms are created as part of the ANALYZE command. You can choose to generate histograms for all columns in the table, for all indexed columns or for selected columns only. Normally, you would choose to create histograms only for columns where the data was not evenly distributed (skewed). The histogram stores information about the frequency of various column values which Oracle can use to decide whether or not to use the index. For example, a table with yes/no column that 90% of the rows are yes and only 10% are no. The optimizer would use the index for queries on no, but not on queries on yes.

Full Table Scan Versus Indexes
The two most commonly used methods to retrieve rows from a table are full table scan or index lookup. With a full table scan, all rows from the table are read and compared against the selection criteria. A full table scan can be faster than an index lookup if your SQL statement accesses a large percent of the table's data. Although it is not possible to generalize across all types of SQL statements, hardware and data distribution, a rule of thumb can help a programmer decide whether to use a full table scan or an index lookup. One such rule of thumb is when accessing more than five percent of the table's data, full table scans may be fater than indexes. Based on your experience, you may find that percentage to be closer to 10% or even 25%.

Full Table Scan
The full table scan is the simplest way for Oracle to get your data. In a full table scan, every row of data in the table is read into memory. To perform a full table scan, Oracle reads all blocks (the basic unit of data storage) allocated to the table, starting with the first block and continuing until it reaches the high water mark. The high water mark is the "highest" block in the table that has ever held data. Think of it as the high tide mark.

To improve performance of a full table scan, you can reduce the number of block reads required for the scan. If the table is smaller than it once was, you could lower the high water mark by rebuilding the table. Rebuilding the table establishes a new high water mark. Another way to improve a full table scan's performance is to squeeze more rows into each block by reducing PCTFREE and increasing PCTUSED.

Indexes
The B-tree is the default Oracle index type. The B-tree has a hierarchical tree structure. A header block contains pointers to the appropriate branch block for any given range of key values. Branch blocks point to leaf blocks. The leaf block contains a list of key values and ROWIDs (row identifiers).

Each row of a table has a ROWID. The ROWID is a code (a logical address) reflecting the physical location of the row. The fastest SELECT statements are those where the WHERE clause contains a condition based on the ROWID.

The B-tree index provides flexible and efficient query performance. However, maintaining the B-tree with changing data can be expensive. To insert a row in the appropriate leaf block requires an index split, if there is no free space within the leaf block. Index splits are expensive operations. New blocks must be allocated and index entries moved from one block to another. Index splits can be avoided by using artificial keys or reduced by increasing the amount of free space kept within the index. Use the PCTFREE clause of the CREATE INDEX statement to define free space. Also look at the REBUILD clause of the ALTER INDEX statement. This allows an index to be rebuilt using the index itself as the source of data. Indexes should be in a separate table_space.

It is important that all your indexes contribute to query performance, since these indexes will otherwise needlessly degrade Data Manipulation Language performance. Inserts and deletes will be significantly slower for tables with a large number of indexes. It may be possible (with the appropriate scheduling) to drop the indexes make a large number of inserts/deletes and then rebuild the indexes. Dropping indexes will dramatically slow performance for other user transactions so this technique may not be an option.

Bitmapped indexes were introduced in Oracle 7.3. Oracle creates a bitmap for each unique value of the column in question. Each bitmap contains a single bit (0 or 1) for every row in the table. A "1" indicates that the row has the value specified by the bitmap and a "0" indicates that it does not. Bitmaps suit columns with few distinct values and which are often queried together. Oracle is unable to lock a single bit, and consequently, locking for bitmapped indexes is at the block (or page) level.

Even if there is an appropriate index available, the optimizer may not be able to take advantage of the access because of the wording of the SQL statement. Indexes are disabled if a column is subjected to any modification. Two examples of modification are UPPER(column_name) or column_name +1. Some additional query types that prevent indexes being used are:
!=
IS NULL
NOT IN
LIKE '%Smith' (with wildcard as the first letter)

If the WHERE clause uses the column of only one index, Oracle performs a range scan on it to retrieve the ROWIDs of the selected rows, and then accesses the table by these ROWIDs. If the WHERE clause uses columns of more that five indexes, Oracle performs an index merge. When performing an index merge, Oracle retrieves rows from each index matching the appropriate condition. Rows, which are common to all "lists", are returned.

A composite index is an index that is made up of more than one column with a maximum of 16 columns. Using composite indexes for queries can be faster than single-column indexes. The WHERE clause must use all the columns of a composite index in equality conditions combined with AND operators. If the composite index has three or more columns, the index may be used if the WHERE clause references the leading columns. If your query specifies values for only the nonleading columns of the index, then the index will not be used to resolve the query.

Hints
Hints are instructions that you can include in your SQL statement to guide the optimizer. Using hints, you can specify join orders, type of access paths, indexes to be used, the optimization goal and other instructions. Optimizer hints appears as a comment following the first word of the SQL statement. A hint is different from a comment in that they are prefixed with a plus sign ("+"). Example follows:
SELECT /*+ RULE*/

Subqueries
A subquery is a SQL statement that occurs within another SQL statement. A WHERE clause can contain a subquery SELECT. Subquery selects can be nested. You can nest subqueries down more levels than you would ever need. Multiple sources recommend you not go down more than 16 levels.

Correlated Subqueries
A correlated subquery is one that the subquery refers to values in the parent query. A correlated subquery can return the same result as a join, but can be used where a join cannot, such as in an UPDATE, INSERT and DELETE statement. In a correlated subquery, the subquery executes repeatedly, once for each value of a candidate row selected by the main query. This is why a correlated update can take more processing time.

Join
The join operation allows the results from two or more tables to be merged on some common column values. The inner join is the most common type of join operation. Rows are pulled from the tables based on some common ("key") value. Rows that have no match to the other table are not included. The outer join allows rows to be included even if they have no match in the other table. In Oracle, the outer join operator is "(+)". An anti-join is used to select all rows from a table that do not have a matching row in the other result set. You could create an anti_join using NOT EXISTS or NOT IN. In a self join, a table is joined to itself. A common reason for a self join is a recursive relationship. The self join is accomplished by using multiple aliases for the same table.

Set Operation
SQL has operations that allow result sets to be concatenated, subtracted or overlaid. These operators are UNION, MINUS, and INTERSECT. UNION returns the sum of two result sets without duplicates. MINUS returns all rows in the first result set which do not appear in the second result set. INTERSECT returns only the rows that appear in both result sets ignoring nulls. All set operations require that the component queries return the same number of columns and those columns need to be compatible datatypes. The join operations described above do not have these columns and compatible datatype limitations.

Bind Variables
Variables can be specified as literals or as bind variables. Bind variables are fixed references to variables contained elsewhere in the programming language or development tool. Bind variables are recognizable because they are prefixed by a colon. If you are going to re-execute your SQL, but use different parameters, use bind variables to define parameters. This will minimize parsing. Bitmap indexes do not work with bind variables.

Null
Null values are used to indicate that a data item is missing or undefined. The use of null values in relational databases can lead to unexpected results. NULL values extend the coding logic of TRUE/FALSE to TRUE/FALSE/UNKNOWN. Null values impact the query results. Oracle recognizes null values as the highest values in a column, which means in an ORDER BY ascending clause, null values are placed at the bottom. Arithmetic operations can not be performed on fields that contain null values. The result of adding, subtracting, multiplying or dividing a null value and any number is a null value. A field query using "!=" will not return records where the query field is null. When an indexed column is NULL, that row will not have an entry in the index. In other words, nulls are not indexed. When possible you may want to define columns as NOT NULL, for fields that will be indexed.

Truncate Table versus Delete
The TRUNCATE TABLE command allows all rows to be removed from a table with minimal overhead. TRUNCATE TABLE has no rollback segment and no commit. Emptying a table using the DELETE command results in higher overhead in rollback segments and redo log entries. Do not issue TRUNCATE TABLE unless you are sure you want all the table data gone.

Execution Plan
To execute a Data Manipulation Language statement, Oracle may have to perform many steps. Each of these steps either physically retrieves rows of data from the database or prepares them in some way for the user issuing the statement. The combination of the steps Oracle uses to execute a statement is called an execution plan. Oracle provides tools that reveal the way in which your SQL is processing and the resources expended. These tools are the EXPLAIN PLAN, SQL_TRACE facility and the TKPROF utility.

Explain Plan
Explain Plan is used to determine the execution plan Oracle7 follows to execute a specified SQL statement. This command inserts a row describing each step of the execution plan into a "plan table". If you are using cost-based optimization, this command also determines the cost of the executing the statement. If you examine the plan, you can see how the server executes the statement.

Example of Explain Plan follows:
SQL> EXPLAIN PLAN SET STATEMENT_ID = 'MyInitialsTodaysDate' for
SELECT * FROM …

The results of the explain plan go into the plan_table. To view the explain plan, use a variation of the following statement. The key to the statement below is the CONNECT BY and the STATEMENT_ID = 'MyInitialsTodaysDate'. The CONNECT BY gives you the format and the STATEMENT_ID isolates the data to your Explain Plan. This format is the key to interpreting the execution plan. The more indented the access path, the earlier it is executed. If two statements are indented at the same level, the uppermost statement is executed first.

COL OPERATION FORMAT A30
COL OPTIONS FORMAT A20
COL OBJECT FORMAT A20

SELECT LPAD(' ', 2*LEVEL) || OPERATION ||
DECODE (ID, 0, ' COST = ' || POSITION) "OPERATION",
OPTIONS "OPTIONS",
OBJECT_NAME "OBJECT"
FROM PLAN_TABLE WHERE STATEMENT_ID = 'MyInitialsTodaysDate'
CONNECT BY PRIOR ID = PARENT_ID
START WITH ID = 0
/

Because the procedure for running explain plan is awkward, many Oracle users write scripts to automate the process. In addition, there are a number of commercial and shareware products which can display explain plans and perform additional analysis of your SQL statement.

SQL Trace Facility
Oracle provides a facility for tracing SQL statement execution, which can provide all of the information provided by EXPLAIN PLAN together with details of CPU and I/O requirements and even the number of rows processed by each step in the execution plan. The ALTER SESSION SET SQL_TRACE TRUE statement allows SQL tracing to be initiated. The TKPROF command allows the trace files generated to be formatted in a meaningful way. The SQL trace facility generates the following statistics for each statement:
- Parse, execute and fetch counts
- CPU and elapse times
- Physical reads and logical reads
- Number of rows processed
- Missed on the library cache

To enable the SQL trace facility first set the following parameters:
SQL> ALTER SESSION SET SQL_TRACE = TRUE;

SQL>ALTER SESSION SET TIMED_STATISTICS=TRUE;
This parameter enables the collection of timed statistics. The default is false. Setting it to true will reduce performance very slightly.

SQL> MAX_DUMP_FILE_SIZE = N
This parameter specifies the maximum size of trace files. The default is 500.

SQL> USER_DUMP_FILE_SIZE = DIRECTORY_PATH
This parameter specifies the destination for the trace file. The default value for this parameter is the default destination for system dumps.

Developer Note: SQL Trace increases system overhead. You should enable it only when tuning your SQL statements, and disable it when you are finished.

Having enabled SQL_TRACE, your next challenge will be to find your trace file. Your trace file will be in the user_dump_file_size directory you specified. In UNIX and many other operating systems, the name will start with "ora' or "oracle_sid" and end with ".trc". In between will be the process identifier for the Oracle server process. The timestamp will help to identify your file. Once the trace file is found, TKPROF utility is used to create a usable file.

TKPROF TRACEFILE OUTPUTFILE [SORT=OPTION] PRINT=N]
[EXPLAIN=USERNAME/PASSWORD] [INSERT=FILENAME] [SYS=NO]
[RECORD=FILENAME] [TABLE=SCHEMA.TABLENAME]

TRACEFILE the name of the trace output file
OUTPUTFILE the name of the formatted file
SORT=OPTION the order in which to sort the statements
PRINT=N prints the first n statements
EXPLAIN=USER/PASSWORD runs explain plan in the specified username
INSERT=FILENAME generates INSERT statements
SYS=NO ignores recursive SQL statements run as user sys.
RECORED=FILENAME records statement found in the trace file
TABLE-SCHEMA.TABLENAME puts execution plan into specified table (default is plan_table)

Autotrace
SQL_TRACE and TKPROF are powerful tools, but they are not always easy to use. Each time you use SQL_TRACE you have to find the trace file then format and interpret the TKPROF output. Starting with SQL*Plus version 3.3 (Oracle 7.3) you have an alternative. AUTOTRACE can generate execution plans and execution statistics for SQL statements executed from SQL*Plus. The set autotrace on command will cause each query, after being executed, to display both its execution path and high-level trace information about the processing involved to resolving the query. Unlike the explain plan command, the statement is actually run, even if you choose not to see the output. When you use set autotrace on, records are inserted into plan_table to show the order of operations executed.

To enable AUTOTRACE set the following parameters:
SQL> SET AUTOTRACE ON

To skip seeing the output of the SQL statement use the following command:
SQL>SET AUTOTRACE TRACE

Summary
Tuning SQL can improve the response time, throughput and scalability of an application and can help avoid costly hardware upgrades. SQL tuning is a very cost-effective way of improving system performance and can be done at any stage of a system's life cycle. Ideally, SQL should be tuned as it is written. It is hoped that the information presented here will help you understand how Oracle processes SQL, how you can influence Oracle's processing, and how you can make the most of indexes.

Copyright 2001