An Oracle Tune-Up: Tuning Your SQL for Maximum Performance
Nothing can make a good application perform badly like poorly tuned SQL. Although many developers believe that tuning queries is the mysterious realm of database administrators, maximizing query performance is not difficult, and should be a goal that every developer strives to achieve. Here, you will explore how Oracle executes queries and learn how to get the most out of the database with some simple techniques.
I Demand an Explanation!
How can you decide whether a query is tuned properly? One way to do so would be simply to execute the query and see how long it takes to return the desired results. However, what if more data is inserted into the table? A query that once ran in a second or two might suddenly start to bog down after a few million rows of data are added. Luckily, there is another way.
Suppose that you have a database that is used by the HR and payroll systems to keep track of employee information and payroll transactions. Consider the Employees table as shown below:
create table employees ( employee_id varchar2(9) not null, first_name varchar2(30) not null, last_name varchar2(30) not null, hire_date date not null, department_id varchar2(10) not null, salary number(9,2) );
Now, suppose you want to find out how a simple query that looks up an employee by employee ID executes. To do this, you need to explain the execution plan of the query you are executing. In SQL*Plus, the Oracle client that ships with all versions of Oracle, you could do this as follows:
set autotrace traceonly explain; select first_name, last_name, hire_date from employees where employee_id = '123456789'; set autotrace off;
---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=1 Bytes=29) 1 TABLE ACCESS (FULL) OF 'EMPLOYEES' (TABLE) (Cost=18 Card=1 Bytes=29)
This output shows the execution plan for the query.
Line 0 shows important information about the entire query. First, note the portion that says Optimizer=CHOOSE. This indicates that Oracle's cost-based optimizer is choosing the mode in which it runs on the fly. There are two possible modes: on-line transaction processing (OLTP) and data warehousing (DW). An OLTP database, such as an airline reservation system, would likely see a high volume of transactions, while a data warehouse, such as a business process management system, might only be updated nightly, weekly, or monthly. Here, the default configuration of Oracle gives the optimizer the freedom to choose which mode to use based on the characteristics of this particular database.
Next, you will notice that line 0 indicates Cost=18. This is the cost for the entire query, and is a relative indication of how much processing has to be done to execute it. In general, lower cost equals better performance. However, this is a relative measurement, not an absolute measurement of the computing resources needed to run the query. So, a query that has a cost of 16 would run slightly faster than this query does, and a query with a cost of 6000 would run significantly more slowly. The actual execution time and resources needed to run the query depend on many different factors, so the cost should be used to determine whether you are helping or hindering query performance when tuning a query. It generally can be used only as a performance gauge when looking at different queries.
Next on this line, you will see Card=1. This value indicates how many rows must be processed in order to execute the query. This is Oracle's best guess, based on the data currently existing in the table.
Finally on line 0, we have Bytes=29. This number indicates the amount of I/O in bytes that must occur between the client and the server to give back the results of the query.
So, for purposes of query tuning, line 0 gives a good idea of the total resources needed to run the query, how many rows will be processed, and how much data must be exchanged between client and server to return the results. Now, on to the meat of the tune-up for this query.
Line 1 shows TABLE ACCESS (FULL) OF 'EMPLOYEES' as step 1 of the execution plan. This means that, to run this query, every row must be read in the EMPLOYEES table until the right one is found. Data is kept in this table in the order in which it was inserted, so this might require one read, 20 reads, or 2 million reads, depending on when the employee ID 123456789 was inserted. In general, a full-table scan is the absolute worst-performing way to get data from a table.
So, how can the query be fixed? By creating a primary key on the table. A primary key is a unique database constraint that identifies some key attribute of the data kept in the table. In this case, assume that every employee has a unique employee ID number, and create the primary key as follows:
alter table employees add constraint employees_pk primary key (employee_id);
This might seem obvious to veteran database designers, but there are multi-million-dollar software packages out there with no primary keys on important tables that might contain huge amounts of data! Now, performing the explanation of the query again gives the following execution plan:
---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=29) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMPLOYEES' (TABLE) (Cost=1 Card=1 Bytes=29) 2 1 INDEX (UNIQUE SCAN) OF 'EMPLOYEES_PK' (INDEX (UNIQUE)) ( Cost=0 Card=1)
So, what has been accomplished by adding the primary key? On line 2, notice that you have INDEX (UNIQUE SCAN) OF 'EMPLOYEES_PK'. Scanning an index is far quicker than scanning an entire table for data. In this case, a unique scan is used; this means that the Oracle optimizer is able to figure out whether the row exists in the table by doing one lookup on the index. Also, line 1 shows TABLE ACCESS (BY INDEX ROWID) OF 'EMPLOYEES'. When Oracle goes to the primary key to look up the data, it is able to find the row ID of the needed row. The row ID is like a unique street address of this row. Once this value is known, it takes only one access of the table to retrieve the desired data.
Finally, looking at line 0, you can see that the cost has decreased from 18 down to 1. Even though this is a relative measurement, it is easy to see that adding the primary key has improved the query's performance.
Now, on to some more complicated examples.