Sample Chapter: Efficient PL/SQL, Page 7
Bind Variables and the Cost of Parsing
One of the reasons I am so passionate about PL/SQL is that it is naturally scalable to high degrees of concurrency and hence, large user populations. But as with any tool, if you abuse it, it will bite back!
Scalability is of course an integral part of PL/SQL efficiency, namely satisfying our second condition for efficiency—ensuring that a program does not impact other parts of the system. One of the single biggest scalability inhibitors when building Oracle-based applications is ignorance of the steps Oracle must take to execute a SQL statement. Many developers are familiar with tools such as EXPLAIN PLAN to ensure that the execution time of an SQL is optimal, but unbeknown to most developers, there is an additional cost that must be taken into consideration; you must check that a SQL statement is valid before it is executed. Notice that I used the general term "Oracle-based applications" here. This is not just an issue that applies to writing efficient PL/SQL. Any application that is built on Oracle (and intended to scale to large user populations)—be it a Java/JDBC application or a Visual Basic/ODBC application—that does not take into account the costs of processing a SQL statement before it is executed, will not scale.
To understand why parsing is so important (or more accurately, so expensive), imagine the following scenario. Your boss walks into your office and asks you to write a program that will be used by many other programs in the department. She tells you it's a simple little program—all it has to do is to decide if a given character string is valid SQL for an Oracle database. That basic task will take an enormous amount of complexity; you will need to know every possible allowable SQL keyword, the sequences and combinations in which they make sense, where the whitespace is allowed, what impact any comments may have, and you haven't even started checking the objects referenced in the query yet. That is a lot of work to do for every single SQL statement that gets passed to your program for checking. However, one simple enhancement could be to keep a log of SQL statements that have been processed at some stage in the past. If a SQL statement is presented for checking more than once, you need merely consult the log to see if it is valid.
Of course, if every SQL statement from each application program were different, this log would be of little benefit. In fact, over time, your centralized log of SQL statements is probably going to get quite large—searching through it to see if an SQL statement is already present could in itself take a reasonable amount of CPU. You would probably optimize this process even further by asking the other application developers in the department to code a similar logging facility within their own application code so that they only need to call your syntax-checking program once for each unique SQL their program needs.
Oracle works in the same way: the centralized log from the example is the shared pool. This process of checking all aspects of an SQL statement prior to its execution is known as parsing and it consumes a large amount of CPU. Every new SQL presented to the database must be parsed to ensure that it is valid before it can be executed. Hence, the fewer new SQL statements that are presented to the system, the better your systems will scale and perform. The impact of parsing can be especially severe when high levels of concurrency are required.
If we present a brand new SQL statement to Oracle, a "hard parse" will be performed, that is, the entire set of syntax and validity checks for the SQL is undertaken. If we present an SQL statement to Oracle and ask for it to be parsed, but it has already been hard parsed, a "soft parse" will be performed. But the ideal situation would be for applications to parse their own SQL statements only once, and remember that they do not need to be parsed again.
To minimize the number of different SQL statements that are parsed by the database, make intelligent use of bind variables. A bind variable is simply a mechanism through which you can create a placeholder for a literal value within an SQL statement. Obviously at the moment of execution, an appropriate value must be supplied for that placeholder (otherwise the SQL statement would not make sense), but a lot of work takes place before a SQL statement can be executed.
Where do the bind variables fit into the parsing picture? As we mentioned previously, an appropriate value for any placeholder must be present before a statement is executed, but Oracle can perform the parsing work on a SQL statement before those values are substituted. Thus, for any series of SQL statements that differ only by the value of the literal values within them, it should be possible to reduce the parsing overhead by replacing those literal values with bind variables.
Failure to Use Bind Variables
We can easily prove the impact of not using bind variables with the following tests. We'll use the dbms_sql built-in package for our examples because each call to dbms_sql describes each phase of SQL statement processing.1 We will perform queries to a table, people , in which a single row lookup is done via a primary key.
SQL> create table people(pid primary key ) 2 organization index 3 as select rownum from all_objects 4 where rownum <=10000; Table created.
First, to get a measure for the cost of parsing, let's consider the case where a string to get each person is simply built by concatenation. Each query will be along the lines of
select pid from people where pid =123; select pid from people where pid =124; ...etc....
Every SQL statement will be regarded as brand new by the SQL engine. We will run 10,000 different queries to the person table, each SQL simply getting the details for the particular person row.
SQL> create or replace 2 procedure literals is 3 c number; 4 p number; 5 x number; 6 x1 number; 7 begin 8 for i in 1 ..10000 loop
When using dbms_sql, we must first open a cursor using the OPEN_CURSOR function for the query we are about to execute. We then parse the text of the statement using the parse procedure. We then execute the statement and fetch the row from the cursor. When we are finished with each statement, we close its cursor with the close procedure.
9 c :=dbms_sql.open_cursor; 10 dbms_sql.parse(c, 11 'select pid from people '|| 12 'where pid ='||i,dbms_sql.native); 13 x :=dbms_sql.execute(c); 14 x1 :=dbms_sql.fetch_rows(c); 15 dbms_sql.close_cursor(c); 16 end loop; 17 end; 18 / Procedure created.
The SQL*Plus timing facility yields the total elapsed time, but we want to distinguish between the time spent executing each of the SQL statements in the procedure and the time spent parsing. For this, we need to use the sql_trace facility.
SQL> set timing on SQL> alter session set sql_trace =true; Session altered. SQL> exec literals; PL/SQL procedure successfully completed. Elapsed:00:00:29.68 SQL> alter session set sql_trace =false; Session altered.
Without any further investigation, it's quite possible that a developer would be satisfied with this result. After all, we ran 10,000 queries in 30 seconds. When we run Tkprof 2 on the raw trace data, the summary produces a startling result.
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS call count cpu elapsed disk query current rows ------- ------ ------ ------- ----- ------ ------- ----- Parse 10081 25.72 26.96 0 310 0 0 Execute 10134 1.70 1.63 0 0 0 0 Fetch 10492 0.76 0.86 0 20488 0 10425 ------- ------ ------ ------- ----- ------ ------- ----- total 30707 28.19 29.45 0 20798 0 10425
We chose the total for the recursive statements because each of the queries to the people table will be a child of the main procedure, literals.
The 10,000 queries needed only 2.49 seconds (that is, 1.63 + 0.86) to execute and fetch the rows, but before that could be done, we needed 26.96 seconds just to do the preparatory work. Parsing the queries took 10 times the effort of actually running them! This is an extreme (but sadly common in the real-world) example of the cost of running distinct SQL for every statement, and dramatically demonstrates the cost of parsing. Notice also that virtually all this parsing time was heavy-duty CPU work (28.19 out of 29.45 seconds).