http://www.developer.com/

Back to article

Sample Chapter: Efficient PL/SQL


February 5, 2004

CHAPTER 1: Efficient PL/SQL

In this chapter, we discuss PL/SQL efficiency. We have deliberately avoided the term "performance" because efficiency is more than just performance. We'll define exactly what we mean by "efficient PL/SQL" and explain how to ensure that the PL/SQL you build meets that definition.

When used intelligently, PL/SQL can build applications that perform well, are resilient to change over time, and scale to large user populations. We contend that PL/SQL should be your language of choice for database-centric application development. Therefore, when you're deploying Oracle within your organization, you should encourage the use of PL/SQL as an integral component of the applications that you develop.

Why Use PL/SQL?

Before covering the important issues surrounding PL/SQL efficiency, let's address the question that we've often been asked by developers over the years: "Do I need to use PL/SQL at all?"

Ever since organizations began their obsession with getting their applications off the mainframe in the late 1980's—moving first to client-server applications and then to Web-based solutions—the application data (stored in an Oracle database) has been getting "further away" from the application code itself.

As a consequence, when stored procedures first appeared in the Oracle server back in version 7, they were generally marketed as a solution for improving client/server applications over slow networks. Rather than making lots of discrete database calls from a client application, we could bundle those calls inside a PL/SQL stored procedure that resided on the server, issue a single call from the client application, and thus reduce our dependency on the speed of the network.

This is a terrible underselling of the value of PL/SQL (and let's face it, underselling a feature is something we generally don't expect from Oracle). Worse, it led to the idea that if you didn't intend to run an application across a wide area network, you didn't need PL/SQL. But as this book will demonstrate time and time again, not using PL/SQL in an Oracle project is equivalent to coding with one hand tied behind your back. Nevertheless, the first battle you're likely to face on a project will not be ensuring that you use PL/SQL efficiently, but convincing management to use it at all!.

Often, the arguments against using PL/SQL during application development include:

  • It locks you into using Oracle.
  • "It doesn't do anything I could not do in <insert your favorite 3GL here>."

Launching into a tirade about the foolhardiness of building applications that purport to be truly database independent is outside the scope of the book, but avoiding PL/SQL in an Oracle application could well be a death knell for that project. And although it is true that the functionality of PL/SQL is reproducible in most 3GLs, the second argument demonstrates an ignorance of the two key benefits of using PL/SQL.

PL/SQL is Close to the Data

Very, very close, in fact. PL/SQL is inextricably part of the database. As you will see in Chapter 4, "Effective Data Handling," we can forge strong links between variable and data structures used in PL/SQL programs and the equivalent structures in the database.

Also, as will be demonstrated in Chapter 2, "Package It All Up," PL/SQL is aware of structural changes in the database and can provide insulation from those changes. You can track the dependencies between PL/SQL programs and other database objects but not need to perform any code maintenance when those database objects are changed.

The Simplest Option is Often the Best

If you compare the amount of code required in PL/SQL to process SQL with that required in any other language, PL/SQL usually wins. Any time you can use less code to achieve the same result is a good thing in terms of development time and ongoing maintenance. Consider the following examples, taken from the standard documentation, that show the difference between using PL/SQL and ProC to retrieve a BLOB from one table and insert it into another. First, the PL/SQL version (all eight lines of it!).

PROCEDURE insert_blob_proc IS
  Blob_loc BLOB;
BEGIN
  SELECT ad_photo INTO Blob_loc
  FROM Print_media
  WHERE product_id =3106 AND ad_id=13001;.INSERT INTO Print_media
                         VALUES (2056,12001,Blob_loc);
END;

And now the ProC version.

#include <oci.h>
#include <stdio.h>
#include <sqlca.h>

void Sample_Error()
{
  EXEC SQL WHENEVER SQLERROR CONTINUE;
  printf("%.*s \n",sqlca.sqlerrm.sqlerrml,sqlca.sqlerrm.sqlerrmc);
  EXEC SQL ROLLBACK WORK RELEASE;
  exit(1);
}

void insertBLOB_proc()
{
  OCIBlobLocator *Lob_loc;

  EXEC SQL WHENEVER SQLERROR DO Sample_Error();
  /*Initialize the BLOB Locator:*/
  EXEC SQL ALLOCATE :Lob_loc;

  EXEC SQL SELECT ad_photo INTO :Lob_loc
    FROM Print_media WHERE product_id =2268 AND ad_id =21001;

  /*Insert into the row where product_id =3106 and ad_id =13001:*/
  EXEC SQL INSERT INTO Print_media
    VALUES (3106,13001,:Lob_loc);

  /*Release resources held by the locator:*/
  EXEC SQL FREE :Lob_loc;
}

void main()
{
  char *samp ="pm/pm";
  EXEC SQL CONNECT :pm;
  insertBLOB_proc();
  EXEC SQL ROLLBACK WORK RELEASE;
}

The lower level OCI version of the code is even lengthier than the ProC one. Similarly, with many languages that can access a database, there are several mechanisms available on which SQL can be run and the results retrieved. More often than not, the mechanism that provides the quickest or simplest coding path is not the one that provides optimal efficiency. For example, in Java, the Statement and PreparedStatement classes are readily available for processing SQL statements. For the majority of applications, the most efficient way to write JDBC applications that access Oracle is to use the PreparedStatement.(You'll see the reason for this in the "Achieving Efficiency" section, later in this chapter.) However, this requires more code from the developer, and not surprisingly, developers often take the shortest distance between two points and code with the Statement method.

With PL/SQL, the quick and simple coding solution for the processing of SQL also turns out to be the optimal way. You have to go out of your way to force PL/SQL to process SQL in a way that is not optimal. We'll be exploring this point later in this chapter. But first, we need to define what we mean by efficiency.

What is Efficient PL/SQL?

If (or hopefully, after you have read this book, when) you are convinced of the benefits of PL/SQL, the next logical task is to ensure that any PL/SQL code you develop runs efficiently.

There are a myriad of books on the market that espouse best practices for efficient PL/SQL. However, many of these texts propose that efficiency comes from good standards and consistent coding styles, for example, a set of naming standards for such constructs as variables, cursor definitions, and so on. It certainly is true that adhering to such standards will improve the consistency and readability and thus reduce the maintenance effort required during the coding life-cycle. However, it takes a leap of faith to assume that this will magically result in PL/SQL code that is efficient. Having well-structured methodologies and standards gives you an efficient environment for coding PL/SQL, but this by no means guarantees that you will create efficient code. So let's state in advance that this chapter (indeed, this book) is not about having good naming standards or consistent coding practices within your development team. These are important components of a successful development project (for any language, not just PL/SQL), but building efficient PL/SQL is a related but different topic.

What do we mean by efficiency? Assuming that your code meets it functional requirement, our general philosophy is that PL/SQL code can be considered efficient if it satisfies three general conditions.

  • Performance: It runs within an acceptable elapsed time.
  • Impact: It does not break (or degrade in an unacceptable fashion) any other components in your system.
  • Demonstrability: You can prove (or demonstrate) that under all reasonable conditions, the code will not violate the Performance or Impact conditions. Put simply, you tried to break it and you couldn't.

The order here is important. You must prove that your code can meet the elapsed time requirement before you move on to the impact analysis. Let's take a look at these broad guidelines in a bit more detail.

Performance

Ask a PL/SQL developer what efficient code is, and the standard response will be "it runs fast." After all, this seems like common sense—if your code runs as fast as possible, what more could be asked of you? But how fast is fast enough?

The problem with this philosophy is that it is impossible to define an end goal. How do you know that the program can't run just a little bit faster? Given the vast number of features that appear in each new release of Oracle, for any given PL/SQL program, there is probably always some adjustment you could make to squeeze a little more performance out of it. A perfect example of this is the native compilation features in 9i. It is probably true that any PL/SQL program that contains some procedural logic (as opposed to being just a wrapper around some simple SQL) would run faster if it were natively compiled as opposed to the interpreted nature of PL/SQL (which is the default). But is it worth it? Natively compiled PL/SQL means that you have an external file system directory that needs to be backed up as part of your standard backup procedures. Also, there are overheads when mixing interpreted and natively compiled PL/SQL units, so you may need to look at natively compiling every PL/SQL unit in the database including those delivered in the standard database installation scripts (catproc.sql and so on). And after the effort of converting to a natively compiled database, did the gains justify the additional license costs of that C compiler you had to install? Can you quantify the gains you achieved anyway?

Trying to get PL/SQL to run as fast as it can is a waste of resources, namely your own! The question you have to ask yourself is not "Is it running as fast as possible?" but "Does it run within an acceptable elapsed time?" There is a subtle difference between the two. You will notice that we have not quantified what "acceptable" is because only you or your users can define the appropriate limits for each PL/SQL unit. If the code is designed for processing the nightly set of batch interfaces, it may not need to run sub-second; it simply needs to finish within the boundaries defined by the nightly batch window. However, if the code needs to (for example) return a description for a product item to a field on the user's data entry screen, a quick turnaround time is vital. Even more vital is that these limits are defined before the code is written.

Once the elapsed time criteria have been defined and accepted, you cannot consider your code to be performance efficient until that target has been reached. This is true even if the reason the target wasn't reached is not "your fault." Oracle professionals (both DBAs and developers) sometimes have a tendency to "pass the performance buck." For example, consider a PL/SQL program that has to complete in three minutes but part of its function is to query data from a remote Oracle database, which can only be accessed via very slow network. As a result, the PL/SQL program takes seven minutes to run. The typical response to this sort of performance problem is "The program is fine, it's a network issue." We argue that because this program fails the elapsed time requirement, it is therefore inefficient. It's unfortunate for the developer that there is a network problem, but that's just plain tough luck! Maybe the importance of the functionality of the program will yield a network upgrade, or alternatively, it might take a whole suite of other mechanisms (snapshots, queuing, materialized views, and so on) to yield a solution in which the program can run in its required three minutes. The fact remains that until it can be demonstrated that the proposed solution runs in three minutes, the solution is inefficient and should not be implemented. The definition of performance is not how much CPU a process uses or how many disk operations are required—it is a measure of acceptable response-time.

Impact

Here is where factors such as CPU, memory, and concurrency become a consideration. A PL/SQL program that runs at breakneck speed (or simply just satisfies the performance requirement) is not of much use if it disrupts another required system function in some way. A PL/SQL program could violate this requirement in a number of ways. For example:

  • It could lock all the rows in a table while it runs, thus halting end-users from doing their day-to-day tasks.
  • It could consume too many system resources (CPU, memory, and so on), thus causing response time problems in unrelated parts of the system, or crashing the system entirely.
  • It might fail to meet its concurrency requirements. If 100 users are expected to use this PL/SQL module concurrently, but it dies a slow death as soon as 10 users try to execute it at once, then as fast as it is, it is useless.

Demonstrability

To use an Australian colloquialism, I am a bit of a "rev head." In a nutshell, this means I like driving my car fast, although four years of living in central London nearly broke me of the habit. The optimal time for driving fast tends to be about four o'clock in the morning, when there is no one else on the road. All I need is one long empty lane on the local freeway and I'm in heaven. Of course, does this mean that having a one-lane freeway is a guarantee of being able to drive fast? Definitely not. In fact, if I drive on the same freeway between 7am and 9am during the working week, it would be quicker if I got out of the car and walked!

It is common sense that what you can do when you are the only car on the road is a far cry from what you can do when there are a lot of cars on the road. But amazingly, we tend to put that same common sense on the shelf when it comes to developing (and more importantly, testing) applications. Many systems have failed under the assumption that if a program runs fine in isolation, it will run fine on a live system.

This is the key aspect of demonstrability. It is easy to wave off discussions about efficiency as just plain common sense: ("Of course I know that you want my code to run fast and not cause problems. I'm not stupid."). But there is more to efficiency than that. Developer testing strategies tend to fall into one of three categories.

  • You hope the program will satisfy its performance and impact requirements.
  • You know the program satisfies its performance and impact requirements but only under the conditions you've tested. (This is the most common testing strategy used by most developers.)
  • You have busted your gut trying to come up with ways to break the thing and you can't!

The last of these is a coarse way of describing the true concept of demon-strability—having a definitive set of tests that prove both the functionality and efficiency of the program under all reasonable conditions. Tests that prove that the code will scale to the required number of users, tests that prove that performance goals are met under all conditions (during the rush hour, not just at 4AM on the open highway).

Consider the following (true) example taken from a site that was experiencing problems with one of its online systems. We will see shortly that the efficiency problems were actually caused by a design flaw, but this was only revealed once we attempted to demonstrate that the code met all its efficiency requirements.

One of the requirements for the online system was that all new customers needed to be able to track their own personal details once they had registered over the Internet. The initial specification was that customers were given a unique numeric ID and a pin number (effectively a numeric password) that would allow them to authenticate with the system. However, the requirement changed on the premise that two numeric IDs would be too much to remember Efficient PL/SQL.for any customer, so the customer pin number would actually be the unique ID for that customer. Therefore, we wanted a table that would uniquely record customers by using a random numeric number as the primary key.

The resulting solution was that a random number generator would produce a random ID, and then the customers table would be probed to see if that ID already existed. If the customer did exist, another random number would be generated until a unique customer record was inserted. Let's look at a possible solution for the stated requirements. First we have the table that stores the unique ID for each customer.

SQL> create table customers (
  2    pin_number number(6))

Table created.
SQL> alter table customers
  2    add constraint customers_pk
  3    primary key (pin_number);

Table altered.

The customers table contains a primary key, pin_number, that will hold the unique customer identifier. Next, we create a procedure that obtains a random number, generated by the built-in dbms_random package, and inserts it into the customers table.

SQL> create or replace
  2  procedure gen_customer is
  3    v_new_cid customers.pin_number%type;
  4  begin
  5    loop
  6       begin
  7         v_new_cid :=round(dbms_random.value(1000000,9999999));

The variable, v_new_cid, contains the random number to be assigned for this new customer record. We then attempt to insert this into the customers table.

  8         insert into customers
  9         values (v_new_cid);
 10         exit;

Of course, if that random number is already in use, a constraint violation will occur and the dup_val_on_index exception will be raised. We simply catch this and loop around to try another random number.

 11         exception when dup_val_on_index then
 12           null;
 13         end;
 14    end loop;
 15  end;
 16 /

Procedure created.

This solution looks fairly simple. Let's put it through its paces and add a customer. We'll use the standard set timing on facility in SQL*Plus to record some executions times.

SQL> set timing on
SQL> begin
  2    gen_customer;
  3    commit;
  4  end;
  5  /

PL/SQL procedure successfully completed.

Elapsed:00:00:00.02

No problems there; a customer record added in virtually no time at all. Let's scale up the test and simulate the creation of 100,000 customers.

SQL> set timing on
SQL> begin
  2    for i in 1 ..100000 loop
  3      gen_customer;
  4    end loop;
  5    commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed:00:00:34.08

Voilà! We can insert 100,000 customers in 30 seconds, and because it's unlikely that this many people are all going to try to register at once, we can be pretty confident that the program is fine from a performance perspective. However, the linchpin of demonstrability is not trying to get code to work, but trying to break it. Of course, in this example, it doesn't take a rocket scientist to see where the problems are. Assuming that pin numbers are always positive, the column definition of pin_number tells us a maximum of 999,999 customers is expected. Table 1-1 shows what happens when we reach our maximum customer mark.

Table 1-1. Benchmark for Customer Registration


ROW INSERTION TEST            TIME TAKEN TO GENERATE
Rows 100,000 to 200,00038 seconds
Rows 300,000 to 400,00053 seconds
Rows 600,000 to 700,000123 seconds

This makes sense—as more and more customer rows are created, the likelihood of a randomly generated pin number already being in use becomes higher, so we need to iterate more and more times through our random number generator before finding a unique pin number. With each additional customer, the process will continue to consume more and more CPU per customer and get slower and slower. Consider the poor customer who happens to be the one-millionth person to register (or at least attempt to). They will wait forever trying to obtain a pin number. Randomness and uniqueness are mutually exclusive—if the occurrence of something (in this case a pin number) is genuinely random, we can never be sure of uniqueness. We did not even have to extend the test to ensure lots of concurrent customers can register—we demonstrated that the system starts to exhibit performance issues with just one customer trying to register (presuming many others have already done so).

This is what I like to call the "Let's hope it's not a success" approach to coding. That is, it's quite possible that the solution just implemented will work well as long as you don't attract too many customers. But of course, we implemented this system in the hope of doing precisely the opposite. It's a no-win situation. The more customers you get, the worse the system will treat them.

The solution was to compose a longer customer ID, the first six digits generated by a sequence (an object that is inherently designed to be associated with uniqueness), and the second six digits generated randomly.

First we need a sequence that contains six digits, so we will make it start from 100,000. Our table will also need to be able to house a larger pin number.

SQL> alter table CUSTOMERS modify PIN_NUMBER number(12);

Table altered.

SQL> create sequence cust_seq cache 1000
  2  start with 100000;

Sequence created.

Then we amend our procedure to insert a pin number that is comprised of the next value from our sequence, appended to a random six-digit number. The code is smaller as well because by using a sequence, uniqueness is guaranteed, thus obviating the need for the dup_val_on_index checking.

SQL> create or replace
  2  procedure gen_customer is
  3    v_new_cid customers.pin_number%type;
  4  begin
  5    insert into customers
  6     values (cust_seq.nextval*100000+
  7             round(dbms_random.value(100000,999999)));
  8  end;
  9  /

Procedure created.

Repeating our tests show that we sacrificed a little of the initial performance (due to the larger amount of data being inserted) for the sake of consistent and predictable results right up to the maximum number of expected customers, as shown in Table 1-2.

Table 1-2. Repeated Benchmark for Customer Registration


ROW INSERTION TEST            TIME TAKEN TO GENERATE
Rows 0 to 100,00055 seconds
Rows 100,000 to 200,00055 seconds
Rows 300,000 to 400,00059 seconds
Rows 600,000 to 700,00059 seconds
Rows 900,000 to 999,99961 seconds

With this simple change, we have definitively proved that one million customers can be handled, and that the millionth customer will be able to register just as efficiently as any other. We have thus demonstrated that the program works under the expected customer volume conditions. The next test ensures that if we have many customers concurrently trying to register, performance does not suffer. We can simulate concurrent users with the dbms_job scheduling facility. For example, consider the following anonymous PL/SQL block (which creates 10,000 customers):

begin
   for i in 1 ..10000 loop
      gen_customer;
   end loop;
   commit;
end;

If we want to execute this simultaneously across 20 concurrent sessions, we simply submit it 20 times using dbms_job. First, we ensure that at least 20 concurrent job processes are allowed.

SQL> alter system set job_queue_processes =20;

System altered.

Because we cannot directly view the output from database jobs using dbms_output, we will create a custlog table to hold the elapsed time for each of the job executions.

SQL> create table CUSTLOG (elapsed_centiseconds number);

Table created.

We then submit our PL/SQL job within the dbms_job framework. We have altered the anonymous block that is submitted to capture the elapsed time into the custlog table.

SQL> declare
  2    j number;
  3    job_string varchar2(1000):=
  4      'declare
  5        s number :=dbms_utility.get_time;
  6      begin
  7        for i in 1 ..10000 loop
  8          gen_customer;
  9        end loop;
 10        insert into custlog values (dbms_utility.get_time-s);
 11        commit;
 12      end;';
 13 begin
 14 for i in 1 ..20 loop
 15   dbms_job.submit(j,job_string);
 16 end loop;
17 end;
18 /

PL/SQL procedure successfully completed.

The jobs will not commence until a commit is issued. Once the jobs have completed, we can view the custlog table to check the elapsed time of each submitted job.

SQL> select *from custlog;

 ELAPSED_CENTISECONDS
---------------------
                10785
                10878
                11172
                11116
                11184
                11450
                11347
                11701
                11655
                11897
                11726
                12055
                11962
                12028
                12373
                11859
                11995
                11905
                12547
                11977

20 rows selected.

We averaged approximately 115 seconds for each of the jobs and the number of rows created in the customers table is 200,000. Given that these tests were performed on a single CPU laptop, these results are comparable with the 55-60 seconds per 100,000 rows results we obtained earlier.

Achieving Efficiency

For our programs to be efficient, they must fall within our elapsed time requirements and not cause critical damage to any other part of the system or its resources. We also must be able to demonstrate that this efficiency can be maintained under most or all anticipated conditions.

For the remainder of this chapter, we present some basic guidelines to achieving efficient PL/SQL along with code to demonstrate how this efficiency is achieved. We're not even going to try to cover everything here. After all, this whole book is basically about writing and using PL/SQL in an efficient manner. At this point, we're going to focus on three simple, high-level guidelines for achieving efficiency.

  • Minimize the amount of work the database has to do when parsing the SQL you want to execute. In a sense, this is not a PL/SQL issue because it is by no means limited to executing SQL from PL/SQL. It applies to execution of SQL from any language (Java, Visual Basic, and so on). However, it is possibly the number one cause of nonscalability in Oracle applications, so it must be discussed.
  • Understand PL/SQL's features and functionality and know how to correctly exploit them so you never have to reinvent the wheel.
  • Never use PL/SQL to do the job of SQL.

Bind Variables and the Cost of Parsing

One of the reasons I am so passionate about PL/SQL is that it is naturally scalable to high degrees of concurrency and hence, large user populations. But as with any tool, if you abuse it, it will bite back!

Scalability is of course an integral part of PL/SQL efficiency, namely satisfying our second condition for efficiency—ensuring that a program does not impact other parts of the system. One of the single biggest scalability inhibitors when building Oracle-based applications is ignorance of the steps Oracle must take to execute a SQL statement. Many developers are familiar with tools such as EXPLAIN PLAN to ensure that the execution time of an SQL is optimal, but unbeknown to most developers, there is an additional cost that must be taken into consideration; you must check that a SQL statement is valid before it is executed. Notice that I used the general term "Oracle-based applications" here. This is not just an issue that applies to writing efficient PL/SQL. Any application that is built on Oracle (and intended to scale to large user populations)—be it a Java/JDBC application or a Visual Basic/ODBC application—that does not take into account the costs of processing a SQL statement before it is executed, will not scale.

To understand why parsing is so important (or more accurately, so expensive), imagine the following scenario. Your boss walks into your office and asks you to write a program that will be used by many other programs in the department. She tells you it's a simple little program—all it has to do is to decide if a given character string is valid SQL for an Oracle database. That basic task will take an enormous amount of complexity; you will need to know every possible allowable SQL keyword, the sequences and combinations in which they make sense, where the whitespace is allowed, what impact any comments may have, and you haven't even started checking the objects referenced in the query yet. That is a lot of work to do for every single SQL statement that gets passed to your program for checking. However, one simple enhancement could be to keep a log of SQL statements that have been processed at some stage in the past. If a SQL statement is presented for checking more than once, you need merely consult the log to see if it is valid.

Of course, if every SQL statement from each application program were different, this log would be of little benefit. In fact, over time, your centralized log of SQL statements is probably going to get quite large—searching through it to see if an SQL statement is already present could in itself take a reasonable amount of CPU. You would probably optimize this process even further by asking the other application developers in the department to code a similar logging facility within their own application code so that they only need to call your syntax-checking program once for each unique SQL their program needs.

Oracle works in the same way: the centralized log from the example is the shared pool. This process of checking all aspects of an SQL statement prior to its execution is known as parsing and it consumes a large amount of CPU. Every new SQL presented to the database must be parsed to ensure that it is valid before it can be executed. Hence, the fewer new SQL statements that are presented to the system, the better your systems will scale and perform. The impact of parsing can be especially severe when high levels of concurrency are required.

If we present a brand new SQL statement to Oracle, a "hard parse" will be performed, that is, the entire set of syntax and validity checks for the SQL is undertaken. If we present an SQL statement to Oracle and ask for it to be parsed, but it has already been hard parsed, a "soft parse" will be performed. But the ideal situation would be for applications to parse their own SQL statements only once, and remember that they do not need to be parsed again.

To minimize the number of different SQL statements that are parsed by the database, make intelligent use of bind variables. A bind variable is simply a mechanism through which you can create a placeholder for a literal value within an SQL statement. Obviously at the moment of execution, an appropriate value must be supplied for that placeholder (otherwise the SQL statement would not make sense), but a lot of work takes place before a SQL statement can be executed.

Where do the bind variables fit into the parsing picture? As we mentioned previously, an appropriate value for any placeholder must be present before a statement is executed, but Oracle can perform the parsing work on a SQL statement before those values are substituted. Thus, for any series of SQL statements that differ only by the value of the literal values within them, it should be possible to reduce the parsing overhead by replacing those literal values with bind variables.

Failure to Use Bind Variables

We can easily prove the impact of not using bind variables with the following tests. We'll use the dbms_sql built-in package for our examples because each call to dbms_sql describes each phase of SQL statement processing.1 We will perform queries to a table, people , in which a single row lookup is done via a primary key.

SQL> create table people(pid primary key )
  2  organization index
  3  as select rownum from all_objects
  4  where rownum <=10000;

Table created.

First, to get a measure for the cost of parsing, let's consider the case where a string to get each person is simply built by concatenation. Each query will be along the lines of

select pid from people where pid =123;
select pid from people where pid =124;
...etc....

Every SQL statement will be regarded as brand new by the SQL engine. We will run 10,000 different queries to the person table, each SQL simply getting the details for the particular person row.

SQL> create or replace
  2  procedure literals is
  3    c number;
  4    p number;
  5    x number;
  6    x1 number;
  7  begin
  8    for i in 1 ..10000 loop

When using dbms_sql, we must first open a cursor using the OPEN_CURSOR function for the query we are about to execute. We then parse the text of the statement using the parse procedure. We then execute the statement and fetch the row from the cursor. When we are finished with each statement, we close its cursor with the close procedure.

  9    c :=dbms_sql.open_cursor;
 10    dbms_sql.parse(c,
 11      'select pid from people '||
 12      'where pid ='||i,dbms_sql.native);
 13    x :=dbms_sql.execute(c);
 14    x1 :=dbms_sql.fetch_rows(c);
 15    dbms_sql.close_cursor(c);
 16   end loop;
 17 end;
 18 /

Procedure created.

The SQL*Plus timing facility yields the total elapsed time, but we want to distinguish between the time spent executing each of the SQL statements in the procedure and the time spent parsing. For this, we need to use the sql_trace facility.

SQL> set timing on

SQL> alter session set sql_trace =true;

Session altered.

SQL> exec literals;

PL/SQL procedure successfully completed.

Elapsed:00:00:29.68

SQL> alter session set sql_trace =false;

Session altered.

Without any further investigation, it's quite possible that a developer would be satisfied with this result. After all, we ran 10,000 queries in 30 seconds. When we run Tkprof 2 on the raw trace data, the summary produces a startling result.

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
call     count      cpu  elapsed   disk   query  current   rows
-------  ------  ------  -------  -----  ------  -------  -----
Parse     10081   25.72    26.96      0     310        0      0
Execute   10134    1.70     1.63      0       0        0      0
Fetch     10492    0.76     0.86      0   20488        0  10425
-------  ------  ------  -------  -----  ------  -------  -----
total     30707   28.19    29.45      0   20798        0  10425

We chose the total for the recursive statements because each of the queries to the people table will be a child of the main procedure, literals.

The 10,000 queries needed only 2.49 seconds (that is, 1.63 + 0.86) to execute and fetch the rows, but before that could be done, we needed 26.96 seconds just to do the preparatory work. Parsing the queries took 10 times the effort of actually running them! This is an extreme (but sadly common in the real-world) example of the cost of running distinct SQL for every statement, and dramatically demonstrates the cost of parsing. Notice also that virtually all this parsing time was heavy-duty CPU work (28.19 out of 29.45 seconds).

Nevertheless, the CPU cost is not the main problem here. Let's look at what happens when we run four versions of the literals procedure in four concurrent sessions on a machine with four CPUs. Even though we've seen that the parsing is expensive in terms of CPU, with four CPUs, we should be able to run the four programs in approximately the same elapsed time as the single case on a single CPU.

The literals procedure was slightly altered to include a unique tag for each SQL, so that each procedure in the four concurrent sessions will be executing unique SQL.

SQL> create or replace
  2  procedure literals(tag number)is
  3    c number;
  4    p number;
  5    x number;
  6    x1 number;
  7  begin
  8    for i in 1 ..10000 loop
  9      c :=dbms_sql.open_cursor;
 10      dbms_sql.parse(c,
 11        'select pid t'||tag||'from people '||
 12        'where pid ='||i,dbms_sql.native);
 13      x :=dbms_sql.execute(c);
 14      x1 :=dbms_sql.fetch_rows(c);
 15      dbms_sql.close_cursor(c);
 16      end loop;
 17    end;
 18 /

We then ran "literals(1)" in the first session, "literals(2)" in the second, and so on. The elapsed times were as follows (remember these were run concurrently, not one after the other):

SQL> exec literals(1);
Elapsed: 00:00:31.70
SQL> exec literals(2);
Elapsed: 00:00:32.05
SQL> exec literals(3);
Elapsed: 00:00:31.43
SQL> exec literals(4);
Elapsed: 00:00:32.21

What happened here? Remember that when we ran just the single literals procedure, it took 29.68 seconds. Because we have 4 CPUs, theoretically, each CPU should have run flat out and returned control to the user after just under 30 seconds. Somewhere we lost a second or so. To see where time is lost for a session, we need to consult the V$SESSION_EVENT view to see the wait statistics.

Note: For a full description of the benefit of wait statistics, refer to the revolutionary Yet Another Performance Profiling Method by Anjo Kolk, Shari Yamaguchi, and Jim Viscusi at oraperf.veritas.com.

Whereas we would like either the CPU or disks to be active processing our requests, the wait statistics indicate the time where work could not be done. In each session we ran

SQL> select sid,event,time_waited
  2  from v$session_event
  3  where sid =...
  4  and event ='latch free';

Where the SID was the unique session ID for each of the 4 sessions that ran the test. When the results were tabled, the following was observed:

SID      EVENT        TIME_WAITED
-------  -----------  -----------
     43  latch free            79
     44  latch free            72
     45  latch free            69
     46  latch free            87

Oracle has to protect access to the important memory structures to ensure that while one session is parsing, no other session can modify any of those memory structures on which parsing is dependent. We can see that approximately 0.8 seconds per session (3 percent) got wasted on the "latch free" event; that is, the sessions spent time waiting for their turn to access the common resources required to parse a SQL statement.

NOTE If you are using Oracle 9.2, there is a bug with session-level events—the SID value is incorrect by 1. Thus the wait event statistics for a session with (for example) SID=42 as viewed from V$SESSION will be found as SID=41 in V$SESSION_EVENT. This problem is fixed in version 10 and 9.2.0.4

Parsing doesn't just cost you vital CPU time, it stops other sessions from getting access to resources that they need, namely, the library and dictionary cache. You cannot solve this with more hardware—parsing stops you from getting the return you should be getting on the hardware you already have.

Bind Variables to the Rescue

Let's return to our example scenario from earlier in this section. We want to maximize the chances of an SQL statement being found in our log of previously processed SQL statements. In order to achieve that, we need to use bind variables. With a small change to the literals procedure, we can create a new procedure, binding, which takes advantage of bind variables.

SQL> create or replace
  2  procedure binding is
  3    c number;
  4    p number;
  5    x number;
  6    x1 number;
  7  begin
  8  for i in 1 ..10000 loop
  9    c :=dbms_sql.open_cursor;
 10    dbms_sql.parse(c,
 11      'select pid from people '||
 12      'where pid =:b1 ',dbms_sql.native);

Notice that the SQL statement that we parse never actually changes. It is only after we have parsed the statement that we assign a value to the bind variable. This is the key difference here. From the point of view of hard parsing, in this procedure we are running 10,000 identical SQL statements. We'll see shortly how this makes a large difference to how Oracle will process the statement.

 13     dbms_sql.bind_variable(c,':b1',i);
 14     x :=dbms_sql.execute(c);
 15     x1 :=dbms_sql.fetch_rows(c);
 16     dbms_sql.close_cursor(c);
 17   end loop;
 18 end;
 19 /

 Procedure created.

We then run and trace the execution as per the previous test

SQL> alter session set sql_trace =true;

Session altered.

SQL> exec binding

PL/SQL procedure successfully completed.

SQL> alter session set sql_trace =false;

Session altered.

A look at the trace file seems to show that we are still parsing (the parse count is still 10,000), but somehow the performance is dramatically improved.

select pid
from
  people where pid =:b1
call     count      cpu  elapsed   disk   query  current   rows
-------  ------  ------  -------  -----  ------  -------  -----
Parse     10000    0.99     1.02      0       0        0      0
Execute   10000    1.17     1.26      0       0        0      0
Fetch     10000    0.77     0.56      0   20000        0   9999
-------  ------  ------  -------  -----  ------  -------  -----
total     30000    2.93     2.85      0   20000        0   9999

So the routine is still performing lots of parses—after all, we called the dbms_sql.parse routine 10,000 times, so the figures make sense. To explain where the performance benefit was obtained, we need to look at the session-level statistics. To do so, first we create a view that makes it easy to obtain session-level statistics.

SQL> create or replace
  2  view V$MYSTATS as
  3  select s.name,m.value
  4  from v$mystat m,v$statname s
  5  where s.statistic#=m.statistic#;

View created.

SQL> grant select on V$MYSTATS to public;

Grant succeeded.

SQL> create or replace public synonym V$MYSTATS for V$MYSTATS;

Synonym created.

Now we can look at the statistics that correspond to parsing on the system.

SQL> select *from v$mystats
  2  where name like 'parse%';

NAME                                 VALUE
------------------------------  ----------
parse time cpu                         107
parse time elapsed                     137
parse count (total)                  10019
parse count (hard)                       2
parse count (failures)                   0

The key statistic is "parse count (hard)." Although in our binding procedure we asked Oracle to parse a SQL statement 10,000 times, Oracle performed this very expensive task only twice—once for the execution of the binding procedure, and once for the first SQL statement parsed. The remaining 9,999 calls to parse did not require a full parse because the SQL statement is unchanged between executions. We reused the parsing information. This is soft parsing—an explicit parse call was issued but the SQL code found in the shared pool could be reused. That is the beauty of bind variables—the likelihood of an individual SQL statement being reused is far greater. We can do even better than that—as we saw in the binding procedure, the SQL statement we parsed did not change for each iteration of the loop. So if the SQL does not change between executions, we do not need to parse it again at all, and we can take the parse call out of the loop altogether, yielding the following revised solution.

SQL> create or replace
  2  procedure binding is
  3    c number;
  4    p number;
  5    x number;
  6    x1 number;
  7  begin
  8    c :=dbms_sql.open_cursor;
  9    dbms_sql.parse(c,
 10      'select pid from people '||
 11      'where pid =:b1',dbms_sql.native);

Notice the subtle difference here. The parse call is performed only once and is no longer within the loop.

 12    for i in 1 ..10000 loop
 13      dbms_sql.bind_variable(c,':b1',i);
 14      x :=dbms_sql.execute(c);
 15      x1 :=dbms_sql.fetch_rows(c);
 16    end loop;
 17    dbms_sql.close_cursor(c);
 18 end;
 19 /

Procedure created.

When we execute and trace the execution of this new version, the trace file shows even better performance.

select pid
from
  people where pid =:b1

call     count      cpu  elapsed   disk   query  current   rows
-------  ------  ------  -------  -----  ------  -------  -----
Parse    1         0.00     0.00      0       0        0      0
Execute  10000     0.65     0.39      0       0        0      0
Fetch    10000     0.27     0.26      0   20000        0   9999
-------  ------  ------  -------  -----  ------  -------  -----
total    20001     0.92     0.66      0   20000        0   9999

We only have a single parse call now. By simply reducing the amount of parsing, performance has been improved from an initial 30 seconds to less than 1 second. Many developers simply refuse to believe me when I tell them that Oracle can run 10,000 SQL queries in less than one second.

The examples just shown are carefully crafted to use the dbms_sql package to process SQL. One of the best things about PL/SQL is that it's easy to adopt the best practices of minimal parsing and bind variables. Let's recode the example using conventional static SQL.

SQL> create or replace
  2  procedure EASY_AS_THAT is
  3    x1 number;
  4  begin
  5    for i in 1 ..10000 loop
  6      select pid into x1
  7      from people
  8      where pid =i;
  9    end loop;
 10  end;
 11  /

Procedure created.

When we execute this, the trace file looks identical to the (optimal) results of the previous example. PL/SQL automatically uses bind variables for any PL/SQL variables, and minimizes parsing for static SQL. Oracle expects you to program using bind variables, and consequently PL/SQL makes it very easy for us to use them. As we said at the start, this is a strong argument for using PL/SQL in your applications. It naturally lends itself to building high-performance, scalable Oracle applications.

If the only language you use for working with Oracle is PL/SQL, you will probably be unaware that this is not generally the case with other 3GL languages. You will often have a lot more work to do to ensure appropriate usage of bind variables. Let's pick up again on the point made earlier in the chapter about Java, and the use of the JDBC Statement and PreparedStatement. In JDBC, the easiest way to process a SQL statement is to use a statement object. It is quite common to see code such as the following, which does not use bind variables:

Statement stmt =conn.createStatement();
for (int i =0;i <10000;i++){
Efficient PL/SQL
  ResultSet rs =stmt.executeQuery("select pid from people
                                    where pid ="+i );
  stmt.close();
}

As we proved, this code will not scale. It is much more effective to put in a little more coding effort and use PreparedStatement and bind variables, as follows:

  PreparedStatement ps;
  for (int i =0;i <10000;i++){
    pstmt =conn.prepareStatement("select pid from people
                                   where pid =?");
    pstmt.setInt(1,i);
    ResultSet rs =pstmt.executeQuery();
    pstmt.close();
  }
}

A few more lines of code are required to avoid the parsing cost. However, even this is not the whole story. The code achieves a result equivalent to the example that performs an unnecessary amount of soft parsing. Each and every time, we open pstmt, execute our SQL, and then close it again. To eliminate this, for each new SQL statement, we should parse only once and execute as many times as necessary. In Java, we do this using the singleton pattern, as follows:

static PreparedStatement pstmt;
...
  if (pstmt ==null){
    pstmt =conn.prepareStatement("select pid from people
                                  where pid =?");
  }
  for (int i =0;i <10000;i++){
    pstmt.setInt(1,i);
    pstmt.execute();
  }

Doing it the right way in JDBC takes quite a bit of thought and significantly more code.

We do not want to drift too far away from PL/SQL here, but it's hard to overstate the importance of this issue. If you were to act on every other tip we provide in this book about creating efficient PL/SQL except building your application to use bind variables, chances are your work will have been in vain and your application will not scale. We'll revisit the costs of parsing when we cover dynamic SQL in Chapter 5, "PL/SQL Optimization Techniques."

Use the PL/SQL Features Available

In this section, we would like to focus in on a particular aspect of the PL/SQL features, which is often expressed by the clichi "Don't reinvent the wheel." We preferto extend the cliché to be "Don't reinvent a wheel that is typically more complicated, slower, and sometimes just plain wrong."

The power of PL/SQL is of course not lost on Oracle, which delivers a lot of powerful functionality using PL/SQL, which you should know about and exploit within your own applications. We'll come back to this again in Chapter 2, "Package It All Up," when we look at some of the Oracle supplied packages, but it is useful to cover a few simple examples here, in order to illustrate the point.

Use the Provided Error Handling Facilities

Consider the following procedure, update_emp. It accepts as parameters an employee number and the factor by which we will be decreasing their salary. A simple SQL statement then effects the change.

SQL> create or replace
  2  procedure UPDATE_EMP(p_empno number,p_decrease number)is
  3  begin
  4    update EMP
  5    set SAL =SAL /p_decrease
  6    where empno =p_empno;
  7  end;
  8  /

Procedure created.

To decrease the salary of employee 7379 by a factor of 2, we would simply call the procedure as follows:

SQL> exec UPDATE_EMP(7369,2);

PL/SQL procedure successfully completed.

It doesn't take a university degree to see where this procedure could encounter some problems—we simply pass a value of 0 for the p_decrease para-meter.

SQL> exec UPDATE_EMP(7369,0);
BEGIN UPDATE_EMP(7369,0);END;
*
ERROR at line 1:
ORA-01476:divisor is equal to zero
ORA-06512:at "UPDATE_EMP",line 3
ORA-06512:at line 1

But it is at this point that many developers take an incorrect approach to enhancing the code to make it more robust. The processing of errors within PL/SQL is based around the exception handler model, namely, capturing errors as they occur and either taking remedial action or propagating the error back to the calling environment. However, developers regularly avoid using the exception-handling facilities by attempting to anticipate every possible error, and thus ensuring that a PL/SQL program always succeeds. For example, we can erroneously extend the update_emp procedure to return a Boolean variable indicating whether the call was successful. To set this variable, we check for the validity of the p_decrease parameter before performing the update.

SQL> create or replace
  2 procedure UPDATE_EMP(p_empno number,p_decrease number,
  3 p_success out boolean)is
  4 begin
  5 if p_decrease =0 then
  6 p_success :=false;
  7 else
  8 update EMP
  9 set SAL =SAL /p_decrease
  10 where empno =p_empno;
  11 p_success :=true;
  12 end if;
  13 end;
  14 /

Procedure created.

It is impossible to anticipate every error that could occur in a PL/SQL program. To attempt to do so merely adds complexity to the code, and may even cause issues with data integrity—something that we will revisit when we cover transaction management within PL/SQL in Chapter 4, "Effective Data Handling."

Binary Operations

To use the available PL/SQL features, first learn which features are available. (If you are thinking this is a polite way of saying: "Go reread those manuals," you're correct.) For example, if we need to perform the logical numeric operations, AND, OR, and XOR, with a little bit of arithmetic, the following PL/SQL function can be built for binary AND. We will not delve too deeply into the specifics of how it works because as we will see imminently, creating such a function is a wasted effort.

SQL> create or replace
  2  function binary_and(x number,y number)return number is
  3    max_bin number(22):=power(2,64);
  4    l_x number :=x;
  5    l_y number :=y;
  6    result number :=0;
  7 begin
  8   for i in reverse 0 ..64 loop
  9     if l_x >=max_bin and l_y >=max_bin then
 10       result :=result +max_bin;
 11     end if;
 12     if l_x >=max_bin then
 13       l_x :=l_x -max_bin;
 14     end if;
 15     if l_y >=max_bin then
 16       l_y :=l_y -max_bin;
 17     end if;
 18     max_bin :=max_bin/2;
 19   end loop;
 20   return result;
 21 end;
 22 /

Function created.

This is a wasted effort because such a function already exists, namely the BITAND function. This is perhaps a special case, and a developer could possibly be excused for building his own version because although the BITAND function existed as far back as version 7 (and probably before), Oracle didn't document its existence until version 8.1.7. Performance-wise, there is no comparison. Comparing 50,000 executions of the home-grown BITAND PL/SQL function to its native counterpart using the SQL*Plus timing facility shows the dramatic difference.

SQL> declare
  2    x number;
  3 begin
  4   for i in 1 ..50000 loop
  5     x:=binary_and(i,i+1);
  6   end loop;
  7 end;
  8 /

PL/SQL procedure successfully completed.

Elapsed:00:00:07.07
SQL> declare
  2    x number;
  3 begin
  4   for i in 1 ..50000 loop
  5     x:=bitand(i,i+1);
  6   end loop;
  7 end;
  8 /

PL/SQL procedure successfully completed.

Elapsed:00:00:00.01

Filling in the Blanks

It's difficult in SQL to generate result sets of data that don't necessarily exist in the database at all. For example, for the requirement to generate a list of database objects created in the last two weeks by day, every day must be listed even if no objects were created. The conventional workaround to this problem is a "filler" table—an example of which you'll see in the next section of this chapter. We can demonstrate this to produce the required list of recently created objects. First we create a table, src , which contains the digits 1 to 9999 as rows. (The name "src" was chosen because this table will serve as an arbitrary source of rows.)

SQL> create table SRC (x number )pctfree 0;

Table created.

SQL> insert into SRC

  2 select rownum
  3 from all_objects
  4 where rownum <10000;

9999 rows created.

Now we'll create a copy of the ALL_OBJECTS view in a table T1 with the created column truncated down to remove the time component from the date.

SQL> create table T1 as
  2 select trunc(created)created
  3 from all_objects;

Table created.

By joining to the src table, we can report the objects created in the last two weeks, even if no objects were created on a particular day. A simple outer-join serves the purpose.

SQL> select trunc(sysdate)-14+x created,count(created)no_of_obj
  2 from t1,src
  3 where trunc(sysdate)-14+x =t1.created(+)
  4 and x <=14
  5 group by trunc(sysdate)-14+x
  6 /

CREATED            NO_OF_OBJ
---------     --------------
30/MAY/03                  0
31/MAY/03                  0
01/JUN/03                  0
02/JUN/03                  0
03/JUN/03                  0
04/JUN/03                  0
05/JUN/03                  0
06/JUN/03                  0
07/JUN/03                  0
08/JUN/03                  0
09/JUN/03                 41
10/JUN/03                  4
11/JUN/03                  6
12/JUN/03                  8

Once again, understanding what facilities are available in PL/SQL could lead to a solution that does not require an additional table, which also resolves the issue of how many rows in src are enough. The pipeline function facility that is new to version 9 can also be used as a mechanism for artificially generating as many rows as we see fit. (See Chapter 5 for a full description of pipelined functions. 3) First we need to create a nested table collection type (because all pipelined functions must return such a datatype).

SQL> create or replace
  2 type date_list is table of date;
  3 /

Type created.

Next we create a pipelined function that pipes back rows ranging from a provided starting date (parameter p_start_date) up to any limit we specify (parameter p_limit).

SQL> create or replace
  2  function pipe_date(p_start date,p_limit number)
  3  return date_list pipelined is
  4  begin
  5    for i in 0 ..p_limit-1 loop
  6      pipe row (p_start +i);
  7    end loop;
  8    return;
  9  end;
 10  /

Function created.

Instead of using the src table, we can now call our pipeline function PIPE_DATE to artificially create as many rows as we desire. In this case, we only require 14 rows back from the function. Our query now looks like this.

SQL> select column_value,count(created)no_of_obj
  2 from t1,table(pipe_date(trunc(sysdate)-14,14))
  3 where column_value =t1.created(+)
  4 group by column_value
  5 /

COLUMN_VA         NO_OF_OBJ
---------    --------------
29/MAY/03                 0
30/MAY/03                 0
31/MAY/03                 0
01/JUN/03                 0
02/JUN/03                 0
03/JUN/03                 0
04/JUN/03                 0
05/JUN/03                 0
06/JUN/03                 0
07/JUN/03                 0
08/JUN/03                 0
09/JUN/03                41
10/JUN/03                 4
11/JUN/03                 6

We can also assist the optimizer by telling it how many rows we will be returning from our pipeline function. We can use the CARDINALITY hint to assist Oracle with its optimization. In the example just shown, we can tell the optimizer that there will be 14 rows returned as follows:

SQL> select /*+CARDINALITY(t 14)*/
       column_value,count(created)no_of_obj
  2  from t1,table(pipe_date(trunc(sysdate)-14,14))t
  3  where column_value =t1.created(+)
  4  group by column_value
  5  /

Don 't Use PL/SQL to Do the Job of SQL

PL/SQL is often used excessively. The very first sentence of the PL/SQL Users Guide and Reference book that comes as part of the Oracle 9.2 documentation is "PL/SQL, Oracle's procedural extension of SQL...." PL/SQL was designed as (and always has been) an extension to SQL, that is, a tool that can be used when SQL cannot do the job requested.

To demonstrate, let's look at a fictitious example that would appear to be tailor-made for PL/SQL. We will re-create the ubiquitous emp and dept tables for our example so that we can populate them with some larger sample data sizes.

SQL> drop table EMP;

Table dropped.

SQL>drop table DEPT;

Table dropped.

SQL> create table EMP (
  2   EMPNO       NUMBER(8),
  3   ENAME       VARCHAR2(20),
  4   HIREDATE    DATE,
  5   SAL         NUMBER(7,2),
  6   DEPTNO      NUMBER(6));

Table created.

SQL> create table DEPT (
  2   DEPTNO      NUMBER(6),
  3   DNAME       VARCHAR2(20));

Table created.

We add primary keys to the tables, which will in turn index the empno and deptno columns respectively in these tables.

SQL> alter table EMP add constraint EMP_PK
  2  primary key (EMPNO);

Table altered.

SQL> alter table DEPT add constraint DEPT_PK
  2  primary key (DEPTNO);

Table altered.

Before proceeding, we will re-create the src table from the previous section to hold 200,000 rows of nonsensical data. We'll use this table as a source of data to populate other tables in this example. We don't care what is in the table rows, just that there are at least 200,000 of them. If your system already has such a table, you can use that one in any of the following examples. Alternatively, if you are using version 9 or above, you can use the pipeline function solution presented in the previous section.

SQL> create table SRC (x varchar2(10));

Table created.

SQL>begin
  2  for i in 1 ..200000 loop
  3   insert into SRC values ('x');
  4  end loop;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>commit;

Commit complete.

Let's now populate the emp table with 500 employees by using the rownum pseudo-column to generate names and date of hiring, dbms_random to generate some random salaries, and ensure that the department (deptno) ranges between 1 and 10.

SQL> insert into EMP
  2  select rownum,
  3      'Name'||rownum,
  4      sysdate+rownum/100,
  5      dbms_random.value(7500,10000),
  6      dbms_random.value(1,10)
  7  from SRC
  8  where rownum <=500;

500 rows created.

We now do a similar exercise with the dept table, simply adding 10 rows with a deptno ranging from 1 to 10.

SQL> insert into DEPT
  2  select rownum,'Dept'||rownum
  3  from SRC
  4  where rownum <=10;

10 rows created.

Now there are 500 employee records, each assigned to one of 10 departments. The reason for creating this example is that the CEO is disgusted at the vast difference between the salaries of the highest income earners in each department versus the pittance that the low incomes earners currently receive. (As we said, it is a fictitious example!)

The CEO wants to balance the distribution of salaries within each department so that there is less of a discrepancy. After a detailed analysis, a module specification has been produced as follows:

Please code a module report_sal_adjustment that runs in less than three seconds and fetches each employee record from the emp tables and performs the following:

  1. Determine the average salary for this employees department.
  2. If the employee's salary differs by more than 20 percent of the average salary, add a row to the emp_sal_log table with the employee number, name, department name, hiring date, and the amount that the current salary differs from the average salary.
  3. If this employee's salary is the lowest in the department, flag this employee by setting the min_sal column in emp_sal_log to "Y," otherwise set this column to "N."

Based on the module specifications you have probably seen over the years, this would rate as a fairly good one, and strangely enough, herein lies the fundamental problem. Procedural specifications yield procedural solutions.

In an attempt to make the module specification easy and straightforward to follow, it has been written in a way that lends itself to a procedural solution. The specification outlines a number of discrete steps to be taken for each employee in the organization. Let's look at a solution that represents a direct mapping of the specification to our database. First we will need a table called emp_sal_log in which to record the results of our report.

SQL> create table EMP_SAL_LOG (
  2   ENAME       VARCHAR2(20),
  3   HIREDATE    DATE,
  4   SAL         NUMBER(7,2),
  5   DNAME       VARCHAR2(20),
  6   MIN_SAL     VARCHAR2(1));

Table created.

Now let's look at a PL/SQL program that satisfies the requirements of the specification. We'll break it into sections that align with the module specification that was used to build the program.

SQL> create or replace
  2  procedure report_sal_adjustment is

We will need some variables to store the average and minimum salaries for each department, as well as the department name for an employee.

  3  v_avg_dept_sal emp.sal%type;
  4  v_min_dept_sal emp.sal%type;
  5  v_dname        dept.dname%type;

We will have a cursor to allow us to loop through each employee in the EMP table.

  6  cursor c_emp_list is
  7    select empno,ename,deptno,sal,hiredate
  8    from emp;
  9  begin

Here is where the real work begins. We start fetching through our cursor, picking up each employee row in turn. Using this employee's department number, we can determine the average salary for this department (module specification component #1).

 10  for each_emp in c_emp_list loop
 11    select avg(sal)
 12    into   v_avg_dept_sal
 13    from   emp
 14    where  deptno =each_emp.deptno;

Now we compare the employee's salary with the average salary for his or her department. If it's more than 20 percent away from the average, we want to log a row in the emp_sal_log table (module specification component #2). Before we can do that, however, we need to look up the name of the employee's department, at which point we may as well pick up the lowest salary for the department as well because we will need that shortly to satisfy module specification component #3.

 15  if abs(each_emp.sal -v_avg_dept_sal )/v_avg_dept_sal >0.20 then
 16    select dept.dname,min(emp.sal)
 17    into   v_dname,v_min_dept_sal
 18    from   dept,emp
 19    where  dept.deptno =each_emp.deptno
 20    and    emp.deptno =dept.deptno
 21  group by dname;

Before we insert a record into the emp_sal_log table, we need to know if this employee has the lowest salary for this department. Comparing the minimum department salary we just picked up with the employee's salary allows us to set the min_sal flag in emp_sal_log.

 22        if v_min_dept_sal =each_emp.sal then
 23          insert into emp_sal_log
 24          values (each_emp.ename,each_emp.hiredate,
 25                  each_emp.sal,v_dname,'Y');
 26        else
 27          insert into emp_sal_log
 28          values (each_emp.ename,each_emp.hiredate,
 29                  each_emp.sal,v_dname,'Y');
 30        end if;
 31      end if;
 32    end loop;
 33  end;
 34  /

Procedure created.

A couple of executions (not shown) confirm that the procedure satisfies the functional requirements, so let's look at how efficient it is. When run against the sample data that was generated earlier (that is, 500 employees and 10 departments), all appears well.

SQL> exec report_sal_adjustment

PL/SQL procedure successfully completed.

Elapsed:00:00:00.03

Magic! It ran in less then three seconds (one of the requirements of our module specification) so our code is ready for deployment into production! And depending on the size of the company, this might be a perfectly adequate solution. But what happens when we scale this up to a larger number of employees. This company (or a company that we might be hoping to sell this solution to) might have thousands of employees. To test the efficiency of the solution under various scenarios (after all, we want to demonstrate that our code is efficient), a simple SQL*Plus script can be used to test the performance with varying employee and departmental populations. Follow the REM lines in the script to see how it works.

rem REPTEST.SQL
rem ......
rem Takes as input two parameters,the first being
rem the number of employees,the second the number
rem of departments.Employees are assigned at random
rem to departments,and salaries are randomised between
rem 7500 and 10000
rem
set termout off

rem Number of employees will be passed as the first parameter,
rem number of departments as the second.We want to assign these
rem values to two SQL Plus substitution variables called NUM_EMPS
rem and NUM_DEPTS respectively

col x new_value num_emps
col y new_value num_depts

rem Select the values from DUAL to complete the assigment

select &1 x,&2 y from dual;

rem Now we erase the contents of the EMP table and DEPT table
rem in preparation for population with sample data

truncate table EMP reuse storage;
truncate table DEPT reuse storage;

rem Now we load up the EMP table just like we did in the
rem previous example,using the SRC table to generate as
rem many rows as we require

insert into EMP
select rownum,
       'Name'||rownum,
       sysdate+rownum/100,
       dbms_random.value(7500,10000),
       dbms_random.value(1,&num_depts)
from SRC
where rownum <=&num_emps;
rem We do a similar exercise with the DEPT table to load
rem the required number of department rows

insert into DEPT
select rownum,'Dept'||rownum
from SRC
where rownum <=&num_depts;

rem And for good measure,we will calculate optimizer
rem statistics on the two tables,because no-one should
rem be using the rule based optimizer anymore

analyze table emp compute statistics;
analyze table dept compute statistics;

rem We truncate our results table in preparation for the test
truncate table EMP_SAL_LOG;


rem And now,here is the real testing portion of the script
rem We will run the procedure 3 times,and report an average
rem execution time across the 3 executions.To do this,we
rem use the DBMS_UTILITY.GET_TIME function to give start and
rem end timings for the test . the delta of these two times
rem is the total time elapsed (in centiseconds).Dividing
rem by 100 and then by 3,gives us the average execution
rem time in seconds.
set serverout on
set termout on
declare
  x number :=dbms_utility.get_time;
begin
  for i in 1 ..3 loop
    report_sal_adjustment;
  end loop;
  dbms_output.put_line('Average run time:'||
                       round((dbms_utility.get_time-x)/3/100,2));
end;
/

Now we can simply generate any size employee/department base for our test and get the average run time for the procedure based on three executions. Let's look at the output:

SQL> @c:\reptest 500 50
Average run time:.3

PL/SQL procedure successfully completed.

SQL> @c:\reptest 1000 100
Average run time:.93

PL/SQL procedure successfully completed.

SQL> @c:\reptest 1500 150
Average run time:1.87

PL/SQL procedure successfully completed.

SQL> @c:\reptest 2000 200
Average run time:3.3

PL/SQL procedure successfully completed.

SQL> @c:\reptest 2500 250
Average run time:4.96

PL/SQL procedure successfully completed.

The developer's apparently fast piece of code scales very poorly. In fact, running a large number of tests shows that the elapsed time increases exponentially with a linear increase in the employee numbers. Given that the program was supposed to run within three seconds, we can see that (on the hardware used for testing) we have a 1900 employee limit on the solution. After this, our solution no longer works because it takes longer than three seconds.

Luckily, we caught this flaw in our testing, so the poor developer is marched into the office and told to improve the performance of his code. Here is where the second fundamental problem occurs.

Improving a Procedural Solution Yields another Procedural Solution

If you want to make things run faster, you need to do less work. Our developer observes that most of the work appears to be queries to the emp table, so if he can avoid that, he can improve performance. As a first cut, he can merge the two queries that look up average and minimum salaries:

SQL> create or replace
  2  procedure report_sal_adjustment2 is
  3  v_avg_dept_sal emp.sal%type;
  4  v_min_dept_sal emp.sal%type;
  5  v_dname        dept.dname%type;
  6  cursor c_emp_list is
  7    select empno,ename,deptno,sal,hiredate
  8    from emp;
  9 begin
 10   for each_emp in c_emp_list loop

Here we can pick up the department name and minimum salary at the same time we get the average salary.

 11   select avg(emp.sal),min(emp.sal),dept.dname
 12   into v_avg_dept_sal,v_min_dept_sal,v_dname
 13   from  dept,emp
 14   where dept.deptno =each_emp.deptno
 15   and   emp.deptno =dept.deptno
 16   group by dname;

The rest of the code is unchanged.

 17     if abs(each_emp.sal -v_avg_dept_sal )/v_avg_dept_sal >0.20
           then
 18       if v_min_dept_sal =each_emp.sal then
 19         insert into emp_sal_log
 20         values (each_emp.ename,each_emp.hiredate,
 21                 each_emp.sal,v_dname,'Y');
 22       else
 23         insert into emp_sal_log
 24         values (each_emp.ename,each_emp.hiredate,
 25                 each_emp.sal,v_dname,'Y');
 26       end if;
 27     end if;
 28   end loop;
 29 end;
 30 /

Procedure created.

We adjust the REPTEST.SQL script to now call the new report_sal_adjustment2 procedure. Running this through for similar employee and department numbers yields the results shown in Table 1-3.

Table 1-3. Results of Changes to REPORT_SAL_ADJUSTMENT


EMPLOYEES     DEPARTMENTS     ELAPSED TIME
500500.27
10001000.86
15001501.68
20002003.03
25002504.64

We can see that performance has improved by approximately 10 percent, which raises the employee ceiling to around 2000, but that's hardly a massive gain. What if there are 50,000 employees in the organization?

Encouraged by the gains on the first bout of tuning, our developer can explore further alternatives. We have seen that running less SQL against the emp table seems to make things run faster. Perhaps an even more efficient solution will be to pre-fetch all the average department salaries into a memory-based lookup table so no additional SQL will be required. Being a savvy developer who always keeps up with the latest and greatest PL/SQL functionality, he knows that this can be achieved using a collection. With some additional code, we arrive at a new solution.

SQL> create or replace
  2  procedure report_sal_adjustment3 is

Now we need some type definitions to hold an array (or list) of departmental details, namely the average salary, the minimum salary, and the department name. This is done in two sections—a record to hold each row of details, and an array of those records.

  3    type dept_sal_details is record (
  4      avg_dept_sal emp.sal%type,
  5      min_dept_sal emp.sal%type,
  6      dname        dept.dname%type );
  7    type dept_sals is table of dept_sal_details
  8        index by binary_integer;
  9    v_dept_sal dept_sals;
 10    cursor c_emp_list is
 11      select empno,ename,deptno,sal,hiredate
 12      from emp;

We have a new cursor to retrieve the department summary details—this will be the source of the entries we will add to our PL/SQL table.

 13    cursor c_dept_salaries is
 14      select avg(sal)asal,min(sal)msal,dname,dept.deptno
 15      from dept,emp
 16      where emp.deptno =dept.deptno
 17      group by dname,dept.deptno;
 18 begin

As a preliminary step, we collect the departmental summary detail and add it to our PL/SQL table. Because the deptno is numeric, it can also serve as the index for our PL/SQL table.

 19   for i in c_dept_salaries loop
 20     v_dept_sal(i.deptno).avg_dept_sal :=i.asal;
 21     v_dept_sal(i.deptno).min_dept_sal :=i.msal;
 22     v_dept_sal(i.deptno).dname        :=i.dname;
 23   end loop;
 24   for each_emp in c_emp_list loop

In our main processing loop, we no longer need to do any more lookups to the dept table. We simply reference the appropriate information from the PL/SQL table (dept_sal).

 25      if abs(each_emp.sal - 
            v_dept_sal(each_emp.deptno).avg_dept_sal )/
 26         v_dept_sal(each_emp.deptno).avg_dept_sal >0.20 then
 27        if v_dept_sal(each_emp.deptno).min_dept_sal =
              each_emp.sal then
 28          insert into emp_sal_log
 29            values (each_emp.ename,each_emp.hiredate,
 30                    each_emp.sal,v_dept_sal
                       (each_emp.deptno).dname,'Y');
 31         else
 32           insert into emp_sal_log
 33            values (each_emp.ename,each_emp.hiredate,
                       each_emp.sal,
 34                    v_dept_sal(each_emp.deptno).dname,'Y');
 35         end if;
 36       end if;
 37     end loop;
 38   end;
 39 /

Procedure created.

At the cost of a little more code complexity, we have managed to reduce our code to a single pass through the dept table and a single pass through the emp table. Look at the impressive results we get through testing through the REPTEST.SQL script (after adjusting it to run report_sal_adjustment3).

Table 1-4. Further Refinements to REPORT_SAL_ADJUSTMENT


EMPLOYEES     DEPARTMENTS     ELAPSED TIME
500500.03
10001000.05
50005000.24
2500025001.24
5000050002.74

We have improved performance significantly. More importantly, the scalability now appears to be linear as opposed to exponential. However, with this solution, a new problem lurks under the covers. Storing the departmental information obviously consumes some memory. To observe just exactly how much memory is being chewed up, we need to look at the session statistics using the V$MYSTATS view we defined earlier. After running the test with 5,000 departments, let's look at how much memory was consumed by this session.

SQL> col value format 999,999,999
SQL> select *from v$mystats
  2  where name ='session pga memory max'
  3  /

NAME                                               VALUE
----------------------------------------    ------------
session pga memory max                         3,669,024

Our session used nearly 4 megabytes of memory. But we can now handle more than 50,000 employees within our time constraints. Problem solved...well, not really.

A solution exists that is even more efficient and has none of the associated memory overhead. In adding greater levels of complexity to the procedural solution, our developer has been getting further and further from the optimal result. We did not need all that complexity in our PL/SQL—we didn't need PL/SQL at all! The problem can be resolved with SQL using analytic functions. (Note that you will need to use Oracle 9 for this routine to compile as presented. If you have version 8, the insert statement will need to run as dynamic SQL; that is, wrapped within the EXECUTE IMMEDIATE command.)

SQL>create or replace
2 procedure report_sal_adjustment4 is
3 begin
4 insert into emp_sal_log
5 select e.empno,e.hiredate,e.sal,dept.dname,
6 case when sal >avg_sal then 'Y '
7 else 'N '
8 end case
9 from (
10 select empno,hiredate,sal,deptno,
11 avg(sal)over (partition by deptno )as avg_sal,
12 min(sal)over (partition by deptno )as min_sal
13 from emp )e,dept
14 where e.deptno =dept.deptno
15 and abs(e.sal -e.avg_sal)/e.avg_sal >0.20;
16 end;
17 /
Procedure created.

And that's it! Our procedure has been reduced to a single SQL statement. No lookup tables, no complicated code, just simple SQL. And when we benchmark it using the REPTEST.SQL script, we get the following astounding scalability results shown in Table 1-5.

Table 1-5. Optimal REPORT_SAL_ADJUSTMENT


EMPLOYEES     DEPARTMENTS     ELAPSED TIME
500500.01
50005000.08
5000050000.83
100000100001.71

Analytic Functions

If you are unfamiliar with the syntax shown in the report_sal_adjustment4 procedure, you have been missing out on one of Oracle's greatest database achievements. These analytic functions, which have been available since version 8.1.6, increase in functionality and power with each release. Unfortunately, they are not widely known because they were initially proposed primarily for data warehousing queries for such requirements as moving averages and regression statistics. As such, they are documented in the Data Warehousing guide (one of the standard manuals in the Oracle documentation set). Unfortunately, this manual isn't typically consulted except for those Oracle customers who are implementing warehouses. That is a great pity—analytic functions can achieve many remarkable things on any database.


Every major release of Oracle adds more diversity and more complexity to the SQL command set. As a consequence, there will (or should be) more tasks that can be achieved with standard SQL and not PL/SQL. One classic case of this that is worth a specific mention is the processing of results in a join (or view containing a join). We have lost count of the number of times we have seen a PL/SQL module built because the module specification ran along the lines of:

"A view V is based on tables, X, Y, and Z. Update the rows in table X for rows in the view V where (some criteria)."

A PL/SQL module is built because the general consensus among developers is that "you cannot update a join view." This was true way, way back in version 7.1, but for many years now, there is an entire class of views in Oracle that are allowed to be directly updated as though they were tables. Oracle even provides a data dictionary object (DBA_UPDATEABLE_COLUMS) that tells you exactly whether a view or part thereof is a candidate for DML.

Consider the following example module specification:

Increase the bonus by 10 percent for all employees who appear in the view YEARLY_BONUS.

Where this view is defined as

create or replace view YEARLY_BONUS as
select emp.empno,emp.ename,dept.dname,emp.bonus
from EMP,DEPT
where emp.hiredate <sysdate +1
and emp.deptno =dept.deptno

A PL/SQL solution that queries the view and then updates the underlying table is relatively easy to code.

create or replace
procedure XXX is
begin
for i in (select empno from yearly_bonus )loop
    update emp
  set bonus =bonus *1.1
  where empno =i.empno;
end;
/

But the best solution is even more unremarkable:

update yearly_bonus set bonus =bonus *1.1;

Which of course you could simply wrap within a PL/SQL procedure if required. That is all you need! In fact, even if there is not a view defined at all, you can still avoid any complex processing. You could simply code

update (select emp.ename,dept.dname,emp.bonus
        from EMP,DEPT
        where emp.hiredate <sysdate +1
        and emp.deptno =dept.deptno)
set bonus =bonus *1.1;

As long as you satisfy the requirements for updating a join view (See "Modifying a Join View" in the Application Developer fundamentals guide), you do not need, and you should not use PL/SQL.

Choose SQL Innovation Over PL/SQL

The hardest thing about using SQL instead of PL/SQL is convincing yourself that something can be coded in SQL when PL/SQL might seem the natural choice. Here are a few examples where SQL can be used but it does not appear obvious to do so. It is of course by no means the definitive set of ways in which SQL can be used instead of PL/SQL, and you will not find a manual anywhere which tells you when a SQL solution can be used instead of PL/SQL. The key is that you need to think outside the box when confronted with a problem that does not appear to be readily solvable with pure SQL.

Displaying a Calendar

PL/SQL combined with DBMS_OUTPUT.PUT_LINE seems to be the only way to display the calendar for the current month (or any month for that matter) because there isn't any obvious function within SQL to generate it. But with the common technique of using decode to pivot rows into columns, we can in fact generate a calendar purely with SQL. First, we will define a substitution variable mdate that holds the value of the date for which the calendar month will be generated.

SQL> col dte new_value mdate
SQL> select '23-JUN-03'dte from dual;

DTE
---------
23-JUN-03

And now a little SQL can be used to generate the month. We retrieve n rows from our src table where n is the number of days in the month, and then use a little date arithmetic to align the results into the appropriate day column.

SQL> select
  2     max(decode(dow,1,d,null))Sun,
  3     max(decode(dow,2,d,null))Mon,
  4     max(decode(dow,3,d,null))Tue,
  5     max(decode(dow,4,d,null))Wed,
  6     max(decode(dow,5,d,null))Thu,
  7     max(decode(dow,6,d,null))Fri,
  8     max(decode(dow,7,d,null))Sat
  9  from
 10  (select rownum d,
 11          rownum-2+to_number(
 12                 to_char(trunc(
 13                   to_date('&mdate'),'MM'),'D'))p,
 14          to_char(trunc(to_date('&mdate'),'MM')
 15             -1+rownum,'D')dow
 16   from SRC
 17   where rownum <=to_number(to_char(
 18         last_day(to_date('&mdate')),'DD')))
 19 group by trunc(p/7)
 20 /

SUN  MON   TUE    WED   THU   FRI   SAT
---  ----  ----   ----  ----  ----  -----
                                      1
  2    3     4      5     6     7     8
  9   10    11     12    13    14    15
 16   17    18     19    20    21    22
 23   24    25     26    27    28    29
 30

6 rows selected.

Such SQL could then easily be folded away within a view to provide a calendar for any desired month.

Calculating the Median Value

The median has always been a classical problem to solve within a relational database because historically, although databases are fine at returning a rowset in order, they have been poor at processing a rowset in order, which is a prerequisite for calculating the median. A typical algorithm for calculating the median might look like:

  • Determine the count of the rows in the rowset and call it n .
  • If n is odd, fetch rows in ascending order until you get to row [n/2+1], this is the median.
  • If n is even, fetch rows in ascending order until you get to row [n/2].
  • The mean of this row and the next are the median.

However, a quick scan of the data-warehousing guide reveals a far easier solution, once again using an analytic function (this one being new to version 9).

SQL> select percentile_cont(0.5)
  2  within group (order by sal desc )median_sal
  3  from   emp;

MEDIAN_SAL
----------
     1550

More importantly, we still can observe the performance benefits of using SQL instead of PL/SQL when we test against a table larger than the standard emp. We've loaded 1,000,000 rows into the emp table, and then created two PL/SQL procedures—one that implements the median using the conventional algorithm listed earlier in this section, and another with the median calculated using the PERCENTILE_CONT function.

SQL> set timing on
SQL> exec median_using_old_method
6044.5

PL/SQL procedure successfully completed.
Elapsed:00:00:05.05
SQL> exec median_using_percentile
6044.5

PL/SQL procedure successfully completed.
Elapsed:00:00:01.00

On this system, using the percentile function is five times faster when the size of the underlying table is large.

Conditional Table Lookup

A common (but from a design perspective, hideous) occurrence in databases is a single column representing several different attributes, each of which requires lookup to a different reference code table. Such systems are often the result of migration of a file-based database to a relational model. For example, a specification to print salary options might read:

List all persons in the staff table plus their salary calculated as follows:
  • A part-time worker salary is the FIXED_PRICE from the part_time_package table
  • A contractor salary is the HRS *HRLY_RATE from the contract_package table
  • A permanent salary is the ANNUAL_SAL +BONUS from the perm_package table

Within the staff table, the class_type column indicates whether a staff member is a part-time, contract, or permanent employee. Let's set up the required tables to investigate the options for a solution. First we will re-create the emp and dept tables with the sample data that comes with the database software distribution.

SQL> @$ORACLE_HOME/sqlplus/demo/demobld.sql
Building demonstration tables.Please wait.
Demonstration table build is complete.

Then create some additional tables required for the example.

SQL> create table STAFF (
  2  staffno number,
  3  name varchar2(30),
  4  class_type number,
  5  class_id number);

Table created.

SQL> create table part_time_package (
  2  id number,
  3  fixed_price number );

Table created.

SQL> create table contract_package (
  2  id number,
  3  hrly_rate number,
  4  hrs number );

Table created.

SQL> create table perm_package (
  2  id number,
  3  annual_sal number,
  4  bonus number );

Table created.

We will then seed these tables with sample data, once again using the src table to give us an arbitrary number of rows.

SQL> insert into part_time_package
  2  select rownum,rownum*130
  3  from SRC where rownum <6;

5 rows created.

SQL> insert into contract_package
  2  select rownum,rownum*10,rownum*30
  3  from SRC where rownum <6;

5 rows created.

SQL> insert into perm_package
  2  select rownum,rownum*10000,rownum*500
  3  from SRC where rownum <6;

5 rows created.

SQL> insert into staff
  2  select empno,ename,mod(rownum,3)+1,mod(rownum,5)+1 from emp;

14 rows created.

At first glance, a PL/SQL solution seems called for because the determination of which table to look up is driven by the data in another. A PL/SQL solution is presented as follows, the comments within it describe how we look up various tables dependent on the value of the staff member's CLASS_TYPE.

SQL> create or replace
  2  procedure SHOW_SAL is
  3    v_sal number;
  4  begin
  5    for i in (select *from staff )loop
  6    --
  7    --Part time employee so we need to look up
  8    --the PART_TIME_PACKAGE table
  9     --
 10       if i.class_type =1 then
 11         select fixed_price
 12         into   v_sal
 13         from   part_time_package
 14         where  id =i.class_id;
 15     --
 16     --For contractors,we need to look up
 17     --the CONTRACT_PACKAGE table
 18     --
 19        elsif i.class_type =2 then
 20          select hrs *hrly_rate
 21          into   v_sal
 22          from   contract_package
 23          where  id =i.class_id;
 24     --
 25     --For permanent employees,we need to look up
 26     --the PERM_PACKAGE table
 27     --
 28        elsif i.class_type =3 then --permanent
 29          select annual_sal +bonus
 30          into   v_sal
 31          from   perm_package
 32          where  id =i.class_id;
 33        end if;
 34      dbms_output.put_line(rpad(i.name,20)||lpad(v_sal,10));
 35     end loop;
 36   end;
 37 /

Procedure created.

Using this procedure, we can now display the salary options for each staff member, as outlined in the module specification.

SQL> exec show_sal;
ALLEN                    1200
WARD                    31500
MARTIN                    520
BLAKE                    7500
CLARK                   10500
SCOTT                     260
KING                     2700
TURNER                  42000
ADAMS                     650
JAMES                     300
FORD                    21000
MILLER                    390

PL/SQL procedure successfully completed.

A static SQL solution can be built even for this problem—in fact, it quite closely resembles the PL/SQL code and uses the CASE statement and the scalar subquery functionality available in 8i onward.

SQL> select s.name,
  2    case class_type
  3    when 1 then (
  4       select fixed_price
  5       from   part_time_package
  6       where  id =s.class_id )
  7    when 2 then (
  8       select hrs *hrly_rate
  9       from   contract_package
 10       where  id =s.class_id )
 11    when 3 then (
 12       select annual_sal +bonus
 13       from   perm_package
 14       where  id =s.class_id )
 15    end sal
 16 from staff s;

NAME                                  SAL
-------------------------------     -----
ALLEN                                1200
WARD                                31500
MARTIN                                520
BLAKE                                7500
CLARK                               10500
SCOTT                                 260
KING                                 2700
TURNER                              42000
ADAMS                                 650
JAMES                                 300
FORD                                21000
MILLER                                390
New Oracle 10g Features

The importance of staying up to date with the features available is no more apparent than in the many extensions to SQL that are available in version 10g. This raises the bar even higher for what can be achieved with pure SQL. Note in particular the new MODEL clause functionality that yields a new suite of facilities to probe and manipulate a SQL result set.

As a very basic example of what can be achieved with the new MODEL functionality, first we will create some sample data for a fictitious scenario in which we measure the probability that a lab animal produces a favorable result in a range of different experiments.

SQLSQL> create table SAMPLES (
  2     lab_id          varchar2(10),
  3     animal          varchar2(10),
  4     experiment_id   number,
  5     probability     number );

Table created.

SQLSQL> set feedback off
SQLSQL> insert into SAMPLES values ('OXFORD','RATS',1,0.993);
SQLSQL> insert into SAMPLES values ('OXFORD','RATS',2,0.93);
SQLSQL> insert into SAMPLES values ('OXFORD','RATS',3,0.91);
SQLSQL> insert into SAMPLES values ('OXFORD','MICE',1,0.91);
SQLSQL> insert into SAMPLES values ('OXFORD','MICE',2,0.99);
SQLSQL> insert into SAMPLES values ('OXFORD','MICE',3,0.90);
SQL> insert into SAMPLES values ('HARVARD','RATS',1,0.993);
SQL> insert into SAMPLES values ('HARVARD','RATS',2,0.93);
SQL> insert into SAMPLES values ('HARVARD','RATS',3,0.91);
SQL> insert into SAMPLES values ('HARVARD','MICE',1,0.91);
SQL> insert into SAMPLES values ('HARVARD','MICE',2,0.99);
SQL> insert into SAMPLES values ('HARVARD','MICE',3,0.90);
SQL> set feedback on

What is the probability that a given animal returns a favorable result across all tests? We need the product of each probability by lab and animal for experiments 1, 2, and 3. With Oracle 10g, this answer can be discovered with the MODEL clause.

SQL> SELECT lab_id,animal,experiment_id,s
  2  FROM samples
  3  SPREADSHEET PARTITION BY (lab_id)
  4    DIMENSION BY (animal,experiment_id)
  5    MEASURES (probability s)IGNORE nav
  6    (s ['MICE',-1 ]=s ['MICE',1 ]*s ['MICE',2 ]*s ['MICE',3 ],
  7     s ['RATS',-1 ]=s ['RATS',1 ]*s ['RATS',2 ]*s ['RATS',3 ]);

LAB_ID   ANIMAL     EXPERIMENT_ID   S
------   --------   -------------   ---------
OXFORD   RATS                   1       .993
OXFORD   RATS                   2        .93
OXFORD   RATS                   3        .91
OXFORD   MICE                   1        .91
OXFORD   MICE                   2        .99
OXFORD   MICE                   3         .9
HARVARD  RATS                   1       .993
HARVARD  RATS                   2        .93
HARVARD  RATS                   3        .91
HARVARD  MICE                   1        .91
HARVARD  MICE                   2        .99
HARVARD  MICE                   3         .9
OXFORD   RATS                  -1   .8403759
OXFORD   MICE                  -1     .81081
HARVARD  RATS                  -1   .8403759
HARVARD  MICE                  -1     .81081

16 rows selected.

The rows with an experiment ID of .1 show the result required. Although a full discussion of the MODEL facilities in Oracle 10g is outside the scope of this book (and the example demonstrates only a tiny fraction of the flexibility offered by this new functionality), it further reduces the number of areas where PL/SQL is adopted because a SQL solution cannot be found.

Is a PL/SQL Solution Ever Justified?

If a SQL solution exists, is there ever a reason for using a slower PL/SQL equivalent? There is possibly one circumstance when this may be the case. Although processing a large DML operation is going to be more efficient with SQL than its PL/SQL equivalent, it also is an all-or-nothing process. If that SQL operation fails due to an unforeseeable error, the time taken to roll back the change may be unacceptable. (The reason for the emphasis on "unforeseeable" will be revealed shortly). If the DML operation is restartable, that is, it can logically be broken into discrete chunks of work, using PL/SQL can provide some insulation by turning an all-or-nothing operation into a smaller set of piecemeal tasks. Even in these cases, more often than not, the underlying SQL can also be broken up into similar small units of work thus obviating the need for PL/SQL. Possibly what is springing to most readers minds at this point is the common practice of using PL/SQL to avoid consuming excessive undo by issuing regular commits within a processing loop, but this is most definitely not a justification for using PL/SQL over SQL. Failing due to insufficient undo (or temporary) space is not an unforeseeable error and is catered to by either using the resumable transaction features available in version 9, or even better, configuring the appropriate amount of undo/temporary space for the task.

To summarize, if you are responsible for developing PL/SQL programs, always take the time to check out the latest SQL features in your Oracle version before leaping into PL/SQL code. A SQL solution will typically be faster and simpler to understand and maintain than a PL/SQL alternative. Remember (just like the PL/SQL manual says) it is an extension to SQL, not an alternative.

Conclusion

What we've tried to stress throughout this chapter is that building PL/SQL applications that are efficient is not difficult to do. This is indeed one of the great beauties of PL/SQL—the coding actions that are typically simplest are usually also the optimal way to code your applications.

It is only when you try to fight against these native features that PL/SQL goes bad. The number of occasions that you need to take a convoluted or complex approach to delivering functionality within PL/SQL is far lower than you might think, especially if you stay informed and up to date on the PL/SQL features that become available with each new release of the database.

Whichever PL/SQL feature you are taking advantage of, it is vital that you test and re-test the efficiency of anything you build. Being able to prove the performance and scalability of a PL/SQL program is just as important as the performance and scalability itself. There are many programming environments around that make it very difficult to track where efficiency problems are, and developers are forced to build their own sophisticated instrumentation facilities. With the simple timing and tracing facilities we have taken advantage of within this chapter, however, and the more advanced debugging APIs that are used by various third party PL/SQL development tools, there is no excuse for not taking to time to ensure that your PL/SQL code is as efficient as possible.

Endnotes

1. I would like to acknowledge OakTable member, Jonathan Lewis, for providing the motivation and background for this example.

2. For details about how to use the tracing facilities and the trace formatter tool Tkprof, consult Chapter 10 of the Performance Tuning Guide and Reference in the standard Oracle documentation set.

3. We've retained the use of the src table in upcoming sections for those readers using version 8 (who will not be able to take advantage of pipeline functions).


This sample chapter is from:
Mastering Oracle PL/SQL: Practical Solutions
Authors: Christoper Beck, Joel Kallman, Chaim Katz, David C. Knox, and Connor McDonald
ISBN: 1-59059-217-4
Published by: Apress

Reprinted with permission

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date