DatabaseProgramming the Perl DBI

Programming the Perl DBI

This is Chapter 4: Programming the Perl DBI from the book Perl DBI (ISBN:1-56592-699-4), written by Alligator Descartes & Tim Bunce, published by O’Reilly & Associates.

© Copyright O’Reilly & Associates. All rights reserved.


Chapter 4: Programming with DBI

In this chapter, we’ll discuss in detail the actual programming nterface defined by the DBI module. We’ll start with the very architecture of DBI, continue with explaining how to use the handles that DBI provides to interact with databases, then cover simple tasks such as connecting and disconnecting from databases. Finally, we’ll discuss the important topic of error handling and describe some of the DBI’s utility methods and functions. Future chapters will discuss how to manipulate data within your databases, as well as other advanced functionality.

DBI Architecture

The DBI architecture is split into two main groups of software: the DBI itself, and the drivers. The DBI defines the actual DBI programming interface, routes method calls to the appropriate drivers, and provides various support services to them. Specific drivers are implemented for each different type of database and actually perform the operations on the databases. Figure 4-1 illustrates this architecture.





Figure 4-1. The DBI architecture

Therefore, if you are authoring software using the DBI programming interface, the method you use is defined within the DBI module. From there, the DBI module works out which driver should handle the execution of the method and passes the method to the appropriate driver for actual execution. This is more obvious when you recognize that the DBI module does not perform any database work itself, nor does it even know about any types of databases whatsoever. Figure 4-2 shows the flow of data from a Perl script through to the database.




Figure 4-2. Data flow through DBI

Under this architecture, it is relatively straightforward to implement a driver for any type of database. All that is required is to implement the methods defined in the DBI specification,[1] as supported by the DBI module, in a way that is meaningful for that database. The data returned from this module is passed back into the DBI module, and from there it is returned to the Perl program. All the information that passes between the DBI and its drivers is standard Perl datatypes, thereby preserving the isolation of the DBI module from any knowledge of databases.

The separation of the drivers from the DBI itself makes the DBI a powerful programming interface that can be extended to support almost any database available today. Drivers currently exist for many popular databases including Oracle, Informix, mSQL, MySQL, Ingres, Sybase, DB2, Empress, SearchServer, and PostgreSQL. There are even drivers for XBase and CSV files.

These drivers can be used interchangeably with little modification to your programs. Couple this database-level portability with the portability of Perl scripts across multiple operating systems, and you truly have a rapid application development tool worthy of notice.

Drivers are also called database drivers, or DBDs, after the namespace in which they are declared. For example, Oracle uses DBD::Oracle, Informix uses DBD::Informix, and so on. A useful tip in remembering the DBI architecture is that DBI can stand for DataBase Independent and DBD can stand for DataBase Dependent.

Because DBI uses Perl’s object-orientation features, it is extremely simple to initialize DBI for use within your programs. This can be achieved by adding the line:

use DBI;

to the top of your programs. This line locates and loads the core DBI module. Individual database driver modules are loaded as required, and should generally not be explicitly loaded.

Handles

The DBI defines three main types of objects that you may use to interact with databases. These objects are known as handles. There are handles for drivers, which the DBI uses to create handles for database connections, which, in turn, can be used to create handles for individual database commands, known as statements. Figure 4-3 illustrates the overall structure of the way in which handles are related, and their meanings are described in the following sections.



Figure 4-3. DBI handles

Driver Handles

Driver handles represent loaded drivers and are created when the driver is loaded and initialized by the DBI. There is exactly one driver handle per loaded driver. Initially, the driver handle is the only contact the DBI has with the driver, and at this stage, no contact has been made with any database through that driver.

The only two significant methods available through the driver handle are data_sources(), to enumerate what can be connected to, and connect(), to actually make a connection. These methods are more commonly invoked as DBI class methods, however, which we will discuss in more detail later in this chapter.

Since a driver handle completely encapsulates a driver, there’s no reason why multiple drivers can’t be simultaneously loaded. This is part of what makes the DBI such a powerful interface.

For example, if a programmer is tasked with the job of transferring data from an Oracle database to an Informix database, it is possible to write a single DBI program that connects simultaneously to both databases and simply passes the data backwards and forwards as needed. In this case, two driver handles would be created, one for Oracle and one for Informix. No problems arise from this situation, since each driver handle is a completely separate Perl object.

Within the DBI specification, a driver handle is usually referred to as $drh.

Driver handles should not normally be referenced within your programs. The actual instantiation of driver handles happens “under the hood” of DBI, typically when DBI->connect() is called.

Database Handles

Database handles are the first step towards actually doing work with the database, in that they encapsulate a single connection to a particular database. Prior to executing SQL statements within a database, we must actually connect to the database. This is usually achieved through the DBI’s connect() method:

$dbh = DBI->connect( $data_source, … );

The majority of databases nowadays tend to operate in a multiuser mode, allowing many simultaneous connections, and database handles are designed accordingly. An example might be if you wanted to write a stock-monitoring program that simultaneously monitored data in tables within different user accounts in the database. A DBI script could make multiple connections to the database, one for each user account, and execute SQL statements on each. Database handles are completely encapsulated objects, meaning that transactions from one database handle cannot “cross-over” or “leak” into another.

Database handles are children of their corresponding driver handle, which supports the notion that we could also make multiple simultaneous connections to multiple database types, as well as multiple simultaneous connections to databases of the same type. For example, a more complicated DBI script could make two connections to each of an Oracle and an Informix database to perform the above-mentioned monitoring. Figure 4-3, shown earlier, illustrates the capability of having multiple database handles connecting through a driver handle to an Oracle database.

Keep in mind that had the monitoring program been written in C, two copies of code would be required, one for Oracle’s programming interface and one for Informix’s. DBI levels the playing field.

Within the DBI specification and sample code, database handles are usually referred to as $dbh.

Statement Handles

Statement handles are the final type of object that DBI defines for database interaction and manipulation. These handles actually encapsulate individual SQL statements to be executed within the database.

Statement handles are children of their corresponding database handle. Since statement handles are objects in their own right, data within one statement is protected from tampering or modification by other statement handles.

For a given database handle, there is no practical limit to the number of statement handles that can be created and executed.[2] Multiple statements can be created and executed within one script, and the data can be processed as it returns. A good example of this might be a data-mining robot that connects to a database, then executes a large number of queries that return all sorts of different types of information. Instead of attempting to write convoluted SQL to correlate the information within the database, the Perl script fetches all the data being returned from the many statements and performs analysis there, using the fully featured text and data manipulation routines that Perl has to offer.

Within the DBI specification and sample code, statement handles are generally referred to as $sth.

Data Source Names

When connecting to a database via the DBI, you need to tell the DBI where to find the database to connect to. For example, the database driver might require a database name, or a physical machine name upon which the database resides. This information is termed a data source name, and of all the aspects of DBI, this is possibly the most difficult to standardize due to the sheer number and diversity of connection syntaxes.

The DBI requires the data source name to start with the characters dbi:, much like a URL begins with http:, and then the name of the driver, followed by another colon—for example, dbi:Oracle:. Any text that follows is passed to the driver’s own connect() method to interpret as it sees fit. Most drivers expect either a simple database name or, more often, a set of one or more name/value pairs separated with semicolons. Some common examples are listed later in this section.

For example, mSQL requires the hostname, database name, and potentially, the TCP/IP port number for connecting to the database server. However, Oracle may require only a single word that is an alias to a more complicated connection identifier that is stored in separate Oracle configuration files.

DBI offers two useful methods for querying which data sources are available to you for each driver you have installed on your system.

Firstly, you can get a list of all the available drivers installed on your machine by using the DBI->available_drivers() method. This returns a list with each element containing the data source prefix of an installed driver,[3] such as dbi:Informix:.

Secondly, you can invoke the DBI->data_sources() method against one or more of the drivers returned by the DBI->available_drivers() method to enumerate which data sources are known to the driver.[4] Calling the data_sources() method will actually load the specified driver and validate that it is completely and correctly installed. Because DBI dies if it can’t load and initialize a driver, this method should be called inside an eval{} block if you need to catch that error.

The following script lists all the drivers and data sources for each driver on your system:


#!/usr/bin/perl -w
#
# ch04/listdsns: Enumerates all data sources and all
# installed drivers
#
use DBI;

### Probe DBI for the installed drivers
my @drivers = DBI->available_drivers();

die “No drivers found!n” unless @drivers; # should never happen
 
### Iterate through the drivers and list the data sources for
### each one
foreach my $driver ( @drivers ) {
print “Driver: $drivern”;
my @dataSources = DBI->data_sources( $driver );
foreach my $dataSource ( @dataSources ) {
print “tData Source is $dataSourcen”;
}
print “n”;
}

exit;

The output from this script on my machine looks like:


Driver: ADO

Driver: CSV
Data source is DBI:CSV:f_dir=megaliths
Data source is DBI:CSV:f_dir=pictish_stones

Driver: ExampleP
Data Source is dbi:ExampleP:dir=.

Driver: File
Data Source is DBI:File:f_dir=megaliths
Data Source is DBI:File:f_dir=pictish_stones

Driver: ODBC

Driver: Proxy

Driver: XBase
Data Source is dbi:XBase:.

which tells us that we have the standard drivers DBD::Proxy, DBD::ADO, DBD::File, and DBD::ExampleP installed, as well as DBD::ODBC, DBD::XBase, and DBD::CSV.

While this may be interesting in theory, in practice you rarely need to use these methods. Most applications are written to use one data source name, either hardcoded into the application or passed in as a parameter in some way.

When specifying a data source name for a database, the text following the driver prefix should be of the form that is appropriate for the particular database that you wish to connect to. This is very database-specific, but the following table shows some examples.[5]







Database

Example Connection Syntax

mSQL

dbi:mSQL:hostname:database:port_number

For example, to connect to a database called archaeo located on a machine called fowliswester.arcana.co.uk running on port number 1114, the following $data_source argument would be used:


dbi:mSQL:fowliswester.arcana.co.uk:archaeo:1114

Oracle

dbi:Oracle:connection_descriptor

Oracle has a slightly less cut-and-dried way of specifying connection identifiers due to the many different ways in which the Oracle database software can actually handle connections.


To break this nightmarish topic down into bite-sized chunks, Oracle may use two different types of connection. For local connections, Oracle uses a single item of information as the connection descriptor, either the name of the database or an alias to the database as specified in the Oracle configuration files. For a network-based connection, Oracle usually needs to know the alias of the connection descriptor as specified in the Oracle configuration files, or, if you are feeling suitably masochistic, you can specify the whole connection descriptor … but, believe me, it isn’t pretty.

For example, a simple Oracle $data_source value might be:


dbi:Oracle:archaeo

CSV

dbi:CSV:f_dir=/datafiles

The DBD::CSV module treats a group of comma-separated value files in a common directory as a database. The data source for this driver can contain a parameter f_dir that specifies the directory in which the files are located.

In the case of the $data_source argument, an empty or undefined value will result in the environment variable DBI_DSN being checked for a valid value. If this environment variable is not defined, or does not contain a valid value, the DBI will call die().

Connection and Disconnection

The main activity in database programming usually involves the execution of SQL statements within a database. However, to accomplish this task, a connection to a database must be established first. Furthermore, after all the work has been done, it is good manners to disconnect from the database to free up both your local machine resources and, more importantly, valuable database resources.

Connection

In the case of simple databases, such as flat-file or Berkeley DB files, “connecting” is usually as simple as opening the files for reading or using the tie mechanism. However, in larger database systems, connecting may be considerably more complicated.

A relatively simple RDBMS is mSQL, which has a simple method of connection: to connect, a program connects to a TCP/IP port on the computer running the database. This establishes a live connection within the database. However, more complex systems, such as Oracle, have a lot more internal security and housekeeping work that must be performed at connection time. They also have more data that needs to be specified by the program, such as the username and password that you wish to connect with.

By looking at a broad spectrum of database systems, the information required to connect can be boiled down to:


  1. The data source name, a string containing information specifying the driver to use, what database you wish to connect to, and possibly its whereabouts. This argument takes the format discussed in the previous section and is highly database-specific.


  2. The username that you wish to connect to the database as. To elaborate on the concept of usernames a little further, some databases partition the database into separate areas, called schemas, in which different users may create tables and manipulate data. Users cannot affect tables and data created by other users. This setup is similar to accounts on a multiuser computer system, in that users may create their own files, which can be manipulated by them, but not necessarily by other users. In fact, users may decide to disallow all access to their files, or tables, from all other users, or allow access to a select group or all users.[6]

    Most major database systems enforce a similar security policy, usually with an administrator having access to an account that allows them to read, modify, and delete any user’s tables and data. All other users must connect as themselves. On these systems, your database username may be the same as your system login username, but it doesn’t have to be.

    More minimal database systems may not have any concept of username-based authentication, but you still need to supply the username and password arguments, typically as empty strings.

  3. The password associated with the supplied username.

In light of these common arguments, the syntax for connecting to databases using DBI is to use the connect() call, defined as follows:

$dbh = DBI->connect( $data_source, $username, $password, %attr );

The final argument, %attr, is optional and may be omitted. %attr is a reference to a hash that contains handle attributes to be applied to this connection. One of the most important items of the information supplied in this hash is whether or not automatic error handling should be supplied by DBI. We will discuss this in further detail in the following section, but the two common attributes are called RaiseError and PrintError, which cause the DBI to die or print a warning automatically when a database error is detected.

This method, when invoked, returns a database handle if the connection has been successfully made to the database. Upon failure, the value undef is returned.

To illustrate the DBI->connect() method, assume that we have an Oracle database called archaeo. To connect to this database, we might use the following code:


#!/usr/bin/perl -w
#
# ch04/connect/ex1: Connects to an Oracle database.

use DBI; # Load the DBI module

### Perform the connection using the Oracle driver
my $dbh = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” )
or die “Can’t connect to Oracle database: $DBI::errstrn”;

exit;

This simple example illustrates the use of the DBI->connect() method to make one connection to the database. We also perform error checking on the call to ensure that the connection occurs; upon failure, the error message will be printed along with the database-specific reason for the failure, which will be contained within the variable $DBI::errstr.[7]

A more complicated example might be to connect twice to the same database from within the one script:


#!/usr/bin/perl -w
#
# ch04/connect/ex2: Connects to two Oracle databases
# simultaneously with identical arguments.
# This is to illustrate that all database
# handles, even if identical argument-wise,
# are completely separate from one another.

use DBI; # Load the DBI module

### Perform the connection using the Oracle driver
my $dbh1 = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” )
or die “Can’t make 1st database connect: $DBI::errstrn”;

my $dbh2 = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” )
or die “Can’t make 2nd database connect: $DBI::errstrn”;

exit;

or to connect simultaneously to two different databases. For example:


#!/usr/bin/perl -w
#
# ch04/connect/ex3: Connects to two Oracle databases
# simultaneously.

use DBI; # Load the DBI module

### Perform the connection using the Oracle driver
my $dbh1 = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” )
or die “Can’t connect to 1st Oracle database: $DBI::errstrn”;

my $dbh2 = DBI->connect( “dbi:Oracle:seconddb”, “username”,
“password” )
or die “Can’t connect to 2nd Oracle database: $DBI::errstrn”;

exit;

This former example is quite interesting, because even though we have used identical arguments to DBI->connect(), the two database handles created are completely separate and do not share any information.

A final example of using DBI->connect() is to connect to two different databases (one Oracle, one mSQL) within the same script. In this case, DBI’s automatic error reporting mechanism will be disabled in the mSQL database by passing an attribute hash to the connect() call, as shown here:


#!/usr/bin/perl -w
#
# ch04/connect/ex4: Connects to two database, one Oracle, one mSQL
# simultaneously. The mSQL database handle has
# auto-error-reporting disabled.

use DBI; # Load the DBI module

### Perform the connection using the Oracle driver
my $dbh1 = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” )
or die “Can’t connect to Oracle database: $DBI::errstrn”;

my $dbh2 = DBI->connect( “dbi:mSQL:seconddb”, “username”,
“password” , {
PrintError => 0
} )
or die “Can’t connect to mSQL database: $DBI::errstrn”;

exit;

The $username and $password arguments should be specified but may be empty ( ) if not required. As discussed previously, the $data_source argument can also be undefined and the value of the environment variable DBI_DSN will be used instead, if it has been set.

Disconnection

Explicit disconnection from the database is not strictly necessary if you are exiting from your program after you have performed all the work, but it is a good idea. We strongly recommend that you get into the habit of disconnecting explicitly.

DBI provides a method through which programmers may disconnect a given database handle from its database. This is good practice, especially in programs in which you have performed multiple connections or will be carrying out multiple sequential connections.

The method for performing disconnections is:

$rc = $dbh->disconnect();

According to this definition, disconnect() is invoked against a specific database handle. This preserves the notion that database handles are completely discrete. With multiple database handles active at any given time, each one must explictly be disconnected.

An example of using disconnect() might look like:


#!/usr/bin/perl -w
#
# ch04/disconnect/ex1: Connects to an Oracle database
# with auto-error-reporting disabled
# then performs an explicit disconnection.

use DBI; # Load the DBI module

### Perform the connection using the Oracle driver
my $dbh = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” , {
PrintError => 0
} )
or die “Can’t connect to Oracle database: $DBI::errstrn”;

### Now, disconnect from the database
$dbh->disconnect
or warn “Disconnection failed: $DBI::errstrn”;

exit;

Upon successful disconnection, the return value will be true. Otherwise, it will be false. In practice, failure to disconnect usually means that the connection has already been lost for some reason. After disconnecting the database handle can’t be used for anything worthwhile.

What happens if you don’t explicitly disconnect? Since DBI handles are references to Perl objects, Perl’s own garbage collector will move in and sweep up any object trash you leave lying around. It does that by calling the object’s DESTROY method when there are no longer any references to the object held by your script, or when Perl is exiting.

The DESTROY method for a database handle will call disconnect() for you, if you’ve left the handle connected, in order to disconnect cleanly from the database. But it will complain about having to do so by issuing a warning:

Database handle destroyed without explicit disconnect.

A major caveat with the disconnect() method regards its behavior towards automatically committing transactions at disconnection. For example, if a program has updated data but has not called commit() or rollback() before calling disconnect(), the action taken by different database systems varies. Oracle will automatically commit the modifications, whereas Informix may not. To deal with this, the DESTROY method has to call rollback() before disconnect() if AutoCommit is not enabled. In Chapter 6, we’ll discuss the effect of disconnect() and DESTROY on transactions in more detail.

Error Handling

The handling of errors within programs, or the lack thereof, is one of the more common causes of questions concerning programming with DBI. Someone will ask “Why doesn’t my program work?” and the answer generally runs along the lines of “Why aren’t you performing error checking?” Sure enough, nine out of ten times when error checking is added, the exact error message appears and the cause for error is obvious.

Automatic Versus Manual Error Checking

Early versions of the DBI required programmers to perform their own error checking, in a traditional way similar to the examples listed earlier for connecting to a database. Each method that returned some sort of status indicator as to its success or failure should have been followed by an error condition checking statement. This is an excellent, slightly C-esque way of programming, but it quickly gets to be tiresome, and the temptation to skip the error checking grows.

The DBI now has a far more straightforward error-handling capability in the style of exceptions. That is, when DBI internally detects that an error has occurred after a DBI method call, it can automatically either warn() or die() with an appropriate message. This shifts the onus of error checking away from the programmer and onto DBI itself, which does the job in the reliable and tireless way that you’d expect.

Manual error checking still has a place in some applications where failures are expected and common. For example, should a database connection attempt fail, your program can detect the error, sleep for five minutes, and automatically re-attempt a connection. With automatic error checking, your program will exit, telling you only that the connection attempt failed.

DBI allows mixing and matching of error-checking styles by allowing you to selectively enable and disable automatic error checking on a per-handle basis.

Manual error checking

Of course, the DBI still allows you to manually error check your programs and the execution of DBI methods. This form of error checking is more akin to classic C and Perl programming, where each important statement is checked to ensure that it has executed successfully, allowing the program to take evasive action upon failure.

DBI, by default, performs basic automatic error reporting for you by enabling the PrintError attribute. To disable this feature, simply set the value to 0 either via the handle itself after instantiation, or, in the case of database handles, via the attribute hash of the connect( ) method.

For example:


### Attributes to pass to DBI->connect( )
%attr = (
PrintError => 0,
RaiseError => 0
);

### Connect…
my $dbh = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” , %attr );

### Re-enable warning-level automatic error reporting…
$dbh->{PrintError} = 1;

Most DBI methods will return a false status value, usually undef, when execution fails. This is easily tested by Perl in the following way:


### Try connecting to a database
my $dbh = DBI->connect( … )
or die “Can’t connect to database: $DBI::errstr!”;

The following program disables automatic error handling, with our own tests to check for errors. This example also moves the attributes into the connect( ) method call itself, a clean style that’s commonly used:


#!/usr/bin/perl -w
#
# ch04/error/ex1: Small example using manual error checking.

use DBI; # Load the DBI module

### Perform the connection using the Oracle driver
my $dbh = DBI->connect( undef, “stones”, “stones”, {
PrintError => 0,
RaiseError => 0
} ) or die “Can’t connect to the database: $DBI::errstrn”;

### Prepare a SQL statement for execution
my $sth = $dbh->prepare( “SELECT * FROM megaliths” )
or die “Can’t prepare SQL statement: $DBI::errstrn”;

### Execute the statement in the database
$sth->execute
or die “Can’t execute SQL statement: $DBI::errstrn”;

### Retrieve the returned rows of data
my @row;
while ( @row = $sth->fetchrow_array( ) ) {
print “Row: @rown”;
}
warn “Data fetching terminated early by error: $DBI::errstrn”
if $DBI::err;

### Disconnect from the database
$dbh->disconnect
or warn “Error disconnecting: $DBI::errstrn”;

exit;

As can be seen from the example, the code to check the errors that may have arisen in a DBI method is actually longer than the code to perform the operations themselves. Similarly, it is entirely possible that you may just genuinely forget to add a check after a statement, which may result in extremely bizarre program execution and error reporting, not to mention hours of wasted debugging time!

Automatic error checking

The automatic error checking capabilities of the DBI operates on two levels. The PrintError handle attribute tells DBI to call the Perl warn( ) function (which typically results in errors being printed to the screen when encountered) and the RaiseError handle attribute (which tells DBI to call the Perl die( ) function upon error, typically causing the script to immediately abort).

Because the standard Perl functions of warn( ) and die( ) are used, you can change the effects of PrintError and RaiseError with the $SIG{_ _WARN_ _} and $SIG{_ _DIE_ _} signal handlers. Similarly, a die( ) from RaiseError can be caught via eval { … }.

These different levels of automatic error checking can be turned on for any handle, although database handles are usually the most common and useful. To enable the style of automatic error checking you want, you may set the value of either of the following two attributes:


$h->{PrintError} = 1;
$h->{RaiseError} = 1;

Similarly, to disable automatic error checking, simply set the value of these attributes to 0.

If both RaiseError and PrintError are enabled, an error will cause warn( ) and die( ) to be executed sequentially. If no $SIG{_ _DIE_ _} handle has been defined, warn( ) is skipped to avoid the error message being printed twice.[8]

A more common way in which these attributes are used is to specify them in the optional attribute hash supplied to DBI->connect( ) when connecting to a database. Automatic error checking is the recommended style in which to write DBI code, so PrintError is enabled by default in DBI->connect( ). You can think of this as training wheels for novices and grease for quick-and-dirty script writers. Authors of more significant works usually either enable RaiseError or disable PrintError and do their own error checking.

The following short example illustrates the use of RaiseError instead of manual error checking is:


#!/usr/bin/perl -w
#
# ch04/error/ex2: Small example using automatic error handling
# with RaiseError, i.e., the program will
# abort upon detection of any errors.

use DBI; # Load the DBI module

my ($dbh, $sth, @row);

### Perform the connection using the Oracle driver
$dbh = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” , {
PrintError => 0, ### Don’t report errors via warn( )
RaiseError => 1 ### Do report errors via die( )
} );
### Prepare a SQL statement for execution
$sth = $dbh->prepare( “SELECT * FROM megaliths” );

### Execute the statement in the database
$sth->execute( );

### Retrieve the returned rows of data
while ( @row = $sth->fetchrow_array( ) ) {
print “Row: @rown”;
}

### Disconnect from the database
$dbh->disconnect( );

exit;

This example is both shorter and more readable than the manual error checking shown in a following example. The actual program logic is clearer. The most obvious additional benefit is that we can forget to handle error checking manually after a DBI operation, since the DBI will check for errors for us.

Mixed error checking

You can mix error checking styles within a single program, since automatic error checking can be easily enabled and disabled on a per-handle basis. There are plenty of occasions where mixed error checking is useful. For example, you might have a program that runs continuously, such as one that polls a database for recently added stock market quotes every couple of minutes.

Disaster occurs! The database crashes! The ideal situation here is that the next time the program tries connecting to the database and fails, it’ll wait a few minutes before retrying rather than aborting the program altogether. Once we’ve connected to the database, the error checking should now simply warn when a statement fails and not die.

This mixed style of error checking can be broken down into two areas: manual error checking for the DBI->connect( ) call, and automatic error checking via PrintError for all other statements. This is illustrated in the following example program:


#!/usr/bin/perl -w
#
# ch04/error/mixed1: Example showing mixed error checking modes.

use DBI; # Load the DBI module

### Attributes to pass to DBI->connect( ) to disable
### automatic error checking
my %attr = (
PrintError => 0,
RaiseError => 0,
);

### The program runs forever and ever and ever and ever …
while ( 1 ) {
my $dbh;

### Attempt to connect to the database. If the connection
### fails, sleep and retry until it succeeds …
until (
$dbh = DBI->connect( “dbi:Oracle:archaeo”,
“username”, “password” , %attr )
) {
warn “Can’t connect: $DBI::errstr. Pausing before
retrying.n”;
sleep( 5 * 60 );
}

eval { ### Catch _any_ kind of failures from the
### code within

### Enable auto-error checking on the database handle
$dbh->{RaiseError} = 1;

### Prepare a SQL statement for execution
my $sth = $dbh->prepare(
“SELECT stock, value FROM current_values” );

while (1) {

### Execute the statement in the database
$sth->execute( );

### Retrieve the returned rows of data
while ( my @row = $sth->fetchrow_array( ) ) {
print “Row: @rown”;
}

### Pause for the stock market values to move
sleep 60;
}

};
warn “Monitoring aborted by error: $@n” if $@;

### Short sleep here to avoid thrashing the database
sleep 5;
}

exit;

This program demonstrates that with DBI, you can easily write explicit error checking and recovery code alongside automatic error checking.

Error Diagnostics

The ability to trap errors within the DBI is very useful, with either manual or automatic error checking, but this information is only marginally useful on its own. To be truly useful, it is necessary to discern exactly what the error was in order to track it down and debug it.

To this end, DBI defines several error diagnostic methods that can be invoked against any valid handle, driver, database, or statement. These methods will inform the programmer of the error code and report the verbose information from the last DBI method called. These are:


$rv = $h->err();
$str = $h->errstr();
$str = $h->state();

These various methods return the following items of information that can be used for more accurate debugging of errors:


  • $h->err() returns the error number that is associated with the current error flagged against the handle $h. The values returned will be completely dependent on the values produced by the underlying database system. Some systems may not support particularly meaningful information; for example, mSQL errors always have the error number of -1. Oracle is slightly more helpful: a connection failure may flag an ORA-12154 error message upon connection failure, which would return the value of 12154 by invoking $h->err(). Although this value is usually a number, you should not rely on that.

  • $h->errstr() is a slightly more useful method, in that it returns a string containing a description of the error, as provided by the underlying database. This string should correspond to the error number returned in $h->err().

    For example, mSQL returns -1 as the error number for all errors, which is not particularly useful. However, invoking $h->errstr() provides far more useful information. In the case of connection failure, the error:

    ERROR : Can’t connect to local MSQL server

    might be generated and returned by $h->errstr(). Under Oracle, a connection failure returning the error number of 12154 will return the following string as its descriptive error message:

    ORA-12154: TNS:could not resolve service name
    (DBD ERROR: OCIServerAttach)

  • $h->state() returns a string in the format of the standard SQLSTATE five-character error string. Many drivers do not fully support this method, and upon invoking it to discern the SQLSTATE code, the value:

    S1000

    will be returned. The specific general success code 00000 is translated to 0, so that if no error has been flagged, this method will return a false value.

The error information for a handle is reset by the DBI before most DBI method calls. Therefore, it’s important to check for errors from one method call before calling the next method on the same handle. If you need to refer to error information later you’ll need to save it somewhere else yourself.

A rewriting of the previous example to illustrate using the specific handle methods to report on errors can be seen in the following code:


#!/usr/bin/perl -w
#
# ch04/error/ex3: Small example using manual error checking which
# also uses handle-specific methods for reporting
# on the errors.

use DBI; # Load the DBI module

### Attributes to pass to DBI->connect() to disable automatic
### error checking
my %attr = (
PrintError => 0,
RaiseError => 0,
);

### Perform the connection using the Oracle driver
my $dbh = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” , %attr )
or die “Can’t connect to database: “, $DBI::errstr, “n”;

### Prepare a SQL statement for execution
my $sth = $dbh->prepare( “SELECT * FROM megaliths” )
or die “Can’t prepare SQL statement: “,
$dbh->errstr(), “n”;

### Execute the statement in the database
$sth->execute
or die “Can’t execute SQL statement: “, $sth->errstr(),
“n”;

### Retrieve the returned rows of data
while ( my @row = $sth->fetchrow_array() ) {
print “Row: @rown”;
}
warn “Problem in fetchrow_array(): “, $sth->errstr(), “n”
if $sth->err();

### Disconnect from the database
$dbh->disconnect
or warn “Failed to disconnect: “, $dbh->errstr(), “n”;

exit;

As you can see, it’s even more long-winded than using the $DBI::errstr variable, which can at least be interpolated directly into the error messages.

In addition to these three methods, which allow finely grained error checking at a handle level, there are three corresponding variables that will contain the same information, but at a DBI class level:


$DBI::err
$DBI::errstr
$DBI::state

Use of these variables is essentially the same as that of $h->err() and friends, but the values referred to are for the last handle used within DBI. They are particularly handy for interpolating into strings for error messages.

Since these variables are associated with the last handle used within the DBI, they have an even shorter lifespan than the handle error methods, and should be used only immediately after the method call that failed. Otherwise, it is highly likely they will contain misleading error information.

The one case where the variables are very useful is for connection errors. When these errors occur, there’s no new handle returned in which to hold error information. Since scripts don’t use the internal driver handles, the $DBI::errstr variable provides a very simple and effective way to get the error message from a connect() failure.

In summary, for most applications, automatic error checking using RaiseError and/or PrintError is recommended. Otherwise, manual checking can be used and $DBI::errstr can easily be interpolated into messages. The handle methods are available for more complex applications.

Utility Methods and Functions

To round off our basic introduction to DBI, we’ll tell you about some useful utility methods and functions that will make your life that little bit easier. These include the very useful quote escaping method, DBI execution tracing, and various functions to tidy up your data.

Database-Specific Quote Handling

By far the most important utility method is quote(), which correctly quotes and escapes SQL statements in a way that is suitable for a given database engine. This feature is important if you have a Perl string that you wish to insert into a database, as the data will be required, in most cases, to have quotation marks around it.

To confuse matters, database engines tend to have a different format for specifying these surrounding quotation marks. DBI circumvents this problem by declaring the quote() method to be executed against a database handle, which ensures that the correct quotation rules are applied.

This method, when executed against a database handle, converts the string given as an argument according to defined rules, and returns the correctly escaped string for use against the database.

For example:


#!/usr/bin/perl -w
#
# ch04/util/quote1: Demonstrates the use of the
# $dbh->quote() method

use DBI;

### The string to quote
my $string = “Don’t view in monochrome (it looks ‘fuzzy’)!”;

### Connect to the database
my $dbh = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” , {
RaiseError => 1
} );

### Escape the string quotes …
my $quotedString = $dbh->quote( $string );

### Use quoted string as a string literal in a SQL statement
my $sth = $dbh->prepare( ”
SELECT *
FROM media
WHERE description = $quotedString
” );
$sth->execute();

exit;

For example, if you quoted the Perl string of Do it! via an Oracle database handle, you would be returned the value of ‘Do it!’. However, the quote() method also takes care of cases such as Don’t do it! which needs to be translated to ‘Don”t do it!’ for most databases. The simplistic addition of surrounding quotes would have produced ‘Don’t do it!’ which is not a valid SQL string literal.

Some databases require a more complex quote() method, and some drivers (though not all) have a quote() method that can cope with multiline strings and even binary data.

As a special case, if the argument is undef, the quote() method returns the string NULL, without quotes. This corresponds to the DBI’s use of undef to represent NULL values, and to how NULL values are used in SQL.

Tracing DBI Execution

DBI sports an extremely useful ability to generate runtime tracing information of what it’s doing, which can be a huge time-saver when trying to track down strange problems in your DBI programs.

At the highest level, you can call the DBI->trace() method, which enables tracing on all DBI operations from that point onwards. There are several valid tracing levels:







0 Disables tracing.
1 Traces DBI method execution showing returned values and errors.
2 As for 1, but also includes method entry with parameters.
3 As for 2, but also includes more internal driver trace information.
4 Levels 4, and above can include more detail than is helpful.

The trace() method can be used with two argument forms, either specifying only the trace level or specifying both the trace level and a file to which the trace information is appended. The following example shows the use of DBI->trace():


#!/usr/bin/perl -w
#
# ch04/util/trace1: Demonstrates the use of DBI tracing.

use DBI;

### Remove any old trace files
unlink ‘dbitrace.log’ if -e ‘dbitrace.log’;

### Connect to a database
my $dbh = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” );

### Set the tracing level to 1 and prepare()
DBI->trace( 1 );
doPrepare();

### Set trace output to a file at level 2 and prepare()
DBI->trace( 2, ‘dbitrace.log’ );
doPrepare();

### Set the trace output back to STDERR at level 2 and prepare()
DBI->trace( 2, undef );
doPrepare();

exit;

### prepare a statement (invalid to demonstrate tracing)
sub doPrepare {
print “Preparing and executing statementn”;
my $sth = $dbh->prepare( ”
SELECT * FROM megalith
” );
$sth->execute();
return;
}

exit;

This program generates quite a bit of trace information, of which we’ll show just a small fragment:


-> prepare for DBD::Oracle::db (DBI::db=HASH(0xcd45c)~0xcd4a4 ‘
SELECT * FROM megalith
‘) thr0
<- prepare= DBI::st=HASH(0xcd648) at trace1 line 30.
-> execute for DBD::Oracle::st
(DBI::st=HASH(0xcd648)~0x16afec) thr0
dbd_st_execute SELECT (out0, lob0)…
!! ERROR: 942 ‘ORA-00942: table or view does not exist
(DBD ERROR: OCIStmtExecute)’
<- execute= undef at trace1 line 33.
DBD::Oracle::st execute failed: ORA-00942: table or view does
not exist (DBD ERROR: OCIStmtExecute) at trace1 line 33.

This trace information was generated with a setting of level 2, and shows the operations that DBI undertook when trying to prepare and execute a statement. Lines prepended with > are written when the method is being entered, and lines prepended with < are written when the method is returning. These lines also show the information being returned from the method call. The DBI trace output is indented by four spaces to make it easier to distinguish the trace output from any other program output.

You can see the prepare() method being called along with its parameters: a database handle and the SQL statement to prepare.[9] The next line shows the prepare() returning a statement handle. It also shows the file and line number that prepare() was called from. Following that, we see execute() being called, a trace line from the driver itself, and the method returning after logging an error. Finally we see the warning generated by the DBI due to the PrintError attribute, which is on by default.

The trace information generated at level 1 is similar. The main difference is that the method entry lines (>) are not shown.

The one drawback to this form of tracing is that if your program uses a lot of handles, then the volume of tracing information could be quite vast. Similarly, you might have tracked your problem down to a specific database operation that you’d like to trace individually.

The trace() method is also available at a handle level, allowing you to individually trace any database and statement handle operations. Therefore, you could trace operations on a given database handle to level 1 and a single statement handle to level 2. For example:


### Connect to a database…
my $dbh = DBI->connect( “dbi:Oracle:archaeo”, “username”,
“password” );

### Trace the database handle to level 1 to the screen
$dbh->trace( 1 );

### Create a new statement
my $sth = …;

### Trace the statement to level 2 to the file ‘trace.lis’
$sth->trace( 2, ‘trace.lis’ );

Note that if a filename is specified when calling trace(), then currently, trace output from all handles is redirected to that file.

If your programs are exhibiting odd behavior or are generating errors on a regular basis, you should consider using the built-in tracing features of DBI to help you resolve your problems. This tool is extremely useful, as you will be able to see exactly what data is being passed to the database, allowing you to ensure that it’s in the correct format.

Finally, tracing can also be controlled via the use of an environment variable called DBI_TRACE, which acts in a similar manner to the DBI->trace() method. That is, it traces all handles used within the program. This environment variable can be used in three ways that are summarized in the following table.






DBI_TRACE Value

Effect on DBI

1

DBI->trace(1);

dbitrace.log

DBI->trace(2, ‘dbitrace.log’);

4=dbitrace.log

DBI->trace(4, ‘dbitrace.log’);

If the trace level isn’t specified in the DBI_TRACE environment variable, it will default to 2, as shown in the table above.

Neat and Tidy Formatting

The DBI features a couple of utility functions that can be used to tidy up strings into a form suitable for easy reading. These two functions are neat() and neat_list(), the former operating on a single scalar value, the latter operating on a list of scalar values.

For example, to use neat() to tidy up some strings, you could write:


#!/usr/bin/perl -w
#
# ch04/util/neat1: Tests out the DBI::neat() utility function.
#

use DBI;

### Declare some strings to neatify
my $str1 = “Alligator’s an extremely neat() and tidy person”;
my $str2 = “Oh nonhe’s not!”;

### Neatify this first string to a maxlen of 40
print “String: ” . DBI::neat( $str1, 40 ) . “n”;

### Neatify the second string to a default maxlen of 400
print “String: ” . DBI::neat( $str2 ) . “n”;

### Neatify a number
print “Number: ” . DBI::neat( 42 * 9 ) . “n”;

### Neatify an undef
print “Undef: ” . DBI::neat( undef ) . “n”;

exit;

which generates the output of:


String: ‘Alligator’s an extremely neat() and…’
String: ‘Oh no
he’s not!’
Number: 378
Undef: undef

demonstrating that string values are quoted,[10] whereas values known to be numeric are not. The first string has been truncated to the desired length with added. Undefined values are recognized and returned as the string undef without quotes.

While the neat() function is handy for single values, the neat_list() function is handy for lists. It simply calls neat() on each element of the referenced list before joining the list of values together with the desired separator string. For example:


#!/usr/bin/perl -w
#
# ch04/util/neat2: Tests out the DBI::neat_list() utility function

use DBI qw( neat_list );

### Declare some strings to neatify
my @list = ( ‘String-a-string-a-string-a-string-a-string’,
42, 0, ”, undef );

### Neatify the strings into an array
print neat_list( @list, 40, “, ” ), “n”;

exit;

which generates the output of:

‘String-a-string-a-string-a-string-a…’, 42, 0, ”, undef

This example also shows that the utility functions can be imported into your package so you can drop the DBI:: prefix.

DBI uses neat() and neat_list() internally to format the output generated by tracing. That’s important to know if you’re wondering why the trace output is truncating your huge SQL statements down to 400 characters.[11]

Numeric Testing

The final utility function supplied by DBI that we’ll look at is quite a curious one called looks_like_number(). This function quite simply tells you whether or not a value looks like a number or not.

looks_like_number() operates by taking a list of values as an argument and returns a new array signifying whether or not the corresponding value within the original array was a number, not a number, or undefined.

This may seem rather a curious thing to want to do, but in the case of handling large quantities of data, it’s useful for working out which values might need to have their quotes escaped via the quote() method.

The returned array will contain the same number of values as the original data array, with the elements containing one of three values signifying the following:


true The original value is a number.
false The original value is not a number.
undef The original value is empty or undefined.

The following example illustrates how this process works:


#!/usr/bin/perl -w
#
# ch04/util/lookslike1: Tests out the DBI::looks_like_number()
function.
#

use DBI;

### Declare a list of values
my @values = ( 333, ‘Choronzon’, ‘Tim’, undef, ‘Alligator’,
1234.34, ‘Linda’, 0x0F, ‘0x0F’, ‘Larry Wall’ );

### Check to see which are numbers!
my @areNumbers = DBI::looks_like_number( @values );

for (my $i = 0; $i < @values; ++$i ) {

my $value = (defined $values[$i]) ? $values[$i] : “undef”;

print “values[$i] -> $value “;

if ( defined $areNumbers[$i] ) {
if ( $areNumbers[$i] ) {
print “is a number!n”;
}
else {
print “is utterly unlike a number and should be
quoted!n”;
}
}
else {
print “is undefined!n”;
}
}

exit;

The results from this program illustrate how the values are treated and shows that hexadecimal values are not treated as numbers:


values[0] -> 333 is a number!
values[1] -> Choronzon is utterly unlike a number and should
be quoted!
values[2] -> Tim is utterly unlike a number and should
be quoted!
values[3] -> undef is undefined!
values[4] -> Alligator is utterly unlike a number and should
be quoted!
values[5] -> 1234.34 is a number!
values[6] -> Linda is utterly unlike a number and should
be quoted!
values[7] -> 15 is a number!
values[8] -> 0x0F is utterly unlike a number and should
be quoted!
values[9] -> Larry Wall is utterly unlike a number and should
be quoted!

The first 0x0F in the list of values is reported as looking like a number because Perl converted it into one (15) when the script was compiled. The second is not reported as looking like a number because the looks_like_number() function only looks for integers and floating-point numbers.

And that brings us to the end of the introduction to DBI and its architecture. We’ll be talking more on how to actually do stuff with DBI in the next chapter.


Endnotes

1. Few methods actually need to be implemented since the DBI provides suitable defaults for most of them. The DBI::DBD module contains documentation for any intrepid driver writers.

2. In reality, the number of concurrent statement handles is dependent on the underlying database. For information on how many concurrent statement handles your database can support, see Appendix B.

3. The actual definition of “installed driver” is a little loose. The DBI simply searches the directories in @INC looking for any DBD subdirectories that contain .pm files. Those are assumed to be drivers. It does not verify that the modules are completely and correctly installed. In practice, this process is fast and works well.

4. Note that not necessarily every data source that is reachable via the driver is returned. Similarly, the inclusion of a data source does not imply that it is actually currently available for connection.

5. An excellent example of an application that figures out data source names at runtime is dbish, discussed more fully in Chapter 8.

6. In general, this is true. However, some database systems, such as MySQL, support different users but only one schema.

7. Actually, the error message will be displayed twice for reasons that will be explained in the “Error Handling” section later in this chapter.

8. The exact behavior when both attributes are set may change in future versions. This is something to consider if the code is inside an eval.

9. If the Perl you are using was built with threading enabled, then each method entry line also shows the thread number, e.g., thr0. The DBI implements a per-driver mutex so that each DBD driver may only be entered by one thread at a time. Trace levels 4 and above show this in action.

10. Note that internal quotes are not escaped. That’s because neat() is designed to produce output for human readers, and to do so quickly since it’s used by the internal trace mechanisms. If you wish quote escaping to occur, you could use the quote() method instead.

11. 400 characters is the default value for the $DBI::neat_maxlen variable, which defines the default maximum length for the neat() function.

About the Authors

Alligator Descartes has been an itinerant fiddler with computers from a very early age, which was ruined only by obtaining a BSc in computer science from the University of Strathclyde, Glasgow. His computing credits include several years of Oracle DBA work, multi-user Virtual Reality servers, high-performance 3D graphics programming, and several Perl modules. He spends his spare time trudging around Scotland looking for stone circles and Pictish symbol stones to photograph. Alligator Descartes is not his real name.

Tim Bunce has been a perl5 porter since 1994, contributing to the development of the Perl language and many of its core modules. He is the author and maintainer of the DBI, DBD::Oracle,and Oracle::OCI modules, and author and co-maintainer of The Perl Module List. Tim is the founder and CTO of Data-Plan Services, a perl, database, and performance consultancy with an international client base. Prior to that we was Technical Director (CTO) of IG in the UK where he was awarded by British Telecom for his role in the rapid development of their Call Management Information service, a system implemented in Perl. He is co-author, along with Alligator Descartes, of Programming the Perl DBI, the definitive book on DBI, published by O’Reilly & Associates in February 2000.

Source of this material

This is Chapter 4: Programming the Perl DBI from the book Perl DBI (ISBN:1-56592-699-4) written by Alligator Descartes & Tim Bunce, published by O’Reilly & Associates.

To access the full Table of Contents for the book

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories