Data access via the Web
May 25, 1999
Data access via the Web
by Piroz Mohseni
This phenomenon perhaps was best exemplified (and publicized) by the Web version of the Federal Express package-tracking system. Any customer from anywhere could get tracking information from the Fed Ex Web site. The push for bringing databases to the Web world is still going on. In fact, some corporate databases use the Web as the only user interface. We are, however, seeing another phenomenon emerging, and that is applications that act as integrators for other applications.
Suppose a company has three major database systems. Call them A, B, and C. Each one has been Web-enabled via various technologies such as servlets, CGI, and application servers (e.g., Cold Fusion). It is now conceivable to write a Web application that uses systems A, B, and C as its data sources and provides a consistent user interface to all three. You no longer need direct access to a database via technologies like ODBC, JDBC, and CORBA, but can use the Web interface that was already created for the database to access it. Granted, this integration scheme will not work for all cases. For example, if the data or feature you need is not Web-enabled, then you need to find other ways to access the needed data.
Let's take an integrated package-tracking system. You can ask the user to type the tracking number and select the shipper from a pull-down menu. Your program would then access the appropriate Web site and interact with it as if a user was visiting the site. It would retrieve the tracking information and present it to the user. The user only will see your user interface; the back-end processing is hidden. Fortunately, most programming languages do provide support for HTTP, which means you can simulate a browser programmatically.
In order to do this, you need to familiarize yourself with the site you want to integrate into your application. You need to find out what program processes which HTML form and what the output of each program is. In HTTP, FORM data is sent to a back-end program (e.g., CGI, servlet) via either a GET method or a POST method. Once you have this information, you are ready to begin your integration. For discussion purposes, we assume there is an HTML form that accepts a last name and an ID number and will return the person's telephone, fax, and e-mail. We will show code fragments for how such a simple interaction can be accomplished in Perl (CGI) and Java (servlet). Here is the HTML form that is probably displayed in a page with advertisements and other promotional material surrounding it.
PerlIt is very easy to simulate a browser in Perl. Almost all relevant HTTP-related functions are wrapped in a package called Library for WWW access in Perl (LWP). Specifically, for our case, we use LWP::UserAgent, which is simply a mini-browser for our purposes. Since you are accessing the Web resources programmatically (not through a GUI-based browser) you have to become familiar with how the HTTP protocol works. After the user fills out the HTML and clicks the Submit button, the content of that form is wrapped in the form of an HTTP Request message and sent to the server (specified by the action attribute). In this case, since we don't specify a server explicitly, it goes to the same server that hosted the HTML form. The server in turn passes on the information to the CGI program called getinfo.cgi. The CGI program will do some processing and generate an output that the server sends back to the browser (in this case our program) in the form of an HTTP Response message. Our program must then interpret that response and extract the information it needs. Note that the response contains HTML. Our data is mixed with HTML tags, so our code must do some data extraction. For example, we may get a response like this:
User InformationTelephone: 123-123-1234
The Perl program must know how to extract the relevant information out of the above response. Hopefully, in the near future, responses would be in the form of XML (to represent the data) and XSL/Style Sheets (to represent the formatting). This would make the process of data extraction much easier.
Here is the Perl code fragment for our example.
JavaThe browser interaction can also be done programmatically in Java. There is probably a Java class somewhere that encapsulates HTTP messages. In the code linked here, we construct the message manually. We open a stream to send the request and read the response in another stream, line by line. The program must contain some string matching to extract the data it needs from the stream.
Legal considerationsNote that what I have just described, is strictly a technical proposal. There are legal ramifications when you integrate applications in this manner. Although the "integrated" sites still receive hits, the user no longer sees their advertisement. Certain sites, make information available on their Web site for "casual" customers and require other customers to pay for a service that provides access to their database. Before you design your next integrated application, make sure the legal matters, if any, are resolved.
ConclusionThe ability to integrate several applications into one can be a powerful and useful addition to your applications. With the emergence of XML and related standards, we will see more technologies geared towards this market.
Currently, the biggest challenge with this type of integration is data extraction, because HTML data mixes presentation tags and the actual data. Furthermore, if the Web site changes its design, the data extraction portions of your code will most likely have to be rewritten. Despite the challenges, due to the nature of the Web, we will probably see more and more of this type of integration.
About the authorPiroz Mohseni is president of Bita Technologies, which focuses on business improvement through the effective use of technology. His areas of interest include enterprise Java, XML, and e-commerce applications.
Page 1 of 3