LanguagesPerlThe automated Web site

The automated Web site


There’s a saying in the Web business that a Web site is like a baby — everyone loves it until it needs changing. With the help of server-side includes (SSIs) and some simple Perl scripts, however, it’s possible to take most of the hassle out of updating your Web site.

For the purposes of this article, I’m going to create a hypothetical Web site called Tech Tips and Reviews. This site will contain product reviews and advice on a variety of topics, and I’m going to set it up so that it can easily be updated by myself or any number of colleagues in other locations.

(Note: Source code references in this article will all open in the same window, which the link above opens. Just so you know.)

Let’s take a look at the front page of this site. I’ve kept it deliberately simple — the user selects an area and a topic, and a simple JavaScript function (see page source for details) navigates to the page selected. The way to keep it easily maintainable is to use SSIs for everything that will need to be updated, including the page header (header.html), the page footer (footer.html) and the contents of the pull-down menus.

Different types of Web servers handle SSIs differently. On Apache and Netscape servers, you’ll need to edit the configuration files to let the server know to parse HTML files containing SSIs. For more information on server configuration, see Apache 1.3 User’s Guide or consult your local sysadmin.

The advantage of using SSIs for header and footer files is that I can call header.html and footer.html on every page in the site; and if I want to change them, I only have to change them once. The file header.html contains the tag for every page, so I can control the page color, text color, and so forth from a central location and maintain a consistent look across the whole site.

In order to allow people in other locations to add content to the site and change SSIs, I’m going to create a Web interface and use HTTP authentication to password-protect this area. There are a number of ways to limit access to Web pages — I could write a Perl script to check the

REMOTE-USER
environment variable and only allow access to users with certain IP addresses, or I could use a JavaScript password program such as Gate Keeper, or I could use HTTP authentication. None of these is foolproof, but I’m not going to be storing any life-threatening secrets here, so I won’t worry about the industrial-grade hackers — and for the purposes of this article, I’ll leave it out entirely.

The page for adding content to this site is called add_content.html. It uses a form to allow contributors to input tips or reviews in existing categories or to add new categories. A script called addcontent.cgi parses the form, creates a Web page for the new content (and a new directory if necessary), updates the index page for the pertinent section, and updates the pull-down menus on the front page if necessary. (addcontent.cgi should be called when the “Add Content” button is pressed, but for our purposes I’ve disabled it; because if I left it enabled, I’m sure one of you crafty hackers out there would figure out a way to wreak havoc with it.) Let’s take a closer look at addcontent.cgi.

The functions

getFormData
and

saveFormData
parse the form and save the information it contains in a temporary file. The function

readFormData
is where the heavy lifting is done. The first thing it does is get the various pieces of form information from the temporary data file and assign them to variables for easy reference. Then it checks to see if any new content types or topics have been created, by looking at the variables

$new_type
and

$new_topic
and testing whether they’re blank or not. If a new type or topic has been entered, the program modifies the name so that it doesn’t include any white space or odd characters that would cause a Web browser to choke, using regular expressions.


$new_type_clean = $new_type;
$new_type_clean =~ s!s!!g;
$new_type_clean =~ s!W!!g;

For more info on regular expressions, see the Perl Regular Expression Tutorial.

For example, if a contributor added a topic called “Regular Expressions,” it would be changed to “RegularExpressions.” To update the pull-down menus on the front page, the program opens the file containing the SSI that populates the menu and appends a new line containing the form tag <OPTION>, the browser-readable name of the directory, and the full name. As a safeguard against carelessness, the function then uses the

-d
filetest to make sure the new type or topic doesn’t already exist, and then creates it.


if (-d “$type”) {
0;
} else {
`mkdir $type`;
}

Next the program creates an HTML page for the new content. In order to give the page a unique filename, I create an array called

@filelist
and populate it with the contents of the directory in which the content will live, using ls command. Then I assign a new filename by using the

$#
special variable to get the number of the last item in the array.


@filelist =`ls $type/$topic`;
$filename = $#filelist + 1;

This means that if there are six items in the directory, the new file will be called 7.html. I open 7.html with a filehandle and write in all the HTML it needs, including a call to the SSI header.html at the top and footer.html at the bottom, with the title of the content piece, the author’s name, the date, and the piece itself in the middle.

Now the new content file has been created, and the last step is to update the index page for the particular directory in which the new content lives. The

archive
function carries out this action.

In the

archive
function, I create an array called

@files
, which lists all the files in the directory (in reverse order, so the most recent will be at the top). Then I set up a

foreach
loop to cycle through all the files. This is the loop that will generate a link on the index page for each of the content pieces in the directory, so I add a

next
statement to skip the file if it’s a backup or an index.


next if ($f eq “index.html” || $f =~ /(.*)(~)/);

I look at each file and find the lines containing the title, author name, and date, and print them all into the index file with the appropriate HTML, add the footer, and it’s done.

Now I can add content quickly and painlessly; but to make the site really flexible, I should allow other people to modify the header and footer files through a Web interface as well. To this end, I’ve set up a page called configure.cgi. This dynamically generated page displays the current contents of header.html and allows the user to modify the various elements. There are two CGI programs associated with this page — configure.cgi to draw the page and header.cgi to parse the form and make the changes that have been entered. Again, I’ve had to disable this page for public consumption.

The only missing piece remaining is a form to modify the footer as well, and I leave that as an exercise for you.

A few notes on troubleshooting: All the scripts in this article are written to work on Unix systems; if you’re working on a Windows, Macintosh or other system, you’ll need to make some minor modifications to the system calls. If you find yourself getting server errors, nine times out of ten it’ll be due to permissions problems. Make sure all your .cgi files are executable and make sure that all the file and directories which your programs write to are writeable. If you really get stuck, try the developer.com Perl discussion group. Happy coding!

Steve Renaker is part of the implementation team at Razorfish Inc. and has been fiddling with the Web since its earliest days. Previous publications include The Official Gamelan Java Directory and articles for Java Report. He can be reached at srenaker@razorfish.com.


Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories