Oh man, do people want this! I get so many letters asking for a search engine that will search just the user’s site. Well, here ya go. This will search whatever directory you want, list the results, and even recognize if any results were found or not. The script is fairly robust, checking not only the title and file, but also the full file text. I’m giving you a fairly bare-bones results page. You can gussy it up all you want. I’m interested in the script more than its appearance here.
How about we have a look-see, huh? This script is set up to search the directory that contains all of the PERL primers. If you want to be sure to get some results, search “perl” or “Joe”. If you want to be sure to not get results, try searching “zorkbabble”. I don’t think that appears.
The HTML and PERL Code
The HTML could not be easier. It’s a simple on text box form with a submit button.
Be sure to change out that ACTION statement so that your HTML attaches to your CGI.
I’ve done my best to explain as much of the script as possible in the script itself. I’ve commented out all of the explanation so you can go ahead and paste it as you see it. Be aware there a multiple points in the script where you have to put in your own paths. Don’t just copy, paste, upload and hope it’ll work. As has been the case in the last couple of primers, I’ll open the script in a new window so you can look back and forth.
After you alter the two files, upload them. Put the HTML document in with your other HTML documents. If you set modifications on your HTML, set it to 644. Upload the script to your bin and set the modification to 755.
What It All Means
Hopefully you have the script open in a new window. If not, click here:
See the PERL Code
First off, there’s our old friend once again, the module that accepts and delineates the text from the form. It’s got an easy time of it on this one I should say. We’re only sending it one piece of text. No matter. We need it to delineate that one piece. Now let’s get on to the portion of the script that searches. That’s the real fun.
#Set the form input to a variable $keyword=$FORM{keyword}; |
This is a quick line of code just to make the script easier for you to read. I’m simply taking the value returned from the form and assigning it to a scalar variable I called “$keyword”. That’s all.
#Start the HTML with some search engine type words print "Content-type: text/htmlnn"; print "<h2>Here are the files we found</h2>nn"; |
As it says, we’re just starting the results page here. You can alter this to your heart’s content, make it as stunning as you’d like.
#Change the directory to the one you want to search - use absolute path chdir("/this/must/be/an/absolute/path"); |
Here’s the first real concern. The command chdir means to change directories. If this line wasn’t in there, the script would search the directory it’s sitting in. That would be the cgi-bin and I doubt the files you want searched are in there.
Set this path to the directory you want searched. Again, this MUST BE an absolute path to the directory. I explained this a couple of primers back, but one more time…telnet into the directory you want to search. Type “pwd” (without the quotes) at the prompt and hit enter. The return will show for you the absolute path to that directory. Write it down for later. That’s another tip I learned the hard way.
#Open it opendir(DIR, "."); |
The directory is opened by the script using opendir. We assign the filehandle “DIR” to the directory and then denote what directory is to be read. That single dot means to search the current directory. That currect directory is the one we set in the previous chdir command.
Yes, I know it seems like we should just be able to set the directory right there. Without a ton of explanation, it doesn’t work. By setting it here, we’re too late in the process. A directory has already been opened. A path here will be ignored.
#Start some loops that look for files with .html while($file = readdir(DIR)) { next if ($file !~ /.html/); open(FILE, $file); $foundone = 0; $title = ""; while (<FILE>) { |
This is new. This piece of code starts with a loop. A loop is a bit of code that runs again and again until some condition is met. This is a while loop which means that the loops will continue while something is true or available. Notice that the while statement is followed by curly brackets. The commands inside the curly brackets are what will happen while the condition is true.
While the scalar variable $file equal readdir(DIR) something will happen. Think of that line as, while there ae still files in the directory, we’ll loop through again. That’s how the script goes about reading every file in the directory rather than shutting down after one.
We go on to a branching statement. The condition is set upon the next element ($file). If the file has the .html extension. That (!~) is known as a binding operator. It means “contains”. There is also (=~) which mean “equals”. We don’t want to search images. This line will exclude them from the search. So the whole statement reads, if the next file in line has “.html” in its name…
…Open it. See the format? It’s just like the format to open the directory except we’re opening a file.
Next, two variables are created and set to nothing because we’ll need them later. The variable “$foundone” will increase if the keyword is found and “$title” will get the value of the file’s title.
Another loop kicks in now that we’re inside the file. See that? while(
#If the keyword is found, set $foundone to one if (/$keyword/i) { $foundone = 1; } |
One more condition. If $foundone is equal to 1, then we must have found a file that contains the keyword. So let’s print it to the page.
Notice first off, that we are only printing the file name, so you need to enter in the additional path required to get to the file. The leading slash will pass the domain (www.something.com) for you, then you need to add whatever is needed after that. Do not use the absolute path here. Once again, do not use the absolute path here.
There’s a <BR> at the end of the printed line so that all of the returns print on new lines.
Finally I set a variable “$listed” to one. The purpose of this variable is simply to let me know if anything had been found. The only way this variable can even come into play is if something is printed. If the variable doesn’t exist, I know nothing has been found. I can use that information in a moment.
Close up the file, we’re done. The final curly bracket closes up the main while loop.
#If there is a title, chop it and take the text between the two flags if(/<TITLE>/) { chop; $title = $_; $title =~ s/<TITLE>//g; $title =~ s/</TITLE>//g; } |
Let’s look for a title. If we find one on the code, (notice no little i this time), we chop it, meaning we knock off the last character.
That title is then assigned to the PERL default variable “$_”. That variable is a real workhourse. Anytime you need to hold something, but don’t want to assign it to any variable name for later, use this. Think of it as a temp directory.
The next two lines use that binding operator (=~). Basically the two lines are saying get rid of the twi title flags (s and g aid in the process).
Now all we’re left with is the title text. Cool, huh?
#No title? Fine. Use the file name } if($title eq "") { $title = $file; } |
To keep the script straight, the first curly bracket ends the second while loop.
If there is no title, the title will equal (eq) nothing (“”). If that’s the case, set $title to the name of the file for display purposes.
#Print the title and file name so it is a link back to the file, set $listed to 1 if($foundone) { print "<A HREF=/sub/sub/sub/$file>$title</A><BR>"; $listed=1; } close(FILE); } |
One more condition. If $foundone is equal to 1, then we must have found a file that contains the keyword. So let’s print it to the page.
Notice first off, that we are only printing the file name, so you need to enter in the additional path required to get to the file. The leading slash will pass the domain (www.something.com) for you, then you need to add whatever is needed after that. Do not use the absolute path here. Once again, do not use the absolute path here.
There’s a <BR> at the end of the printed line so that all of the returns print on new lines.
Finally I set a variable “$listed” to one. The purpose of this variable is simply to let me know if anything had been found. The only way this variable can even come into play is if something is printed. If the variable doesn’t exist, I know nothing has been found. I can use that information in a moment.
Close up the file, we’re done. The final curly bracket closes up the main while loop.
#Close the directory after looping through all the files closedir(DIR); |
Once every file has been gone through, close up the directory.
# Print one line if no results, another if results were found if($listed ne 1) {print "Sorry, nothing this time. <A HREF=/sub/sub/sub/searchengine.html>Try again</A>";} else {print "<P>That's all. Do you want a <A HREF=/sub/sub/sub/searchengine.html>new search? </A>";} |
Are you quite familiar with these branching deals yet? This is a fairly simply black or white condition. Either “$listed” exists or it doesn’t. If it exists, we know it’ll be set to one. We’ll test if it doesn’t equal one (ne – not equal) first. If it doesn’t, then it must not exist, the script didn’t find any files, and we print “Sorry, nothing this time” and then offer a path back to the search engine HTML page.
Notice you have to change out the path here to get people back to your HTML search page. DO NOT use the absolute path here.
If the condition is not true the script must have foudn some files. We go to the else statement and the line “That’s all. Do you want another search” pops up at the end of the list of returns. Again, change out the paths so that someone can click to go back to your search engine.
exit;
The end.
That’s All
Believe it or not. This is a great little script that will allow you to set up a basic search on your own site. Use it in good health.
Primer Ten Assignment
Some of you will find this unfair, but I’m going to do it anyway since all you have to do is click a link to get the answer.
Anyay, here’s a new piece of coding, (++). Those two plus signs act to increase a variable value by one. So, set up a variable, set it to zero. Then keep a running count of how many results you find. The format for getting the count to increase is to assign the increase to yet another variable.
So, if you choose $count as your starter, you increase that by something like $finalcount = $count++.
No – I did not give you the answer. The really, really hard part is the placement of those elements so you get a true count.