DatabaseHow to Create a Code Search Engine with PHP and MySQL

How to Create a Code Search Engine with PHP and MySQL

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

I’m just a few days away from launching a comprehensive support website for my book, “Beginning PHP and MySQL 5, Second Edition“, and among other features, have built a search engine for sifting through the more than 500 code snippets found throughout the book. This was an interesting exercise because it involves a number of storing a fairly significant amount of text within a MySQL database, using MySQL’s full-text search facility, and devising an effective way to extract and display the code in the browser.

In this article I’ll offer a simplified version of this search engine, introducing you to some compelling PHP and MySQL features along the way. You might adopt what you learn towards building your own search engine, or towards other applications.

The Database Schema

Just a single table is required for the engine’s operation. The table, code, serves as the code repository. Each example is stored along with a suitable title and the chapter number in which it appears. Because the search engine should retrieve examples based on keywords found in the example title or in the code itself, a FULLTEXT index has been added for these columns. Because the table contents will rarely change beyond the occasional bug fix, its backed by the read-optimized MyISAM storage engine. The table follows:

CREATE TABLE code (
 id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
 title VARCHAR(50) NOT NULL,
 chapter TINYINT UNSIGNED NOT NULL,
 code TEXT NOT NULL,
 FULLTEXT (title,code)
) TYPE = MYISAM;

Loading the Table

The downloadable zip file containing all of the book’s code should be easily navigable so readers can easily retrieve the desired example. To meet this requirement, the zip file contains a number of directories labeled according to chapter number (1, 2, 3, … 37), and each script is aptly named with a lowercase title and series of underscores, for example retrieving_array_keys.php. Therefore a script capable of dealing with these two organizational matters is required in order to automate the process of loading the scripts into the database.

You might recognize this task as one well suited for recursion, and indeed it is. The following script does the job nicely:

<?php

mysql_connect("localhost","gilmore","secret");

mysql_select_db("beginningphpandmysqlcom");

// Running on Windows or Linux/Unix?
$delimiter = strstr(PHP_OS, "WIN") ? "" : "/";

function parseCodeTree($path) {

  global $delimiter;

  if ($dir = opendir($path)) {
	
    while ($item = readdir($dir)) {

      // If $item is a directory, recurse
      if (is_dir($path.$delimiter.$item) && $item != "." && $item != "..") {
			
        //printf("Directory: %s <br />", $item);
        parseCodeTree($path.$delimiter.$item);

      // $item is a file, so insert it into database
      } elseif ($item != "." && $item != "..") {

        // Retrieve the chapter number
        $directory = substr(strrchr($path, "$delimiter"), 1);

        //printf("File: %s <br />", $item);

        // Convert the file name to a readable title
        $scriptTitle = str_replace(".php", "", $item);
        $scriptTitle = str_replace("_", " ", $scriptTitle);
		
        // Retrieve the file contents
        $scriptContents = file_get_contents($path.$delimiter.$item);

        // Insert the file information into database
        $query = "INSERT INTO code VALUES('NULL', '$scriptTitle', '$directory', '$scriptContents')";
        $result = mysql_query($query);

      }
    }
    closedir($dir);
  }
  return 1;
}

parseCodeTree('code');

?>

I’ve purposefully left two printf() statements in the script so you can view the script’s logic. Some sample output follows:

Directory: 4
File: array_key.php
File: is_array.php
Directory: 5
File: multidimensional_array.php
File: retrieving_array_keys.php
File: retrieving_array_values.php
File: slicing_an_array.php

Building the Search Engine

With the code and corresponding metadata inserted into the database, all that’s left to do is build the search engine. Believe it or not, this is perhaps the easiest part of the project, thanks to MySQL’s fulltext search capabilities. Although I’ve used the symfony framework to abstract the database interaction, for the purposes of this article I’ve used POPM (Plain Old PHP and MySQL) to build the search engine. The search form is exceedingly simple, and looks like this:

<form method="POST" action="search.php">
Search the code repository:<br />
<input type="text" id="keyword" name="keyword" /><br />
<input type="submit" value="Search!" />
</form>

The search script (search.php) looks something like this. Provided you’ve used PHP to interact with MySQL before, there shouldn’t be any surprises, except for perhaps the query itself. This query takes advantage of MySQL’s fulltext feature to compare the keyword against those columns that have been identified as searchable using MySQL’s fulltext conditions. These conditions can produce unexpected results without doing some advance reading, so be sure to peruse the appropriate section of the MySQL documentation before building your own queries.

<?php

  mysql_connect("localhost","gilmore","secret");
  mysql_select_db("beginningphpandmysqlcom");

  $keyword = mysql_real_escape_string($_POST['keyword']);

  // Perform the fulltext search
  $query = "SELECT id, title, chapter, code 
            FROM code WHERE MATCH(title, code) AGAINST ('$keyword')";

  $result = mysql_query($query);

  // If results were found, output them
  if (mysql_num_rows($result) > 0) {

    printf("Results: <br />");

    while ($row = mysql_fetch_array($result)) {

      printf("Chapter %s: <a href='displaycode.php?id=%s'>%s</a>", 
	      $row['chapter'], $row['id'], ucfirst($row['title']));

    }

  } else {
    printf("No results found");
  }

?>

Using the search form to search for code consisting of the keyword “array” would produce output similar to this:

Results: <br />
Chapter 5: <a href='displaycode.php?id=65'>Retrieving array keys</a>
Chapter 5: <a href='displaycode.php?id=54'>Creating an array</a>
Chapter 9: <a href='displaycode.php?id=97'>Converting an array to a delimited string</a>

Finally, the displaycode.php script is used to display the script contents. It looks like this:

<?php

  mysql_connect("localhost","gilmore","secret");
  mysql_select_db("beginningphpandmysqlcom");

  $id = mysql_real_escape_string($_GET['id']);

  $query = "SELECT id, title, chapter, code FROM code WHERE id='$id'";

  $result = mysql_query($query);

  // If results were found, output them
  if (mysql_num_rows($result) > 0) {

    $row = mysql_fetch_array($result);

    printf("<h4>Chapter %s - %s</h4>", $row['chapter'], ucfirst($row['title']));

    // Convert the newline characters and HTML entities before displaying
    printf("%s", ^nl2br^(htmlentities($row['code'])));

  } else {
    printf("No results found");
  }

?>

Clicking on the first result produces output similar to the following:

Chapter 5 - Retrieving array keys

<?php

$state["Delaware"] = "December 7, 1787";
$state["Pennsylvania"] = "December 12, 1787";
$state["New Jersey"] = "December 18, 1787";
$keys = array_keys($state);
print_r($keys);

?>

I hope this tutorial sheds some insight into how you can not only use MySQL’s fulltext search capabilities to perform powerful searches against your database, but also introduces some of PHP’s interesting text-related functions (nl2br(), htmlentities(), and ucfirst(), to name a few). Of course, one could easily extend what was demonstrated here to implement far more powerful search capabilities, boolean searches for instance. Be sure to check out the MySQL manual for a complete accounting of what’s possible!

About the Author

W. Jason Gilmore is Apress’ Open Source Editorial Director, and co-founder of IT Enlightenment. He’s the author of several books, including the bestselling “Beginning PHP and MySQL 5: Novice to Professional, Second Edition” (Apress, 2006. 913pp.). Jason loves receiving e-mail; so don’t hesitate to write him at wjATwjgilmore.com.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories