DatabaseCreating a Custom RSS Feed Aggregator

Creating a Custom RSS Feed Aggregator

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Rich Site Summary feeds, better known as RSS, are a great example of how XML is making a major impact in regards to the way information is consumed. This XML dialect is a popular format for summarizing information, typically, but not exclusively news-oriented data consisting of common attributes like a title, summary, author, and publish date. Building and distributing RSS feeds has become an increasingly popular practice for several reasons:

  • Competition: As the number of Web sites vying for your attention continues to grow, alternative methods for making their information available to consumers are always under consideration.
  • Information Overload: Obviously, consumers are finding themselves increasingly dependent upon an ever-growing number of Web sites. That said, navigating from site to site in a linear fashion just isn’t cutting it anymore. Viewing site summaries sans often-distracting graphics, advertisements, and additional extraneous material is a great way to quickly sift through the required information with a minimum investment of time.
  • Ubiquity: XML-based formats offer a clean separation of data and presentation, resulting in easy transformation to suit the widely varying requirements of various media distribution outlets (standard Web browser, cell phone, pager, email, and so forth).

While the use of RSS feeds were once relegated to the dork-elite, they seem to be popping up everywhere these days. Yahoo!, the Christian Science Monitor, CNET News.com, and The BBC are just a few of the Web sites to recently make RSS feeds available to their readers.

Note. For those of you completely new to RSS, take a moment to load Yahoo’s Technology feed (http://rss.news.yahoo.com/rss/tech) into your browser. You’ll quickly recognize a well-structured data format that lends itself to presentational transformation, typical of XML dialects. For a complete dissertation on the matter, execute a quick search in your favorite search engine; you’ll find more tutorials than you can shake a keyboard at.

In this article, I’ll show you how to provide your Web site users with a customizable RSS feed service using PHP, the MySQL database server, and the Magpie RSS parser. Although I’ll expect you to have at least rudimentary experience working with both PHP and MySQL, the examples should be easy enough for a beginner to comprehend quite easily. While the majority of you are likely unfamiliar with the Magpie RSS Parser, I’d like to offer some additional information regarding this great tool.

Magpie RSS

The Magpie RSS Parser was created by Kellan Elliott-McCrea in late 2002 to satisfy what he perceived was a lack of practical PHP-based RSS aggregation solutions. The result was a wonderfully capable tool offering a bevy of valuable features, some of which include:

  • Object-oriented design: The object-oriented, modularized code allows you to easily integrate the aggregation features into pre-existing applications.
  • Highly configurable: Magpie’s aggregation and caching behavior is easily modified through a well-thought-out configuration strategy.
  • Feed caching: This very cool feature will cache RSS feeds locally (to the server), conserving bandwidth and increasing application performance.

Distributed under the GPL license, you’re free to use the software without charge and as you please, provided that you abide by the license terms and conditions. Its only requirement is a recent version of PHP (4.0+) compiled with XML (expat) support.

The RSS Feeds

Let’s start with the application content. What types of RSS feeds would you like to provide to your users? Finding RSS feeds is as easy as perusing your favorite search engine: Just enter “RSS” along with some other choice topic such as “technology,” “science,” or “sports.” For the exceedingly lazy (never a bad trait in the programming industry), browse through one of the many RSS aggregators popping up around the Web. Feedster (http://www.feedster.com/) is one of my personal favorites. For the purposes of this tutorial, I’ll use the following feeds:

You should keep in mind that some RSS publishers require permission prior to making use of their feeds for commercial purposes. Therefore, always take care to review any usage clauses prior to deployment.

Make note of the feed locations, recording a title, URL, and if you wish, a description. In the next section, we’ll create the database table that will house this information.

The MySQL Database

To implement our custom RSS service, just three database tables are required. In this section, I’ll introduce all three.

rssfeed

The first table, rssfeed, will store the RSS feed information. For sake of example, we’ll store just three items: a unique ID, title, and URL. In a more complex application, you might store other details, such as a description, the date the feed was added to the aggregator, the number of seconds to wait between subsequent retrievals of the feed, and other relevant information.

mysql>CREATE TABLE rssfeed (
     >rowID tinyint unsigned not null auto_increment,
     >title varchar(150) not null,
     >url varchar(150) not null,
     >primary key(rowID)
     >);

Table 1-1 shows this table’s contents once the chosen RSS feeds have been added:

Row ID Title URL
1 Yahoo! Top Stories http://rss.news.yahoo.com/rss/topstories
2 MSDN, Technical Articles http://msdn.microsoft.com/rss.xml
3 Infoworld Latest News http://www.infoworld.com/rss/news.rdf
4 PCWorld Latest News http://rss.pcworld.com/rss/latestnews.rss
5 eWeek Technology News http://rssnewsapps.ziffdavis.com/tech.xml

user

The second table, user, stores information about the users who will make use of the RSS aggregator. Each must be uniquely identifiable so that we can provide custom feeds; to do so, each user is identified by a simple integer value. The user will need to log in before he can both manage his preferred feeds, as well as view them, so his e-mail address and a password will also be stored. In the interests of security, the password is stored as an irreversible md5 hash consisting of 32 characters. Additionally, like the rss table, in a real-world application, this user table is likely to be substantially more complicated; however, the parts necessary for implementing our aggregation mechanism are available in our table.

mysql>CREATE TABLE user (
     >rowID smallint unsigned not null auto_increment,
     >email varchar(55) not null,
     >pswd varchar(32) not null,
     >primary key(rowID)
     >);

For purposes of this example, Table 1-2 displays the sample user information stored in this table.

Table 1-2: Sample user information

Row ID E-Mail Password
1 wj@wjgilmore.com 5ebe2294ecd0e0f08eab7690d2a6ee69
2 manny@wjgilmore.com 6f1ed002ab5595859014ebf0951522d9
3 moe@wjgilmore.com 7f33334d4c2f6dd6ffc701944cec2f1c

user_to_rss_feed

The third and final table, user_to_rss_feed, binds the users to their chosen RSS feeds. This table consists of just two columns: userid, which identifies the user; and rssid, which identifies the RSS feed.

mysql>CREATE TABLE user_to_rss_feed (
     >userid smallint unsigned not null auto_increment,
     >rssid tinyint unsigned not null
     >);

Table 1-3 offers a simulation of this table’s contents after a few of our users have selected their favorite feeds.

Table 1-3: User/RSS feed mappings

userid rssid
1 1
1 3
2 1
2 4
3 5

The PHP Script

The final task is to create the script that will display the user’s chosen RSS feeds. This script, titled myrss.php, is surprisingly simple, consisting of just two main components:

  • Login: The user will need to log in to view his customized feeds. This is accomplished by using a simple form and some logic that will confirm the supplied e-mail address and password.
  • RSS Aggregation and Delivery: Assuming a valid login (or valid session; see next paragraph), we’ll use the Magpie RSS Parser to display the user’s chosen RSS feeds.

To save users the hassle of having to repeatedly log in every time they need an RSS “fix,” I make use of PHP’s session-handling feature. Once successfully logged in, the user’s unique identifier is stored as a session variable for later retrieval. Keep in mind that the storage duration is entirely dependent upon how your PHP installation’s session functionality is configured.

Listings 1-1 and 1-2 offer the commented code for both components. Both components are assembled and available for viewing via a link posted at the conclusion of Listing 1-2.

Listing 1-1: The Login Component

// Start or continue a session

session_start();


// Has the user

if (! isset($_SESSION['userid']))
{

   if (! isset($_POST['email']))
   {

      echo "<form action='myrss.php' method='post'>";
      echo "Email:<br />";
      echo "<input type='text' name='email' size='20'
                   maxlength='55' value='' /><br />";
      echo "Password:<br />";
      echo "<input type='password' name='pswd' size='20'
                   maxlength='20' value='' /><br />";
      echo "<input type='submit' value='login'>";
      echo "</form>";

   } else {

      mysql_connect("localhost","aggregator","secret");
      mysql_select_db("rssfeeds");

      $email = $_POST['email'];
      $pswd = md5($_POST['pswd']);

      $query = "SELECT rowID, email, pswd FROM user 
                WHERE email='$email' AND '$pswd'";
      $result = mysql_query($query);

      if (mysql_numrows($result) != 1)
      {
         echo "<p>Could not login!</p>";
      } else {
         list($rowID, $email, $pswd) = mysql_fetch_row($result);
         $_SESSION['userid'] = $rowID;
      }

      mysql_close();

   } // end isset[email]

}

Listing 1-2. The RSS Aggregation and Delivery Component

if (isset($_SESSION['userid'])) {

   require_once("rss_fetch.inc");

   mysql_connect("localhost","aggregator","secret");
   mysql_select_db("rssfeeds");	

   $userID = $_SESSION['userid'];

   $query = "SELECT rss.title, rss.url
             FROM rssfeed AS rss, user_to_rss_feed as userrss
             WHERE userrss.userid='$userID' 
             AND rss.rowID=userrss.rssid";

   $result = mysql_query($query);

   while(list($title,$url) = mysql_fetch_row($result)) 
   {

      $url = "$url";
      $rss = fetch_rss( $url );

      echo "<strong>" . $rss->channel['title'] . "</strong><p>";
      echo "<ul>";
      foreach ($rss->items as $item) 
      {
         $href = $item['link'];
         $title = $item['title'];
         echo "<li><a href=$href>$title</a></li>";
      }

      echo "</ul>";

   }

   mysql_close();

}

About the Author

W. Jason Gilmore (http://www.wjgilmore.com/
) is an Internet application developer for the Fisher College of Business. He’s the author of the upcoming book, PHP 5 and MySQL: Novice to Pro, due out by Apress in 2004.
His work has been featured within many of the computing industry’s leading
publications, including Linux Magazine, O’Reillynet, Devshed, Zend.com, and
Webreview. Jason is also the author of A Programmer’s Introduction to PHP
4.0 (453pp., Apress). Along with colleague Jon Shoberg, he’s co-author of
“Out in the Open,” a monthly column published within Linux magazine.

# # #

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories