Creating a Custom RSS Feed Aggregator
Rich Site Summary feeds, better known as RSS, are a great example of how XML is making a major impact in regards to the way information is consumed. This XML dialect is a popular format for summarizing information, typically, but not exclusively news-oriented data consisting of common attributes like a title, summary, author, and publish date. Building and distributing RSS feeds has become an increasingly popular practice for several reasons:
- Competition: As the number of Web sites vying for your attention continues to grow, alternative methods for making their information available to consumers are always under consideration.
- Information Overload: Obviously, consumers are finding themselves increasingly dependent upon an ever-growing number of Web sites. That said, navigating from site to site in a linear fashion just isn't cutting it anymore. Viewing site summaries sans often-distracting graphics, advertisements, and additional extraneous material is a great way to quickly sift through the required information with a minimum investment of time.
- Ubiquity: XML-based formats offer a clean separation of data and presentation, resulting in easy transformation to suit the widely varying requirements of various media distribution outlets (standard Web browser, cell phone, pager, email, and so forth).
While the use of RSS feeds were once relegated to the dork-elite, they seem to be popping up everywhere these days. Yahoo!, the Christian Science Monitor, CNET News.com, and The BBC are just a few of the Web sites to recently make RSS feeds available to their readers.
Note. For those of you completely new to RSS, take a moment to load Yahoo's Technology feed (http://rss.news.yahoo.com/rss/tech) into your browser. You'll quickly recognize a well-structured data format that lends itself to presentational transformation, typical of XML dialects. For a complete dissertation on the matter, execute a quick search in your favorite search engine; you'll find more tutorials than you can shake a keyboard at.
In this article, I'll show you how to provide your Web site users with a customizable RSS feed service using PHP, the MySQL database server, and the Magpie RSS parser. Although I'll expect you to have at least rudimentary experience working with both PHP and MySQL, the examples should be easy enough for a beginner to comprehend quite easily. While the majority of you are likely unfamiliar with the Magpie RSS Parser, I'd like to offer some additional information regarding this great tool.
The Magpie RSS Parser was created by Kellan Elliott-McCrea in late 2002 to satisfy what he perceived was a lack of practical PHP-based RSS aggregation solutions. The result was a wonderfully capable tool offering a bevy of valuable features, some of which include:
- Object-oriented design: The object-oriented, modularized code allows you to easily integrate the aggregation features into pre-existing applications.
- Highly configurable: Magpie's aggregation and caching behavior is easily modified through a well-thought-out configuration strategy.
- Feed caching: This very cool feature will cache RSS feeds locally (to the server), conserving bandwidth and increasing application performance.
Distributed under the GPL license, you're free to use the software without charge and as you please, provided that you abide by the license terms and conditions. Its only requirement is a recent version of PHP (4.0+) compiled with XML (expat) support.
The RSS Feeds
Let's start with the application content. What types of RSS feeds would you like to provide to your users? Finding RSS feeds is as easy as perusing your favorite search engine: Just enter "RSS" along with some other choice topic such as "technology," "science," or "sports." For the exceedingly lazy (never a bad trait in the programming industry), browse through one of the many RSS aggregators popping up around the Web. Feedster (http://www.feedster.com/) is one of my personal favorites. For the purposes of this tutorial, I'll use the following feeds:
- Yahoo! Top Stories (http://rss.news.yahoo.com/rss/topstories)
- MSDN, Recent Technical Articles (http://msdn.microsoft.com/rss.xml)
- Infoworld Latest News (http://www.infoworld.com/rss/news.rdf)
- PCWorld Latest News (http://rss.pcworld.com/rss/latestnews.rss)
- eWeek Technology News (http://rssnewsapps.ziffdavis.com/tech.xml)
You should keep in mind that some RSS publishers require permission prior to making use of their feeds for commercial purposes. Therefore, always take care to review any usage clauses prior to deployment.
Make note of the feed locations, recording a title, URL, and if you wish, a description. In the next section, we'll create the database table that will house this information.
The MySQL Database
To implement our custom RSS service, just three database tables are required. In this section, I'll introduce all three.
The first table, rssfeed, will store the RSS feed information. For sake of example, we'll store just three items: a unique ID, title, and URL. In a more complex application, you might store other details, such as a description, the date the feed was added to the aggregator, the number of seconds to wait between subsequent retrievals of the feed, and other relevant information.
mysql>CREATE TABLE rssfeed ( >rowID tinyint unsigned not null auto_increment, >title varchar(150) not null, >url varchar(150) not null, >primary key(rowID) >);
Table 1-1 shows this table's contents once the chosen RSS feeds have been added:
|1||Yahoo! Top Stories||http://rss.news.yahoo.com/rss/topstories|
|2||MSDN, Technical Articles||http://msdn.microsoft.com/rss.xml|
|3||Infoworld Latest News||http://www.infoworld.com/rss/news.rdf|
|4||PCWorld Latest News||http://rss.pcworld.com/rss/latestnews.rss|
|5||eWeek Technology News||http://rssnewsapps.ziffdavis.com/tech.xml|
The second table, user, stores information about the users who will make use of the RSS aggregator. Each must be uniquely identifiable so that we can provide custom feeds; to do so, each user is identified by a simple integer value. The user will need to log in before he can both manage his preferred feeds, as well as view them, so his e-mail address and a password will also be stored. In the interests of security, the password is stored as an irreversible md5 hash consisting of 32 characters. Additionally, like the rss table, in a real-world application, this user table is likely to be substantially more complicated; however, the parts necessary for implementing our aggregation mechanism are available in our table.
mysql>CREATE TABLE user ( >rowID smallint unsigned not null auto_increment, >email varchar(55) not null, >pswd varchar(32) not null, >primary key(rowID) >);
For purposes of this example, Table 1-2 displays the sample user information stored in this table.
Table 1-2: Sample user information
The third and final table, user_to_rss_feed, binds the users to their chosen RSS feeds. This table consists of just two columns: userid, which identifies the user; and rssid, which identifies the RSS feed.
mysql>CREATE TABLE user_to_rss_feed ( >userid smallint unsigned not null auto_increment, >rssid tinyint unsigned not null >);
Table 1-3 offers a simulation of this table's contents after a few of our users have selected their favorite feeds.
Table 1-3: User/RSS feed mappings