December 20, 2014
Hot Topics:

Consuming RSS Feeds with Ruby

  • March 11, 2008
  • By W. Jason Gilmore
  • Send Email »
  • More Articles »

Last year, noted blogger and technical evangelist Robert Scoble made headlines by explaining how he manages to read a staggering 622 RSS (Rich Site Summary) feeds each and every morning. Although attempting to digest this much information on a regular basis is probably overkill for most, it's a testament to the efficiency boost gained from subscribing to RSS feeds in lieu of navigating from one web site to the next.

But what exactly is an RSS feed, and how does one go about consuming them? In this tutorial, I'll show you how to use Ruby to retrieve and parse RSS feeds from your favorite web sites. You can use what you learn here to do something as simple as including a favorite RSS feed on your web site, or as the basis for building your own custom RSS aggregator!

RSS Internals

RSS was created almost a decade ago by Netscape for use on their My Netscape portal, which made it possible for users to customize their home pages with a variety of custom data (at the time a cutting-edge development). This XML-based format made it possible for content publishers to distribute information in a format-agnostic manner, allowing others to integrate this content into then web sites with relative ease. That is, easy if you understand RSS' XML dialect.

If you open any RSS feed within a text editor, you'll see it contains a bunch of slightly confusing tags that delimit data identified as titles, URLs, dates, creators, and descriptions, among others. For example, here's a snippet from my blog's RSS feed:

<item>
   <title>Adding Multiple Markers with YM4R</title>

   <link>http://www.wjgilmore.com/?p=38</link>
   <comments>
      http://www.wjgilmore.com/?p=38#comments
   </comments>
   <pubDate>Thu, 06 Mar 2008 14:39:29 +0000</pubDate>
   <dc:creator>wjgilmore</dc:creator>

   <guid isPermaLink="false">
      http://www.wjgilmore.com/?p=38
   </guid>
   <description><![CDATA[In this post you'll learn how to add
      multiple markers...]]></description>
   <content:encoded><![CDATA[In this post you'll learn how to
      add multiple markers...]]></content:encoded>
   <wfw:commentRss>
      http://www.wjgilmore.com/?feed=rss2&amp;p=38
   </wfw:commentRss>

</item>

Therefore, to parse and format a feed, you need to iterate over the tags found in the document, and understand the context of the content found within. Writing these sorts of capabilities from scratch can be a real chore; however, with Ruby much of the work has already been done for you!

Using Ruby to Consume RSS Feeds

RSS parsing has become so commonplace that the capability is built directly into the Ruby language. Once included in your script, the rss module will take care of all of the heavy lifting involved in parsing the feed, in the end providing you with an object from which you can access the various RSS elements. The following script will do exactly this, retrieving my blog's RSS feed, and outputting some information about the feed:

# Provides RSS parsing capabilities
require 'rss'

# Allows open to access remote files
require 'open-uri'

# What feed are we parsing?
rss_feed = "http://feeds.feedburner.com/WJasonGilmore"

# Variable for storing feed content
rss_content = ""

# Read the feed into rss_content
open(rss_feed) do |f|
   rss_content = f.read
end

# Parse the feed, dumping its contents to rss
rss = RSS::Parser.parse(rss_content, false)

# Output the feed title and website URL
puts "Title: #{rss.channel.title}"
puts "RSS URL: #{rss.channel.link}"
puts "Total entries: #{rss.items.size}"

Save this file as parserss.rb and execute it from the command line:

%>ruby parserss.rb
Title: W. Jason Gilmore
RSS URL: http://www.wjgilmore.com
Total entries: 10

Retrieving and displaying the various posts is just as easy. To see this capability in action, add the following snippet to the end of the file:

rss.items.each do |item|
   puts "<a href='#{item.link}'>#{item.title}</a>"
   puts "Published on: #{item.date}"
   puts "#{item.description}"
end




Page 1 of 2



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Rocket Fuel