http://www.developer.com/

Back to article

REXML: Proccessing XML in Ruby


April 18, 2007

In this day and age of software development, it is inevitable that you will need to process XML or produce XML within your application. If your language of choice is Ruby, or Rails for that matter, there is a very simple and useful XML processing API for Ruby called REXML. REXML is a pure Ruby XML processor with an easy to use API. This article will introduce REXML, and show you how to use it to do some common XML processing tasks.

Ruby: The Scripting Language that is Taking the Computing World by Storm

It is hard to imagine anyone in the programming world these days who has not heard of Ruby. The ever-increasing popularity of the Ruby on Rails web framework is helping to make Ruby the language of choice for rapid application development and testing. Ruby is an interpreted scripting language that provides quick and easy object-oriented programming and contains some neat features such as closures, blocks, and mixins. Ruby is also highly portable, running on Unix/Linux, Windows, and MacOS. For those wanting a more thorough introduction to Ruby, you can read W. Jason Gilmore's article on Ruby.

REXML: XML Made Simple for Ruby

REXML is a pure Ruby XML processor, inspired by the Electric XML library for Java, which features an easy-to-use API, small size, and speed. It supports both tree and stream document parsing. Stream parsing is about 1.5 times faster than tree parsing. However, with stream parsing, you don't get access to features such as XPath.

Getting Started with REXML

To begin working with REXML, you need to include it within your Ruby file:

require "rexml/document"
include REXML    # so that we don't have to prefix everything
                 # with REXML::...

This includes the REXML library and includes the REXML namespace, so you do not need to prefix method calls with the 'REXML' prefix.

Now, create and print a simple XML document with REXML. Enter the following Ruby code into a file named 'REXMLtest.rb' and save it:

require "rexml/document"
include REXML    # so that we don't have to prefix everything
                 # with REXML::...
string = <<EOF
   <xml>
      <element attribute="attr">My first REXML document</element>
   </xml>
EOF
doc = Document.new string

print doc

From the command line, enter the following to run 'REXMLtest.rb' and see the results:



Click here for a larger image.

You created a string containing a simple XML document. You then created a REXML Document object, which was initialized with the string. Finally, you printed out the XML document.

Tree Parsing and Accessing XML Elements

Now, parse an XML document and see how REXML provides access to the elements within an XML document. First, create an XML document, 'guitars.xml', as shown below:

<guitars title="My Guitars">
   <make name="Fender">
      <model sn="123456789" year="2006" country="japan">
         <name>62 Reissue Stratocaster</name>
         <price>750.00</price>
         <color>Fiesta Red</color>
      </model>
      <model sn="112233445" year="2006" country="mexico">
         <name>60s Reverse Headstock Stratocaster</name>
         <price>699.00</price>
         <color>Olympic White</color>
      </model>
   </make>
   <make name="Squier">
      <model sn="445322344" year="2003" country="China">
         <name>Standard Stratocaster</name>
         <price>179.99</price>
         <color>Cherry Sunburst</color>
      </model>
   </make>

</guitars>

Read in and print 'guitars.xml' using REXML. Create a Ruby file called 'REXMLFileTest.rb':

require "rexml/document"
include REXML    # so that we don't have to prefix everything
                 # with REXML::...

doc = Document.new File.new("guitars.xml")

print doc

You should see the following printed out when you run 'REXMLFileTest.rb':



Click here for a larger image.

First off, print all the colors of the guitars in this document. You do this by accessing each 'guitars/make/model/color' element of the document and printing the text contained within this element:

include REXML    # so that we don't have to prefix everything
                 # with REXML::...

doc = Document.new File.new("guitars.xml")

doc.elements.each("guitars/make/model/color")
                 { |element| puts element.text }

When you run the Ruby script again, you see the guitar colors printed out.



Click here for a larger image.

Total up the cost of all these guitars. For each price element, add it to a total, and then print the total:

require "rexml/document"
include REXML    # so that we don't have to prefix everything with
                 # REXML::...

doc = Document.new File.new("guitars.xml")

# print doc

# doc.elements.each("guitars/make/model/color")
#                  { |element| puts element.text }

total = 0

doc.elements.each("guitars/make/model/price") { |element|

   total += element.text.to_i

}


puts "Total is $" + total.to_s

When you run this script, you see the following output:



Click here for a larger image.

XPath Expressions

REXML also supports XPath. XPath provides a way to access parts of XML documents using a syntax similar to directories in filesystems of an operating system. Print the first 'model' in your list of guitars. You need to search for the first 'model' element in the XML document. To do this using REXML's XPath API, do the following:

require "rexml/document"
include REXML    # so that we don't have to prefix everything with
                 # REXML::...

doc = Document.new File.new("guitars.xml")

# print doc

firstmodel = XPath.first( doc, "//model" )

print firstmodel

XPath.first is a method that returns the first element in a collection. You use it to return the first element in the 'doc' document. You specify an XPath expression "//model", which tells XPath.first to search for all 'model' elements, starting at the root element, specified by the '//' symbol. When you run this script, you should see the following output:



Click here for a larger image.

Now, print out all the model years of the guitars. Your model years are stored as attributes named 'year' in your 'model' elements. You use the XPath.each method, passing in the XPath expression "//model/attribute::year".

require "rexml/document"
include REXML    # so that we don't have to prefix everything with
                 # REXML::...

doc = Document.new File.new("guitars.xml")

XPath.each( doc, "//model/attribute::year")
          { |element| puts element }

When you run this script, you should see the following output:



Click here for a larger image.

Updating a Document with REXML

Now that you have learned how to access XML document elements with REXML, it's time to do some updates to the document using the REXML API. You will use the API to add a new guitar 'make', with one 'model' in it. Enter the following into a new file called 'REXMLUpdateTest.rb':

require "rexml/document"
include REXML    # so that we don't have to prefix everything with
                 # REXML::...

doc = Document.new File.new("guitars.xml")

root = doc.root

make = Element.new "make"
make.attributes["name"] = "Gibson"

model = Element.new "model"
model.attributes["sn"]      = "99999999"
model.attributes["year"]    = "2007"
model.attributes["country"] = "USA"

model.add_element "name"
model.elements["name"].text  = "SG"
model.add_element "price"
model.elements["price"].text = "1250.00"
model.add_element "color"
model.elements["color"].text = "Red"

make.add_element model

root.add_element make

print doc

You began by getting the root element of the document, storing it in a variable 'root'. You then created your 'make' and 'model' elements. Note that you created both attributes and elements within the 'model' element. You then added the 'model' as a child element of 'make', and added 'make' to the document root. Running the script, you should see the following:



Click here for a larger image.

You now have added a new make to your guitar list, with a model in it.

Conclusion

This article took a look at the REXML library and showed how it can be used to process XML within your Ruby or Rails application. Like most things in Ruby and Rails, getting up and running with REXML is both simple and intuitive. REXML makes adding XML support to your application a breeze, with a quick learning curve. There are many more functions provided by REXML, so give it a good look and see what it has to offer you.

References

About the Author

Dominic Da Silva (http://www.dominicdasilva.com/) is the President of SilvaSoft, Inc., a software consulting company specializing in Java, Ruby, and .NET-based web and web services development. He has worked with Java since the year 2000 and is a Linux user from the 1.0 days. He is also Sun Certified for the Java 2 platform. Born on the beautiful Caribbean island of Trinidad and Tobago, he now makes his home in sunny Orlando, Florida.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date