http://www.developer.com/

Back to article

Mining Amazon.com Catalog Data with Ruby


October 6, 2010

Amazon.com introduced one of the world's first online affiliate programs in 1996, a mere two years after the company's founding. The enormous popularity of the Amazon Associates Program is widely considered to have played a significant role in the company's early growth. In 2002 the company launched a catalog API intended for use in conjunction with the Associates Program called the Amazon E-Commerce Service (later retitled the Product Advertising API).

Amazon's Product Advertising API provides developers with an interface for creating interesting new services that mine Amazon's enormous product catalog. Using this API, access to typical product information such as the, price and manufacturer is just the tip of the iceberg; it's also possible to retrieve information about the sales volume (via the sales rank), product reviewers, product descriptions, related products, and much, much more.

The popularity of this API has prompted the development of libraries that facilitate application integration using all of the most popular programming languages, among them PHP, Ruby, Perl and C#. Ruby offers a particularly powerful library known as Ruby/AWS. This library (or gem in Ruby parlance) provides an easy way to begin programmatically perusing and mining the Amazon catalog in every conceivable manner, a characteristic I recently came to fully appreciate while integrating Ruby/AWS into a new project.

In this tutorial I'll introduce you to Ruby/AWS, showing you how to use this great library to bend the Product Advertising API to your will.

Installing and Configuring Ruby/AWS

As I mentioned, Ruby/AWS is packaged as a Ruby gem, meaning you can install it via the RubyGems package manager. To install it, just open a terminal and execute the following command:



%>gem install ruby-aws

When installed, you'll need to sign up for an Amazon Web Services account in order to obtain an API key. Creating an account is free and takes only a moment. Within your account profile you'll be able to retrieve your "Access Key ID" and "Secret Access Key", which serve as your account's username and password, respectively. Ruby/AWS will look for this information within a configuration file named .amazonrc in your home directory, so create this file and copy the following contents into it:



locale = 'us' cache = false key_id = 'PASTE_YOUR_ACCESS_KEY_HERE' secret_key_id = 'PASTE_YOUR_SECRET_KEY_HERE'

If you hail from outside of the United States, you can change the locale setting, causing Ruby/AWS to consult the associated country-specific Amazon catalog. For instance, if you live in the UK, use the locale setting uk, which will cause Ruby/AWS to consult the Amazon.co.uk catalog.

Performing a Product Lookup

The Product Advertising API exposes a number of methods useful for searching the catalog in a variety of ways. For instance, you can look up a specific product according to its ASIN Amazon Standard Identification Number), search a particular category of products (Books, Music or Grocery for instance) by product title, release date, or manufacturer, and even search for a product's related items in order to further entice a prospective customer into purchasing more.

Further, to conserve bandwidth and improve performance, several lookup responses (known as response groups) can be returned with varying degrees of specificity. For instance, the Small response group returns key attributes such as the product title, ASIN, and URL. The Medium response group includes everything found in Small as well as product image URLs and the latest sales rank. Still other response groups can return information specific to bestselling items, solely images, and product reviews. (See the API documentation for a complete summary of available response groups.)

Let's work through a few examples involving Ruby/AWS, beginning with a simple item lookup based on its ASIN (incidentally, you can find a product's ASIN on its Amazon product page). Much of the following script is standard Ruby syntax, so you should pay particular attention to the ItemLookup, ResponseGroup and Request calls. Following the example I'll talk more about the role of these calls.



#!/usr/bin/ruby -w require 'rubygems' require 'amazon/aws/search' include Amazon::AWS include Amazon::AWS::Search il = ItemLookup.new( 'ASIN', { 'ItemId' => '1430231149', 'MerchantId' => 'Amazon' } ) rg = ResponseGroup.new( 'Medium' ) req = Request.new resp = req.search( il, rg ) item = resp.item_lookup_response.items.item attribs = item.item_attributes title = attribs.title asin = item.asin sales_rank = item.sales_rank publication_date = attribs.publication_date puts "#{title} was released on #{publication_date}"

Executing this script produces the following output:

Beginning PHP and MySQL: From Novice to Professional, Fourth Edition was released on 2010-09-30

The ItemLookup constructor determines what precisely we are looking for, in this case a specific ASIN. The ItemId parameter defines that ASIN, and the MerchantId parameter specifies that we're interested only in products sold by Amazon.com, rather than the array of affiliate merchants selling (or reselling) products through the site. The ResponseGroup method defines the type of response group used for the lookup results. Finally, the Request.search method executes the search.

Following a successful search you'll be able to access the product attributes using the typical dot notation used when accessing Ruby objects. As you can see from the example, part of the challenge is figuring out which attributes are stored within item_attributes and which are directly accessible.

Searching a Product Group

Suppose you want to create a service that tracks the historical sales rankings of Ubuntu-related books sold through Amazon. The Ruby script used to retrieve the latest sales rankings executes on a daily basis, and it automatically adds any new Ubuntu-related books that appear with the results of searches for books having the term "Ubuntu" in the title. To perform this task you can use Ruby/AWS' ItemSearch class, searching the Book category using the Title attribute:



#!/usr/bin/ruby -w require 'rubygems' require 'amazon/aws/search' include Amazon::AWS include Amazon::AWS::Search is = ItemSearch.new( 'Books', { 'Title' => 'Ubuntu' } ) rg = ResponseGroup.new( 'SalesRank' ) req = Request.new resp = req.search( is, rg ) items = resp.item_search_response.items.item items.each do |item| asin = item.asin sales_rank = item.sales_rank puts "#{asin}: #{sales_rank}" end

At the time of publication, this script produced the following results:



B003YL3OXM: 45853 0137081308: 49740 0137003889: 410902 B000SET66M: 64106 159327257X: 33498 0470604506: 180920 1430219998: 150251 0470485051: 82790 0307587886: 136259 0596527209: 140876

Of course, in a real world situation you would probably store the ASINs along with additional product information within a database, and then link to a ranking table that stores the historical sales ranks.

Conclusion

Amazon's Product Advertising API can be a bit overwhelming due to the sheer breadth and depth of product offerings and other information it exposes. However, libraries such as Ruby/AWS give you the ability to focus upon efficiently mining that product data rather than getting lost in a sea of irrelevant search results. If you're currently using Ruby/AWS or another API library to create interesting services, tell us about it in the comments! And as always, ping me with your questions on Twitter at @wjgilmore!

About the Author

Jason Gilmore is founder of the publishing and consulting firm WJGilmore.com. He is the author of several popular books "Easy PHP Websites with the Zend Framework", "Easy PayPal with PHP", and "Beginning PHP and MySQL, Fourth Edition". Follow him on Twitter at @wjgilmore.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date