CouchDB is an interesting implementation of a schema-less data store. It supports client applications through HTTP and a REST-style API. I don’t use CouchDB’s support for replication, using it instead to store structured data. While I sometimes run CouchDB locally during development, I like to keep CouchDB running on a low-cost VPS instance that I access interactively and from client applications. (I will refer to data instances as “documents” in this article.)
When you have mastered how to use the Heroku platform to deploy and manage Rails web applications, you can choose CouchDB to use on the backend. Using a simple Rails app, Note Taker with Search (see the previous article in this series, “Deploying a Rails Application to Heroku“), I will demonstrate how to use CouchDB, based on my own use of this data storage and management tool. (The code download for this article contains all the examples in the directory note_taker_couchdb, and you should extract them and work along with me through every example.) I will use a combination of the APIs in the couchrest gem with direct REST-style calls using the simplehttp gem.
Because CouchDB is built so well on top of HTTP and REST, it often seems most natural to me to simply make direct REST calls to a CouchDB service and use the json gem to process the returned JSON data—”close to the metal” instead of using the APIs provided by the couchrest gem.
A particularly interesting CouchDB attribute is its versioning system. CouchDB never discards old versions after adding new data. Rather, it creates new versions of documents by reusing the ID of a document and updating the document’s version number. Old versions are left intact. If you are concerned about wasted disk space, don’t be: CouchDB also uses a lot of disk storage for indexes, and disk space is inexpensive.
You create indexes on documents by writing map/reduce functions in JavaScript and adding them to databases. The map/reduce functions that you write define what data can be searched for efficiently. The general topic of writing CouchDB map/reduce functions is beyond the scope of this article, but I will walk you through the function I defined for the next example. The database for this example is notes. I have only one type of data document in the notes database, and the document type is also called notes. In all further discussions, whenever I refer to notes I mean documents.
I write map/reduce functions for two types of views on the notes documents:
- words: used in note titles and content
- users: defined by user IDs in notes that specified who wrote the note
In this example, you are allowed to see only notes that have the same user ID as that set in a session when you login to this web application. CouchDB uses JSON to store data, so your notes documents will be stored internally as JSON. Map/reduce functions are also expressed as JSON with the JavaScript code in embedded strings. I don’t much like this notation, but it is only a minor annoyance. Document IDs are specified by the hash key _id, and documents containing map/reduce JavaScript functions for defining views have ID names starting with _design; for example:
{
“_id”: “_design/notes”,
“language”: “javascript”,
“views”: {
“words”: {
“map”: “function(doc) { var s = doc.title + doc.content;
var words = s.replace(/[0123456789!.,;]+/g,’ ‘).toLowerCase().split(‘ ‘);
for (var word in words) { emit(words[word], doc._id); } }”
}
“users”: {
“map”: “function(doc) { if (doc.user_id) { emit(doc.user_id, null); }}”
}
}
}
Neither of these views required a reduce function. The function emit writes a key/value pair. It is fairly common to see null for either the key or value. In the view users, I only need all user IDs as keys because I specify a null value for each key/value pair; I only need the keys. Interestingly, the user IDs for the view are culled from the notes documents and there is no separate document type for users.
To help you understand the views created by these JavaScript functions, take a look at some examples of REST calls to access the two views I just created (note that %22 is a “ (quotation mark) character in URL encoding):
- To get all words: http://localhost:5984/notes/_view/notes/words
- To search for documents containing a specific word: http://localhost:5984/notes/_view/notes/words/?key=%22java%22
- To list the first 11 docs (including views): http://localhost:5984/notes/_all_docs?limit=11
- To get note docs by user ID = “1”: http://localhost:5984/notes/_view/notes/users/?key=%221%22
Numbers 2 and 4 are the most interesting, because they filter on specific key values. Also, notice in example number 3 that although the query would return all documents of type notes, I set a limit of returning 11 documents.
Author’s Note: Using CouchDB seems natural to me because it is built with tools and concepts that I know, such as REST-style calls and JSON storage. I have been using CouchDB for almost a year, and unlike simpler key/value stores like memcached, Tokyo Cabinet, and Redis (which does offer some structure like lists and sets), document-oriented data stores like CouchDB are a more natural fit for most of my work. That said, I try to choose the best tools for each specific job and you obviously should too.
In all these examples, the returned data is in JSON format. CouchDB provides a web interface called Futon (see Figure 1 for a screenshot of me inspecting the document that defined the map/reduce functions for the two views I need in this example).
Figure 1. Using Futon to Inspect Two JavaScript Views: Here is a screenshot of me inspecting the document that defined the map/reduce functions for the two views.
At the bottom of the screenshot, I have nine versions of the implementations of these views. Futon makes it easy to go back and review changes in old versions. The screenshot in Figure 2 shows an edit view in Futon that allows you to modify a document and save it as a new version:
Figure 2. Using Futon to Edit One JavaScript View: Here is an edit view in Futon that allows you to modify a document and save it as a new version.
The screenshot in Figure 3 shows me using Futon to view a note. Notice that there are no data items for “words.” Those are defined in an index and show themselves only when the user performs a search.
Figure 3. Inspecting a Note Document: Here is a screenshot of me using Futon to view a note.
I seldom use Futon for editing or creating documents, although I did use Futon to define my views. I write almost all of my CouchDB client code in Ruby.
Now you can look at the changes you need to make to the MongoDB-based web application (from the previous article in this series) to use CouchDB instead.
Require three gems in your environment.rb file:
config.gem ‘postgres’
config.gem ‘couchrest’
config.gem ‘simplehttp’
Also set two global variables at the end on your environment.rb file:
# setup for CouchDB
COUCHDB_HOST = ENV[’COUCHDB_RUBY_DRIVER_HOST’] || ‘localhost’
COUCHDB_PORT = ENV[’COUCHDB_RUBY_DRIVER_PORT’] || 5984
Most of the code changes are in the Notes model class. First, notice that this Notes class is not derived from ActiveRecord:
class Note
attr_accessor :user_id, :title, :content
def to_s
“note: #{title} content: #{content[0..20]}…”
end
Using mostly low-level, REST-style calls to CouchDB, I will manually implement the behavior in the ActiveRecord version from the PostgreSQL-backed example (Part I) and the MongoRecord::Base version from the MongoDB-backed example (Part II).
The next method is used to create a new note document. This code is simpler than the MongoDB article (where I had to create a document attribute that was a list of words in the document), but you pay for some of this simplicity by having to write the JavaScript view functions. Here, I use the higher-level save_doc API from the couchrest gem:
def Note.make user_id, title, content
@db ||= CouchRest.database(“http://#{COUCHDB_HOST}:#{COUCHDB_PORT}/notes”)
@db.save_doc({‘user_id’ => user_id.to_s, ‘title’ => title, ‘content’ => content})[’id’]
end
The next method implements a search function. I tokenize the search string and for each token make a REST-style call to get all of the document IDs that contain the word. These results are stored in the hash table score_hash (keys are the document IDs, and the values are counts of how many times a search token is found in the corresponding document). I sort the hash table by value and return the documents in JSON hash table format in sort order:
def Note.search query
@db ||= CouchRest.database(“http://#{COUCHDB_HOST}:#{COUCHDB_PORT}/notes”)
tokens = query.downcase.split
score_hash = Hash.new(0)
tokens.each {|token|
uri = “http://localhost:5984/notes/_view/notes/words/?key=%22#{token}%22”
JSON.parse(SimpleHttp.get(uri))[’rows’].each {|row| score_hash[row[’value’]] += 1}
}
score_hash.sort {|a,b| a[1] <=> b[1]}
score_hash.keys.collect {|key| @db.get(key)}
end
Note: This implementation of method search would be very inefficient for search strings with many words, because a REST call would be made for each search word. Compare this to the MongoDB version of method search, where a single call is made and the entire query is performed on the server (in fast C++ code).
The next method returns all notes in the data store with a given user ID. I build a GET request URI and then use the simplehttp and json gems to get the documents as an array of JSON hash tables:
def Note.all user_id
@db ||= CouchRest.database(“http://#{COUCHDB_HOST}:#{COUCHDB_PORT}/notes”)
uri = “http://localhost:5984/notes/_view/notes/users/?key=%22#{user_id}%22”
JSON.parse(SimpleHttp.get(uri))[’rows’].collect {|hash| @db.get(hash[’id’])}
end
The following method returns a note with a specific ID. In contrast to the last method, I use a low-level API from the couchrest gem instead of building a request URI and manually performing the REST call:
def Note.find id
puts “** Note.find id=#{id}”
@db ||= CouchRest.database(“http://#{COUCHDB_HOST}:#{COUCHDB_PORT}/notes”)
@db.get(id)
end
end
The controller code is almost identical to the first two Rails examples in this article. Calling the search method you just saw performs the search:
notes = Note.search(params[:search])
All notes with a specific user ID are found and passed to the scaffold view:
@notes = Note.all(session[’user_id’])
Two Choices: Run CouchDB on Your Own Server or Use a Hosted CouchDB Service
For development, I run an “unofficial” all-in-one CouchDB application on my MacBook. I also run CouchDB on two of my servers for both testing and deployed applications. See the Resources section for links for installing CouchDB and implementing simple HTTP authentication.
You may simply want to use a commercial CouchDB service like Cloudant. Using either MongoHQ (for MongoDB services) or Cloudant is a great fit for web applications hosted on Heroku: leave the management of your Rails application to Heroku and the management of your data to Cloudant or MongoHQ. I expect to see more commercial MongoDB and CouchDB service providers in the future so you can shop around for the best price.
Just as I did for the MongoDB example, I like to set access information for a local or remote CouchDB server using environment variables; for example:
export COUCHDB_RUBY_DRIVER_HOST=xxxxxxx.com
export COUCHDB_RUBY_DRIVER_PORT=5984
I change these values for whichever CouchDB server I am using. As with the MongoDB example, it is easy to pass CouchDB connection parameters to a deployed Heroku application:
$ heroku config:add COUCHDB_RUBY_DRIVER_HOST=xxxxxxx.com
COUCHDB_RUBY_DRIVER_PORT=5984 COUCHDB_PASS=password COUCHDB_USER=notessclient
Adding config vars:
COUCHDB_RUBY_DRIVER_HOST => xxxxxxx.com
COUCHDB_RUBY_DRIVER_PORT => 5984
COUCHDB_PASS => password
COUCHDB_USER => notesclient
Restarting app…done.
Wrapup
I enjoy doing “bare metal” deployments to leased servers or VPS solutions like Amazon EC2, RimuHosting, Slicehost, etc. That said, sometimes it simply does not make economic sense to create custom deployments and administer your own servers. In those cases, three options are available:
- Use your own servers
- Use cloud deployment servers
- Use a hybrid of cloud services and your own servers
I hope this article provided you with the information necessary to make a wise deployment choice. You also have a few deployment and CouchDB tricks in your toolbox now.
Code Download
For Further Reading
- Heroku web site
- CouchDB
- “Deploying with Git” (from Heroku web site)
About the Author
Mark Watson is a consultant living in the mountains of Central Arizona with his wife Carol and a very feisty Meyers Parrot. He specializes in web applications, text mining, and artificial intelligence development. He is the author of 16 books and writes both a technology blog and an artificial intelligence blog.