DatabaseIndexes for Big Data - Map/Reduce Explained

Indexes for Big Data – Map/Reduce Explained

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Couchbase is a powerful key/value NoSQL database that allows nearly unlimited scaling and provides excellent support for the .NET platform. The previous article in this series “Introducing Couchbase” covered how to install Couchbase, create buckets and persist objects as JSON in Couchbase. It also covered how to retrieve objects by their Document ID, which is highest performance and the primary means of looking up objects. This article takes a look at how to utilize Map/Reduce to create Couchbase views so you can look up records by data other than the Document ID of the record.

Views In Couchbase

A Couchbase View is generated by a few javascript functions that take the inconsistently structured data in the bucket and structure it into a searchable structure in order to locate the DocumentIDs of records. A view can also contain derived data such as summaries or calculations. Views also allow you to sort the data that exists inside of your bucket.

To add a view, click the “Views” button next to your data bucket and inside of the development area choose “Create Development View”. Couchbase will prompt you to name your view then present you with a development console where you can create your map and reduce JavaScript functions. After you change your code each time you can click “Save” then “Show Results” in order to see a sample of the output of your view.

Once you have your view outputting the data you want, you will need to publish it by going back to the Views screen and clicking “Publish”. When you publish a new view on a large dataset it will take time for Couchbase to run the map and reduce functions across all of the records in your bucket, so your view might not be fully built out right away.

Writing Map Functions

Because Couchbase stores your objects as JSON each document could be different. Some might have properties that others do not have. Some might have complex objects as children such as order records. In a larger application you might have Customer documents living in the same bucket as Book and Author documents.

All of this inconsistent data needs a mechanism to point out to the view how to reach into the document and pull out the properties you are interested in searching on. This is fundamentally all a map function does.

The map function takes two parameters. The doc parameter contains the JSON document and the meta parameter contains some meta-data about your document such as when it was created, what its document id is and when it is set to expire.

Your map function should call “emit” for each separate element of your document you want to expose to your view. Each time you call emit it will create another row inside of the view. A single document can contain multiple rows in your view or none at all if you don’t want a particular document to show up in your view.

For this example, the Couchbase document being processed is below.

{
   "BirthDay": "/Date(-62135578800000-0500)/",
   "CustomerName": "David Talbot",
   "OrderHistory": [
   {
   "OrderNumber": "d339b97a-bf7f-41b5-a972-8d1dea9f9773",
   "Total": 500
   },
   {
   "OrderNumber": "57764ce2-31b1-4f87-902d-4a7106ce1f5c",
   "Total": 221.25
   }
   ]
  }

To create a view that allows a user to locate a customer by order number, you will need to create a view that emits a record for each OrderNumber. This is done by iterating through each OrderHistory record in the document and emitting the OrderNumber as the key. This will create a view you can use to look up Customer records by OrderNumber.

function
  (doc, meta) {
   for(var i=0; i<doc.OrderHistory.length; i++) {
   emit(doc.CustomerName, doc.OrderHistory[i].Total);
   }
  }

Most application’s search functions need to show results beyond just the single value being searched. In this example, if the user is searching for Customers by OrderNumber your UI will need to show the CustomerNames of all of the matching records. It would be very inefficient to get 100 DocumentIDs back from your view and then re-query Couchbase for each Customer record just so you can display the CustomerName. In order to enable you to construct the view you need, the second parameter of emit is the “value” of the view.

This value can contain simple text you want to return to your user interface or it can contain key numbers or other data you want to pass to your reduce function for further processing.

Reduce

The reduce function performs calculations or summarizes the data in your view. The most simple reduce functions are built right in to process automatically for “sum”, “count” and “stats”. Anything more complex than that and you will need to write a custom reduce function to process the values emitted in your map function. The kinds of reduce functions you can write to summarize your data are limited only by your programming capability.

The reduce function passes in a key object, value object and rereduce Boolean. The key and value object match the types defined in your map function. If your key was a single string, that is what will be passed to your reduce function. If you set an array of objects as the key in your map function, that is what will be passed to your reduce function. The rereduce boolean indicates weather or not this is the first pass on reducing this document or if it is a re-reduction.

function(key,
  values, rereduce) {
   var totalWithTax = 0.0;
   for(i=0; i < values.length; i++) {
   totalWithTax = totalWithTax + values[i] + (values[i]*0.0825);
   }
   return totalWithTax;
  }

Once you add a reduce function in Couchbase, the preview at the bottom will consistently return null as the key. This is actually by design because the preview is not grouping the results the same way after a reduce has been applied. To see your results with the properly grouped keys, add group=true on a rest query or as a parameter on .NET executed queries.

Conclusion

Map/Reduce is a simple and powerful way to make your NoSQL database searchable and to add powerful analytics to your solution.

About the Author:

David Talbot has over 14 years of experience in the software industry and specializes in building rich UI web applications. He is also the author of Applied ADO.NET and numerous articles on technology.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories