dcsimg
December 2, 2016
Hot Topics:

Comparison Study of MongoDB, DocumentDB, & HD Insights

  • April 8, 2015
  • By Uma Narayanan
  • Send Email »
  • More Articles »

Introduction

With the increase in the number of mobile devices, the volume of unstructured data has also increased and NoSQL seems to be a viable solution. The NoSQL database has better performance over relational DBs. It also helps in rapid development and makes them available even on low-end systems.

With so many NoSQL options available, in this article we will try comparing three of them: MongoDB, DocumentDB, and HDInsights. We will do the comparison study feature wise and see how it is offered in these.

Features

MongoDB & DocumentDB are for transactional processing and HDInsights is basically for analytics. HDInsights is a Hadoop system hosting on the cloud. Let's take a look at the features in detail.

Data Model

Both MongoDB & DocumentDB use a flexible data structure. MongoDB uses a data format named 'BSON' that is a binary format of JSON data and DocumentDB uses the JSON format to save data. In both DBs, the data is stored as documents. Both have a reserved 'ID' that takes a GUID to represent a unique record. MongoDB has a reserved field named '_id' whereas DocumentDB has the field named 'id'.

Hosting Features

MongoDB is available in both 'on-premise' and cloud hosted options. DocumentDB and HDInsight are hosted on the Azure Cloud.

Scalability

Mongo supports scalability by adding a number of nodes by using scripts. Once they are added, one of the servers is treated as a primary server that supports read-write operations and all the secondary nodes are used for read operations. An odd number of servers is generally configured in the farm, where one acts as a primary server, the other acts as a secondary server and the third acts as an arbiter. This arbiter server is used to promote any of the secondary servers to primary when the primary server goes down.

DocumentDB is hosted on the Azure Cloud and all the servers support read-write operations. The cluster is managed by using the Azure hosting methods.

Both enable vertical scaling and horizontal scaling; in other words, sharding. MongoDB supports sharding by using 'shard clusters'.

Query Types Supported

Mongo provides system-defined methods and operators to do operations such as aggregation and filter. Mongo provides a 'Find' method that takes a criterion and a number of fields to return. It also supports operators like '$in', '$gt', and '$lt' when applying a filter. It also supports a search of nested structures. To do a groupby and having clause, it uses operators like '$match' and '$group'. It also supports operators like '$geoNear' to take advantage of geoSpatical indexes. It also supports $sum and $avg when used with group by ($group) operations.

DocumentDB supports the creation of stored procedures, trigger, and user defined functions (UDF) using JavaScript. It uses SQL-like statements to retrieve the data. It also supports joins within the document that has nested structure to apply filter on the data. The major disadvantage is that it doesn't provide any group by options or methods like sum and average. Users have to write custom logic to achieve this.

Consistency & Availability

MongoDB uses ACID properties at the document level. ACID properties ensure the document is safely updated; in case of any errors, the operation is rolled back. MongoDB allows developers to specify write concerns; in other words, if there are multiple secondary servers, the write-concern can be mentioned to specify how many secondary servers are updated before the user is confirmed about the update.

DocumentDB also uses ACID properties at the document level. It defines different consistency levels to determine how the read is executed after a write operation is performed.

MongoDB is designed to make a secondary server primary if a primary server goes down. This is automatically done and requires no manual intervention. DocumentDB uses an Azure feature to manage the availability of servers.

Management & Operations

Azure provides a web interface to manage and monitor a DocumentDB account. It provides an option to monitor the usage—charts—and also allows the modifying of the metrics based on needs.

In MongoDB, the ops manager enables monitoring. It also provides charts, dashboards, and customized alerts to monitor usage and also customized metrics.

All this while, we saw comparisons of DocumentDB & MongoDB. These two support transactional data whereas HDInsights supports read operations. HDInsights is hosted on Azure, so it also uses Azure features for scalability, availability, management, and and operations. HDInsights and MongoDB both support map reduce queries. The performance based on the volume of data supported by HDInsights is better than MongoDB.

Summary

In this article, we compared MongoDB, DocumentDB, and HDInsight. MongoDB and DocumentDB support transactional data and HDInsight is purely for analysis; it doesn't support transactions. HDInsight and MongoDB both support map reduce queries. HDInsight's scores are higher in executing map reduce queries and the volume of data it can handle as compared to MongoDB.

MongoDB scores better as compared to DocumentDB because it supports aggregate functions whereas DocumentDB doesn't support them yet. An integration of MongoDB with Hadoop will open more options where the benefits of both transactional system and the analytics can be leveraged.

References


Tags: Hadoop, NoSQL, database, JSON, Azure, business intelligence, BI, Azure Cloud, Mongo DB, DocumentDB, BSON, HDInsights, GUID




Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel