dcsimg
September 25, 2016
Hot Topics:

NoSQL Database Comparison: MongoDB, Apache Cassandra, and Couchbase

  • February 5, 2016
  • By Deepak Vohra
  • Send Email »
  • More Articles »

NoSQL Databases is a relatively new genre of databases. NoSQL does not imply no SQL at all. In fact, most NoSQL databases support an SQL-like query language. NoSQL is for Not Only SQL. NoSQL databases differ from relational databases (RDBMS) in that they are based on a flexible schema (or schema-free) data model. A relational database such as Oracle database or MySQL database has a fixed table structure with a fixed number of columns and pre-specified column types. Data added to a RDBMS table must conform to the table definition. In contrast, a NoSQL data store could store data of variable structure. One row or document could have a different data structure from another. Different NoSQL data stores support different data models. The main data models are: Document store, Key-value store, and Wide column store. Document stores are document-oriented database systems, most of which are based on the JSON document model. Key-value stores are based on a data model in which data is stored as key/value pairs. Wide column stores are somewhat similar to the table structure of a relational database in that data is stored in rows and columns, but the columns (column names and column types) are not fixed. We shall compare three commonly used NoSQL databases: MongoDB, Apache Cassandra, and Couchbase.

Data Collection

Just as a table is stored in a database in a RDBMS, NoSQL data stores provide a top-level namespace or container for storing data. In MongoDB, the equivalent of a relational database table is a "collection," which contains one or more documents. A MongoDB "database," which is the top-level container, consists of one or more collections. In Couchbase, the equivalent of a RDBMS database is called a "bucket." A bucket could have one or more documents. In Apache Cassandra, a "keyspace" defines a top-level namespace for tables.

Data Model

The equivalent of a RDBMS table is a MongoDB collection and equivalent of a RDBMS table row is a MongoDB document. MongoDB is based on the document store data model in which a document is stored as BSON format. BSON format is binary JSON format. A MongoDB document consists of fields and values and each document could have different or the same fields as another. For example, the following could be a MongoDB document stored in a collection.

{
   journal: 'Oracle Magazine',
   publisher: 'Oracle Publishing',
   edition: 'January February 2010,
}

A different document in the same collection could have different fields. For example, the following could be another document stored in the same collection as the first.

{
   journal: 'Oracle Magazine',
   edition: 'January February 2010,
   section: 'Oracle JDeveloper',
   title: 'Installing JDeveloper'
}

Although a document in the same collection could have completely dissimilar fields, similar documents are usually grouped together. A MongoDB document field value could be any of the BSON data types such as Double, String, Object, Array, and Binary data.

The Couchbase data model is based on the JSON document store. Couchbase data is stored as JSON documents in data buckets. As for the MongoDB's BSON format, Couchbase does not have a fixed schema. One JSON document could have different fields from another. Couchbase document data types are the JSON data types such as strings, boolean, and arrays. What makes the JSON and BSON data models flexible is the nested hierarchical structures, including nested arrays and objects supported by JSON.

Apache Cassandra's data model is a wide column model in which columns are grouped into a column family. Cassandra is not totally schema-free in that metadata for columns in a column family could be pre-specified. Two types of column families are feasible: static column family and dynamic column family. In a static column family, the column metadata, column names and types, are specified when a column family is created. In a dynamic column family, the column metadata is not pre-specified and an application may define any columns. In a static or dynamic column family, each row could have different columns. The only schema requirement is that each row have a row key, which is the equivalent of a primary key in a RDBMS table, and its type. In fact, a Cassandra table could consist only of row keys and no columns in any row. An Apache Cassandra column family may also be called a table; both CREATE TABLE and CREATE COLUMNFAMILY commands are available. The following is a comparison of a RDBMS table and an Apache Cassandra table (or column family).

NoSQL
Figure 1: A comparison of a RDBMS table and an Apache Cassandra table

CLI

Each of the three NoSQL databases provides a command line interface (CLI). MongoDB provides the mongo shell, which may be started with the mongo command after the MongoDB server has been started with the mongod command. The mongo shell provides several command helpers to list the different artifacts of a database, such as show collections, show dbs, and show users. The use <db> command sets the database to use. Couchbase provides a more varied set of command line tools than the other two NoSQL databases. For example, the couchbase-cli tool performs operations on an entire cluster, the cbbackup tool creates a backup of data, cbdockloader tool loads documents, cbrestore tool restores data from a backup, and cbtransfer tool transfers data. The Apache Cassandra CLI utility may be used to run DDL and DML operations on a database and may be started with the cassandra-cli command.

Query Language

Each of the three NoSQL databases provides a query language. MongoDB provides JavaScript methods for CRUD operations on a database. For example, the db.collection.insert() method adds a document to a collection. Replace the "collection" in the method with the collection name; for example, db.catalog.insert() would be used to add a document to the catalog collection. Similarly, the db.collection.find() method is used to find document/s. The db.collection.update(query, update, options) method is used to update a document and db.collection.remove() removes a document. Couchbase provides the N1QL query language, which is an SQL-like query language for JSON. The result of a N1QL statement is a JSON document. Although earlier versions had to install the N1QL engine separately, Couchbase 4.0 has added a built-in support for N1QL. Apache Cassandra provides the Cassandra Query Language (CQL), which is an SQL-like language for Cassandra but the "table," "row," and "column" of CQL are different than those of a RDBMS table.

Java Driver

All three NoSQL databases provide a Java driver and three support synchronous and asynchronous operations. One of the concerns of early versions of the MongoDB Java driver was that the default write concern was "Unacknowledged," which created the potential for dropping data because the write operations did not wait for acknowledgement. Starting with MongoDB 3.0, the default write concern was modified to "acknowledged" and the mongod confirms all the write operations, which removes the possibility of dropping data.

Spring Data and Java EE Support

Spring Data provides POJO-based templates and repository support for all three NoSQL databases. For example, the org.springframework.data.mongodb.core.MongoTemplate class and similarly the org.springframework.data.repository.CrudRepository interface could be used to perform CRUD operations on MongoDB. Kundera, an object-database mapping framework for NoSQL databases, supports MongoDB and Apache Cassandra, but not Couchbase.

Scripting Language APIs

All three NoSQL databases provide support for common scripting languages such as PHP, Ruby, and JavaScript (Node.js).

Admin Console

Only Couchbase is packaged with a GUI Admin Console. Third-party GUI interfaces for MongoDB and Apache Cassandra are available.

Import/Export Tools

MongoDB provides the mongoimport and mongoexport tools to import and export respectively from/to JSON, CSV, and TSV data files. Couchbase provides the cbtransfer tool to transfer data to and from files and between clusters. Apache Cassandra provides the sstable2json utility to export a table to a JSON document, and the json2sstable utility converts a JSON representation of a table to a Cassandra usable format.

Apache Hadoop Connector

A Couchbase Hadoop connector is available for connecting to a Couchbase cluster and transferring bucket data to HDFS or Hive. The Couchbase Hadoop connector is used with command line tools provided by Apache Sqoop. A MongoDB Connector for Hadoop may be used to transfer data between MongoDB and Hadoop. Apache Cassandra does not provide a Hadoop Connector tool.

Apache Flume Source or Sink

Apache Flume distribution does not include built-in support for a source or sink for any of the three databases. But, third-party support for Flume sinks for each of the three databases is available.

Apache Hive Storage Handler

Apache Hive does not provide built-in support for storage handlers for any of the three NoSQL databases. But, third-party Hive storage handlers for Apache Cassandra and MongoDB are available.

About the Author

Deepak Vohra is a Web Developer and recently published a NoSQL book on each of the three NoSQL databases: MongoDB (http://www.apress.com/9781484214350?gtmf=s), Apache Cassandra (http://cengageptr.com/Topics/TitleDetail/1305576764), and Couchbase (http://www.apress.com/9781484215999?gtmf=s).


Tags: NoSQL, MongoDB, MySQL, database, SQL, JSON, RDBMS, Apache Cassandra, Couchbase, query language, CLI, BSON, command line interface, N1QL: SQL for JSON




Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel