Make MongoDB Replica Sets Work for Analytics Without Compromising Availability
By Chris Chang, Developer Advocate, mLab.
Developers looking to ensure high availability for their database deployments—and to run analytics at the same time—can accomplish this by using MongoDB replica sets. Although it is not too demanding from a technical perspective, incorrectly using a replica set can jeopardize its high availability.
One common replica set configuration uses three member nodes, one arbiter node, and a pair of data-bearing nodes. The latter two electable nodes provide redundancy, offering protection from issues like maintenance events and hardware failures that would otherwise result in downtime for a single-node deployment.
Given that this configuration has two data-bearing nodes, developers may be enticed by the idea of using that "extra," redundant capacity to serve their analytics needs by running queries on the secondary. Doing so is highly inadvisable; by performing secondary reads with just two data-bearing nodes, you risk losing the high availability that the redundant node is there to ensure. It's generally safe to use the secondary server to handle non-critical, ad-hoc queries on occasion. But, if your application requires both the primary and secondary nodes to handle its database load adequately, your replica set will no longer be able to recover if one of the nodes in the cluster goes down or becomes unavailable.
The better route for those who do need to run analytics queries on a regular basis is to use a replica set properly configured to handle them—one that utilizes hidden analytics nodes. To do this, add a node as a hidden, non-electable member of the replica set that will be used solely for analytics purposes.
Your replica set configuration contains an array of replica set members. After you run the rs.conf() command, identify the index of the member that you would like to use as your analytics node. You then can use this index to set the member to be hidden and non-electable.
cfg = rs.conf() Cfg.members[node_index].priority = 0 cfg.members[node_index].hidden = true rs.reconfig(cfg)
Hidden replica set members are beneficial in serving analytics needs for a few reasons. For starters, they maintain a copy of the primary node's data set. Minus some minor replication delay, queries to this hidden member are next to identical to what queries on the primary node would provide. Hidden members are also invisible to your application and cannot become primary, allowing for full isolation between analytics and production application traffic.
Moreover, using a hidden node for analytics can provide value as a mechanism for disaster recovery, by configuring MongoDB's slaveDelay option. Enabling slaveDelay allows you to configure a data replication delay, which is helpful for recovering from disasters—such as when a collection or database is accidentally dropped. As an example: If you configure a one-hour delay on an analytics node and a developer makes the mistake of dropping important data from the primary node, those changes will not happen on the analytics node for an hour. This means there is likely plenty of time to query the analytics node and recover that data; without the slaveDelay, the change would happen immediately and the data would be lost.
Properly configured, your MongoDB replica set will provide you with high availability resilience, an analytics solution as robust as you require, and options for capabilities such as disaster recovery that make optimally implementing your replica set that much more valuable.
About the Author
Chris Chang is a developer advocate at mLab. At mLab, Chris has provided database support to thousands of developers. Previously, Chris has held IT roles at VMware and NetApp. He is interested in developer tools, fitness, and house music. Chris holds a B.S. degree in Computer Science from the University of California, San Diego. Connect with him on LinkedIn and Twitter.
*** This article was contributed to Developer.com. All rights reserved. ***