NoSQL for the MSSQL Soul
The buzz around NoSQL solutions has reached stratospheric heights. This buzz has led many Microsoft SQL Server development shops to start looking at how and where a NoSQL solution can fit into their business.
NoSQL addresses some traditional struggles of relational databases to allow more financially viable storage of larger databases and making high scaling interactive applications possible.
Relational databases struggle with massive datasets. Once a relational database gets to multi-gigabyte size you have to start buying high end hardware. Once a database gets to terabyte size you need to buy exotic hardware that is carefully maintained by a team of DBAs who will constantly be re-indexing, tuning, backing up and monitoring your database to keep it running. Once you get beyond this size the cost of the hardware and human maintence skyrockets. The cost curve of maintaining a relational SQL database is a hockey-stick graph of exponential cost as the size goes up.
This is where the buzzword Big Data and NoSQL intersect. In a relational SQL world, decisions have to be made about what you will track and in what quantity in order to keep data sizes manageable. In a NoSQL world, where there is no practical upper limit to the amount of data you can store and track, completely new things become possible like tracking everything.
NoSQL databases are designed to be run on commodity hardware as highly parallelized cluster able systems that can scale linearly. This means if you need to store 10% more data, you can add 10% more hardware at almost any scale. Key/Value databases in particular are extremely popular among the highest scaling web sites on the internet because they can provide lightning quick speeds even as the number of concurrent users a web site is serving explodes in popularity.
So what is NoSQL?
NoSQL is a broad term for any kind of database that doesn’t use SQL or in the case of some products has been expanded out to refer to “Not only SQL”.
There are multiple types of NoSQL databases but they mainly fall into two camps; Key/Value stores and Graph databases. This is an oversimplification but it is a good starting point for further exploration.
A key/value database stores documents, objects or simple values looked up primarily using a single key called a Document ID in most databases of this type. Most of these types of databases utilize Map/Reduce to produce a view, which is functionally similar to a SQL index. There are variations on this theme including document, multi-value, tuple and more but these type of stores share the common thread that they are primarily optimized to allow for the efficient retrieval object-like structure by a single key.
Graph databases store data in terms of nodes and relationships. Graph databases require an entry point that is found through some sort of strong key relationship or index and is then explored relative to that entry point. Graph databases excel at friend-of-a-friend (FOAF) queries and data that is best described in terms of how it relates to other pieces of data such as social networking.
One of the core philosophies of NoSQL in general is to use the right kind of database for the right kind of problem. There is no one SQL/NoSQL database that can both support real time applications scaling to millions of concurrent users while providing real time shopping cart analysis of that same user base. You will need to devote some time to understanding each kind of SQL and NoSQL database and think about the kinds of problems your business needs to address in order to find the right mix of tools to solve your business needs.
Most NoSQL databases utilize Map/Reduce to locate data. This does not work the same as putting an index on a column in Microsoft SQL Server because the data stored in a NoSQL database does not have columns to index.
The map function is run on each record in your database to pull out a particular piece of data you want to be searchable. For example, if you had a record containing a customer and their related orders and you wanted to be able to search your system by OrderID, you would write a map function that for each record in your database would iterate over your orders collection and map the OrderID. Your application could then search by OrderID in order to get the document ID belonging to the customer.
The reduce function is run on each record in your NoSQL database to summarize the data contained in that record. On a customer/orders database this could be the sum totals of all of a particular customer’s orders.
NoSQL isn’t SQL
It says it right in the name. If your business is looking for something that works just like Microsoft SQL Server except that it allows you to scale up to enormous data sizes your expectations are completely wrong and you will be disappointed. Coming from a Microsoft SQL background, your first instinct is probably to try to look for a SQL Management Studio equivalent so you can “SELECT * FROM Customer” and go from there. It doesn’t exist because NoSQL isn’t SQL.
What about Reporting with NoSQL?
A typical business report problem is sales totals by week over the last 30 days that then allows the user to slice and dice that data many different ways by filtering on sales associate, product sold and other items. NoSQL databases are not very well suited to this type of problem.
Key/Value databases can return a particular group of documents that you could then analyze, slice and dice in your C# code but that approach breaks down on larger data sets. You could potentially write a Map/Reduce view that gets you the data you’re looking for pre-crunched but you would end up writing a lot of different views to support different reports.
A graph database can provide a number of exotic views on your data that are otherwise extremely difficult in the relational world such as providing a listing of people who know people who know people who like surfing but don’t surf themselves. A graph database is not an efficient solution to the sales totals by week problem.
It is all about using the right tool for the job. SQL Server Analysis Services is an excellent tool for providing business users the ability to slice and dice their sales data.
NoSQL is an exciting new technology that is a must have to support high scaling applications or to manage massive amounts of data. If you’re ready to learn more, the next article in this series is on Couchbase. Couchbase is a key/value database that can run on windows and has excellent .NET support so Microsoft SQL Server developers will appreciate the next article in this series, “Couchbase: NoSQL in Windows”