Choosing the "Right" DBMS Engine
We've had another flurry of developers on the newsgroups and forums asking which database engine they should use. It seems that some of the folks asking this question don't seem to have a very clear understanding of the available choices—and there are myriad ways to store data—it's no wonder they're confused. And no, not everyone needs a full-blown (or half-blown) DBMS engine to store their choir roster. Sometimes, a 3x5 card file works better than any electronic DBMS on the planet—and it stays "up" when the power has failed. Personally, I've resisted the temptation to manage my own choir roster on SQL Server—I use an Access/JET database. This article attempts to help developers understand how to choose the right data storage method for their specific set of circumstances and resources. It's extracted from my new book Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition).
No, I'm not going to tell you which DBMS is best because you won't like the answer: "It depends." To keep this article short, I'm only going to focus on the various data management engines offered by Microsoft because those are the engines I've been using and writing about for over a decade.
These developers (or "paradevelopers" who manage data without formal computer science training) eventually come to the realization that their existing database or data storage system isn't working and are motivated to upgrade. Let me review some of the typical reasons (or rationalizations):
- Sometimes it's because their boss tells them that they need to make data more secure. Developers at the Veteran's Administration are probably doing that as we speak and so are lots of other shops big and small dealing with government-mandated security constraints or the results of a hacked data store.
- Perhaps the existing data storage system has exceeded its capacity or the capacity of the system to support it. This could mean the database can't store the volume of data, can't support the number of users trying to access it, or the company has run out of places to store the 3x5 card boxes. This might also mean that the application or database is so poorly designed that no amount of upgrading or tuning can cure its ills.
- Sometimes they need to choose a way to share data among co-workers or to publish data in a way that's not only secure but easily implemented with the existing infrastructure. Increasingly, companies are finding new requirements to share data or simply expose it to their customers or the public. Once a useful database is created, it can spread uncontrollably like a virus or a video on YouTube. That's the downside of data sharing.
- Perhaps the data source has changed or a new source of data has been introduced that does not work with the existing system. Data import and export issues are a perennial problem often introduced as companies and departments merge. Many developers spend much of their bandwidth writing import/export/transform utilities simply to move data from place to place.
- Sometimes the choice of a data storage system is a function of long-term study, skill, experience, or simply ignorance. All too often, you hear about people (and companies) that choose their DBMS because that's the engine they know—or it's the one chosen by the CEO's son-in-law. When choosing any tool or DBMS engine, developers, consultants, managers, and relatives all need to make an effort to understand the strengths and weaknesses of all the alternatives.
What Are the Choices?
SQL Server comes in many editions—at least one of which is not really SQL Server as you know it. At least three of the versions are free or very nearly so. Let mehighlight them before getting started on which makes the most sense for your requirements and constraints.
|SQL Server Edition||Price||Database Size||CPUs/RAM Supported||64-bit||Notes|
|Express||Free||4GB||1/1GB||WOW||Included with Visual Studio and Visual Basic|
|Express AS||Free||4GB||1/1GB||WOW||Reporting Services and other advanced services/tools|
|Developer||$50||No limit||NL/OS Limit||Yes||For development only|
|Workgroup||$3.9K||No limit||2/3GB||WOW||Merge Publisher (25)|
|Standard||$5.9K||No limit||4/OS Limit||Yes||Merge Publisher (N)|
|Enterprise||$25K||No limit||NL/OS Limit||Yes||Merge Publisher (N)|
Note that the SQL Server Compact Edition (AKA SQLCe) does not share the same binaries or functionality as the other SQL Server editions. For more information on this new DBMS engine, see www.hitchhikerguides.net. Many developers choose to buy the $50 ($35 on Amazon) Developer Edition that has all of the features of the Enterprise Edition but is license-restricted to development and testing. It comes with the entire gamut of SQL Server tools and support applications that can make development a lot easier.
If you're considering SQL Server Express, be sure to download the Advanced Services Edition that supports SQL Server Reporting Services as well as Full Text Search along with the Business Intelligence Workbench and tools used to create reports.
Asking the Right Questions
Sure, there are lots of good reasons to choose a new or another data management solution—just as there are many not-so-good reasons to do so. When we get asked which data storage technique is best, most of us turn around and ask a dozen or more questions to better understand how to meet the customer's needs—both short and long-term. These questions (which invariably lead to other questions) include:
- How much data do you plan to store? How many rows and how many megabytes (or terabytes) to you need to store? Consider that SQL Server Compact Edition and SQL Express can only store 4GB of data. That should be plenty for a small business—if you're careful about what you store and maintain in the database as archived data. That is, it's not efficient or cost-effective to store archival data along with current data in the same data store. Developers and DBAs need to fold in an archival strategy into the DBMS as well the applications that access the data.
- What kind of data to you plan to store? That is, do you plan to store customer records or read-only pictures and documents (BLOBs)? It makes sense to organize your data so you're storing data that needs managing separately from the data that simply needs to be retrieved. That is, data that's in constant flux, such as customers, orders, and items rows should be managed separately from documents, and (relatively) static inventory pictures and parts lists. SQL Server 2008 expands on this concept by supporting a new Filestream datatype to make it easy to manage externally stored data. It might also make sense to use SharePoint Services to store and help manage your BLOBs—that's what it's designed to do.
- How do you plan to manage changes to the data? Do you need to leverage a legacy application written in COBOL or Access or an application you can actually compile and change? Is this application already written or is it simply a spreadsheet or an Access front-end? How many of the business rules used to protect the data and referential integrity are built into the application and how many are hard-coded into the client front-end application or middle-tier business layers? All versions of SQL Server (except the Compact Edition) support server-side RI and data integrity constraint validation. These validation tests can be implemented with Unique Indexes, triggers, Rules, stored procedures, or Check Constraints. Not all of these features are implemented by the SQLCe engine and XML file or the JET engine.
- How often are these changes made? If the data changes frequently, what mechanism will make these changes—other applications, users, or imported data? Any of the SQL Server engines are capable of handling fairly high volumes of transactions per second. However, the higher-end versions can support more activity due to expanded access to RAM caches and additional CPUs. The free versions of SQL Server govern performance by limiting RAM access to 1GB and a single CPU. Sure, your system will benefit from dual or quad processors, but SQL Server Express won't use any more than one of the available CPUs to perform its operations.
- How many users do you plan to support? We've seen databases supporting thousands of connected SQL Server users running on ancient hardware. We've also seen SQL Server Express support hundreds of networked users. This scalability hinges as much on the design of the database and the applications that share its data. However, consider that typically, not all of these users were active at any one time and not all were making changes to the data—some were simply retrieving data. Knowing which SQL Server engine to choose means determining how much load you expect to place on the DBMS engine. All of the SQL Server engines except the Compact Edition can act as a network-accessible DBMS server. Again, the higher-end versions like the Workgroup, Standard, and Enterprise can support increasingly higher throughput rates. Another factor to consider when building a shared-service SQL Server server—what other processes is the server charged to support? That is, do you plan to load up the server as an IIS, domain or Exchange server, a print or file server, or one running SharePoint or Reporting Services? All of these additional services rob SQL Server of precious RAM and CPU resources and can hurt performance.
- Does the chosen DBMS engine scale? That is, can you split the database service functionality across several servers or can you simply add more RAM or hard-disk or CPUs to make it faster or can you upgrade to a faster, more feature-rich (smarter) DBMS engine? The upscale versions of SQL Server do support more sophisticated functionality that permits developers to build a server farm that can vastly expand the number of supported users—but at considerable expense. Consider that SQL Server gains much of its performance from intelligently cached query plans and cached data. Simply increasing RAM or improving the efficiency of the compiled plans by tuning the queries or indexes can often preclude the need to buy a more expensive SQL Server version. Understanding how SQL Server works is essential to your tuning efforts. Although you might be tinkering with indexes, simply adding RAM or eliminating competing applications (like Reporting Services or a print server) can go a long way to making the existing SQL Server work more efficiently.
- Does management or the system architect want to leverage the code base to run on upscale versions when success pushes up the demand for more capacity? Development of upwardly compatible code is a primary feature of the SQL Server family of DBMS engines (not including SQLCe). This means virtually all of the architectural choices and code you write to access or run on SQL Server Express or Workgroup or Standard editions will run on upscale versions—usually without change. Yes, there are some features like User Instance that are only supported on SQL Server Express. Again, this cannot be said for SQL Server Compact Edition.
- How do you plan to protect the data? Developers, architects, and managers need to think about data validation and data protection as well as ways to manage referential integrity. In addition, they need to consider how the data can be compromised and protected from those who should not have access. Only SQL Compact Edition (and SQL Server 2008) supports the ability to encrypt the entire database. All versions of SQL Server permit you to encrypt the data traffic being sent to and from the SQL Server service as well as encrypt specific columns, protect the database through a straightforward user/password scheme. This reduces the opportunities for those who would hack into the database.
- Physically, where is the data managed in relationship to the clients that use it? Is the data to be stored centrally or distributed on a number of remote client or secondary server systems? Is there a network available to link data consumers with the database management system? How do you plan to keep the distributed data systems in sync? Although SQL Server can be configured as a client-hosted service (that only the individual client or a specific application can see), all versions can be configured to be visible (to varying degrees) to the corporate network or even the Internet. As you broaden the visibility of the SQL Server service and the databases it manages and protects, you increase the attack surface area and make it easier for hackers to gain access. All versions of SQL Server, including SQLCe, support replication scenarios that leverage upscale versions of SQL Server to host a merge replication Publication. This coupled with ADO.NET Synchronization Services, distributed database architectures are easier than ever to create and manage.
- Is the network always connected? That is, do you depend on dial-up or occasionally connected networks with varying degrees of performance, reliability, and security? If you're building a networked data store, you'll want to choose one of the service-class DBMS systems such as SQL Server Express or better. For disconnected or occasionally architectures, perhaps SQL Express or SQL Server Compact Edition makes sense. Both of these data engines support the ability to replicate data and changes to a central data host. In some cases, extracting data from a web-hosted Service Oriented Architecture (SOA) XML Web Service can make sense—especially for occasionally connected architectures.
- What tools do you plan to use to build, manage and write applications for your DBMS? Are you familiar with Visual Studio/VB.NET/C#? These tools are becoming more and more aware of SQL Server and are often the best ways to leverage its functionality. Consider that where Visual Studio tools fall short, SQL Server Management Studio often takes up the slack. Developers should make themselves aware of the increasing number and sophistication of tools from both the Visual Studio and SQL Server teams. However, if your developers are not familiar with these tools, using SQL Server will be more of a challenge.
- Another important factor is the chosen application architecture. If you're planning to build an ASP.NET application for a web site, you'll need to coordinate your DBMS choice with your ISP and your DBA. Some might support SQL Server but all too often you'll find they prefer to use other DBMS engines. Don't depend on being able to host a SQL Server-based application if the ISP does not support SQL Server. In addition, don't plan to use JET or SQL Server Compact edition as a web site's main data store. These engines are not designed for the high-throughput needed on most web sites.
- Deployment is also a critical factor for any SQL Server DBMS implementation. Although SQLCe can be deployed simply by adding a number of DLLs to your project, using other versions of SQL Server always means running setup with administrator-rights on either the client's system or to a visible networked server. Once SQL Server is installed, you'll need to install the actual database (or use one of the auto-attach approaches I discuss in my book) and configure the rights and other server settings.
- And, most importantly, do you plan to upgrade your skills from DAO or ADO to ADO.NET? Using any version of SQL Server means you're going to want to leverage the most efficient languages and data access interfaces like ADO.NET and the SqlClient .NET Framework namespace. Using older (and less efficient) OLE DB or ODBC interfaces can be problematic at best.
There are (always) more questions, but these should give you a good idea about which DBMS engine is necessary. I hope this helps.
About the Author
William (Bill) Vaughn is an industry-recognized author, mentor, and subject-matter expert on Visual Studio, SQL Server, Reporting Services, and data access interfaces. He's worked in the computer industry for over thirty-five years—working with mainframe, minicomputer, and personal computer systems as a developer, manager, architect, trainer, marketer, support specialist, writer, and publisher. In 2000, after 14 years at Microsoft, Bill stepped away to work on his books, mentoring, and independent training seminars. He's written seven editions of the Hitchhiker's Guide to Visual Basic and SQL Server, and three editions of ADO.NET and ADO Examples and Best Practices for Visual Basic (and C#) Programmers. He and Peter Blackburn also wrote the critically acclaimed Hitchhiker's Guide to SQL Server 2000 Reporting Services.
Bill is a top-rated speaker and frequents conferences all over the world, including TechEd, Visual Studio/SQL Connections, DevTeach, and many others. He's also written a wealth of articles for magazines such as MSDN, SQL Server Magazine, Visual Basic Programmer's Journal, .NET Magazine, and many others as well as a regular editorial for Processor magazine. Bill spends considerable time answering questions on the public newsgroups and speaking at INETA user group meetings all over the country and at other speaking venues all over the world. He's available for consulting, mentoring, or custom training. See www.betav.com or www.betav.com/blog/billva for his current schedule and course catalog.