Importing Data into SQL Server
I can't count the number of times I've answered questions in the newsgroups, forums, and from the attendees at the tech conferences that imply that developers think ADO.NET (or any of the data access interfaces) are designed to handle "bulk" operations. The question often sounds like:
"Ah, I've got this 6 million rows of data that I need to get from X (an external data source of some kind), process at the client, and send up to my SQL Server production table. I can't seem to get it all in memory at once and if I cut down the row count, the performance is still killing me—it can take all day to process..."
Although some of the newsgroup responders address the symptoms by suggesting they buy more RAM or faster hard drives, use a data reader with live connections to the source and destination, or find another job, they miss the point. ADO.NET has only one method designed to help with this task: the SqlBulkCopy class exposed in ADO.NET 2.0.
The Existing Alternatives
The concept of bulk data transfers has existed since the earliest days of SQL Server as there has always been a need to import data from other data sources. The low-level TDS interface exposes a series of BCP calls and "events" (if you can call them that) that permit developers (and device-driver developers) access to this functionality on any version of SQL Server (except the Compact edition).
In the past, I've encouraged developers to leverage the (considerable) power and flexibility of the Bulk Copy (BCP) command-line utility or the easier to use Data Transformation Services (DTS) and SQL Server Integration Services (SSIS). These tools that most DBAs know fairly intimately have interactive front-ends that help DBAs, developers acting as DBAs (or moms with kids at their knees) point to virtually any data source and choose the data to import into specified SQL Server tables. The tools deal with a litany of issues, including helping you choose the right data source provider or driver and the correct target database and table(s). This means you can import data from text files, delimited files, raw data files, or intelligent data sources such as Oracle, DB2, or even JET/Access or Excel—as long as it's exposed by an ODBC, OLE DB, or .NET Framework data provider. When you're done describing the bulk copy task you, can save the settings as an SSIS job that can be run on a scheduled basis or as you watch. SSIS is built into SQL Server Management Studio so it's available to all of the SQL Server SKUs (except Express and Compact).
Using the SSIS tools means better performance—both developer performance and data throughput performance. That's because it's not only easy to use, you'll find that SSIS (and its predecessors clear back to TDS BulkCopy) use a special technique to transfer the data to the server. Some of the performance difference is due to the fact that these bulk operations are not logged. Of course, this means it does not make sense to import directly into a production table. If something goes wrong during the bulk-copy operation, you simply clear (truncate) the table and restart the operation. In addition, based on settable options, you can choose to update indexes (or not) or to fire triggers (or not) that index and validate the data. Without the server-side constraints, this means that the data is still "evil" when it arrives in the server. Remember my fourth law: "All data is evil until proven innocent." And this means that you'll need to run validation procedures on the data after it arrives and before it's posted to the production tables.
And yes, this functionality also can be implemented directly from SQL Server by using the TSQL BULK INSERT approach, but that's a story for another day.
ADO.NET 2.0's SqlBulkCopy Method
Yes, there are times when you need to write a custom application to move data into SQL Server—as when you're working with SQL Server Express edition. That's where the SqlBulkCopy class comes in. I can't take credit for the developer team at Microsoft adding a bulk copy class to ADO.NET 2.0, but it's something that I and a number of others have been asking for since... well, for a long time. I wrote about this functionality as exposed by SQL Server's native DBLibrary interface in the first edition of "Hitchhiker's Guide to VBSQL" published in the early 90's. Basically, this class exposes the properties and events implemented by the low-level DBLibrary BulkCopy (BCP) functionality.
Now, walk through the SqlBulkCopy class to examine the properties and settings:
There are several SqlBulkCopy constructors that let you preset the properties (and options). In any case, you'll be passing a SqlConnection object or ConnectionString that addresses the target database. You also can pass a SqlBulkCopyOptions value based on the enumerations, as shown in Table 1. Note that as you enable these additional options you might be adding work that must be done on the server and adversely affect performance. I've added a column in the table to indicate options that can hurt (or help) performance.
Table 1: The SqlBulkCopyOption enumerations
|CheckConstraints||No||Validate constraints as data is being imported.|
|Default||Use the default setting for all options (all are disabled by default).|
|FireTriggers||No||As rows are added, fire any applicable triggers.|
|KeepIdentity||Yes||As rows are added, do not assign new identity values.|
|KeepNulls||Yes||Preserve null values in the target table regardless of specified default values. Unless KeepNulls is specified, null values are replaced by default values defined for the column.|
|TableLock||Yes||Obtain a bulk update lock for the table. If not specified, row locks are applied.|
|UseInternalTransaction||No||Bulk operation runs in a transaction.|