http://www.developer.com/

Back to article

Using SQL Server 2008 Change Data


April 14, 2009

You've written a nice database application that allows the user to manipulate data in an SQL Server database. All is good, but the users would also like to see how the data in the database has changed over time. For instance, John wants to know how the customer records changed last week and Jane would like to know how those details were added into her latest sales order.

Technically speaking, keeping such a log inside your application is quite straight forward. If you have a properly architected layered application, you could add logging code into your data access layer (DAL), and then construct a user interface for this log. But, adding such capability to each and every operation and table is often time-consuming work. Wouldn't it be great if the database you use could help you in this common, but often times taxing work?

Enter SQL Server 2008 Change Data

If you are using Microsoft's latest SQL Server version, then the database can help. SQL Server 2008 Enterprise Edition (Developer Edition also works, as it's based on Enterprise) supports two features that are collectively called Change Data. More precisely, these features are named Change Tracking (CT) and Change Data Capture (CDC).

Both services have the same aim: to help developers easily collect information about changes in the database data. Of the two features, Change Tracking is a simpler one and is only able to record that something has changed in the database. It supports the familiar DML statements: INSERT, UPDATE and DELETE. Change Data Capture on the other hand is able to record that something changed just like Change Tracking, but also how the data changed.

These features are useful in creating a change log, but also when synchronizing data between database instances. For example, you might need to periodically synchronize data between a local database and a central database. The Change Data features can help you do that.

Of course, being able to use the change tracking features requires some work on your part. First, you need to enable the change data collection features in your database. Second, you need to write code to retrieve the change data and process it. Additionally, you also need to understand the consequences of enabling change data, as both performance and database storage considerations must be kept in mind. For small databases, these overheads might not cause you to spend sleepless nights, but with larger databases you need to be more careful.

Using Change Tracking

If you are happy to live with the inherent overhead, using change data collection features is a good way to save the trouble of implementing these by hand. The easier of the two related features is Change Tracking, which is enabled for each database and table you want to track. Change Tracking can tell which rows in which table have changed. But on the other hand, Change Tracking data won't tell you how the data changed: you cannot use it to retrieve the older data back. You just know for instance that column N changed while doing an UPDATE, but no more.

But even with these limitations, Change Tracking can be useful. To get started with the feature, you first need a suitable SQL Server database that has been created at least with the compatibility level of 90 (this means SQL Server 2005 or later database, SQL Server 2008 databases by default have the compatibility level of 100). For testing purposes, you might even wish to create a completely new database. The next step is to enable Change Tracking. You can either do this from the database's properties in SQL Server Management Studio (see Figure 1) or by running the following T-SQL statement:

Figure 1. Change Tracking can be enabled from
database's properties in SQL Server Management
Studio.
Click here for larger image

Figure 1. Change Tracking can be enabled from database's properties in SQL Server Management Studio.

ALTER DATABASE MyDatabase
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)

After enabling the feature on a database level, you need to enable change tracking for each table you wish to monitor. This is done with the following SQL statement:

ALTER TABLE dbo.MyTable
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)

Just as with the database level setting, you can also enable change tracking from the SQL Server Management Studio. The setting can be found from the properties dialog box for each table. Remember that in addition to typing these scripts by hand, you can let SQL Server Management Studio help you. Its Template Explorer pane contains numerous ready-made scripts that you can use with little modifications. This applies to change tracking and change data capture functions as well (Figure 2).

Figure 2. Management Studio's Template Explorer has
ready-written scripts to help using change tracking
functions.
Figure 2. Management Studio's Template Explorer has ready-written scripts to help use change tracking functions.

When you enable change tracking and start adding, deleting or modifying records in the table for which you have enabled tracking, you can immediately use the built-in functions to read the logs. SQL Server 2008 provides several of these functions, most of which are provided as table- value functions, or TVFs. A table-value function is a regular function, but instead of returning a single value like a scalar function, these functions return a table's worth of data: zero or more rows.

Testing the Sample Application

The example application accompanying this article shows how you can access change tracking values from a WinForms application written with C# (Figure 3). The sample application is divided into two parts, one for the Change Tracking feature and another for Change Data Capture. Presently, let's focus on the Change Tracking part.

Figure 3. The sample application supports reading
change data logs.
Click here for larger image

Figure 3. The sample application supports reading change data logs.

This part contains two buttons labeled Get Tracking Numbers and Get Tracking Changes. Whenever Change Tracking is enabled for a table, each change increments a so-called tracking version number. When requesting changes thru the special SQL Server change tracking functions, you need to pass in this tracking version number. For instance, if you update column A on row number 8, the maximum tracking number will increment by one. Then, if you request information about the latest change, an ID value will indicate that row 8 was changed. Furthermore, a bit mask will indicate which columns changed.

To retrieve the latest tracking numbers, you can use an SQL statement like this:

SELECT OBJECT_ID(N'dbo.MyTable'),
CHANGE_TRACKING_MIN_VALID_VERSION(
  OBJECT_ID(N'dbo.MyTable')),
CHANGE_TRACKING_CURRENT_VERSION()

This query would return the minimum and maximum valid tracking version numbers. For example, the values might be 0 and 6. This would mean that there are six versions of the table data, i.e. the records have changed six times. You can then use the tracking functions to retrieve changes with version numbers 0..5. The idea is that the sixth version is the latest data, which you can retrieve directly from the original table.

In the sample application, the above SQL statement is executed and the resulting two version numbers (plus the object ID) are displayed to the user (Figure 4). Note how the table name must be changed to an object ID before it can be passed to the CHANGE_TRACKING_MIN_VALID_VERSION function. This can be done with the OBJECT_ID function. Note also that in the sample application, the user can type in a table name; this is concatenated at runtime to a full SQL statement.

Figure 4. Displaying the minimum and maximum tracking
numbers

Figure 4. Displaying the minimum and maximum tracking numbers.

The next step is to utilize these version numbers to retrieve the change rows. The sample application uses an SQL statement like the following to retrieve the change tracking information given a tracking version number:

SELECT CT.ID, CT.SYS_CHANGE_OPERATION,
CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT
FROM CHANGETABLE(CHANGES dbo.MyTable, 123) AS CT

Here, the table-value function called CHANGETABLE is used to retrieve the change records for the table dbo.MyTable with the tracking version number 123. For clarity, the table is aliased as "CT". The ID field returns the original ID value of the record that was changed. This means that the table which is to be monitored must have a primary key set. An identity field with an integer value is always a good candidate for a primary key.

The SYS_CHANGE_OPERATION column indicates the operation that took place: I for insert, U for update or D for delete. The SYS_CHANGE_COLUMNS value is on the other hand a binary field, which contains a bitmask of the changed columns. However, it's not fully documented, but its contents appear to be formed of DWORD values, where each value indicates the changed column starting from one. For example, if the bit values would be 0x00000003000000005, then the third and fifth columns would have changed. Note that for insert and delete operations, this bitmask is always NULL. The fourth field, SYS_CHANGE_CONTEXT reflects the table hint value specified using the WITH clause in an INSERT, DELETE or UPDATE statement. Otherwise the value is NULL.

The sample application converts the bitmask into column names based on a fixed array of strings. If you wanted, you could add a feature that reads the metadata of the table in question and then retrieves the column names based on that. This way, you could have a completely generic solution for converting the bitmask into column names.

Understanding Change Data Capture

Now that you understand the basics of Change Tracking, you can start learning about the more advanced feature: Change Data Capture or CDC. As you have learned, Change Data Capture is able to tell which records have changed in your tables, but unlike Change Tracking, it can also tell how these columns changed.

The get started with Change Data Capture, you first need to enable it for both your database and table(s). Enabling must be done by executing a stored procedure; you cannot enable Change Data Capture from SQL Server Management Studio's property dialog boxes. Instead, the command you need to execute is:

EXEC sys.sp_cdc_enable_db

Executing this command takes a little while, as SQL Server needs to create and prepare several system tables. The following is a list of the tables that are being created:

cdc.captured_columns
cdc.change_tables
cdc.ddl_history 
cdc.index_columns
cdc.lsn_time_mapping
dbo.systranschemas

As you can guess, you do have the possibility to query the contents of these tables directly, but since both the contents and the schemas are designed for internal use, the data in these system tables might change once the next service pack comes along. Thus, the better advice is to use the CDC functions to query these tables just like with Change Tracking data.

After enabling support for the database, the next step is to enable Change Data Capture for a particular table. This can be done by executing another SQL stored procedure, giving as a parameter name the table name for which you wish to enable monitoring. This could go like this for a table named MyTable in the schema dbo:

EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'MyTable',
@role_name     = N'AdminRoleOrUser',
@supports_net_changes = 1

Here, the sys.sp_cdc_enable_table stored procedure prepares a table for Change Data Capture. The source schema and name parameters indicate which schema and table are to be enabled, and the role name parameter specifies who can read the collected records as it could contain sensitive information. Finally, the "net change support" feature allows you to query combined changes up to certain point in time. The alternative is "all changes support" which means that you will get an entry each time a change has been made. This means that "all changes" is better suited for keeping a log.

So far, you have learned that Change Data Capture works on a similar basis than the simpler Change Tracking: you must enable the feature both on the database level and individual table level. However, Change Data Capture requires you to perform one additional step: you need to enable a capture job for your database. This capture job is a regular SQL Agent job, which you can enable and disable at will. To enable the job for your database, you would need to execute an SQL statement similar to the following:

EXEC sys.sp_cdc_start_job
     @job_type = N'capture'

When you execute this statement, you might see three or four informational messages that the SQL Server Agent is not running and cannot be contacted. Luckily, this is easy to fix: you just need to start the agent service. Note however that the Change Data Capture collects data even without the agent, but then you cannot use the ready-made functions. The best option then is to enable SQL Server Agent before enabling the job.

Understanding capture instances

When reading the SQL Server documentation about Change Data Capture, it often refers to the term capture instance. This refers to a particular instance of collected data, and since you can only enable Change Data Capture on table-basis, each capture instance corresponds to a table in your database.

It is also important to understand how capture instances are named. By default, the capture instance name is formed from the schema and the table name being monitored with an underscore. For example, if you have the table called dbo.MyTable, then the capture instance name will be dbo_MyTable. And just as with table and column names in SQL Server, capture instance names are not case sensitive.

When you enable Change Data Capture for a specific table, SQL Server creates a special table to store the changes made to the original table. This table contains the same columns as the source table and several housekeeping columns. In addition to this internal change table, SQL Server also creates a special function for it. This function is then able retrieve the changes given a start and end time. For example, if you would like to retrieve all changes made to the table, you would use a function beginning with the name cdc.fn_cdc_get_all_changes. Since this stored procedure (and its sister cdc.fn_cdc_get_net_changes, if the source table is enabled for net changes) is created specifically for a given table, the capture instance name is appended to the function name. For instance, the table MyTable would have the following function name:

cdc.fn_cdc_get_all_changes_dbo_mytable

The function expects three parameters, a from LSN number, a to LSN number, and a row filtering option. Here, LSN stands for Log Sequence Number which is a unique identifier for each transaction. It is defined as binary(10). You can use the functions sys.fn_cdc_get_min_lsn and sys.fn_cdc_get_max_lsn to retrieve the minimum and maximum LSN numbers, respectively.

Although these functions are fine, it is often more useful to be able to get changes up to a certain date and time. To allow this, SQL Server 2008 supports a function called sys.fn_cdc_map_time_to_lsn which you can use to convert a regular datetime value to an LSN number. For instance, you could use the following statements to retrieve changes up to the current date and time:

DECLARE @from_lsn binary(10),
  @to_lsn binary(10)
SET @from_lsn =
  sys.fn_cdc_get_min_lsn('dbo_MyTable')
SET @to_lsn = sys.fn_cdc_map_time_to_lsn(
  'largest less than', GETDATE());
SELECT
  sys.fn_cdc_map_lsn_to_time(
  __$start_lsn) as changetime,
  [__$operation], CAST(
  [__$update_mask] AS int)
  AS updatemask,
  id, name, value
FROM
  cdc.fn_cdc_get_all_changes_dbo_testtable(
@from_lsn, @to_lsn, N'all')
ORDER BY changetime

The results could then be processed as needed. For example, the sample application shows the rows in a simple DataGridView (see again Figure 3). Here is an example of a C# method that runs the above SQL statement:

private void listChangesButton_Click(
  object sender, EventArgs e)
{
  SqlConnection conn =
    DataAccess.GetSqlConnection();
  try
  {
    string sql = "DECLARE ...";
    SqlDataAdapter da =
      new SqlDataAdapter(sql, conn);
    DataTable changesTable =
      new DataTable();
    da.Fill(changesTable);
    dataGridView.DataSource =
      changesTable;
  }
  finally
  {
      conn.Dispose();
  }
}

If you wanted to make the code more generic, you could construct the SQL statements so that you can take the capture instance name as a parameter. And if you had the need to get changes from certain point of time to another, you could retrieve both LSN numbers with the sys.fn_cdc_map_lsn_to_time function.

Conclusion

In this article, you saw how you can use the change data features in SQL Server 2008 Enterprise Edition. This edition supports two useful, but slightly different techniques called Change Tracking (CT) and Change Data Capture (CDC). Both are able to identify the rows that have changed in selected tables, but CDC is also able to tell how the data has changed.

Although using either of these features requires some setup work from the SQL Server administrator, using them from application code is quite easy. Even if the sample application is written in C#, you can actually use these features from any application - Windows, web or mobile - and any language that can connect to an SQL Server.

With these new change tracking functions, you can speed up your application development and leave the grinding work to the SQL Server.

Good luck with your database applications!


About the Author: Jani Järvinen is a software development trainer and consultant in Finland. He is a Microsoft C# MVP and a frequent author and has published three books about software development. He is the group leader of a Finnish software development expert group at ITpro.fi. His blog can be found at http://www .saunalahti.fi/janij/. You can send him mail by clicking on his name at the top of the article.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date