Microsoft & .NET .NET Using SQL Server 2008 Change Data

Using SQL Server 2008 Change Data


You’ve written a nice database application that allows the
user to manipulate data in an SQL Server database. All is
good, but the users would also like to see how the data in
the database has changed over time. For instance, John wants
to know how the customer records changed last week and Jane
would like to know how those details were added into her
latest sales order.

Technically speaking, keeping such a log inside your
application is quite straight forward. If you have a
properly architected layered application, you could add
logging code into your data access layer (DAL), and then
construct a user interface for this log. But, adding such
capability to each and every operation and table is often
time-consuming work. Wouldn’t it be great if the database
you use could help you in this common, but often times
taxing work?


Enter SQL Server 2008 Change Data


If you are using Microsoft’s latest SQL Server version, then
the database can help. SQL Server 2008 Enterprise Edition
(Developer Edition also works, as it’s based on Enterprise)
supports two features that are collectively called Change
Data. More precisely, these features are named Change
Tracking (CT) and Change Data Capture (CDC).

Both services have the same aim: to help developers
easily collect information about changes in the database
data. Of the two features, Change Tracking is a simpler one
and is only able to record that something has changed in the
database. It supports the familiar DML statements: INSERT,
UPDATE and DELETE. Change Data Capture on the other hand is
able to record that something changed just like Change
Tracking, but also how the data changed.


These features are useful in creating a change log, but also
when synchronizing data between database instances. For
example, you might need to periodically synchronize data
between a local database and a central database. The Change
Data features can help you do that.


Of course, being able to use the change tracking features
requires some work on your part. First, you need to enable
the change data collection features in your database.
Second, you need to write code to retrieve the change data
and process it. Additionally, you also need to understand
the consequences of enabling change data, as both
performance and database storage considerations must be kept
in mind. For small databases, these overheads might not
cause you to spend sleepless nights, but with larger
databases you need to be more careful.


Using Change Tracking


If you are happy to live with the inherent overhead, using
change data collection features is a good way to save the
trouble of implementing these by hand. The easier of the two
related features is Change Tracking, which is enabled for
each database and table you want to track. Change Tracking
can tell which rows in which table have changed. But on the
other hand, Change Tracking data won’t tell you how the data
changed: you cannot use it to retrieve the older data back.
You just know for instance that column N changed while doing
an UPDATE, but no more.

But even with these limitations, Change Tracking can be
useful. To get started with the feature, you first need a
suitable SQL Server database that has been created at least
with the compatibility level of 90 (this means SQL Server
2005 or later database, SQL Server 2008 databases by default
have the compatibility level of 100). For testing purposes,
you might even wish to create a completely new database. The
next step is to enable Change Tracking. You can either do
this from the database’s properties in SQL Server Management
Studio (see Figure 1) or by running the following T-SQL
statement:


Figure 1. Change Tracking can be enabled from
database's properties in SQL Server Management
Studio.

Click here for larger image


Figure 1. Change Tracking can be enabled from
database’s properties in SQL Server Management
Studio.

ALTER DATABASE MyDatabase
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)

After enabling the feature on a database level, you need
to enable change tracking for each table you wish to
monitor. This is done with the following SQL statement:

ALTER TABLE dbo.MyTable
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)

Just as with the database level setting, you can also
enable change tracking from the SQL Server Management
Studio. The setting can be found from the properties dialog
box for each table. Remember that in addition to typing
these scripts by hand, you can let SQL Server Management
Studio help you. Its Template Explorer pane contains
numerous ready-made scripts that you can use with little
modifications. This applies to change tracking and change
data capture functions as well (Figure 2).

Figure 2. Management Studio's Template Explorer has
ready-written scripts to help using change tracking
functions.

Figure 2. Management Studio’s Template Explorer has
ready-written scripts to help use change tracking
functions.


When you enable change tracking and start adding,
deleting or modifying records in the table for which you
have enabled tracking, you can immediately use the built-in
functions to read the logs. SQL Server 2008 provides several
of these functions, most of which are provided as table-
value functions, or TVFs. A table-value function is a
regular function, but instead of returning a single value
like a scalar function, these functions return a table’s
worth of data: zero or more rows.


Testing the Sample Application

The example application accompanying this article shows how you can access change tracking values from a WinForms application written with C# (Figure 3). The sample application is divided into two parts, one for the Change Tracking feature and another for Change Data Capture. Presently, let’s focus on the Change Tracking part.

Figure 3. The sample application supports reading
change data logs.

To retrieve the latest tracking numbers, you can use an SQL statement like this:


SELECT OBJECT_ID(N’dbo.MyTable’),
CHANGE_TRACKING_MIN_VALID_VERSION(
OBJECT_ID(N’dbo.MyTable’)),
CHANGE_TRACKING_CURRENT_VERSION()


This query would return the minimum and maximum valid tracking version numbers. For example, the values might be 0 and 6. This would mean that there are six versions of the table data, i.e. the records have changed six times. You can then use the tracking functions to retrieve changes with version numbers 0..5. The idea is that the sixth version is the latest data, which you can retrieve directly from the original table.


In the sample application, the above SQL statement is executed and the resulting two version numbers (plus the object ID) are displayed to the user (Figure 4). Note how the table name must be changed to an object ID before it can be passed to the CHANGE_TRACKING_MIN_VALID_VERSION function. This can be done with the OBJECT_ID function. Note also that in the sample application, the user can type in a table name; this is concatenated at runtime to a full SQL statement.


Figure 4. Displaying the minimum and maximum tracking
numbers

Figure 4. Displaying the minimum and maximum tracking numbers.


The next step is to utilize these version numbers to retrieve the change rows. The sample application uses an SQL statement like the following to retrieve the change tracking information given a tracking version number:


SELECT CT.ID, CT.SYS_CHANGE_OPERATION,
CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT
FROM CHANGETABLE(CHANGES dbo.MyTable, 123) AS CT


Here, the table-value function called CHANGETABLE is used to retrieve the change records for the table dbo.MyTable with the tracking version number 123. For clarity, the table is aliased as “CT”. The ID field returns the original ID value of the record that was changed. This means that the table which is to be monitored must have a primary key set. An identity field with an integer value is always a good candidate for a primary key.


The SYS_CHANGE_OPERATION column indicates the operation that took place: I for insert, U for update or D for delete. The SYS_CHANGE_COLUMNS value is on the other hand a binary field, which contains a bitmask of the changed columns. However, it’s not fully documented, but its contents appear to be formed of DWORD values, where each value indicates the changed column starting from one. For example, if the bit values would be 0x00000003000000005, then the third and fifth columns would have changed. Note that for insert and delete operations, this bitmask is always NULL. The fourth field, SYS_CHANGE_CONTEXT reflects the table hint value specified using the WITH clause in an INSERT, DELETE or UPDATE statement. Otherwise the value is NULL.


The sample application converts the bitmask into column names based on a fixed array of strings. If you wanted, you could add a feature that reads the metadata of the table in question and then retrieves the column names based on that. This way, you could have a completely generic solution for converting the bitmask into column names.

Understanding Change Data Capture

Now that you understand the basics of Change Tracking, you can start learning about the more advanced feature: Change Data Capture or CDC. As you have learned, Change Data Capture is able to tell which records have changed in your tables, but unlike Change Tracking, it can also tell how these columns changed.

The get started with Change Data Capture, you first need to enable it for both your database and table(s). Enabling must be done by executing a stored procedure; you cannot enable Change Data Capture from SQL Server Management Studio’s property dialog boxes. Instead, the command you need to execute is:


EXEC sys.sp_cdc_enable_db


Executing this command takes a little while, as SQL Server needs to create and prepare several system tables. The following is a list of the tables that are being created:


cdc.captured_columns
cdc.change_tables
cdc.ddl_history
cdc.index_columns
cdc.lsn_time_mapping
dbo.systranschemas

As you can guess, you do have the possibility to query the contents of these tables directly, but since both the contents and the schemas are designed for internal use, the data in these system tables might change once the next service pack comes along. Thus, the better advice is to use the CDC functions to query these tables just like with Change Tracking data.


After enabling support for the database, the next step is to enable Change Data Capture for a particular table. This can be done by executing another SQL stored procedure, giving as a parameter name the table name for which you wish to enable monitoring. This could go like this for a table named MyTable in the schema dbo:


EXEC sys.sp_cdc_enable_table
@source_schema = N’dbo’,
@source_name = N’MyTable’,
@role_name = N’AdminRoleOrUser’,
@supports_net_changes = 1


Here, the sys.sp_cdc_enable_table stored procedure prepares a table for Change Data Capture. The source schema and name parameters indicate which schema and table are to be enabled, and the role name parameter specifies who can read the collected records as it could contain sensitive information. Finally, the “net change support” feature allows you to query combined changes up to certain point in time. The alternative is “all changes support” which means that you will get an entry each time a change has been made. This means that “all changes” is better suited for keeping a log.


So far, you have learned that Change Data Capture works on a similar basis than the simpler Change Tracking: you must enable the feature both on the database level and individual table level. However, Change Data Capture requires you to perform one additional step: you need to enable a capture job for your database. This capture job is a regular SQL Agent job, which you can enable and disable at will. To enable the job for your database, you would need to execute an SQL statement similar to the following:


EXEC sys.sp_cdc_start_job
@job_type = N’capture’


When you execute this statement, you might see three or four informational messages that the SQL Server Agent is not running and cannot be contacted. Luckily, this is easy to fix: you just need to start the agent service. Note however that the Change Data Capture collects data even without the agent, but then you cannot use the ready-made functions. The best option then is to enable SQL Server Agent before enabling the job.


Understanding capture instances

When reading the SQL Server documentation about Change Data Capture, it often refers to the term capture instance. This refers to a particular instance of collected data, and since you can only enable Change Data Capture on table-basis, each capture instance corresponds to a table in your database.

It is also important to understand how capture instances are named. By default, the capture instance name is formed from the schema and the table name being monitored with an underscore. For example, if you have the table called dbo.MyTable, then the capture instance name will be dbo_MyTable. And just as with table and column names in SQL Server, capture instance names are not case sensitive.


When you enable Change Data Capture for a specific table, SQL Server creates a special table to store the changes made to the original table. This table contains the same columns as the source table and several housekeeping columns. In addition to this internal change table, SQL Server also creates a special function for it. This function is then able retrieve the changes given a start and end time. For example, if you would like to retrieve all changes made to the table, you would use a function beginning with the name cdc.fn_cdc_get_all_changes. Since this stored procedure (and its sister cdc.fn_cdc_get_net_changes, if the source table is enabled for net changes) is created specifically for a given table, the capture instance name is appended to the function name. For instance, the table MyTable would have the following function name:


cdc.fn_cdc_get_all_changes_dbo_mytable


The function expects three parameters, a from LSN number, a to LSN number, and a row filtering option. Here, LSN stands for Log Sequence Number which is a unique identifier for each transaction. It is defined as binary(10). You can use the functions sys.fn_cdc_get_min_lsn and sys.fn_cdc_get_max_lsn to retrieve the minimum and maximum LSN numbers, respectively.

Although these functions are fine, it is often more
useful to be able to get changes up to a certain date and
time. To allow this, SQL Server 2008 supports a function
called sys.fn_cdc_map_time_to_lsn which you can use to
convert a regular datetime value to an LSN number. For
instance, you could use the following statements to retrieve
changes up to the current date and time:

DECLARE @from_lsn binary(10),
  @to_lsn binary(10)
SET @from_lsn =
  sys.fn_cdc_get_min_lsn('dbo_MyTable')
SET @to_lsn = sys.fn_cdc_map_time_to_lsn(
  'largest less than', GETDATE());
SELECT
  sys.fn_cdc_map_lsn_to_time(
  __$start_lsn) as changetime,
  [__$operation], CAST(
  [__$update_mask] AS int)
  AS updatemask,
  id, name, value
FROM
  cdc.fn_cdc_get_all_changes_dbo_testtable(
@from_lsn, @to_lsn, N'all')
ORDER BY changetime

The results could then be processed as needed. For
example, the sample application shows the rows in a simple
DataGridView (see again Figure 3). Here is an example of a C#
method that runs the above SQL statement:

private void listChangesButton_Click(
  object sender, EventArgs e)
{
  SqlConnection conn =
    DataAccess.GetSqlConnection();
  try
  {
    string sql = "DECLARE ...";
    SqlDataAdapter da =
      new SqlDataAdapter(sql, conn);
    DataTable changesTable =
      new DataTable();
    da.Fill(changesTable);
    dataGridView.DataSource =
      changesTable;
  }
  finally
  {
      conn.Dispose();
  }
}

If you wanted to make the code more generic, you could
construct the SQL statements so that you can take the
capture instance name as a parameter. And if you had the
need to get changes from certain point of time to another,
you could retrieve both LSN numbers with the
sys.fn_cdc_map_lsn_to_time function.


Conclusion


In this article, you saw how you can use the change data
features in SQL Server 2008 Enterprise Edition. This edition
supports two useful, but slightly different techniques
called Change Tracking (CT) and Change Data Capture (CDC).
Both are able to identify the rows that have changed in
selected tables, but CDC is also able to tell how the data
has changed.

Although using either of these features requires some
setup work from the SQL Server administrator, using them
from application code is quite easy. Even if the sample
application is written in C#, you can actually use these
features from any application – Windows, web or mobile – and
any language that can connect to an SQL Server.


With these new change tracking functions, you can speed
up your application development and leave the grinding work
to the SQL Server.


Good luck with your database applications!


About the Author: Jani Järvinen is a software development
trainer and consultant in Finland. He is a Microsoft C# MVP
and a frequent author and has published three books about
software development. He is the group leader of a Finnish
software development expert group at ITpro.fi. His blog can
be found at http://www
.saunalahti.fi/janij/
. You can send him mail by clicking
on his name at the top of the article.

Latest Posts

Related Stories