A High-Performance Data Pump — and a Whole Lot More
Data Transformation Services in Microsoft SQL Server 2000 is a
high-performance data pump. It is a tool for copying, moving, consolidating,
cleansing, and validating data. The data pump loads each row from the data
source, manipulates the values in that row, and inserts the row into the data
DTS is also a Rapid Application Development tool for data-oriented
programming. It provides a comprehensive set of data manipulation tools
organized in a development environment that is both convenient and powerful.
Speed is important when working with data:
Organizations have to be able to move large amounts of data between
different systems within specified processing periods.
Complex data modifications, including row-by-row processing, are often
needed as the data is being moved. If you are manipulating a lot of data, the
time required to modify each record must be minimized.
The design, development, and maintenance of data cleansing applications
can be very time-consuming. Tools are needed to speed the development
Note – If you receive a 1GB file of clickstream
data at 8:00 a.m. for the previous day’s Web site activity, can you load
that data into a SQL Server data mart and process the OLAP cubes so that
they’re ready for a 10:00 a.m. management meeting?
The need for rapid data transformation is increasing along with the
increasing opportunities to gather and analyze data.
There are four basic strategies for developing data manipulation packages
Use the DTS Designer
Use the DTS Object Model in code
Use DTS Templates
Use the DTS Wizards
I encourage you to become familiar with all of these. Each one is best for a certain type of
situation. These strategies are not exclusive — you will often use them with
one another to achieve the fastest possible development speed.
Using the DTS Designer to Create Complex Data Transformations
The primary tool for working with Data Transformation Services is the DTS Designer,
as shown in Figure 1.1. The Designer is a
friendly graphical interface that you can use to create complex data transformations
with the tasks of DTS.
Figure 1.1. You can create and edit DTS packages in the DTS Designer.
You access the DTS Designer through SQL Server’s Enterprise Manager. Data
Transformation Services is in the console tree structure under any of the SQL
Servers that have been registered (see Figure
1.2). You can open an existing package that is listed under Local Packages
or Meta Data Services Packages. You can right-click on Data Transformation Services
and open a DTS package that has been saved as a file. Or, you can open the DTS
Designer with a blank Design Sheet to begin the creation of a new package.
Figure 1.2. You can open existing packages in the DTS Designer from the Enterprise
A DTS package contains one or more complex programmatic units called tasks.
These tasks can transform data, execute a SQL statement, copy a database, FTP
a file, send a message with the Message Queuing Services, or do any of a number
of other things. Each task is represented by an icon, which can be seen on both
the task palette and the Design Sheet. All the tasks that Microsoft provides
with SQL Server 2000 are shown in Figure 1.3.
Figure 1.3. DTS has a variety of tasks that you can use in a package.
DTS Connections are used to specify the source of the data and the destination
where it’s being moved. You can use DTS with a wide variety of data stores,
including relational databases such as Microsoft SQL Server and Oracle, multidimensional
databases such as Microsoft OLAP Services, text files, and spreadsheets. DTS
Connections use OLE DB providers but can also connect through ODBC drivers and
Data Link files. Some of the data sources that are normally installed with SQL
Server 2000 are shown in Figure 1.4.
Figure 1.4. You can connect to a variety of data sources with DTS.
Using the DTS Object Model for Programmatic
DTS is implemented with the Component Object Model (COM). You can manipulate
DTS packages and their components through the appropriate collections, objects,
properties, and methods that are exposed in the DTS object model.
The Package object is at the top of the DTS object hierarchy. All
the objects that do the work of a DTS package stem from this object.
The Package object contains four collections of objects:
Connections — Defined links to particular sources of data.
Tasks — The actions that take place in a package.
Steps — An object associated with a task that controls how
the task fits into the workflow of the package as a whole.
Global Variables — Variables that allow for sharing of
information between the various tasks. Values of global variables can also be
sent into a package.
Each task has an associated object called a custom task that contains the
properties specific to that particular kind of DTS task.
There are several other DTS objects that do not fit under the Package object.
These other objects primarily provide information:
Application — System properties and information.
TransformationInfo — DTS transformations that are registered.
ScriptingLanguageInfo — Scripting languages that are
SavedPackageInfo — Packages that are stored as files.
Using the DTS Object Model with the Dynamic
You can use the new Dynamic Properties task to change the properties of DTS
objects while a package is being executed. You create the workflow for this
task so that the values are updated at the correct point in the package execution,
as shown in Figure 1.13.
Figure 1.13. The Dynamic Properties task allows you to change properties of an object as a
package is executing.
SQL Server 2000 has added DTS templates to speed the development of DTS
packages. If you are creating a number of packages that are similar, you can
create the package once as a template and then use that template as the basis
for all your individual packages.
A DTS template is always saved as a file with a .dtt extension. You can open
a template in one of two ways:
Right-click on the Data Transformation Services node in the Enterprise
Manager and select All Tasks and Open Template from the popup menu.
Highlight the Data Transformation Services node and select All Tasks and
Open Template from the Action menu.
The template has a set of tasks and precedence constraints that have already
been created. You use these objects as the starting point for a new DTS
Using Wizards for Rapid Application Development
SQL Server 2000 provides two wizards that automate the process of creating a
DTS package. One of these wizards, the Import/Export Wizard, is almost unchanged
from SQL Server 7.0. The other one, the Copy Database Wizard, is new.
Both wizards create DTS packages. You can modify these packages with the DTS
Data Transformation Services is a very powerful, versatile tool for moving
and manipulating data. It was a great tool in SQL Server 7.0, and it’s
greatly improved in SQL Server 2000.
About the Author
Timothy Peterson is the chief consultant at SDG
Computing, Inc., a company that specializes in data warehousing awith
Microsfot’s SQL Server tools. He teaches the official Microsoft Data Warehousing