You’ve no doubt heard of SQL Server Integration Services (SSIS) by now,
and probably even know that it’s the SQL Server 2005 replacement for the older
Data Transformation Services. Microsoft calls SSIS “a platform for
building high performance data integration solutions, including
extraction, transformation, and load (ETL) packages for data
warehousing.” A simpler way to think of SSIS is that it’s the solution
for automating SQL Server. But it’s also one of the more complex parts of
the latest version of SQL Server, and it can be tough to get started with
it. In this article, I’m going to give you the whirlwind SSIS tour. This
won’t be enough to make you an expert, or even to leave you building your
own SSIS packages without referring to the documentation. But it should
help you get oriented in this complicated new world.
Packages and BIDS
The basic organizational concept of SSIS is the package. A
package is a collection of SSIS objects including:
- Connections to data sources.
- Data flows, which include the sources and destinations that
extract and load data, the transformations that modify and extend
data, and the paths that link sources, transformations, and
destinations. - Control flows, which include tasks and containers that execute
when the package runs. You can organize tasks in sequences and in
loops. - Event handlers, which are workflows that runs in response to the
events raised by a package, task, or container.
To work with SSIS packages you use BIDS – not the sort of bids you find
in an auction, but Business Intelligence Development Studio, a version of
Microsoft Visual Studio that’s customized for (and shipped with) SQL
Server 2005. You launch it from the SQL Server 2005 Program Manager group,
and if you do that and select File, New, Project you’ll find that you can
create an Integration Services Project using a template. Figure 1 shows a
brand new Integration Services Project just waiting for its objects.
Connection Managers
The first step in building a package is to add some connection
managers. Connection managers are used to integrate different data sources
into your package and there are a wide variety of them available: ADO.NET
Connection Manager, Excel Connection Manager, Flat File Connection
Manager, WMI Connection Manager, and so on, each connecting to a different
type of data.
To create a Connection Manager, you right-click anywhere in the
Connection Managers area of a package in BIDS and choose the appropriate
shortcut from the shortcut menu. Each Connection Manager has its own
custom configuration dialog box with specific options that you need to
fill out. Figure 2 shows what the Connection Managers area looks like with
connections to a SQL Server 2005 database and a flat file.
Building Control Flows
The Control Flow tab of the Package Designer is where you tell SSIS what
the package will do. You create your control flow by dragging and dropping
items from the toolbox to the surface, and then dragging and dropping
connections between the objects. The objects you can drop here break up
into four different groups:
- Tasks are things that SSIS can do, such as execute SQL statements
or transfer objects from one SQL Server to another. - Maintenance Plan tasks are a special group of tasks that handle
jobs such as checking database integrity and rebuilding
indexes. - The Data Flow Task is a general purpose task for ETL (extract,
transform, and load) operations on data. There’s a separate
design tab for building the details of a Data Flow Task. - Containers are objects that can hold a group of tasks.
The basic workflow on the Control Flow tab is pretty simple. You drag
tasks from the Toolbox to the tab, and use the Properties window to set
their properties. You drag connections between tasks to specify the order
that they execute in. If you used the DTS designer in SQL Server 2000,
this sort of thing should be pretty familiar. Figure 3 shows some Control
Flow tasks for a package.
Building Data Flows
The Data Flow tab of the Package Designer is where you specify the
details of any Data Flow tasks that you’ve added on the Control Flow tab.
As with Control Flows, you drag and drop things from the Toolbox to build
Data Flows. There are three groups of objects that make up Data Flows:
- Data Flow Sources are ways that data gets into the system.
These are things like DataReaders, flat files, and XML files - Data Flow Transformations let you alter and manipulate the data in
various ways. Here you can do lookups, joins, sorts, merges, and
so on. - Data Flow Destinations are where the data goes when you’re done
with it – DataReaders, Recordsets, SQL Server databases, and other
locations.
As with the Control Flows, you hook together Data Flows on the designer
surface. Figure 4 shows what the Data Flows might look like for a small
and simple package. Of course, in the real world you’re liable to have a
much more complex set of Data Flows than this.
But Wait, There’s More!
SSIS also supports a complete event-handling system. You can attach
event handlers to events of tasks or to the package itself. Event handlers
are defined on the Event Handlers tab of the Package Designer. When you
create an event handler, you handle the event by building an entire
secondary SSIS package.
When you work in BIDS, your SSIS package is saved as an XML file (with
the extension dtsx) directly in the normal Windows file system. But that’s
not the only option. Packages can also be saved in the msdb database in
SQL Server itself, or in a special area of the file system called the
Package Store.
Storing SSIS packages in the Package Store or the msdb database makes it
easier to access and manage them from SQL Server’s administrative and
command-line tools without needing to have any knowledge of the physical
layout of the server’s hard drive.
BIDS also provides complete facilities for running and debugging SSIS
packages, including single-stepping through them so you can see what’s
going on.
It’s a Whole New World
By now, you probably get the idea: SSIS is both more complicated and
more powerful than its ancestor DTS. Fortunately, BIDS is a much more
powerful tool than the DTS Designer, making it possible to deal with the
complexity of SSIS without getting lost in the details. The learning curve
can be steep at first, with a large number of tasks available to choose
from. But if you stick with it, you’ll find that SSIS is an extremely
useful tool for database automation tasks of all types.
About the Author
Mike Gunderloy is the Senior Technology Partner for Adaptive Strategy, a
Washington State consulting firm. You can read more of
Mike’s work at his Larkware Web site, or
contact him at MikeG1@larkfarm.com.