http://www.developer.com/

Back to article

Getting Started with SQL Server Integration Services


October 2, 2006

You've no doubt heard of SQL Server Integration Services (SSIS) by now, and probably even know that it's the SQL Server 2005 replacement for the older Data Transformation Services. Microsoft calls SSIS "a platform for building high performance data integration solutions, including extraction, transformation, and load (ETL) packages for data warehousing." A simpler way to think of SSIS is that it's the solution for automating SQL Server. But it's also one of the more complex parts of the latest version of SQL Server, and it can be tough to get started with it. In this article, I'm going to give you the whirlwind SSIS tour. This won't be enough to make you an expert, or even to leave you building your own SSIS packages without referring to the documentation. But it should help you get oriented in this complicated new world.

Packages and BIDS

The basic organizational concept of SSIS is the package. A package is a collection of SSIS objects including:

  • Connections to data sources.
  • Data flows, which include the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations.
  • Control flows, which include tasks and containers that execute when the package runs. You can organize tasks in sequences and in loops.
  • Event handlers, which are workflows that runs in response to the events raised by a package, task, or container.

To work with SSIS packages you use BIDS - not the sort of bids you find in an auction, but Business Intelligence Development Studio, a version of Microsoft Visual Studio that's customized for (and shipped with) SQL Server 2005. You launch it from the SQL Server 2005 Program Manager group, and if you do that and select File, New, Project you'll find that you can create an Integration Services Project using a template. Figure 1 shows a brand new Integration Services Project just waiting for its objects.



Click here for a larger image.

Connection Managers

The first step in building a package is to add some connection managers. Connection managers are used to integrate different data sources into your package and there are a wide variety of them available: ADO.NET Connection Manager, Excel Connection Manager, Flat File Connection Manager, WMI Connection Manager, and so on, each connecting to a different type of data.

To create a Connection Manager, you right-click anywhere in the Connection Managers area of a package in BIDS and choose the appropriate shortcut from the shortcut menu. Each Connection Manager has its own custom configuration dialog box with specific options that you need to fill out. Figure 2 shows what the Connection Managers area looks like with connections to a SQL Server 2005 database and a flat file.



Click here for a larger image.

Building Control Flows

The Control Flow tab of the Package Designer is where you tell SSIS what the package will do. You create your control flow by dragging and dropping items from the toolbox to the surface, and then dragging and dropping connections between the objects. The objects you can drop here break up into four different groups:

  • Tasks are things that SSIS can do, such as execute SQL statements or transfer objects from one SQL Server to another.
  • Maintenance Plan tasks are a special group of tasks that handle jobs such as checking database integrity and rebuilding indexes.
  • The Data Flow Task is a general purpose task for ETL (extract, transform, and load) operations on data. There's a separate design tab for building the details of a Data Flow Task.
  • Containers are objects that can hold a group of tasks.

The basic workflow on the Control Flow tab is pretty simple. You drag tasks from the Toolbox to the tab, and use the Properties window to set their properties. You drag connections between tasks to specify the order that they execute in. If you used the DTS designer in SQL Server 2000, this sort of thing should be pretty familiar. Figure 3 shows some Control Flow tasks for a package.

Control Flow tasks

Building Data Flows

The Data Flow tab of the Package Designer is where you specify the details of any Data Flow tasks that you've added on the Control Flow tab. As with Control Flows, you drag and drop things from the Toolbox to build Data Flows. There are three groups of objects that make up Data Flows:

  • Data Flow Sources are ways that data gets into the system. These are things like DataReaders, flat files, and XML files
  • Data Flow Transformations let you alter and manipulate the data in various ways. Here you can do lookups, joins, sorts, merges, and so on.
  • Data Flow Destinations are where the data goes when you're done with it - DataReaders, Recordsets, SQL Server databases, and other locations.

As with the Control Flows, you hook together Data Flows on the designer surface. Figure 4 shows what the Data Flows might look like for a small and simple package. Of course, in the real world you're liable to have a much more complex set of Data Flows than this.



Click here for a larger image.

But Wait, There's More!

SSIS also supports a complete event-handling system. You can attach event handlers to events of tasks or to the package itself. Event handlers are defined on the Event Handlers tab of the Package Designer. When you create an event handler, you handle the event by building an entire secondary SSIS package.

When you work in BIDS, your SSIS package is saved as an XML file (with the extension dtsx) directly in the normal Windows file system. But that's not the only option. Packages can also be saved in the msdb database in SQL Server itself, or in a special area of the file system called the Package Store.

Storing SSIS packages in the Package Store or the msdb database makes it easier to access and manage them from SQL Server's administrative and command-line tools without needing to have any knowledge of the physical layout of the server's hard drive.

BIDS also provides complete facilities for running and debugging SSIS packages, including single-stepping through them so you can see what's going on.

It's a Whole New World

By now, you probably get the idea: SSIS is both more complicated and more powerful than its ancestor DTS. Fortunately, BIDS is a much more powerful tool than the DTS Designer, making it possible to deal with the complexity of SSIS without getting lost in the details. The learning curve can be steep at first, with a large number of tasks available to choose from. But if you stick with it, you'll find that SSIS is an extremely useful tool for database automation tasks of all types.

About the Author

Mike Gunderloy is the Senior Technology Partner for Adaptive Strategy, a Washington State consulting firm. You can read more of Mike's work at his Larkware Web site, or contact him at MikeG1@larkfarm.com.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date