Getting Started with SQL Server Integration Services
You've no doubt heard of SQL Server Integration Services (SSIS) by now, and probably even know that it's the SQL Server 2005 replacement for the older Data Transformation Services. Microsoft calls SSIS "a platform for building high performance data integration solutions, including extraction, transformation, and load (ETL) packages for data warehousing." A simpler way to think of SSIS is that it's the solution for automating SQL Server. But it's also one of the more complex parts of the latest version of SQL Server, and it can be tough to get started with it. In this article, I'm going to give you the whirlwind SSIS tour. This won't be enough to make you an expert, or even to leave you building your own SSIS packages without referring to the documentation. But it should help you get oriented in this complicated new world.
Packages and BIDS
The basic organizational concept of SSIS is the package. A package is a collection of SSIS objects including:
- Connections to data sources.
- Data flows, which include the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations.
- Control flows, which include tasks and containers that execute when the package runs. You can organize tasks in sequences and in loops.
- Event handlers, which are workflows that runs in response to the events raised by a package, task, or container.
To work with SSIS packages you use BIDS - not the sort of bids you find in an auction, but Business Intelligence Development Studio, a version of Microsoft Visual Studio that's customized for (and shipped with) SQL Server 2005. You launch it from the SQL Server 2005 Program Manager group, and if you do that and select File, New, Project you'll find that you can create an Integration Services Project using a template. Figure 1 shows a brand new Integration Services Project just waiting for its objects.
The first step in building a package is to add some connection managers. Connection managers are used to integrate different data sources into your package and there are a wide variety of them available: ADO.NET Connection Manager, Excel Connection Manager, Flat File Connection Manager, WMI Connection Manager, and so on, each connecting to a different type of data.
To create a Connection Manager, you right-click anywhere in the Connection Managers area of a package in BIDS and choose the appropriate shortcut from the shortcut menu. Each Connection Manager has its own custom configuration dialog box with specific options that you need to fill out. Figure 2 shows what the Connection Managers area looks like with connections to a SQL Server 2005 database and a flat file.
Building Control Flows
The Control Flow tab of the Package Designer is where you tell SSIS what the package will do. You create your control flow by dragging and dropping items from the toolbox to the surface, and then dragging and dropping connections between the objects. The objects you can drop here break up into four different groups:
- Tasks are things that SSIS can do, such as execute SQL statements or transfer objects from one SQL Server to another.
- Maintenance Plan tasks are a special group of tasks that handle jobs such as checking database integrity and rebuilding indexes.
- The Data Flow Task is a general purpose task for ETL (extract, transform, and load) operations on data. There's a separate design tab for building the details of a Data Flow Task.
- Containers are objects that can hold a group of tasks.
The basic workflow on the Control Flow tab is pretty simple. You drag tasks from the Toolbox to the tab, and use the Properties window to set their properties. You drag connections between tasks to specify the order that they execute in. If you used the DTS designer in SQL Server 2000, this sort of thing should be pretty familiar. Figure 3 shows some Control Flow tasks for a package.