Microsoft SQL Server 2005 provides a completely new enterprise extraction, transformation, and loading (ETL) platform called SQL Server Integration Services (SSIS) that ships with the features, tools, and functionality to build both classic and innovative kinds of ETL-based applications. This article examines some of the exciting SSIS features that you can use to build ETL applications. Along the way, you will also learn how to build a simple package using the new Business Intelligence Development Studio, which is a key component of SSIS features.
A History Lesson
Back in the days when Microsoft SQL Server 6.5 was still a hot product, the bulk copy program was the only way to back up or export databases from SQL Server to other data sources, such as other database servers or text files. However, in subsequent SQL Server versions, Microsoft added a large number of features that were aimed at improving the productivity of SQL Server developers and DBAs. One such feature is DTS (Data Transformation Services). By creating DTS packages, you can combine several tasks into one process and use any programming language that supports automation (such as Visual Basic.NET, Visual C#, or Managed C++) to execute these packages and monitor their progress for errors.
With the upcoming SQL Server 2005 release, Microsoft has raised the bar by introducing SSIS, the brand new ETL tool. Similar to DTS, it provides functions for moving data from one place to another and manipulating that data at run time. However, SSIS has been completely redesigned from scratch to be an enterprise ETL platform. SSIS provides the breadth of features—and very high-level performance—necessary to build enterprise-class ETL applications.
SSIS is fully programmable, embeddable, and extensible, which makes it an ideal ETL platform. It has a great development environment hosted in a Visual Studio shell with cool capabilities for building workflows and pipelines through a rich set of pre-built or custom components. SQL Server 2005 ships with the following development and management environments for designing and managing packages:
- The SQL Server Management Studio is an environment for managing the storage and execution of deployed packages. It has special features for doing this, including integration with the DTS Service, and the ability to enumerate packages on remote servers. But it is not a design environment.
- The Business Intelligence Design Studio is an environment for designing packages, organizing them in Solutions and Projects, debugging them, and managing source and version control for multi-user projects.
These two environments offer powerful facilities for deploying, debugging, and monitoring deployed packages. In addition, SQL Server 2005 provides rich workflow capabilities that you can use for performing sophisticated data manipulations.
There are too many SSIS features to cover in a single article, so this piece highlights some of the important ones and then moves on to creating a simple package with Business Intelligence Development Studio. The following are the most notable features:
- You can use SSIS to transfer millions of rows of data to and from heterogeneous data sources, but SSIS functionality doesn’t stop there. The tool leverages the end-to-end BI suite by offering complete data integration, movement, and shaping, which means that SSIS provides data cleansing, extensibility, and interoperability.
- SSIS comes with a lot of pre-built data-cleansing functionality, including completely integrated functions, such as fuzzy matching and fuzzy grouping, that use algorithms to match or group disparate data to a configurable degree of accuracy.
- SSIS offers broad extensibility points for third-party component vendors, meaning that if functionality is not available out of the box, SSIS lets you build your own components or add a third-party component to solve your problem.
- SSIS can load data directly into Analysis Services cubes, and it also offers robust data-mining features for including scalable data-mining model creation, training, and predictions.
- SSIS is seamlessly integrated with SQL Server Reporting Services integration, which lets you treat an SSIS package as the data source for reporting.
- SSIS also has greatly improved performance and scalability that allow you to host complex, high-volume ETL applications on lightweight servers, enabling you to scale down.
- SSIS can also help reduce ETL data staging areas and help minimize performance costs associated with data staging (disk I/O and serial processing). This is made possible by the ability to perform complex data transformations, data cleansing, and high-volume lookups—all inline from source to destination.
- SSIS also provides a new feature, the Slowly Changing Dimension (SCD) wizard. Through the SCD interface, you can rapidly generate all the steps and required code to add unique handling of history to multiple attributes in a given dimension.
- The development environment for SSIS, known as Business Intelligence Development Studio, is hosted in Visual Studio, enabling scripting and other programming tasks that take advantage of that enterprise development environment.
- SSIS now fully supports the Microsoft .NET Framework, allowing developers to program SSIS in their choice of .NET-compliant languages, as well as native code.
- The Data Transformation run-time engine is exposed both as a native COM object model and as an entirely managed object model. Although the Data Transformation engine is written in native code, it is available though a signed Primary Interop Assembly (PIA) that enables full, managed access to it.
Now that you have a general understanding of the SSIS features, consider an example that shows the steps involved in constructing and executing a package using the Business Intelligence Development Studio environment.
Creating a Simple Package
Consider a simple example wherein you transfer data from one table to another table by executing a stored procedure. For each row in the source table, you will invoke the stored procedure to load the data into the destination table. (Click here to download the accompanying source code for the demo.)
Before looking at the package design, create the required tables and the stored procedure. You will create these database objects in the AdventureWorks database, which is one of the sample databases that ships with SQL Server 2005.
To start, create the source table using the following definition:
CREATE TABLE [dbo].[EmpName]( [FirstName] [varchar](50) NOT NULL, [LastName] [varchar](50) NOT NULL ) ON [PRIMARY]
The destination table definition is as follows:
CREATE TABLE [dbo].[Emp]( [EmpID] [int] IDENTITY(1,1) NOT NULL, [EmpName] [varchar](100) NOT NULL ) ON [PRIMARY]
The stored procedure to load the data into the destination table is as follows:
CREATE PROCEDURE [dbo].[InsertEmp] @FirstName [varchar](50), @LastName [varchar](50) WITH EXECUTE AS CALLER AS Insert into Emp(EmpName) Values (@LastName + ', ' + @FirstName) GO
Using the package, you will retrieve the FirstName and LastName columns from the EmpName table and load them into the Emp Table through the InsertEmp stored procedure. Inside the stored procedure, you will concatenate the FirstName and LastName columns and assign the concatenated value to the EmpName column of the Emp table. Now that you understand the package functionality, take a look at the design of the package, as shown in the next section.
Creating a Package Using Business Intelligence Studio
Before creating the package, you first need to create a project that can host the package. Open Business Intelligence Development Studio from the Start->Programs menu. Select File->New Project from the menu and you will see the dialog box in Figure 1.
Figure 1: New Project Dialog Box
As Figure 1 shows, select Integration Services Project as the project template and specify the name of the project as SSISDemo. Once the project is created, you will see the package designer window of the default package Package.dtsx.
For the requirements you’ve outlined, the package you are going to create is very simple and straightforward. It consists of one Data Flow task with two components inside the data flow: an OLE DB Source adapter and an OLE DB Command Transformation. To start, bring up the toolbox by selecting View->Toolbox from the menu. You will see the dialog box in Figure 2.
Figure 2: Toolbox Dialog Box
As you can see, the toolbox offers a number of options that you can leverage to construct your packages. Drag and drop a Data Flow Task from the toolbar onto the package designer. Next, double-click on the Data Flow Task and you will see the designer window in Figure 3, which you can use to build sophisticated data flows.
Figure 3: Data Flow Task Designer Window
Now, bring up the toolbox again by selecting View->Toolbox from the menu. From the toolbox, drag and drop an OLE DB Source onto the designer and double-click on the object to bring up the window in Figure 4. The window allows you to specify the data source as well as the name of the source table to use for transformation.
Figure 4: Designer View of OLE DB Source
Click on OK in the Figure 4 dialog box.
Now, drag and drop an OLE DB Command object onto the designer and connect the OLE DB Source object to the OLE DB Command object. Double-click on the OLE DB Command object to bring up its properties window. Modify the Connection Managers tab of the properties window to look like Figure 5.
Figure 5: Connection Managers Tab of the OLE DB Command Object’s Properties Window
Navigate to the Component Properties tab and specify the stored procedure details in the SqlCommand properties text box as Figure 6 shows.
Figure 6: Stored Procedure Details in the SqlCommand Properties Text Box
As you can see from Figure 6, the InsertEmp procedure is used to load the data into the Emp table and the parameters to that procedure will be supplied at run time. The mapping between the source table columns and the stored procedure parameters is what you will specify in the Column Mapping tab. Note that in the Component Properties tab; once you specify the SqlCommand property value and hit the Refresh button, the title of the Input Columns tab in the Figure 6 dialog box automatically changes to Column Mappings.
In the Column Mappings tab, you can either drag a column (between the input columns and destination columns) or you can use the dropdown list to select the corresponding input and output columns. Once you perform the mappings, a window like Figure 7 appears.
Figure 7: The Resulting Mappings from Column Mappings Tab
Click OK in the Column Mappings dialog box. That’s all there is to creating a package with the new package designer. Now, you are ready to execute the package. Just press F5, and you will see the output in Figure 8.
Figure 8: Output of a Successful Package Execution
If the package executes successfully, you will see an output that is somewhat similar to Figure 8, which shows the objects on a green background. The above screenshot also shows the number of rows transferred between the source and destination tables.
A Solid Foundation for Creating Packages
With the release of SQL Server Integration Services, Microsoft now has a powerful ETL tool that is not only enterprise class but can also go a long way in increasing the productivity of developers. Its feature set makes it extremely easy and seamless to build sophisticated, high-performance ETL applications. This article has only scratched the surface of what is possible with SSIS, but the simple example it gave should provide a solid foundation for creating packages with SSIS. For more information, check out the following Web sites:
Download the Code
To download the accompanying source code for the demo, click here.
About the Author
Thiru Thangarathinam has six years of experience in architecting, designing, developing, and implementing applications using object-oriented application development methodologies. He also possesses a thorough understanding of the software life cycle (design, development, and testing). He holds several certifications, including MCAD for .NET, MCSD, and MCP. Thiru is an expert with ASP.NET, .NET Framework, Visual C# .NET, Visual Basic .NET, ADO.NET, XML Web services, and .NET Remoting. Thiru also has authored numerous books and articles. Contact him at [email protected]