Databases and reports are the flesh and bones of every business intelligence (BI) solution. With internal company database and reporting standards, often the toughest platform decision is selecting a tool to populate the reporting database. Typically, developers use an Extract, Transact, and Load (ETL) tool like SQL Server Integration Services (SSIS) to populate data for many BI solutions. However, some situations are better suited for an integration tool such as BizTalk 2006.
This two-part article shows how to decide whether BizTalk should be part of your BI data-loading solution. The first part explains how my organization decided on BizTalk as our loading solution. The second part uses a sample application to demonstrate techniques you can use to architect a BizTalk BI data-loading solution.
The BI Project Profile
My organization’s BI project involves implementing an application service provider (ASP) training and certification management system. One of the implementation requirements is to build reports that aggregate training and certification data with other enterprise data using a data warehousing solution. To fulfill this requirement, we needed to feed training and certification data from the ASP and load that data into a datamart for reporting. We also needed to receive daily changes from the ASP. We anticipate the eventual need to receive changes multiple times in a day as well. Most important, we want to give business users the capability to handle a large portion of the exceptions.
Picking the BizTalk Platform
Exception handling is important for any development project, but exceptions can kill an integration project. Whereas single solutions often are confined to few servers and a single application, integration solutions span servers, networks, and applications. As a result, integration exceptions can percolate at any juncture in an integration solution. Oftentimes, exception resolution requires business knowledge as well as technical expertise. So, for example, although we worked with the vendor to define our database and a shared schema (more about this later), the vendor’s application can change and how we use the application can change. Such changes oftentimes result in exceptions.
This is largely why we chose BizTalk for the project. BizTalk integration with SharePoint/InfoPath and failed message routing provides the platform to resolve the exceptions. Other reasons for choosing BizTalk were the small amount of data and the BizTalk adapters’ capability to safely transact the data.
Once we selected BizTalk, we had to decide on a data exchange format most suitable to BizTalk.
A Single XML File
In our ASP data exchange solution, we considered using a Web service or even a set of flat files, but we opted to use secure FTP and an XML file instead. A single XML file made sense for the following reasons:
- Multiple files would have complicated the file acquisition portion of the loading process. For example, does a missing file mean something failed on the ASP side, or no updates were made to the ASP? XML was a better receptacle for changes to varied sections of the ASP application.
- The ASP allowed for user-defined fields. Thus, the data may include information we may initially ignore but use later. XML afforded the flexibility to make the change and filter the unused data.
- XML also allowed for some simple data type validation.
- XML can be viewed using tools such as InfoPath, making it easier for non-systems people to view data and administrate the application.
As we thought about how we would be using the reporting data, we realized we needed to make some assumptions.
Major Loading Assumptions
The ASP is based on a relational database design, thus the data is interdependent. For example, employee information may be dependent on organizational data, so loading to a staging area allowed us to do things like update the employee information after the organizational information was entered (more detail on this later in the article).
Working with a staging area required two things:
- A unique identifier for the XML document (DocID)
- Chronological information encapsulated in a document sequence number (DocSequence)
Lastly, we needed to handle deletes as well as updates and inserts. We captured this information in a Status field.
That covers how we evaluated a BizTalk loading solution. Using a sample application, I’ll explain how to architect the BizTalk solution.
Sample Application Overview
The sample application was built to validate a BizTalk loading architecture. As mentioned earlier, we realized that BizTalk would handle exceptions and workflow involved in the loading process. We wanted to verify that a BizTalk solution could be leveraged with minimal coding. So, the sample focuses on the data-loading portion of the solution.
Figure 1 shows what the flow of the sample might look like.
Figure 1. The Flow of the Sample Application
The sample is composed of two pieces:
- A BizTalk Orchestration that parses and transmits the organizational data one record at a time
- SQL Server 2005 stored procedures that update the organizational record in the database, first by writing the data to a staging area in the database and then by moving the data into the reporting section of the database
Now, look at the sample is more detail.
BizTalk SQL Adapter
Our target database platform is SQL Server, so we decided to build the sample around the capabilities of the SQL Adapter. Figure 2 shows the configuration information for a SQL Adapter Send Port.
Figure 2. Configuration for a SQL Adapter Send Port
As you can see, the port is configured for a particular schema. We wanted to construct something more generic and reusable. So, we added two additional layers of indirection:
- A .NET component that generates the TSQL statement to add the data to the database
- A stored procedure to execute the TSQL code passed to it by the SQL Adapter
The .NET component reads the configuration information and the XmlDocument containing the data passed to it. Then, it generates the appropriate stored procedure code:
string frmtValue = “”;
if (frmType == SPToXMLFormatType.AlphaNumeric)
frmtValue = “‘” + nodeValue + “‘”;
if (nodeValue == “”)
frmtValue = “NULL”;
frmtValue = nodeValue;
In the following code, the stored procedure invokes the TSQL statement passed to it:
DECLARE @exeStmt nvarchar(4000)
SET @exeStmt = ‘EXEC ‘ + @pName + ‘ ‘ + @Parms
exec sp_executesql @exeStmt
All the loading complexity has been pushed down to the stored procedures that write data from the staging area.
Loading from Staging
Loading from staging occurs in two stored procedures. The first procedure makes data part of staging, and the second moves the data from staging to the reporting database. The write to staging simply does an insert; the real complexity is in the moving from staging.
The following TSQL statements are key in the write from staging:
FROM Org_Stage Stage
Org.[OrgID] = Stage.[OrgID]
AND Stage.DocID = @DocID
(SELECT * FROM [Org] C WHERE Stage.[OrgID] = C.[OrgID])
FROM [Org] O INNER JOIN dbo.ModifyRecordMaster mst
ON O.ModifyRecordID = mst.ModifyRecordID
WHERE Org.[OrgID] = O.[OrgID]
) < @DocSequence
INSERT INTO [Org]
FROM [Org_Stage] Stage
WHERE Stage.DocID = @DocID
AND NOT EXISTS (SELECT * FROM [Org] C
WHERE Stage.[OrgID] = C.[OrgID])
We assume inserts and updates originate from a particular file. Therefore, instead of inserting and updating everything, we constrained the stored procedure to data originating in the file being loaded. This decision streamlines the performance and limits potential concurrency issues.
As you can see, if the data does not exist, it is inserted in a straightforward manner. Updating is more complicated. Updates must consider whether the data updated in the database is more recent than the data being loaded. Data is updated only if the loading data is more current.
That covers the sample application and explains design considerations. Now, how can you extend the sample to build your own solution?
Extending the Sample Application
Any non-trivial solution equivalent to the sample application seems to require a prodigious amount of TSQL coding. However, the developer need not do the coding because the stored procedures all follow a similar pattern. By leveraging a tool such as CodeSmith, you can generate the SQL code.
Using the .NET component to generate the stored procedure also lends itself well to a more declarative programming model. You can drive the orchestration from configuration information and therefore implement the loading using a simple orchestration and a couple of loops.
Be sure to use transactions in your stored procedure development. Also, consider concurrency issues. Although the BizTalk SQL Adapter will detect concurrency issues, consider adding short delays in the orchestration processing to allow SQL Server to free up resources.
Listen for BizTalk
Even though you may have company standards for reporting and other BI solutions, the loading tool is open to a lot of variation. When you build a loading solution for BI reporting and your exception handling requires human business knowledge along with technical expertise, consider BizTalk.
I used the following blogs to define the solutions above:
- “BizTalk 2004 -> Splitter Pattern Using a Map and Orchestration”
- “Looping Around Message Elements”
- “Using a BizTalk Orchestration with XPath and Envelope Debatching Lab”
Download the Code
Download the code for this article.
About the Author
Jeffrey Juday is a software developer with Crowe Chizek in South Bend, Indiana. He has been developing software with Microsoft tools for more than 12 years in a variety of industries. Jeff currently builds solutions using BizTalk 2004, ASP.NET, SharePoint, and SQL Server 2000. You can reach Jeff at [email protected].