The primary focus of my book is on the details of
DTS implementation — how to use Data Transformation Services.
This article discusses some of the opportunities
for data transformation provided by DTS.
Upgrading a Database from Microsoft Access to SQL
Server
Microsoft has created an upsizing tool for moving a Microsoft Access database
to SQL Server. This tool creates the links from the Access application to the
SQL database.
However, sometimes people
will move a database from Microsoft Access (or some other database) to
SQL Server, where it’s not important to maintain the user interface that
was being used previously. In that case, the DTS Import/Export Wizard is a
very convenient tool for creating the new tables in SQL Server and importing
the data.
Consolidating Data from Multiple Sources
Companies typically have data in a variety of different OLTP and business
analysis systems. DTS is a tool that can connect to a variety of different types
of data sources and load that information into one central location. In some
cases, companies will use this consolidated data to replace the data stored in
diverse places. Other companies will continue to use their data in separate
locations, but will use their consolidated information for enterprise-wide
business analysis.
Data consolidation often also includes merging data obtained by the company
from outside sources. This outside data can present significant processing
challenges because it might not be in the same format that is used internally by
the company.
The consolidation of data from multiple sources often involves much more than
just copying data. There can be
-
Homogenization of data that uses different codes.
-
Verification of data that is contradictory or incorrect.
-
The removal of duplicate data where there is a partial overlap between
data sets. -
Aggregation or summarization of data.
-
Recalculation of values that are calculated in different ways by
different source systems. -
Connecting data that is related, but in which the relationship is missing
in the source data.
The DTS transformation tasks, with individual row processing and the ability
to look up values, can accomplish all of these tasks.
Initial and Periodic Loading of a Data Mart or a
Data Warehouse
For some companies, consolidated data will be used to build a data warehouse
and/or one or more data marts for the purpose of business analysis. In addition
to all the issues involved in consolidating data, there are these additional
data transformation issues:
-
Creation and initial loading of fact and dimension tables.
-
Periodic loading of additional facts into the fact table.
-
Periodic loading of new records into the dimension tables.
-
Updating dimension table records.
-
Creating and maintaining meta data for the business analysts.
-
Moving data from the data warehouse to the data marts.
-
Processing data mining models.
-
Processing server cubes and local cube files.
DTS has two tasks for processing business analysis information —
the Analysis Services Processing task and the Data
Mining Prediction task.
Reporting on Data from Transaction Processing
Systems
Existing On Line Transaction Processing (OLTP) systems usually have some
reporting capability, but the flexibility of these reports can be quite
limited:
-
Some of the significant information might not be available in the
reports. -
Some of the information might be aggregated already so that the detailed
information is not available. -
The reports might not be tied together very well with information from
other parts of the OLTP system or outside the system. -
The reports might not provide information for all significant time
periods.
It is often very difficult to modify existing systems to obtain the desired
reports. Fortunately, it may be possible to extract the information from the
OLTP system and load it into another database, such as SQL Server or Microsoft
Access, for reporting purposes. You can use DTS to access the data in one of two
ways:
-
If there is an OLE DB provider or an ODBC driver to the OLTP database,
you can connect using the provider or the driver and use that connection as the
source for a Transform Data task. -
You can export the data from the OLTP system to a text file, and use the
Bulk Insert task or the Transform Data task to load the data into another
database.
Building an Interface to the Web for a Legacy
System
Existing OLTP systems may not have the capability to present their data in a
way that can be accessed by a Web server. The data can be moved to SQL Server or
another database that has a built-in Web interface.
Archiving a Database
DTS can be used for extracting data from a database and building an archive
with that data. An archive is needed for two fundamental reasons:
-
The store of data in the OLTP system is getting too large for efficient
processing. After the older data is copied to an archive, that data can be
deleted from the OLTP system. -
The OLTP system may automatically summarize older data and delete the
details underlying that data. If the detailed data is not archived periodically,
it will be unavailable for business analysis purposes.
Analyzing Internet Clickstream Data
One of the most important uses of data transformation today is the processing
of Internet clickstream data. Companies want to analyze the use of their Web
sites. They often want to know the following:
-
How are people getting to the site?
-
What pages are they looking at?
-
What ads are they seeing?
-
How long are they spending on each page and on the site as a
whole? -
What path do people follow when moving through the site?
-
How many people are viewing the site?
-
Who’s viewing the site?
-
What is the demographic information for the people viewing the
site?
Web servers create a log that records all the requests received for files.
These logs contain some or all of the following information:
-
The identity of the requesting computer.
-
The time the request was made.
-
The text of the requesting line.
-
The status code returned to the client.
-
The number of bytes sent to the client.
-
The URL of the referring server.
-
The name and version of the browser being used by the client.
-
The time taken to service the request.
-
The URL or the resource requested.
-
The cookie on the client machine.
Web hosting companies sometimes process these logs and give the client
company a summary of the significant data. At other times, the raw log is the
only format in which the information is available.
A very significant amount of data transformation is needed to extract
valuable information out of the clickstream data. You could use the DTS tasks in
the following ways:
-
The FTP task moves the logs to a local server.
-
The Execute Process task starts a process that could do one or more of
the following — unzip the files, convert the files from binary to textual
format, and/or pre-aggregate the data. -
The Bulk Insert task loads the clickstream data from the text files into
SQL Server. The logs are often very large (1GB or larger), and the Bulk Insert
task can achieve the fastest speed for loading. -
The Execute SQL task does set-oriented processing of the data.
-
The Transform Data task does row-level processing of the data.
-
The Analysis Services task processes OLAP cubes that are based on the
data. -
The Send Mail task reports on the results of the import when it is
completed. -
ActiveX Script tasks, Message Queue tasks, Dynamic Property tasks, and
Execute Package tasks manage the data transformation process.
Importing and Exporting with XML
The Extensible Markup Language (XML) is a data-oriented Internet language
that will be a key technology for the exchange of data in the future. There are
many new features in SQL Server 2000 for working with XML, including the
following:
-
The Transact-SQL OpenXML function allows you to decompose XML documents.
The process of XML decomposition allows you to move XML data into a relational
database. -
The FOR XML clause in the SELECT statement allows you to create XML
documents from the recordset created by the SELECT statement. -
You can query a SQL Server database from a web page using an XML query.
SQL Server will return an XML document with the data. -
You can use XPath queries to map SQL Server tables to XML
structures. -
There are some new DTS features that make it easier to work with
XML documents: -
The ReadFile and WriteFile transformations give you a way to import and
export XML files as a part of your transformations. -
The Parallel Data Pump task allows you
to process the hierarchical recordsets that are used in XML.
An XML OLE DB provider is not provided with SQL Server 2000. However, they
are available from third-party sources. With an OLE DB provider you can use an
XML document as the source for a transformation task.
Conclusion
Data Transformation Services is a very powerful, versatile tool for moving
and manipulating data. It was a great tool in SQL Server 7.0, and it’s
greatly improved in SQL Server 2000.
About the Author
Timothy Peterson is the chief consultant at SDG
Computing, Inc., a company that specializes in data warehousing awith
Microsfot’s SQL Server tools. He teaches the official Microsoft Data Warehousing
course.
This article is brought to you by Sams Publishing
publisher of Timothy Peterson’s Microsoft SQL Server 2000 DTS book.
) Copyright Pearson Education. All rights reserved.