CEP Breaking News, Articles, Feature Stories and Blog Posts

CEP on Ulitzer

Subscribe to CEP on Ulitzer: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get CEP on Ulitzer: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


cep Authors: Tony Shan, Liz McMillan, Bob Gourley, Charles Rich, Jyoti Bansal

Related Topics: CEP on Ulitzer

Microsoft SQL Server : Article

The Transformation of DTS in SQL Server 2005

Increasing flexibility with a new look and feel

Data Transformation Services (DTS) is a set of graphical tools that you can use to extract, transform, and load (ETL) data from many different sources to a single or multiple destinations. DTS was originally released with SQL Server 7.0 to provide a more manageable interface for doing ETL without resorting to batch file scripting. This gives the administrator or developer the ability to easily create complicated workflows for loading and processing data, transforming data for data mining, tracking lineage, and versioning the workflows with less code. DTS was incrementally improved for SQL Server 2000 by increasing logging abilities, save options, and the number of tasks that you could use to create a workflow. These improvements were great, but there were many problems with manageability, reliability, and extensibility that also required attention. Under SQL Server 2005, DTS underwent a radical transformation to make it into a much more flexible ETL engine for updating SQL Server content - the most radical change being that the new DTS designer uses the Visual Studio .NET shell.

New Look and Feel for the DTS Designer
We will start off with the most obvious differences - the look and feel have changed dramatically. The DTS designer has been created to work with the new suite of tools that exists for developing and administrating in SQL Server 2005, SQL Workbench, and Business Intelligence Workbench.

The new DTS designer can be hosted in the SQL Workbench, which is the new administrator tool, or in the Business Intelligence Workbench, which is the tool aimed at developers. Both of these tools are based on the Visual Studio .NET environment and should be familiar to VS.NET developers. While both of them enable you to design, execute, and schedule a DTS package, the BI Workbench supports the ability to work in SQL Server. The BI Workbench allows you to create a deployment package for the DTS package, which is one of the new features. It also lets you work with the package in an offline mode, which is great for companies that have security policies in place that don't allow direct access to the server, or if you need to make some tweaks to a package while traveling.

By providing a properties window like the one in Visual Studio .NET, the new DTS designer allows for easier property editing. A picture is worth a thousand words, so the best way to understand the updated designer is to use it in creating a package that extracts data from the northwind database, applies a transform that builds an employee code from the first five letters of the LastName field, converts the LastName field to uppercase, and writes it out to an SQL table.

You can start the DTS designer by launching either SQL Workbench or BI Workbench. Because we are developers, we will be using the BI Workbench. Start the BI Workbench by navigating to Start > Programs > Microsoft SQL Server > Business Intelligence Workbench. Once the BI Workbench has started, you can create a new project by choosing File > New > Project from the main menu to launch the Project Types dialog box. Click on the Business Intelligence Projects folder in the left-hand pane of the dialog box to show the Data Transformation templates. Choose the Data Transformation Project icon and type DTSTest for the name of the package as shown in Figure 1.

You can also choose the location in which to store the package in the Location text box. Click the OK button and you will have a solution that can hold the various projects you develop to support the ETL process. The solution creates a project for you that we will use to host our DTS package. A project is a collection of DTS packages, data sources, and other files that you generate during package development. You can create a new package by right-clicking on the project folder DTSTest and selecting Add > New Item. Select the New DTS Package from the New Item dialog box. Enter DTSTestPackage.dtsx for the name of the package and then click the Add button.

Now we need to specify the source of the data by right-clicking the Data Sources folder and selecting New Data Source to launch the Data Source Wizard. The data source can be shared among many packages within the same solution. Click Next to move the "Select how to define the connection" page and click on the New Connection... button to launch the Connection Manager dialog box. Enter localhost in the "Select or enter a server name" drop-down list. In the "Enter information to logon" section on to the server page, select "Use Windows NT Integrated security," and in the "Select the database on the server" drop-down list, select "localhost. northwind database."

Enter NorthwindDS as the name and click the Finish button to complete the configuration. Now that you have a data source, you can add connections to the data sources or point to the data source directly.

Now we need to create a connection for the package by clicking on the Connections tab at the bottom, selecting New OLE DB Connection, and choosing the data source we created above. Switch to the Control Flow editor by clicking the Control Flow tab.

The DTS designer is broken into four editors: Control Flow editor, Data Flow editor, Event Handlers editor, and Tree View editor. This is different from previous versions where all of the editors were rolled into one.

Control Flow editor is similar to the DTS 2000 designer. You lay out your components in the editor and drag the success or failure arrows to the next step in the flow of the package. The Control Flow task is where you move data into or out of SQL Server. For example, you can move data to XML, use FTP, or do bulk inserts from this area. You also use this area for looping and branching logic. Most of the controls are familiar to users of DTS 2000, except that there are some new controls that act as containers for additional tasks. They are the For Loop, For Each Loop, Sequence, and the Data Flow Task as shown in Figure 2.

Now we will add an Execute SQL Task and a Data Flow Task to the Control Flow design surface. Open up the toolbar and click on the Control Flow Items. Drag the Execute SQL Task and Data Flow Task to the design surface. Click and drag the Execute SQL Task's green arrow to the Data Flow Task to add the OnSuccess precedence constraint to the workflow. This is the path that the server takes if the Execute SQL Task is successful.

Note that the exclamation mark indicates a warning that the package won't execute. In this case, we need to create a connection for the Execute SQL Task object.

Double-click the Execute SQL Task object to open the Edit SQL Task dialog box. Choose the NorthwindDS from the Connection drop-down list. Click the ellipsis button next to the SQL Statement section. Type the following statement in the SQL editor dialog box:

IF EXISTS(Select * FROM sysobjects WHERE id = object_id(N'EmpCodeTemp') and
OBJECTPROPERTY(id, N'IsUserTable') = 1)
DROP TABLE EmpCodeTemp

CREATE TABLE dbo.EmpCodeTemp (EmpCode char(5),
FirstName varchar(30),
LastName varchar(30))

Click OK to accept the SQL statement and then click OK again to close down the Edit SQL Task dialog box. Now we will move onto the Data Flow Task object that we added earlier. Double-clicking on the Data Flow Task will open the Data Flow editor.

The Data Flow editor is where you build work flows that move data around SQL Server. This would include the source and destination tables as well as tasks like the copying or merging data, much like the Transformation Properties dialog box in SQL 2000 DTS.

You should notice that the toolbar reflects the Data Flow Items, which are the tasks used to transform data for manipulating strings or for use in OLAP or Data Mining applications.

Drag the OLE DB Source, Derived Column Transform, and the OLE DB Destination columns to the Data Flow design surface and link up the OnSuccess precedent constraint to each object by dragging the green arrow to the next object as shown in Figure 3.

Double-click the OLE DB Source object and set the Connection drop-down list to localhost.NorthwindDS and choose "User results from the following SQL command" and type the following SQL command in the editor to select the source data for the query:

SELECT FirstName, LastName FROM Employees

Click the Parse Query button to make sure it was typed correctly and then click the OK button to close the Source Properties dialog box. We now need to transform some of the data from the source to produce the information needed for the EmpCode field and to make the LastName field upper case.

Double-click the Derived Column Transform object and type the information in two rows of the Derived Column Properties dialog box, as displayed in Table 1.

Now double-click on the OLE DB Destination object and verify that the NorthwindDS is selected in the Connection drop-down list and select [dbo].[EmpCodeTemp] in the "Name of table or view" drop-down list. Make sure the field mappings are correct by clicking on the Mappings tab. Click the OK button to close the dialog box and save the package by selecting File > Save All.

Now that we have saved the package, we can look at a new feature called Pan & Scan that will allow us to look at the entire project and zoom in on the elements we are interested in quickly. You can click on the cross hairs that are located in the space between the horizontal and vertical scroll bars as shown in Figure 4.

You can execute the package to verify that it runs correctly by right-clicking the package and choosing Execute Package. If you have the Data Flow tab still selected, you will notice that the objects change color as the package executes. The gray objects are waiting to execute, the yellow objects are currently executing, the green objects indicate successful executing, and red objects indicate that this step failed.

Another improvement is the ability to capture the output or supply input to an Execute Process Task. One thing that I did often in SQL Server 2000 DTS was execute processes, which allowed me to extend the system without writing custom tasks. The problem in the past was that I could only analyze the return code of the executable, which was often not granular enough for determining the next step to execute. The Execute Process Task in SQL Server 2005 allows us to redirect Standard Input, Standard Output, and Standard Error to a package or container variable as shown in Figure 5. This will allow you to capture more detailed information when you execute a process from DTS.

The Event Handlers editor lets you include event processing while the DTS runtime is processing your package. You can add tasks graphically to the design surface for events like OnError or OnPostExecute just like you do when building a package or creating a Data Flow task. The following is a list of the events found in SQL Server 2005 DTS.

  • OnError
  • OnPreExecute
  • OnPostExecute
  • OnPreValidate
  • OnPostValidate
  • OnProgress
  • OnQueryCancel
  • OnTaskFailed
  • OnVariableValueChange
  • OnWarning
  • OnCustomEvent
  • OnExecStatusChanged
The Tree View editor shows you all of the pieces of your package (see Figure 6) and includes all of the tasks. This is the location where you can manage your packages more easily than in the SQL Server 2000 DTS. For example, you can click on any of the objects in the Tree View area and get a list of its properties through a property sheet, which is the same as in Visual Studio .NET.

Debugging of DTS Packages
Both SQL Workbench and BI Workbench support debugging DTS packages. You can insert break points, step through the package, and inspect values to aid in determining why the package will not execute. They use the same debugger that you have in Visual Studio .NET, so developers should be familiar with the interface. You can also debug packages through DTS logging.

The logging facility in DTS is much more detailed than in previous versions and can be applied with different settings to each package or task, instead of the same settings to all tasks. DTS logging also takes advantage of logging information for any of the above events. This means that you can log when a variable changes, in addition to failures and warnings. You can also choose to log information about when the event occurred, who was running the package, what message occurred, what data was in the variables, the elapsed time, etc. Once you configure the logging mechanisms, they are saved to an XML file that you can use to load the settings. Often you will want different log settings supported in development, testing, and production and saving them to a file for each environment will support this scenario. All you have to do is right-click on the Control Flow editor and choose Logging... to configure logging as shown in Figure 7.

New Wizards in SQL Server 2005 DTS
Basically, you have three wizards in SQL Server 2005 DTS. The Import/Export Wizard is probably the most popular feature of SQL Server 2000 DTS and is still in the new version of DTS, but it has improved logging. The Package Installer Wizard aids you in building a deployment package for your DTS package that we will discuss later. This leaves us with the third and most interesting wizard, the DTS Configuration Wizard.

You can use this wizard to create configurations for DTS. A configuration gives you the ability to configure package variables and properties at run time. This is similar to dynamic properties that are available in SQL Server 2000 DTS, but they can be stored in an XML file that is not part of the package. This means that you can change values for properties or variables without having access to the DTS package itself or omit the XML file from some program prior to executing the package. You could use this to change the server names for development, testing, and production without touching the source DTS package. All of the values will be assigned to the package variables and properties before the package runs. You can have more than one configuration assigned to a package. The list of configurations is applied from top to bottom, which means that the last configuration will assign the value to any variable or property that is contained in other configurations.

New DTS Installer for Deploying Packages
The DTS Installer facilitates deployment of your packages once you have them developed. It will take care of including all the necessary files and will create an installer wizard that will walk you (or more likely, your DBA) through the process of installing the DTS package. You create an install package by right-clicking on the project in the BI Workspace and choosing Properties. You set the CreateDeploymentUtility property to True and set the DeploymentOutputPath to the desired path for the creation of the necessary deployment files as in Figure 8.

You then rebuild the project by selecting Tools > Build from the main menu. If you navigate to the DeploymentOutputPath directory, you should see the following files.

  • DTS package files: Contain the binary representation of your graphical workflow that will be executed by the DTS runtime
  • DTSInstall.exe: Used to start the installation wizard for a DTS package DTSDeployment Utility.msi and contains the files needed to start the installation of the DTS utility
  • DTSDeploymentManifest.xml: The configuration of the information for the DTSDeployment msi
Double-click on the DTSInstall.- exe file to start the deployment wizard. You will be instructed to select the location in which to install the package - in a folder on the file system or into SQL Server 2005. You would typically install the package into the file system, unless you need to share the package between SQL Servers, which will require you to install it in SQL Server.

Create a Custom Task Using Any .NET Language
Following the theme of complete .NET integration with SQL Server 2005, you can create custom tasks and transformations with any .NET language. This makes it easier to write a task that can be reused and debugged without having to resort to using the Active Scripting task as in previous versions or creating a COM object that was difficult to stabilize.

Reliability Features in DTS
Microsoft has introduced enhanced data and process reliability features with SQL Server 2005 DTS. I would like to focus on two of the features that allow you to recover data and process execution, Checkpoints and Multicast Save Points. The checkpoints are logical stopping points that save the state of the executing package. They allow you to reliably restart a package after failure from the last place it was saved. This means that you do not have to start a complicated build from the beginning. You can start the package execution from one of the checkpoints set in your package. For example, you can set a logical checkpoint in a large job, so that the whole job won't fail on a single error, but you can restart the job at the point of failure or at the last point it makes logical sense. The logical checkpoints act as recovery points.

A checkpoint can be enabled by setting properties on the package and in the Control Flow editor. At the package level, you can set the SaveCheckpoints property to enable checkpoints for the package. You would also need to set the CheckpointFileName to the XML file that you want to persist the package state. You would then make sure that you set task's or Control Flow editor's FailPackageOnFailure or FailParentOnFailure is set to True to persist the state information on failure. You would then set the CheckPointUsage property to tell the package to ignore the checkpoint file or to use it if it exists (IfExists value) to restart the package execution. You can also store data after transformation and use it to recover a package without waiting for the transformation to recur by using Multicast Save Points.

Multicast Save Points allow you to write transformed data to two locations at once without experience the performance issues of running two sets of tasks in parallel. You can choose to store the transformed data in a file or an SQL Server table. You can then use the data stored in this location to restart the execution of a DTS package that has failed without waiting for the whole package to execute again. This feature works well with the logical checkpoints.

Migrating Packages from SQL Server 2000 DTS
Before you worry about recreating your packages for SQL Server 2005, the SQL Server 2000 DTS engine is included in SQL Server 2005, so your packages should run just fine without upgrades. You can even deploy these "legacy" DTS packages onto SQL Server 2005 and run them by using the new task called ExecuteDTS2000 Package. You may be able to upgrade the package to the SQL Server 2005 version of the package by using the DTS Package Migration Wizard. As with all migration tools, your results may not be desirable if you have a very complex package that includes a large number of scripts.

Summary
The new SQL Server 2005 DTS designer introduces many improvements that enable you to create reliable transforms that are easier to manage. It also adds many new transformations to make creating OLAP cubes and data mining easier, such as measuring the effectiveness of mining models or training mining models. The new features truly make DTS an enterprise class ETL tool.

References

  • A Practical Guide to SQL Server Yukon Beta 1 DTS: www.microsoft.com/technet/
  • Introducing SQL Server 2005, Code-Named "Yukon": www.microsoft.com/sql/yukon/productinfo/default.asp
  • Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.