Thursday, December 29, 2011

Developing a Custom Data Flow Component in SSIS


Introduction

SSIS is a great ETL tool and has been there now for a long time. But not always we get all the desired component that we need for transformation. For that we go ahead with Script component. But the disadvantage being that it is application specific. If we need to do the same kind of operation in other SSIS packages, then script component won't help. Instead we can go ahead with our own custom transformation component and even that can be reusable. This article will focus on building, deploying and using a custom script transformation component in a step by step manner

SSIS transformation components helps us to transform the source data to some desirable format. Though SSIS provides many transformation components, but, sometime we may need to perform certain actions for which there is no components available. In such cases custom components comes very handy. In this article we will look into how we can make a custom SSIS transformation component. The custom component is a simple email validator that will read the emails from some source and will let us know if it is a valid email or not. We will use C# as the language of choice.

Steps for Implementation 

Step 1:

To begin with, let us create a Class Library project (say DFCEmailValidator).

Step 2:

Once the project is created, let us add the following dll references in our project.

Microsoft.SqlServer.Dts.Design

Microsoft.SqlServer.DTSPipelineWrap

Microsoft.SqlServer.DTSRuntimeWrap

Microsoft.SqlServer.ManagedDTS

Microsoft.SqlServer.PipelineHost

Step 3:

Let us create a class by the name CustomEmailValidator.cs .Inherit the class from the PipelineComponent class and decorate it with DtsPipelineComponent attribute



Step 4:

Build and deploy the component into the Gac

Now, we should build the class library, generate a Strong name and then deploy it to the gac by using the command

C:\Program Files\Microsoft Visual Studio 9.0\VC>gacutil  -i "D:\DFCEmailValidator\bin\Debug\DFCEmailValidator.dll"



N.B.~ The dll path will vary... So change it accordingly

If successful we will get the message

Assembly successfully added to the cache


We can even verify this browsing to C:\WINDOWS\assembly where we will find the assembly by the name DFCEmailValidator


Step 5: Copy the dll and put into the PipeLineComponents folder

Once the previous step is over, next we can copy the dll to the PipelineComponents folder which will be in C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents.


Step 6:

Open BIDS. Go to the DataFlow editor’s toolbox. Right Click -> Choose Items.Visit the SSIS DataFlow Items tab and locate the CustomEmailValidator. Tick the checkbox option and click the OK button.


We can find that the custom transformation has been added to the toolbox


Step 7:

Drag and drop a Flat File Source and configure it. Let’s say we have a text file (Source.txt) whose content is as under

testmailwrong

testmail@test.com

invalid@invalid

valid@gmail.com

valid_123@yahoo.co.in

123@123@123@1123#456~xud.com

Next drag and drop our CustomEmailValidator and add the data flow path from Flat File Source to it. Double click on the CustomEmailValidator for opening the Editor.


The screen shows that it has three tabs viz Custom Properties, Input columns and Input Output Properties.

The first tab shows the Custom Properties that we set.

Visiting the Input Columns tab we find the available input columns and that needs to be selected. In this case it is Column0.


And in the InputOutput Properties, if we expand the Output columns , we can find that our Output column has been created automatically whose name is IsValidEmail_Column0.


Click Ok.Next drag and drop a Conditional Split and add the data flow path from the CustomEmailValidator to it. And Configure the Conditional Split component as under


Output Name
Condition
Success
[IsValidEmail_Column0] == TRUE
Failure
[IsValidEmail_Column0] == FALSE

Lastly add two flat file destination component one of whose source will be the success path of the Conditional path and the other being the Failure. The destination files are named respectively as ValidEmail.txt and InvalidEmail.txt.

The entire package is as under


Step 8:

Let us run the application and we will see the valid emails in ValidEmail.txt

testmail@test.com

valid@gmail.com

valid_123@yahoo.co.in

while the InvalidEmail.txt contains the content as

testmailwrong

invalid@invalid

123@123@123@1123#456~xud.com

No comments:

Post a Comment