Developing a Custom SSIS Source Component

SSIS was designed to be extensible. Although you can create tasks that will take data from a wide variety of sources, transform the data is a number of ways and write the results a wide choice of destinations, using the components provided, there will always be occasions when you need to customise your own SSIS component. Yes, it is time to hone up your C# skills and cut some code, as Saurabh explains.

Overview

SQLServer Integration Services (SSIS) is the Microsoft platform for building enterprise-level data integration and data transformation solutions. Integration Services can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources, and then load the data into one or more destinations. Integration Services is so versatile because it has the ability to connect to disparate data sources built into its object model. In this article, we will look at how to extend this object model by building a custom data source, using a working example.

What is an SSIS component?

SSIS components are one of the basic building blocks of the SSIS framework, and can be found in the SSIS toolbox in SQLServer Data Tools. In this article we will be specifically looking at data flow components. These can read data from external data sources, transform it, and write it back to other data destinations. SQLServer Data Tools have some examples of data flow components, including the OLE DB Source, Lookup transformation and the Character Map transformation components.

Under the hood, an SSIS data flow component is a .Net class that inherits from the PipelineComponent class, which is part of the overall SSIS Object Model. It overrides the various methods of PipelineComponent , such as those for creating inputs/outputs, adding custom properties to the component etc. The compiled class library is placed in a subfolder in the SQLServer installation folder. SQLServer Data Tools automatically recognizes any class libraries in this folder that implement PipelineComponent , and makes the component available in the SSIS Toolbox.

Building the component class

We will start by first building a bare-bones data source component, and then deploying it so that you can use it in your Integration Services package. I’ve already mentioned that a data source component is used to read data from an external data source, but at this stage it won’t actually do anything. In the next stage, once we are satisfied that we can deploy it, we’ll get it doing something useful.

Before we get into the details of building the class, note that the following description assumes that you are using Visual Studio 2012 and SQLServer Data Tools 2012. If you are using an older version of either tool, you would have to adjust accordingly. I have tried to highlight specific tools between different versions wherever possible.

With that in mind, start by launching Visual Studio, and create a new class library project, as shown in Figure 1. You can name the project CustomSSISComponent . Make sure that you select .Net Framework 4 as the .net version. The code I have presented here uses C#, but you can write it just as easily in Visual Basic also.

2010-1-58f162db-02c3-42ee-ac2a-6ba0b9b8c

Rename the Class1.cs file in the project to CustomSSISComponent.cs . You will also need to reference the following assemblies in your project:

 

Assembly to import

Namespace to import

Can be found at

Microsoft.SqlServer.PipelineHost

Microsoft.SqlServer.Dts.Pipeline

Under the GAC_MSIL folder in the GAC

Microsoft.SqlServer.DTSPipelineWrap

Microsoft.SqlServer.Dts.Pipeline.Wrapper

Under the GAC_MSIL folder in the GAC

Microsoft.SqlServer.ManagedDTS

Microsoft.SqlServer.Dts.Runtime

Under the GAC_MSIL folder in the GAC

Microsoft.SqlServer.DTSRuntimeWrap

Microsoft.SqlServer.Dts.Runtime.Wrapper

Under the GAC_32 folder in the GAC

System.ServiceModel

System.ServiceModel

Part of the .Net framework.

*  Note that the GAC is at %windowsfolder%\Microsoft.Net\Assembly in .Net framework 4.0.

* Since we are building the component with .Net framework 4.0, make sure you refer to the assemblies at %windowsfolder%\Microsoft.Net\Assembly.

Modify the CustomSSISComponent.cs file to add the following class definition for a custom Integration services component:

This class inherits from PipelineComponent , which is the base class for all Integration Services data flow components. The DTSPipelineComponent attribute provides the name of the component to be shown in the SSIS Toolbox in SQLServer Data Tools, and also specifies the type of the component. We will delve into the details of the overridden methods later in this article. Since at this stage we have not provided implementations for any of the overridden methods, the component will not actually do anything. However, we can still deploy it and add it to the SSIS Toolbox.

Special considerations for VS 2012

If you are using Visual Studio 2012 to launch your SSIS packages, then follow these additional steps:

  • In solution explorer, right-click on the project name and select ‘Add‘ ‘New Item. In the dialog box that comes up, select “Resources File“, and click “Add”. This will add a new resource file to your project, called Resource1.resx , and launch the resource editor.
  • In the resource editor, click on “Add Resource” from the top menu, and select “Add New Icon” from the dropdown list.
  • Leave the name asIcon1 and click ‘Add‘. This will add a new icon file Icon1.ico under the “Resources” folder in your project.
  • In the class library file created earlier, replace the line


    with the line below :

  • In the solution explorer, right-click on your project name and select ‘Properties‘.
  • On the Application tab, click the “Icon and manifest” radio button, and select the Icon1.ico file from the drop-down list.
  • Save and rebuild the project.

Deploying the custom component

Follow these steps to deploy your custom SSIS component:-

  • Sign the assembly with a strong name. You can either do this after building the assembly, or at build time by providing the key/value file (.snk file) in the project properties. You can find details on signing an assembly on MSDN.
  • Copy the strongly-named assembly to {{ SQLServer Installation Folder}}\110\DTS\ PipelineComponents . On 32-bit machines, if your SQLServer installation is at the default path, this path translates to C:\Program Files\Microsoft SQLServer\110\DTS\ PipelineComponents. On 64-bit machines, the path is C:\Program Files (x86)\Microsoft SQLServer\110\DTS\ PipelineComponents

    If you are developing a component for SQLServer 2008, replace the folder 110 with 100 in the above path.

  • Next, launch the Visual Studio developer console, and install the strongly-named assembly to the GAC using the following command –


    The above uninstalls any previous versions of the assembly and reinstalls the newer version of the DLL.

    If you have multiple versions of the .Net framework installed on your computer, it is important that you use the correct version of gacutil.exe to add the file to the GAC. Since we developed the component using .Net framework 4.0, the gacutilversion should also be.Net framework 4.0.

  • Launch a new instance of SQLServer Data Tools, and create a blank Integration Services package.
  • Open the package in package designer, and click on the “Data Flow” tab (shown below). 2010-ca562cc9-e0ee-4892-91c6-8d7e73d448e
  • You should be able to see your custom data source component in the SSIS Toolbox. 2010-4adde20e-7ed0-414e-8506-e5063f2c68d

If everything was configured correctly, the custom component should automatically appear in the SSIS toolbox. If you don’t see it, try adding it manually, via ‘Tools‘ ‘Choose Toolbox ItemsBrowse to your component DLL‘.

Reading from a data source

Now that we are confident that we can build and deploy a custom Integration Services component, we will be creating a sample SSIS source custom component that reads from an RSS data feed. Conceptually, it is similar to the other source components that ship with Integration Services, such as the OLE DB Source or ADO .Net Source component, but the data source in this case is an RSS data feed.

An RSS feed is basically data served in a well-formed XML feed over the internet. I will not get into the format, structure etc. of RSS since that is outside the scope of this article. But let us take a quick look at the .Net helper classes that we will be using to read from an RSS source.

System.ServiceModel.SyndicationFeed
is the .Net class that represents an RSS feed (e.g. a blog feed, newspaper feed etc.). A SyndicationFeed consists of one or more SyndicationItem objects -examples of SyndicationItem are blog posts, news articles etc.
SyndicationItem
represents an individual RSS item. We will be accessing the following properties on the SyndicationItem :-

  • Title – The title of the feed item.
  • PublishDate – Publication date of the item
  • URL – the url for the item

The component will be reading data from the SyndicationItem objects, and the properties mentioned above (Title, PublishDate, URL), will be available in the SSIS data flow.

With this basic information, add the following method to your CustomSSISComponent class. This method reads syndication items from an RSS endpoint, and returns all the items for that feed.

Save the updated file – we will be adding more code in a little bit. The above method reads data from an RSS data source, and returns a datatable with all the data from the source.

Note: Although I have used RSS as a data source here, you can read data from a third party application, or even a cloud-based platform like Salesforce.com.

In the subsequent sections below, I will be covering the individual structures that have to be built when building a custom SSIS component. On a very high level, it involves the following –

  • Creating a custom property on the component for the URL of the RSS feed.
  • Create an output and add output columns to it. Remember that we are creating an “input” component that reads from an external data source. Consequently, we need to explicitly create the “output” on which the component can send data to the next component in the SSIS pipeline.
  • Send the data read from the external data source to the output.

Create component outputs and properties

All Integration Services components have 0 – N inputs and outputs. An input represents a connection point between two components. When an output of one component is connected to the input of another component, it establishes the flow of data between components. We need to explicitly create any inputs/outputs and custom properties for our component. The best place to do this is the overridden ProvideComponentProperties method. This method is called only once, when the component is added to the SSIS package designer. In the CustomSSISComponent.cs file created above, modify the method definition as below:-

The method is adding a custom property called RSS Path to store the URL of the RSS feed that the component will read from. Also, since we are developing a custom source component, we are also adding one output to the component.

Couple of things to note here:-

  • Though we have added an output, we have not added any output columns yet. We will be adding these later in the article.
  • The data type of the RSS Path custom property is determined the first time you assign a value to the property – in this case since the initial value was an empty string, the property data type is string.

Compile and re-deploy the component using the steps listed in the “Deploying the custom component” section above. Close SQLServer Data Tools and re-open the empty Integration Services package that you created above, and add a new Data Flow component. You should now be able to drag your custom component on the package designer, and also update the RSS Path custom property.

2010-1-104a9adf-a404-44bb-a0aa-bba9df2d6

Building output columns

In the previous section, we saw how to add an output to our custom Integration Services source component. The next step is to add columns to the output that we added earlier. There are two rules to keep in mind when adding output columns:-

  • The number of columns should match with that being read from the data source. Since we will be reading three fields from the RSS feed (Title, Publish Date, and URL), we need three output columns in the output.
  • The data types of the columns should be compatible with that being read from the data source.

Replace the SetComponentProperty method with the below:-

The goal of the AddOutputColumns ( ) method above is to identify the columns coming from the data source (GetRSSDataTable () ), and add the corresponding output columns to the Output. Here is a step by step description of what it is doing:

  • Read the data from the RSS data source using the GetRSSDataTable ( ) method that we defined earlier, and extract the schema of the resulting data table – we use the GetSchemaTable () method of the datatable to get the schema.
  • Each row in the schema describes one column in the datatable . For each such row, add an output column to the component output using ComponentMetaData­.OutputCollection[0]­.OutputColumnCollection­.New().
  • Next, we map the datatable column data type to a corresponding Integration Services data type. Integration Services has its own data type model, which is separate from the .Net data types. For e.g., a String data type in .Net can be a DT_STR or DT_WSTR type in Integration Services. Since output columns on Integration Services components use only Integration Services data types, we need to assign a compatible data type to the output column, depending on the type of data that will be flowing through it. We use the following two methods to map the column data type to a corresponding type in the Integration Services object model:-
    • DataRecordTypeToBufferType – given a managed data type (of type System.Type ), this method returns the corresponding Integration Services data type (of type Microsoft­.SQLServer­.Dts­.Runtime­.Wrapper­.Datatype ).
    • ConvertBufferDataTypeToFitManaged – accepts an Integration Services data type as argument, and returns the Integration Services data type that is appropriate for managed code.
  • The two-step approach shown in the code above is required to get the correct Integration Services data type for a managed type. The reason we have two steps is that the DataRecordTypeToBufferType does not always return a type that can be used – for e.g., for a Boolean data type, the DataRecordTypeToBufferType method returns the Integration Services type DT_ BOOL . However, the call to ConvertBufferDataTypeToFitManaged gets the correct type, which is DT_I4 .
  • In addition to identifying the correct Integration Services data type, your code will also have to set the precision, scale, length and codepage properties on the output column. All these properties, except codepage, are available in the datatable schema object. However, certain properties (e.g. length) may not be available for some columns, in this case, make sure you assign a default value (in this case, I’ve added a default length of 1000).
  • Next, we use a switch statement to assign the correct values for length, precision, scale and codepage. A complete list of requirements for all applicable data types can be found here. The switch is required because different data types have different requirements of length, precision and scale, as shown below:

    DataType

    Length

    Scale

    Precision

    CodePage

    DT_DECIMAL

    0

    Greater than 0 and less than or equal to 28.

    0

    0

    DT_CY

    0

    0

    0

    0

    DT_NUMERIC

    0

    Greater than 0 and less than or equal to 28, and less than Precision.

    Greater than or equal to 1 and less than or equal to 38.

    0

    DT_BYTES

    Greater than 0.

    0

    0

    0

    DT_STR

    Greater than 0 and less than 8000.

    0

    0

    Not 0, and a valid code page.

    DT_WSTR

    Greater than 0 and less than 4000.

    0

    0

    0

  • Finally, set the output column name to the column name from the data table, and call the SetDataTypeProperties method to set the precision, length, scale and codepage properties for the output column.

The AddOutputColumns ( ) method is invoked from the SetComponentProperty () method because I want to add output columns only when the “RSS Path” property is set to a legitimate RSS path. Alternatively, since we already know the three columns we will be reading from the RSS feed, we could also have added the output columns in the ProvideComponentProperties method itself.

At this stage, build and redeploy your custom component using the steps in the “Deploying your custom component" section above, and add it again to a test Integration Services package. You should be able to see the RSS Path custom property in the component properties – set the RSS Path property to http://rss.cnn.com/rss/cnn_topstories.rss – this will trigger the call to AddOutputColumns() .

Inspect your component properties, and you should be able to see the output columns on the Input and Output properties tab. Note how the data types for each column are Integration Services data types as a result of the type conversion that we discussed earlier.

2010-1-dae53895-6c5c-4665-ad54-0037812a2

Output column mapping

During package execution, a data source component extracts data from an external source and writes it to one or more similar data buffers. The data buffers are supplied by the Integration Services runtime, and are tabular structures with rows and columns. However, no direct mapping exists between the component output columns and the columns in the data buffers. To understand why, look at the sample package below:

2010-d41f5b9d-0f2a-4247-9851-edfd1f5ab03

In this sample package, we have an OLE DB Source that reads data – in this case, it reads a single column of data called OLEDB_Col . It is followed by a Character Map transformation that adds a new column called CMap_Col that converts the data in OLEDB_Col to all lower case letters. Finally, data is written to a file via the Flat File Destination component.

A single data buffer is created for all 3 components in the package – this is so that the Integration Services runtime does not have to incur the overhead of adding new columns during package execution for each component to the buffer. The downside of this is that the buffer has to accommodate the columns used by all the components – in this example, the buffer for OLE DB Source contains the column CMap_Col even though the OLE DB Source has nothing to do with it.

Because the columns in the buffer may be different than the output columns on the component, your component should always explicitly map its output columns to columns in the data buffer, so that it does not accidentally overwrite columns that are meant for other components. The best place to do this is in the PreExecute method, since it is called only once just prior to package execution. Modify the PreExecute method in the component code as below:-

In this method, we iterate through the output columns for our source component, and add the index of the corresponding column in the data buffer to the map – an array in this case. The key method here is BufferManager.FindColumnByLineageId , which finds the matching column in the data buffer based on the output column’s Lineage Id.

Sending rows to the output

The final step in building our component is to define the runtime method PrimeOutput , which is responsible for sending rows to the output buffers during package execution. Note that the PrimeOutput method is a runtime method, and is called only at package execution time. The process of reading data from the external data source, and writing it to the data buffers should be implemented in this method. Replace the blank PrimeOutput method in your implementation withwith the below method definition”:- to your component:-

  • The PrimeOutput method takes as parameters the total number of outputs on the component, an array with the IDs of all outputs, and an array of buffers which are used by these outputs.
  • Since we have only one output on our component, we locate the only output with the FindObjectById method.
  • Read data from the RSS feed into a datatable , and iterate through the rows.
  • For each row from the datatable , create a new row in the buffer and write its contents. Locate the corresponding column in the buffer using the mapOutputColsToBufferCols array created during PreExecute .
  • When all the rows have been written to the buffer, call the SetEndOfRowset method to signal that the component is done writing rows to the output.

When SetEndOfRowset is called, the Integration Services runtime flushes out any remaining rows in the buffer and moves processing to the next component in the package.

Deploy the component using the steps detailed earlier – you should now be able to use it just like any other data source component in an SSIS package. For example, in the below screenshot, I’m using the component to read data from an RSS data source and load it into a SQLServer table.

2010-imgD8.jpg

Conclusion

In this article, we built a custom Integration Services source component using the Integration Services object model. We also looked at some of the common design techniques that apply towards building custom Integration Services components, in addition to looking at the steps required to deploy the component. The most important aspect of custom components is that they give you fine grained control over the behavior of your component. Custom components are most useful in scenarios where you want to read data from unconventional data sources for which data source components are not supplied by SQLServer Data Tools, especially if you have the same data access logic repeated many times across different SSIS packages. I would encourage you to dig deeper into custom components and explore ways they can be helpful on the projects you work on.

The code to go with this article can be found, zipped, in the link at the head of this article.