{"id":1712,"date":"2013-10-16T00:00:00","date_gmt":"2013-10-16T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/ssis-basics-using-the-merge-join-transformation\/"},"modified":"2021-06-03T16:45:24","modified_gmt":"2021-06-03T16:45:24","slug":"ssis-basics-using-the-merge-join-transformation","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/ssis-basics-using-the-merge-join-transformation\/","title":{"rendered":"SSIS Basics: Using the Merge Join Transformation"},"content":{"rendered":"<div class=\"article-content\">\n<p class=\"start\"> \tIn previous articles in this SQL Server Integration Services (SSIS) series, we included several example SSIS packages that used the <code>Data Flow<\/code> task to retrieve data from a SQL Server databases and load that data into destination tables or files. Each data flow included at least one transformation component that somehow modified the data in order to prepare it for the destination. <\/p>\n<p> \tIn this article, we&#8217;ll look at another type of component: the <code>Merge Join<\/code> transformation. The <code>Merge Join<\/code> transformation lets us join data from more than one data source, such as relational databases or text files, into a single data flow that can then be inserted into a destination such as a SQL Server database table, Excel spreadsheet, text file, or other destination type. The <code>Merge Join<\/code> transformation is similar to performing a join in a Transact-SQL statement. However, by using SSIS, you can pull data from different source types. In addition, much of the work is performed in-memory, which can benefit performance under certain condition. <\/p>\n<p> \tIn this article, I&#8217;ll show you how to use the <code>Merge Join<\/code> transformation to join two tables from two databases into one data flow whose destination is a single table. Note, however, that although I retrieve data from the databases on a single instance of SQL Server, it&#8217;s certainly possible to retrieve data from different servers; simply adjust your connection settings as appropriate. <\/p>\n<p> \tYou can also use the <code>Merge Join<\/code> transformation to join data that you retrieve from Excel spreadsheets, text or comma-separated values (CSV) files, database tables, or other sources. However, each source that you join must include one or more columns that link the data in that source to the other source. For example, you might want to return product information from one source and manufacturer information from another source. To join this data, the product data will likely include an identifier, such as a manufacturer ID, that can be linked to a similar identifier in the manufacturer data, comparable to the way a foreign key relationship works between two tables. In this way, associated with each product is a manufacturer ID that maps to a manufacturer ID in the manufacturer data. Again, the <code>Merge Join<\/code> transformation is similar to performing a join in T-SQL, so keep that in mind when trying to understand the transformation. <\/p>\n<h2>Preparing the Source Data for the Data Flow<\/h2>\n<p> \tBefore we actually set up our SSIS package, we should ensure we have the source data we need for our data flow operation. To that end, we need to two databases: <code>Demo<\/code> and <code>Dummy<\/code>. Of course, you do not need to use the same data structure that we&#8217;ll be using for this exercise, but if you want to follow the exercise exactly as described, you should first prepare your source data. <\/p>\n<p> \tTo help with this demo, I created the databases on my local server. I then copied data from the <code>AdventureWorks2008<\/code> database into those databases. Listing 1 shows the T-SQL script I used to create the databases and their tables, as well as populate those tables with data. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">USE master; \nGO \n\nIF DB_ID('Demo') IS NOT NULL \nDROP DATABASE Demo; \nGO \n\nCREATE DATABASE Demo; \nGO \n\nIF DB_ID('Dummy') IS NOT NULL \nDROP DATABASE Dummy; \nGO \n\nCREATE DATABASE Dummy; \nGO \n\nIF OBJECT_ID('Demo.dbo.Customer') IS NOT NULL \nDROP TABLE Demo.dbo.Customer; \nGO \n\nSELECT TOP 500 \n   CustomerID, \n   StoreID, \n   AccountNumber, \n   TerritoryID \nINTO Demo.dbo.Customer\nFROM AdventureWorks2008.Sales.Customer; \n\nIF OBJECT_ID('Dummy.dbo.Territory') IS NOT NULL \nDROP TABLE Dummy.dbo.Territory; \nGO \n\nSELECT \n   TerritoryID, \n   Name AS TerritoryName, \n   CountryRegionCode AS CountryRegion, \n   [Group] AS SalesGroup \n   \nINTO Dummy.dbo.Territory \nFROM AdventureWorks2008.Sales.SalesTerritory;<\/pre>\n<p class=\"caption\"> \tListing 1: Creating the Demo and Dummy databases <\/p>\n<p> \tAs Listing 1 shows, I use a <code>SELECT...INTO<\/code> statement to create and populate the <code>Customer<\/code> table in the <code>Demo<\/code> database, using data from the <code>Customer<\/code> table in the <code>AdventureWorks2008<\/code> database. I then use a <code>SELECT...INTO<\/code> statement to create and populate the <code>Territory<\/code> table in the <code>Dummy<\/code> database, using data from the <code>SalesTerritory<\/code> table in the <code>AdventureWorks2008<\/code> database. <\/p>\n<h2>Creating Our Connection Managers<\/h2>\n<p> \tOnce we&#8217;ve set up our source data, we can move on to the SSIS package itself. Our first step, then, is to create an SSIS package. Once we&#8217;ve done that, we can add two <code>OLE DB<\/code> connection managers, one to the <code>Demo<\/code> database and one to the <code>Dummy<\/code> database. <\/p>\n<p> \tTo create the first connection manager to the <code>Demo<\/code> database, right-click the <code>Connection Manager<\/code> window, and then click <code>New OLE DB Connection<\/code>, as shown in Figure 1. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure1-062e2c1f-2c89-490d-aa38-13f300c53410.jpg\" alt=\"1886-Figure1-062e2c1f-2c89-490d-aa38-13f\" \/><\/p>\n<p class=\"caption\"> \tFigure 1: Creating an <code>OLE DB<\/code> connection manager <\/p>\n<p> \tWhen the <code>Configure OLE DB Connection Manager<\/code> dialog box appears, click <code>New<\/code>. This launches the <code>Connection Manager<\/code> dialog box, where you can configure the various options with the necessary server and database details, as shown in Figure 2. (For this exercise, I created the <code>Demo<\/code> and <code>Dummy<\/code> databases on the <code>ZOO-PC\\CAMELOT<\/code> SQL Server instance.) <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure2-1ddc3b7d-f6a6-417b-b3d0-5b83616004f3.jpg\" alt=\"1886-Figure2-1ddc3b7d-f6a6-417b-b3d0-5b8\" \/><\/p>\n<p class=\"caption\"> \tFigure 2: Configuring a new connection manager <\/p>\n<p> \tAfter you&#8217;ve set up your connection manager, ensure that you&#8217;ve configured it correctly by clicking the <code>Test Connection<\/code> button. You should receive a message indicating that you have a successful connection. If not, check your settings. <\/p>\n<p> \tAssuming you have a successful connection, click <code>OK<\/code> to close the <code>Connection Manager<\/code> dialog box. You&#8217;ll be returned to the <code>Configure OLE DB Connection Manager<\/code> dialog box. Your newly created connection should now be listed in the <code>Data connections<\/code> window. <\/p>\n<p> \tNow create a connection manager for the <code>Dummy<\/code> database, following the same process that you used for the <code>Demo<\/code> database. <\/p>\n<p> \tThe next step, after adding the connection managers to our SSIS package, is to add a <code>Data Flow<\/code> task to the control flow. As you&#8217;ve seen in previous articles, the <code>Data Flow<\/code> task provides the structure necessary to add our components (sources, transformations, and destinations) to the data flow. <\/p>\n<p> \tTo add the <code>Data Flow<\/code> task to the control flow, drag the task from the <code>Control Flow Items<\/code> section of the <code>Toolbox<\/code> to the control flow design surface, as shown in Figure 3. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure3-d760b308-5a8b-41dc-a19b-5ee0e79fe972.jpg\" alt=\"1886-Figure3-d760b308-5a8b-41dc-a19b-5ee\" \/><\/p>\n<p class=\"caption\"> \tFigure 3: Adding a <code>Data Flow<\/code> task to the control flow <\/p>\n<p> \tThe <code>Data Flow<\/code> task serves as a container for other components. To access that container, double-click the task. This takes you to the design surface of the <code>Data Flow<\/code> tab. Here we can add our data flow components, starting with the data sources. <\/p>\n<h2>Adding Data Sources to the Data Flow<\/h2>\n<p> \tBecause we&#8217;re retrieving our test data from two SQL Server databases, we need to add two <code>OLE DB Source<\/code> components to our data flow. First, we&#8217;ll add a source component for the <code>Customer<\/code> table in the <code>Demo<\/code> database. Drag the component from the <code>Data Flow Source s<\/code> section of the <code>Toolbox<\/code> to the data flow design surface. <\/p>\n<p> \tNext, we need to configure the <code>OLE DB Source<\/code> component. To do so, double-click the component to open the <code>OLE DB Source Editor<\/code>, which by default, opens to the <code>Connection Manager<\/code> page. <\/p>\n<p> \tWe first need to select one of the connection managers we created earlier. From the <code>OLE DB connection manager<\/code> drop-down list, select the connection manager you created to the <code>Demo<\/code> database. On my system, the name of the connection manager is <code>ZOO-PC\\CAMELOT.Demo<\/code>. Next, select the <code>dbo.Customer<\/code> table from the <code>Name of the table or the view<\/code> drop-down list. The <code>OLE DB Source Editor<\/code> should now look similar to the one shown in Figure 4. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure4-cd57b5cb-5606-4b4c-ac91-106948da9812.jpg\" alt=\"1886-Figure4-cd57b5cb-5606-4b4c-ac91-106\" \/><\/p>\n<p class=\"caption\"> \tFigure 4: Configuring the <code>OLE DB Source Editor<\/code> <\/p>\n<p> \tNow we need to select the columns we want to retrieve. Go to the <code>Columns<\/code> page and verify that all the columns are selected, as shown in Figure 5. These are the columns that will be included in the component&#8217;s output data flow. Note, however, if there are columns you don&#8217;t want to include, you should de-select those columns from the <code>Available External Columns<\/code> list. Only selected columns are displayed in the <code>External Column<\/code> list in the bottom grid. For this exercise, we&#8217;re using all the columns, so they should all be displayed. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure5-69314c3f-fc7c-4253-9714-49b596e8bac0.jpg\" alt=\"1886-Figure5-69314c3f-fc7c-4253-9714-49b\" \/><\/p>\n<p class=\"caption\"> \tFigure 5: Selecting columns in the <code>OLE DB Source Editor<\/code> <\/p>\n<p> \tOnce you&#8217;ve verified that the correct columns have been selected, click <code>OK<\/code> to close the <code>OLE DB Source Editor<\/code>. <\/p>\n<p> \tNow we must add an <code>OLE DB Source<\/code> component for the <code>Territory<\/code> table in the <code>Dummy<\/code> database. To add the component, repeat the process we&#8217;ve just walked through, only make sure you point to the correct database and table. <\/p>\n<p> \tAfter we&#8217;ve added our source components, we can rename them to make it clear which one is which. In this case, I&#8217;ve renamed the first one <code>Demo<\/code> and the second one <code>Dummy<\/code>, as shown in Figure 6. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure6-e7ebb7ca-c536-467f-a9d7-eb6dd2d7f538.jpg\" alt=\"1886-Figure6-e7ebb7ca-c536-467f-a9d7-eb6\" \/><\/p>\n<p class=\"caption\"> \tFigure 6: Setting up the <code>OLE DB<\/code> source components in your data flow <\/p>\n<p> \tTo rename the <code>OLE DB Source<\/code> component, right-click the component, click <code>Rename<\/code>, and then type the new name directly in the component. <\/p>\n<h2>Adding the Merge Join Transformation to the Data Flow<\/h2>\n<p> \tNow that we have the data sources, we can add the <code>Merge Join<\/code> transformation by dragging it from the <code>Data Flow Transformations<\/code> section of the Toolbox to the data flow design surface, beneath the source components, as shown in Figure 7. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure10-a72481a4-9e5b-4792-b270-4458cc2a7ab4.jpg\" alt=\"1886-Figure10-a72481a4-9e5b-4792-b270-44\" \/><\/p>\n<p class=\"caption\"> \tFigure 7: Adding the <code>Merge Join<\/code> transformation to the data flow <\/p>\n<p> \tWe now need to connect the data flow paths from the <code>Demo<\/code> and <code>Dummy<\/code> source components to the <code>Merge Join<\/code> transformation. First, drag the data path (green arrow) from the <code>Demo<\/code> source component to the <code>Merge Join<\/code> transformation. When you attach the arrow to the transformation, the <code>Input Output Selection<\/code> dialog box appears, displaying two options: the <code>Output<\/code> drop-down list and the <code>Input<\/code> drop-down list. The <code>Output<\/code> drop-down list defaults to <code>OLE DB Source Output<\/code>, which is what we want. From the <code>Input<\/code> drop-down list, select <code>Merge Join Left Input<\/code>, as shown in Figure 8. We&#8217;ll use the other option, <code>Merge Join Right Input<\/code>, for the <code>Dummy<\/code> connection. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure9-75acf763-8222-4b65-b30b-19d4c225e816.jpg\" alt=\"1886-Figure9-75acf763-8222-4b65-b30b-19d\" \/><\/p>\n<p class=\"caption\"> \tFigure 8: Configuring the <code>Input Output Selection<\/code> dialog box <\/p>\n<p> \tNext, connect the data path from the <code>Dummy<\/code> data source to the <code>Merge Join<\/code> transformation. This time, the <code>Input Output Selection<\/code> dialog box does not appear. Instead, the <code>Input<\/code> drop-down list defaults to the only remaining option: <code>Merge Join Right Input<\/code>. Your data flow should now resemble the one shown in Figure 9. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure10-4d5a0be1-bc5a-48cf-b24d-d057f2f2ae69.jpg\" alt=\"1886-Figure10-4d5a0be1-bc5a-48cf-b24d-d0\" \/><\/p>\n<p class=\"caption\"> \tFigure 9: Connecting the source components to the <code>Merge Join<\/code> transformation <\/p>\n<p> \tYou may have noticed that a red circle with a white X is displayed on the <code>Merge Join<\/code> transformation, indicating that there is an error. If we were to run the package as it currently stands, we would receive the error message shown in Figure 10. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure11-18965a98-2c62-4734-a5f3-0a0075664ebd.jpg\" alt=\"1886-Figure11-18965a98-2c62-4734-a5f3-0a\" \/><\/p>\n<p class=\"caption\"> \tFigure 10: Receiving a package validation error on the <code>Merge Join<\/code> transformation <\/p>\n<p> \tThe reason for the error message is that the data being joined by a <code>Merge Join<\/code> transformation must first be sorted. There are two ways of achieving this: by sorting the data through the <code>OLE DB S ource<\/code> component or by adding a <code>Sort<\/code> transformation to the data flow. <\/p>\n<h2>Sorting Data Through the OLE DB Source<\/h2>\n<p> \tTo sort the data through the <code>OLE DB<\/code> source component, you must first modify the connection to use a query, rather than specifying a table name. Double-click the <code>Demo<\/code> source component to open the <code>OLE DB Source Editor<\/code>. From the <code>Data access mode<\/code> drop-down list, select <code>SQL command<\/code>. Then, in the <code>SQL command text<\/code> window, type the following <code>SELECT<\/code> statement: <\/p>\n<pre class=\"listing\">SELECT CustomerID, StoreID, AccountNumber, TerritoryID\nFROM dbo.Customer\nORDER BY TerritoryID<\/pre>\n<p> \tThe <code>Connection Manager<\/code> page of the <code>OLE DB Source Editor<\/code> should now look similar to the one shown in Figure 11. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure11-4708cd5e-94c5-46d3-9b61-9b424a37985e.jpg\" alt=\"1886-Figure11-4708cd5e-94c5-46d3-9b61-9b\" \/><\/p>\n<p class=\"caption\"> \tFigure 11: Defining a <code>SELECT<\/code> statement to retrieve data from the <code>Customer<\/code> table <\/p>\n<p> \tOnce you&#8217;ve set up your query, click <code>OK<\/code> to close the <code>OLE DB Source Editor<\/code>. You must then use the advanced editor of the <code>OLE DB Source<\/code> component to sort specific columns. To access the editor, right-click the component, and then click <code>Show Advanced Editor<\/code>. <\/p>\n<p> \tWhen the <code>Advanced Editor<\/code> dialog box appears, go to the <code>Input and Output Properties<\/code> tab. In the <code>Inputs and Outputs<\/code> window, select the <code>OLE DB Source Output<\/code> node. This will display the <code>Common Properties<\/code> window on the right-hand side. In that window, set the <code>IsSorted<\/code> option to <code>True<\/code>, as shown in Figure 12. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure12-8730fa00-b5f7-4d14-a6c6-704d5bb1f237.jpg\" alt=\"1886-Figure12-8730fa00-b5f7-4d14-a6c6-70\" \/><\/p>\n<p class=\"caption\"> \tFigure 12: Configuring the <code>IsSorted<\/code> property on the <code>Demo<\/code> data source <\/p>\n<p> \tWe have told the package that the source data will be sorted, but we must now specify the column or columns on which that sort is based. To do this, expand the <code>Output Columns<\/code> subnode (under the <code>OLE DB Source Output<\/code> node), and then select <code>TerritoryID<\/code> column. The <code>Common Properties<\/code> window should now display the properties for that column. Change the value assigned to the <code>SortKeyPosition<\/code> property from <code>0<\/code> to a <code>1<\/code>, as shown in Figure 13. The setting tells the other components in the data flow that the data is sorted based on the <code>TerritoryID<\/code> column. This setting must be consistent with how you&#8217;ve sorted your data in your query. If you want, you can add additional columns on which to base your sort, but for this exercise, the <code>TerritoryID<\/code> column is all we need. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure13-23a9b7bc-36d5-4a17-a4ad-48bd13d16399.jpg\" alt=\"1886-Figure13-23a9b7bc-36d5-4a17-a4ad-48\" \/><\/p>\n<p class=\"caption\"> \tFigure 13: Configuring the <code>SortKeyPosition<\/code> property on the <code>Territory ID<\/code> column <\/p>\n<p> \tOnce you&#8217;ve configured sorting on the <code>Demo<\/code> data source, click <code>OK<\/code> to close the <code>Advanced Editor<\/code> dialog box. <\/p>\n<h2>Adding a Sort Transformation to the Data Flow<\/h2>\n<p> \tAnother option for sorting data in the data flow is to use the <code>Sort<\/code> transformation. When working with an <code>OLE DB Source<\/code> component, you usually want to use the source component&#8217;s T-SQL query and advanced editor to sort the data. However, for other data sources, such as a text file, you won&#8217;t have this option. And that&#8217;s where the <code>Sort<\/code> transformation comes in. <\/p>\n<p> \tDelete the data path that connects the <code>Dummy<\/code> data source to the <code>Merge Join<\/code> transformation by right-clicking the data path and then clicking <code>Delete<\/code>. <\/p>\n<p> \tNext, drag the <code>Sort<\/code> transformation from the <code>Data Flow Transformations<\/code> section of the <code>Toolbox<\/code> to the data flow design surface, between the <code>Dummy<\/code> data source and the <code>Merge Join<\/code> transformation, as shown in Figure 14. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure15-a23b4c79-504b-487b-8bc8-f29b8a18a952.jpg\" alt=\"1886-Figure15-a23b4c79-504b-487b-8bc8-f2\" \/><\/p>\n<p class=\"caption\"> \tFigure 14: Adding the <code>Sort<\/code> transformation to the data flow <\/p>\n<p> \tDrag the data path from the <code>Dummy<\/code> data source to the <code>Sort<\/code> transformation. Then double-click the <code>Sort<\/code> transformation to open the <code>Sort Transformation Editor<\/code>, shown in Figure 15. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure15-7e20cd93-7304-4198-a0b2-613aa26ed53f.jpg\" alt=\"1886-Figure15-7e20cd93-7304-4198-a0b2-61\" \/><\/p>\n<p class=\"caption\"> \tFigure 15: Configuring the <code>Sort<\/code> transformation <\/p>\n<p> \tAs you can see in the figure, at the bottom of the <code>Sort Transformation Editor<\/code> there is a warning message indicating that we need to select at least one column for sorting. Select the checkbox to the left of the <code>TerritoryID<\/code> column in the <code>Available Input Columns<\/code> list. This adds the column to the bottom grid, which means the data will be sorted based on that column. By default, all other columns are treated as <em>pass-through<\/em>, which means that they&#8217;ll be passed down the data flow in their current state. <\/p>\n<p> \tWhen we select a column to be sorted, the sort order and sort type are automatically populated. These can obviously be changed, but for our purposes they&#8217;re fine. You can also rename columns in the <code>Output Alias<\/code> column by overwriting what&#8217;s in this column. <\/p>\n<p> \tOnce you&#8217;ve configured the sort order, click <code>OK<\/code> to close the <code>Sort Transformation Editor<\/code>. <\/p>\n<p> \tWe now need to connect the <code>Sort<\/code> transformation to the <code>Merge Join<\/code> transformation, so drag the data path from the <code>Sort<\/code> transformation to the <code>Merge Join<\/code> transformation, as shown in Figure 16. <\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure18-9e724f7f-c616-41ac-8d88-15af268067d9.jpg\" alt=\"1886-Figure18-9e724f7f-c616-41ac-8d88-15\" \/><\/p>\n<p class=\"caption\"> \tFigure 16: Connecting the <code>Sort<\/code> transformation to the <code>Merge Join<\/code> transformation <\/p>\n<p> \tThat&#8217;s all we need to do to sort the data. For the customer data, we used the <code>Demo<\/code> source component. For the territory data, we used a <code>Sort<\/code> transformation. As far as the <code>Merge Join<\/code> transformation is concerned, either approach is fine, although, as mentioned earlier, if you&#8217;re working with an <code>OLE DB Source<\/code> component, using that component is usually the preferred method. <\/p>\n<h2>Configuring the Merge Join Transformation<\/h2>\n<p> \tNow that we have our source data sorted and the data sources connected to the <code>Merge Join<\/code> transformation, we must now configure a few more settings of the <code>Merge Join<\/code> transformation. <\/p>\n<p> \tDouble-click the transformation to launch the <code>Merge Join Transformation Editor<\/code>, as shown in Figure 17. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure17-5e57fd4d-f1a5-4b90-8b56-f149e5898c4e.jpg\" alt=\"1886-Figure17-5e57fd4d-f1a5-4b90-8b56-f1\" \/><\/p>\n<p class=\"caption\"> \tFigure 17: The default settings of the <code>Merge Join Transformation Editor<\/code> <\/p>\n<p> \tNotice that your first setting in the <code>Merge Join Transformation Editor<\/code> is the <code>Join type<\/code> drop-down list. From this list, you can select one of the following three join types: <\/p>\n<ul>\n<li><code>Left outer join<\/code>: Includes all rows from the left table, but only matching rows from the right table. You can use the <code>Swap Inputs<\/code> option to switch data source, effectively creating a right outer join.<\/li>\n<li><code>Full outer join<\/code>: Includes all rows from both tables.<\/li>\n<li><code>Inner join<\/code>: Includes rows only when the data matches between the two tables.<\/li>\n<\/ul>\n<p> \tFor our example, we want to include all rows from left table (<code>Customer<\/code>) but only rows from the right table (<code>Territory<\/code>) if there&#8217;s a match, so we&#8217;ll use the <code>Left outer join<\/code> option. <\/p>\n<p> \tThe next section in the <code>Merge Join Transformation Editor<\/code> contains the <code>Demo<\/code> grid and <code>Sort<\/code> Grid. The <code>Demo<\/code> grid displays the table from the <code>Demo<\/code> data source. The <code>Sort<\/code> grid displays the table from the <code>Dummy<\/code> data source. However, because the <code>Sort<\/code> transformation is used, the <code>Sort<\/code> transformation is considered the source of the data. Had we changed the output column names in the <code>Sort<\/code> transformation, those names would be used instead of the original ones. <\/p>\n<p> \tNotice that an arrow connects the <code>TerritoryID<\/code> column in the <code>Demo<\/code> grid to that column in the <code>Sort<\/code> grid. SSIS automatically matches columns based on how the data has been sorted in the data flow. In this case, our sorts are based on the <code>TerritoryID<\/code> column in both data sources, so those columns are matched and serve as the basis of our join. <\/p>\n<p> \tYou now need to select which columns you want to include in the data set that will be outputted by the <code>Merge Join<\/code> transformation. For this exercise, we&#8217;ll include all columns except the <code>AccountNumber<\/code> column in the <code>Customer<\/code> table and the <code>TerritoryID<\/code> column from the <code>Territory<\/code> table. To include a column in the final result set, simply select the check box next to the column name in either data source, as shown in Figure 18. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure18-3394f6dd-6e47-4739-8d0a-7314601f9ca1.jpg\" alt=\"1886-Figure18-3394f6dd-6e47-4739-8d0a-73\" \/><\/p>\n<p class=\"caption\"> \tFigure 18: Specifying the columns to include in the merged result set <\/p>\n<p> \tNotice that the columns you select are included in the lower windows. You can provide an output alias for each column if you want, but for this exercise, the default settings work fine. <\/p>\n<p> \tOnce you&#8217;ve configured the <code>Merge Join<\/code> transformation, click <code>OK<\/code> to close the dialog box. You&#8217;re now ready to add your data destination to the data flow. <\/p>\n<h2>Adding an OLE DB Destination to the Data Flow<\/h2>\n<p> \tOur final step in setting up the data flow is to add an <code>OLE DB Destination<\/code> component so that we can save our merged data to a new table. We&#8217;ll be using this component to add a table to the <code>Demo<\/code> database and populate the table with joined data. So drag the <code>OLE DB Destination<\/code> component from the <code>Data Flow Destinations<\/code> section of the <code>Toolbox<\/code> to the data flow design surface, beneath the <code>Merge Join<\/code> transformation, as shown in Figure 19. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure21-8b2b8dcc-9aaa-48d8-9557-4a53ffb5dd67.jpg\" alt=\"1886-Figure21-8b2b8dcc-9aaa-48d8-9557-4a\" \/><\/p>\n<p class=\"caption\"> \tFigure 19: Adding an <code>OLE DB Destination<\/code> component to the data flow <\/p>\n<p> \tWe now need to connect the <code>OLE DB Destination<\/code> component to the <code>Merge Join<\/code> transformation by dragging the data path from the transformation to the destination. <code>Next<\/code>, double-click the destination to open the <code>OLE DB Destination Editor<\/code>. <\/p>\n<p> \tOn the <code>Connection Manager<\/code> page of the <code>OLE DB Destination Editor<\/code>, specify connection manager for the <code>Demo<\/code> database. Then, from the <code>Data access mode<\/code> drop-down list, select <code>Table or view - fast load<\/code>, if it&#8217;s not already selected. <\/p>\n<p> \tNext, we&#8217;re going to create a target table to hold our result set. Click the <code>New<\/code> button next to the <code>Name of the table or the view<\/code> option and the <code>Create Table<\/code> dialog box opens. In it a table definition is automatically generated that reflects the columns passed down the data flow. You can rename the table (or any other element) by modifying the table definition. On my system, I renamed the table <code>Me rge_Output<\/code>, as shown in Figure 20, once you are happy with the table definition click on OK. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure20-3c582564-b28b-49b2-b565-801c1ad88fda.jpg\" alt=\"1886-Figure20-3c582564-b28b-49b2-b565-80\" \/><\/p>\n<p class=\"caption\"> \tFigure 20: Using an <code>OLE DB Destination<\/code> component to create a table <\/p>\n<p> \tClick on <code>OK<\/code> to close the <code>Create Table<\/code> dialog box. When you&#8217;re returned to the <code>OLE DB Destination Editor<\/code>, go to the <code>Mappings<\/code> page and ensure that your columns are all properly mapped. (Your results should include all columns passed down the pipeline, which should be the default settings.) Click <code>OK<\/code> to close the <code>OLE DB Destination Editor<\/code>. We have now set up a destination so all that&#8217;s left to do is to run it and see what we end up with. <\/p>\n<h2>Running Your SSIS Package<\/h2>\n<p> \tYour SSIS package should now be complete. The next step is to run it to make sure everything is working as you expect. To run the package, click the <code>Start Debugging<\/code> button (the green arrow) on the menu bar. <\/p>\n<p> \tIf you ran the script I provided to populate your tables, you should retrieve 500 rows from the <code>Demo<\/code> data source and 10 from the <code>Dummy<\/code> data source. Once the data is joined, you should end up with 500 rows in the <code>Merge_Output<\/code> table. Figure 21 shows the data flow after successfully running. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure21-2312577b-8572-45d5-b3bf-02459bca8639.jpg\" alt=\"1886-Figure21-2312577b-8572-45d5-b3bf-02\" \/><\/p>\n<p class=\"caption\"> \tFigure 21: Running the SSIS package to verify the results <\/p>\n<p> \tOne other check we can do is to look at the data that has been inserted into the <code>Merge_Output<\/code> table. In SQL Server Management Studio (SSMS), run a <code>SELECT<\/code> statement that retrieves all rows from the <code>Merge_Output<\/code> table in the <code>Demo<\/code> database. The first dozen rows of your results should resemble Figure 22. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1886-Figure22-6b3384a3-5cc3-4404-8c04-97c0b9a26f15.jpg\" alt=\"1886-Figure22-6b3384a3-5cc3-4404-8c04-97\" \/><\/p>\n<p class=\"caption\"> \tFigure 22: Partial results from retrieving the data in the <code>Merge_Output<\/code> table <\/p>\n<h2>Summary<\/h2>\n<p> \tIn this article, we incorporated the <code>Merge Join <\/code>transformation into the data flow of an SSIS package in order to join data from two tables in different databases. We then created a third table based on the contents of the original two. In future articles, I hope to tackle error handling, deployment and many other tasks. <\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>SSIS is able to take sorted data from more than one OLE DB data source and merge them into one table which can then be sent to an OLE DB destination. This &#8216;Merge Join&#8217; transformation works in a similar way to a  SQL join by specifying a &#8216;join key&#8217; relationship. This transformation can save a great deal of processing on the destination.  Annette Allen, as usual, gives clear guidance on how to do it.&hellip;<\/p>\n","protected":false},"author":10747,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143528],"tags":[4242,5907,5908,4179,4150,4306],"coauthors":[],"class_list":["post-1712","post","type-post","status-publish","format-standard","hentry","category-bi-sql-server","tag-basics","tag-merge-join-transformation","tag-ole-db","tag-source-control","tag-sql","tag-ssis"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1712","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/10747"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1712"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1712\/revisions"}],"predecessor-version":[{"id":91223,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1712\/revisions\/91223"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1712"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1712"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1712"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1712"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}