SSIS Basics: Setting Up Your Initial Package

When working with databases, the use of SQL Server Integration Services (SSIS) is a skill that often needs to be acquired quickly, from scratch. Up until now, it has been a curiously frustrating search to find out the basics, fast, in order to get up and running quickly. No longer, as Annette comes up with a simple introduction for the rest of us.

I started using SQL Server Integration Services (SSIS) when I had a job that required me to move and manipulate data between files and other data sources. I did a bit of research using the resources available-Twitter, Simple-Talk, SQL Server Central, etc.-and concluded that SSIS was the right way to go. I had to get my head around it quite rapidly, but I couldn’t find help at the level I required. For that reason, I noted the points where I had struggled so that, once I’d learned more, I could help others who might otherwise struggle as I did.

In this article, the first of the “SSIS Basics” series, I go through the basics required for anyone starting out with SSIS, before he or she can venture off into more exotic uses for the tool. In subsequent articles, we’ll cover such topics as variables, for-each loops, and XML. If you’re already a regular SSIS user, this series is not for you!

What can you use SSIS for?

Essentially, SSIS can be used for any task related to files or data. You can, for example,

  • Move and rename files
  • Delete, update or insert data
  • Execute SQL scripts or stored procedures
  • Import and export data between different file types, such as Access, Excel, or any data source that supports an ODBC connection

These are, of course, only a few of the tasks you can perform in SSIS. As we work through this series, you’ll get a better sense of how extensive SSIS actually is.

Getting Started

SSIS is available only in SQL Server 2005 onwards. You create and develop SSIS in SQL Server Business Intelligence Development Studio (BIDS), a visual development tool based on Microsoft Visual Studio. (BIDS has morphed into SQL Server Data Tools (SSDT) in SQL Server 2012.)

Before going further, there is some terminology that’s important to understand. SSIS files are organized into packages, projects and solutions. The package is at the bottom of the hierarchy and contains the tasks necessary to perform the actual extract, transform, and load (ETL) operations. Each package is saved as a .dtsx file and is part of a project. You can include one or more packages in a project. That project, in turn, is part of a solution, which is at the top of the hierarchy. You can include one or more projects within a solution.

When you first open BIDS, you’re presented with the interface shown in Figure 1.

1494-clip_image002.png

Figure 1: The SSIS interface in BIDS

To create an SSIS package, point to the File menu, point to New, and click Project. This launches the New Project dialog box, shown in Figure 2.

1494-clip_image004.png

Figure 2: The New Project dialog box in BIDS

In the New Project dialog box, select the Integration Services Project template. Then, provide a name for the project in the Name text box. Next, in the Location text box, specify the folder where your project files should be saved, and then provide a name for the solution in the Solution Name text box.

After you’ve entered the project and solution information, click OK. Your new package will open in the SSIS window, as shown in Figure 3.

1494-clip_image006.png

Figure 3: Creating a new SSIS package in BIDS

Notice that the SSIS interface is divided into the following five sections (windows):

  • Control Flow Items: The components necessary to control a package’s workflow. For example, the section includes components that let you move or copy data, run SQL statements, or send emails. (The components will be explained in more detail in this article and in articles that will follow.)
  • Connection Managers: The connections to your data sources (whether retrieving or loading data). Your data sources can include SQL Server databases, CSV files, Excel spreadsheets, and a variety of other sources.
  • Solution Explorer: A hierarchical view of the data sources, data source views, packages, and other components included within the current solution.
  • Properties: The properties and their values for the package or the selected component within that package.
  • SSIS Designer: The main working area for developing your SSIS package. SSIS Designer is broken into four tabs: Control Flow, Data Flow, Event Handlers, and Package Explorer. We’ll look at each of these in greater detail as we progress through this series.

Control Flow Items

In this article, I focus on setting up the SSIS package and defining the data connections. I do not cover all the components in the Control Flow Items window. In the next article, I will demonstrate using, what I think is, the most important of these components-the Data Flow Task-and cover other control flow tasks in subsequent articles.

Connection Managers

I will now explain how to create connection managers that connect to both Excel files and a SQL Server database.  However, it is important to note that any connection created through the Connection Manager window is available only to the package it is created in.

Connecting to an Excel File

One of the first steps you’ll often take when developing an SSIS package is to create the connection managers necessary to retrieve data from or load data into your data sources. You can also set up connections “on the fly,” so if you miss creating one here it can be done as part of other tasks. This approach is most commonly used when you wish to create a connection based on the source. For example, if you wish to copy data out of a SQL Server database and export it to an Excel spreadsheet, you can create the connection manager when you set up your data flow.

To add a connection manager, right-click the blank area in the Connection Manager window, where it says Right-click here to add a new connection manager to the SSIS package, as shown in Figure 4.

1494-clip_image008.png

Figure 4: Adding a connection manager to your SSIS package

This will launch a context menu that provides a number of options for creating various types of connections, as Figure 5 illustrates.

1494-clip_image010.png

Figure 5: Selecting which type of connection manager to create

Notice you can create connections for such sources as OLE DB, ADO.NET, Analysis Services, and different types of files. In this case, we want to create a connection to an Excel file, so click the New File Connection option. This will launch the File Connection Manager Editor dialog box, shown in Figure 6.

1494-clip_image012.png

Figure 6: The File Connection Manager Editor dialog box

For this example, we’ll be connecting to an Excel file I created for demonstration purposes. Figure 7 shows the worksheet I set up in this file.

1494-clip_image013.jpg

Figure 7: Excel worksheet used for demonstration purposes

I named the Excel file Employees.xlsx and saved it in the C:\Users\Annette\Documents folder.

In the Usage type drop-down list in the File Connection Manager Editor dialog box, select Existing file. Next, click the Browse button, navigate to the folder that contains the Excel file, and select the file. The dialog box should now look like the one shown in Figure 8.

1494-clip_image015.jpg

Figure 8: Configuring the File Connection Manager Editor dialog box

Once you’ve selected the file, click OK. The new connection manager will be added to the Connection Managers window and will be assigned the name of the file, as shown in Figure 9.

1494-clip_image016.jpg

Figure 9: Viewing the new connection manager in the Connection Managers window

It is very easy to rename the connection manager to something that may be more appropriate. To do so, right-click the new connection manager and select Rename from the context menu, as show in Figure 10

1494-clip_image018.jpg

Figure 10: Renaming a connection manager

The name then becomes updateable and you can rename it to whatever you like. In this case, I renamed the connection manager Employees (Excel), as shown in Figure 11.

1494-clip_image019.jpg

Figure 11: Viewing the new connection manager name

When you view a connection manager in the Connection Managers window, you’ll see that each connection type is associated with a different icon. If you created an Excel connection from here, it is displayed with the same icon used for any flat file connection. However if you create an Excel connection when adding a component to the Data Flow tab, the connection manager will display an Excel Icon.

Connecting to a SQL Server Table

Because our example will retrieve data from a SQL Server database, you’ll also need to create a connection manager for that database. To do so, you should again right-click the Connection Managers window to open the context menu, but this time, click the New OLE DB Connection option. The Configure OLE DB Connection Manager dialog box will appear, as shown in Figure 12.

1494-clip_image021.jpg

Figure 12: Creating an OLE DB connection manager

If any OLE DB connections have already been defined on the package, they will appear in Data connections list. You can use one of these, if it fits your needs, or you can create a new one. To create a new connection, click the New button to launch the Connection Manager dialog box, shown in Figure 13.

1494-clip_image023.png

Figure 13: Configuring an OLE DB connection manager

To configure the connection manager, select the SQL Server instance from the Server name drop-down list, and then select the authentication type. In this case, I selected the Use SQL Server Authentication option and provided a username and password. You might decide to select the Use Windows Authentication option, in which case your current Windows credentials will be used to establish the connection with SQL Server. In a later article, when we look at deploying the package, we will look at how the connections can be altered at run time and therefore how the login details can be changed then. For now, ensure that you set up the login the way you need it to run the package while you’re developing it.

From the Select or enter a database name drop-down list, select the name of the AdventureWorks database. Your Connection Manager dialog box should now look similar to the one shown in Figure 14.

1494-clip_image025.png

Figure 14: Configuring an OLE DB connection manager

Be sure to click the Test Connection button to verify that you can connect to the target database. The system will display a message similar to the one in Figure 15 to confirm whether you’ve successfully connected to the database.

1494-clip_image027.png

Figure 15: Testing your database connection

After you’ve confirmed your connection, click OK to close the message box, and then click OK to close the Connection Manager dialog box. You will be returned to the Configure OLE DB Connection Manager dialog box, shown in Figure 16.

1494-clip_image029.png

Figure 16: Finalizing your OLE DB connection manager

Notice that your new connection has been added to the Data connections list. Click OK to close the dialog box. The Connection Managers window will show your two connections. You’re now ready to start working with them.

Solution Explorer

Within Solution Explorer, you can view all projects, packages, data sources and data source views that make up the solution.   

Adding New Projects

If you wish to add an additional project to a package, point to File on the menu bar, point to Add, and click New Project, as shown in Figure 17.

1494-clip_image031.jpg

Figure  17: Adding a new project to a solution

The Add New Project window opens. Select Integration Services Project and in the Name box enter the name you wish to call the new project as shown in Figure 18.

1494-clip_image033.jpg

Figure 18: Add New Project Wizard

As you can see in Figure 19 a new project is added to the solution named “Dev” and will appear in Solution Explorer. The project will contain three empty folders named Data Sources, Data Source Views and Miscellaneous. The project will also contain a folder named SSIS Packages and within the folder a file named Package.dtsx, which is an empty SSIS package created automatically when the project is created. Figure 19 shows the new project and its folders in Solution Explorer.

1494-clip_image035.jpg

Figure 19: The folders and package created in a new SSIS project

Data Sources

Earlier I showed you how to create connections in the Connection Managers window. As I mentioned, if a connection is created in the Connection Managers window, it is available only to the package it was created in. However, you can also create connections known as data sources, which are available to all packages in a project.

To create a new data source, right-click Data Sources in Solution Explorer to open the Connection Manager dialog box (shown in Figure 20). Then fill in the options as you did when you created an OLE DB connection manager. Be sure to click Test Connection to confirm the connection has been created successfully.

1494-clip_image037.jpg

Figure 20: Creating a new data source connection

The Data Source Wizard will appear, with the new data connection highlighted, as shown in Figure 21. After you review the settings, click Next.

1494-clip_image039.jpg

Figure 21: The data connection in the Data Source Wizard

When the next page of the wizard appears, type in a name for the data source. As this is project wide, I would recommend you fully describe the source using the server and database name. I have renamed my data source RGTest_AdventureWorks2008, as shown in Figure 22. I try to set up and follow consistent naming conventions.

1494-clip_image041.jpg

Figure 22: Renaming the data source

After you’ve renamed the data source, click Finish. Your Data Source should now be listed under Data Sources in Solution Explorer, as shown in Figure 23. Notice that the data source is saved with the .ds file extension to indicate that it is indeed a data source.

1494-clip_image043.jpg

Figure 23: Creating a data source in Solution Explorer

Initially, the new data source is not listed in your package’s Connection Managers window; however, it is available to your package. Once you have made use of the data source in the package it will be visible in the Connection Managers window.Data Source Views

Data source views, like data sources, are available to all packages in a project. A data source view is used to define a subset of a data from a data source. The data source view can include only some of the tables or it can be used to define relationships or calculated columns.

Because a data source view is based on a data source, you would normally create the data source before starting to create the data source view. However, this is not compulsory because you can create the data source when you’re creating the data source view. To create a data source view, right-click the Data Source Views folder and click New Data Source View, as shown in Figure 24.

1494-clip_image045.jpg

Figure 24: Creating a data source view in Solution Explorer

When the Data Source View Wizard appears, click Next. The next page shows the data sources available to the project, as shown in Figure 25. (In this case, there’s only one.)

1494-clip_image047.jpg

Figure 25: Available data sources

As you can see, the page shows the name of the data source in the Relational data sources list. The properties for the selected data source appear to the right, in the Data source properties window. A data source must be selected before you can continue with the wizard. If you haven’t created the data source you need, you can create one now by clicking the New Data Source button.

Once you’ve selected the necessary data source, click Next. The new page provides a list of the tables and views available through the selected data source. If you wish to filter the list, type the filter criteria in the Filter text box below the Available Objects list, and then click the filter icon to the right of the text box. For example, I typed Emp in the Filter text box, which reduced the list of available objects to those that contain “Emp” in their name, as shown in Figure 26.

1494-clip_image049.jpg

Figure 26: Filtering tables and views in the data source

The next step is to determine which tables and views you want to include in your data source view. From the filtered list of tables and views in the Available Objects list, select the objects you want to include. You can select more than one object by clicking the first one, holding down the Ctrl key, and then clicking the additional objects. Once you’ve selected the objects, click the single right arrow button to move those objects to the Included Objects window. If you want to move all the listed objects, simply click the double right arrow button.

Once an object has been moved to the Included Objects list, the single left arrow button and double left arrow button become active. These work the same as the right arrows. The single left arrow moves a single selected object or multiple selected objects from the Included objects list back to the Available objects list. The double left arrow moves all objects in the Included objects list back to the Available objects list.

Figure 27 shows the full list of available objects (without filters), minus the two objects that have been moved to the Included objects list. Notice that two additional objects are selected in the Available objects window. As you would expect, you can move the files to the Included objects list by clicking the single right arrow button.

If you click the Add Related Tables button beneath the Included objects list, all tables related to the objects in the Included objects list will be automatically added.

1494-clip_image051.jpg

Figure 27: Adding tables and views to a data source view

Once all required objects have been selected, click Next. You can now see a preview of what you have selected, and you can rename the data source view to something more appropriate. If you have missed an object, click the Back button to return to the previous page.

For this example, I renamed my data source view AW2008-Employees. As you’re changing the name in the Name textbox, the name is also updated in the Preview window, as shown in Figure 28.

1494-clip_image053.jpg

Figure 28: Renaming the data source view

If you are happy with the configuration, click Finish. The data source view is saved with the .dsv file extension and is added to the Data Source Views folder in Solution Explorer. A new window appears in SSIS Designer and shows the data source view in design mode, as shown in Figure 29.

1494-clip_image055.jpg

Figure 29: Data source view in design mode

Amending a Data Source View

SSIS provides a number of options for modifying a data source view. Most of those options are at the table level. If you right-click the table name either on the design surface or in the Tables pane (on the left side of the screen), you can choose from the following options:

  • Adding a calculation
  • Adding a relationship
  • Replacing a table
  • Deleting a table
  • Reviewing data
Deleting an object

Suppose I added the Store table in error. I can delete from table from my data source view by right-clicking the table name and selecting the Delete table from DSV option, as shown in Figure 30.

1494-clip_image057.jpg

Figure 30: Deleting a table from a data source view

You’ll then be prompted to confirm your deletion. When the Delete Objects message box appears, click OK, as shown in Figure 31.

1494-clip_image059.jpg

Figure 31: Deleting objects from a data source view

When you click OK, the object is permanently removed from the data source view.

Adding a new column

To add a calculated column to a data source view, right-click the table name and select New Named Calculation to open the Create Named Calculation dialog box. Enter the new column name in the Column name text box, add an appropriate description in the Description text box, if required, and then create the calculation in the Expression text box. For this example, I’ve assigned the name Age to the column and added the description Current Age based on Birth Date. For the expression, I added the one shown in Figure 32. Note that, at this stage, there is no way to test whether your code is correct!

1494-Figure32.jpg

Figure 32: Creating a calculated column

Figure 33 shows us that the Age column has been added to our table. The icon next to the column shows that it is a calculated column.

1494-clip_image063.jpg

Figure 33: Verifying that the calculated column has been added

To view the data in the table and verify that the new column has been created correctly, right-click one of the columns and then click Explore Data, as shown in Figure 34.

1494-clip_image065.jpg

Figure 34: Viewing the data in the table

The Explore Employee Table window appears, as shown in Figure 35. We can now view all the data in the Employee table. Notice that the Age column has been added to the table (on the far right side) and displays the data returned by our expression.

1494-clip_image067.jpg

Figure 35: Viewing data in the Employee table

Once you have made all the necessary changes, save the data source view. It will then be available for you to use in any of your packages in the project.

Summary

In this article, I’ve shown you how to create an SSIS package and set up connection managers, data sources, and data source views. In the next article, I will show you how to set up a package that retrieves data from a SQL Server database and loads it into an Excel file. I will also show you how to add a derived column that calculates the data to be inserted into the file. In addition, I will demonstrate how to run the package.

In future articles, I plan to show you how to deploy the package so it can be run as part of a scheduled job or called in other ways. I also plan to cover how to use variables and how they can be passed between tasks. I also aim to cover more control flow tasks and data flow components, including those that address conditional flow logic and for-each looping logic. There is much much more that can be done using SSIS, and I hope over the course of this series to cover as much information as possible.