Uploading Data to Windows Azure SQL Database

One way of getting the advantages of the Cloud without having to migrate the entire database is to just maintain a copy of the data that needs to be accessible to internet-based users in Windows Azure SQL Database. There are various ways of keeping the two in sync, and Feodor describes a solution based in using SSIS

The problem at hand: In a large enterprise, it can be difficult to provide up-to-date public data to outside users, such as clients and vendors, from the company’s internal servers. Maintenance and reliability issues often impede or sometimes altogether prevent delivering the data. One way to solve this problem is to upload relevant data to Windows Azure SQL Database (formerly SQL Azure) and let outside users access the information they need from there.

NOTE: According to Microsoft, the official name for their cloud-based database service is now Windows Azure SQL Database. Since I’m in no way affiliated with Microsoft or obligated to follow their naming conventions, I refer to the service as SQL Azure throughout the rest of the article, since this is the name that most people in the field continue to use.

In this article, I’ll show you how to create a solution that uploads enterprise master data from an on-premises instance of SQL Server to SQL Azure. I’ll be using a local instance of SQL Server 2008 to demonstrate how to create the various components that go into the solution. To facilitate this process, we’ll use SQL Server Management Studio (SSMS) to interface with the database engine and use Business Intelligence Development Studio (BIDS) to create a SQL Server Integration Services (SSIS) package. SSIS will provide the mechanism necessary to upload the data from the SQL Server instance to SQL Azure and synchronize the uploaded data with existing data.

The reason we’ll use SSIS to upload the data to SQL Azure is simple: We can upload data to a SQL Azure database without incurring additional expenses. It costs money to download and read data from a SQL Azure database, but it’s free to upload that data.

There’s another advantage to using SSIS. It’s a mature product that comes free with SQL Server. At some point, SQL Data Sync might also prove a viable solution for uploading data to a SQL Azure database, but that service is still in preview and we have insufficient information about its pricing and limitations.

So for now, SSIS is a good way to go. It provides the flexibility and cost-savings we need and efficiently transfers data to a SQL Azure database, where it can be made available to outside users at the lowest cost possible. There is, of course, the cost of the SQL Azure service itself, but that would be a factor regardless of how we loaded the data into the service’s database.

The Solution in a Nutshell

The goal of our solution is to use SSIS to copy data from an on-premises SQL Server database to a SQL Azure database and to keep the data synchronized. We’ll create the solution in several steps:

  • Identifying the source data. This is the data stored on the on-premises SQL Server instance that will be uploaded to the SQL Azure database.
  • Creating the target tables in the SQL Azure database. The tables will contain the data that outside users will be able to access.
  • Creating the staging tables in the SQL Azure database. The tables will serve as an intermediary landing area for temporarily holding the data so it can be synchronized with the target tables.
  • Setting up an SSIS package to load and synchronize data. The step includes creating the package itself, creating the necessary connection managers, cleaning up our staging tables, bulk-loading data to the staging tables, and synchronizing the data with the target tables.

In our SSIS package, we’ll issue T-SQL MERGE statements against the SQL Azure database to synchronize the data between the staging tables and target tables. We could use SSIS to stage the data in memory and then synchronize it with the data in the SQL Azure database, but this approach would call for additional reads of the SQL Azure data, which would incur additional costs. By keeping all our synchronization processing within the SQL Azure database, we can avoid those extra reads and the costs associated with them.

Identifying the Source Tables

For the purposes of this article, we’ll be using two tables from the AdventureWorks2008 database as our source data: Sales.Currency and Sales.CurrencyRate. We’ll treat the data in these tables as non-confidential so we can make it available to outside users. That way, even if the production servers were to experience long maintenance periods, the outside data would still be available.

To make the data available to outside users, we’ll create two tables with the same names in the SQL Azure database. These will serve as our target tables. For the Currency source table, we’ll upload all data to the SQL Azure target. For the CurrencyRate source table, we’ll upload only the most recent 90 days of historical rates.

Figure 1 shows a diagram on the Currency and CurrencyRate tables, as they exist in the AdventureWorks2008 database. Notice that the diagram shows a foreign key relationship between the tables. Actually, two foreign key relationships exist on the CurrencyRate table, one defined on the FromCurrencyCode column and the other on the ToCurrencyCode column. Both foreign keys reference the CurrencyCode column in the Currency table.


Figure 1: Currency and CurrencyRate tables in the AdventureWorks2008 database

Creating the Target Tables in SQL Azure Database

Once we’ve identified our source tables, we can create the target tables in the SQL Azure database. In SSMS, connect to the SQL Azure database in which you want to create the target tables. Then run the following T-SQL script against that database:

The T-SQL script creates the Sales schema and then creates the Currency and CurrencyRate tables within that schema. Notice that the script also maintains the foreign key relationships on the target tables, just to make sure we have consistent data.

Creating the Staging Tables in Our SQL Azure Database

Now that we have our target tables, we can create the two staging tables that will hold the data we bulk-load from our on-premises SQL Server instance to SQL Azure. The staging tables will serve as intermediary tables that will let us compare the uploaded data to any existing data in our target tables. (As mentioned earlier, this is a great way to avoid extra costs, since uploading data to SQL Azure is free; whereas, if we wanted to read the data and compare it locally, we would need to pay extra for the data transfer.)

To create the staging tables, we’ll use the same T-SQL code as above, but change the table names to Currency_Temp and CurrencyRate_Temp, remove the schema-creation statement, and remove the IDENTITY property from the CurrencyRateID column definition (in the CurrencyRate_Temp table), as shown in the following script:

You should run this script against the same SQL Azure database where you created your target tables. Once you’ve created the staging tables, you now have everything you need in the SQL Azure database to provide data to your outside users, so let’s turn to the SSIS package.

Creating an SSIS Package to Upload Data to SQL Azure

With our source, target, and staging tables in place, we’re ready to set up our SSIS package. The first step, then, is to open BIDS and use the IntegrationServicesProject template to create a new project, as shown in Figure 2.


Figure 2: Creating an SSIS project in BIDS

When we create this project, BIDS automatically creates the initial SSIS package and opens it to the ControlFlow tab in SSIS Designer. For our example, this one package is all we need. Within that package, we’ll be adding the connection managers, control flow tasks, and data flow components necessary to support our solution.

NOTE: This article assumes that you have a basic understanding of the SSIS components and how to set up your control flow and data flow. If you’re unfamiliar with SSIS or these concepts, refer to SQL Server Books Online.

Just to give you an idea of where were heading in this exercise, our finished package should look similar to the one in Figure 3, which shows the control flow after we’ve added all the components.


Figure 3: The control flow tasks in the SSIS package

You can download the SSIS package from the Downloads section on the top of this article. In the meantime, Figure 3 will provide you with a guide that you can follow as we walk through the steps necessary to add each component to our package. The control flow tasks shown in the figure perform the following functions:

  • CleanAzuretempdata: An ExecuteSQL task that removes any data from the staging tables in the SQL Azure database.
  • CurrencyTempImport: A DataFlow task that loads data into the Currency_Temp staging table.
  • CurrencyRatesTempImport: A DataFlow task that loads data into the CurrencyRate_Temp staging table.
  • MergeCurrency: An ExecuteSQL task that runs the MERGE statement necessary to synchronize the Currency_Temp and Currency tables in the SQL Azure database.
  • MergeCurrencyRates: An ExecuteSQL task that runs the MERGE statement necessary to synchronize the CurrencyRate_Temp and CurrencyRate tables in the SQL Azure database.

Shortly, we’ll go through each component in detail. But first, notice that the package also includes two connection managers, so let’s look at those first.

Creating the Connection Managers

Once we have our SSIS package in place, we need to create two connection managers to communicate with the SQL Server and SQL Azure databases. To this end, we’ll add the following two connection managers:

  • An OLEDB connection manager that connects to the AdventureWorks2008 database on the local instance of SQL Server 2008. For this exercise, we’ll name the connection manager LocalHost.AdventureWorks.
  • An ADO.NET connection manager that connects to the SQL Azure target database. For this exercise, we’ll name the connection manager MyAzure.database.windows.net.

These are the only connection managers we’ll need. We’ll be using them in our various control flow and data flow components to read and write data as necessary.

Cleaning Up the Staging Tables

After we create our connection managers, the first control flow task we’ll add to our design surface is an ExecuteSQL task that deletes the data from the staging tables. This is a good step to take to ensure that we’re always starting with clean staging tables whenever we run the SSIS package. When you add the task, configure it with the connection manager that points to the SQL Azure database and include the following T-SQL statement to run against the staging tables:

Notice that we use a DELETE statement for the Currency_Temp table rather than a TRUNCATE statement. This is because the table is referenced by a foreign key. Another way to do this would be to drop the foreign key, truncate the table, and then re-create the foreign key. For an in-house solution, we would probably take this approach in order to account for the transaction log’s size and performance. In SQL Azure, however, we’re not concerned with the transaction logs.

That’s all we need to do to perform the necessary clean-up. Now onto populating the staging tables.

Bulk-Loading Data into the Staging Tables

Our next step is to add two DataFlow tasks to our control flow. The tasks will bulk-insert the data from our on-premises SQL Server database to our SQL Azure database.

The first DataFlow task should include an OLEDBSource component that retrieves data from the Currency source table and an ADO.NETDestination component that loads the data into the Currency_Temp table in our SQL Azure database. Be sure that you use the appropriate connection manager for each data flow component.

The second DataFlow task should also include an OLEDBSource component, but this one retrieves data from the CurrencyRate table. However, rather than retrieving all data from the table, the component should use the following SELECT statement to retrieve data only for the last 90 days:

In addition, the second DataFlow task should include an ADO.NETDestination component that loads the data into the CurrencyRate_Temp table in the SQL Azure database. Again, be sure that you use the appropriate connection manager for each data flow component.

You might have noticed when we created the staging tables that we did not include the IDENTITY property when we defined the CurrencyRateID column in the CurrencyRate_Temp table, even though the CurrencyRateID column in the CurrencyRate source table is defined with the IDENTITY property. We take this approach to ensure that the IDs remain unchanged throughout the staging process so they’re identical in both the source and target databases. In the next step, we’ll use those IDs to merge the staged data into the target table.

Synchronizing the Data

Our package now includes the mechanisms necessary to populate the staging tables. Now it’s time to handle how the data is inserted into and updated in the target tables in our SQL Azure database. Our next step, then, is to add the logic necessary to synchronize the data between the staging tables and the target tables. For that, we’ll add two ExecuteSQL tasks to our data flow, one for the Currency and Currency_Temp tables and one for the CurrencyRate and CurrencyRate_temp tables.

You should add the first ExecuteSQL task to your control flow after the DataFlow tasks, as indicated in Figure 3. Configure the task with the connection manager that points to the SQL Azure database and include the following T-SQL statement in the task:

The T-SQL code uses a MERGE statement to check the Currency table in the SQL Azure database to determine whether any rows match those in the Currency_Temp table. Rows are considered matching if they share the same CurrencyCode values. If a row does match, the statement updates the row in the target table. If the target table does not contain a matching row, the statement inserts that row into the target table.

Now let’s add an ExecuteSQL task to do the same thing for the CurrencyRate and CurrencyRate_Temp tables in the SQL Azure database. Again, when you add the task, configure it to use the SQL Azure database connection manager and include the following T-SQL statement in the task:

The MERGE statement performs the same matching process that we saw in the previous MERGE statement, only in this case the matches are based on the CurrencyRateID column.

Now that we have the logic in place to sync the data between the staging and target tables, our SSIS package contains everything it needs to update the data used by our outside users. As long as we run our SSIS package regularly, we should have no problems with doing in-house maintenance while keeping our data is available to our outside users.

Troubleshooting Performance Issues

SQL Azure terminates queries that run too long. This can occur when you try to load a large amount of data. In such cases, you might receive the following exception when running your SSIS package:

Error: 0xC020844B at Table1, ADO NET Destination [31]: An exception has occurred during data insertion, the message returned from the provider is: A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 – An existing connection was forcibly closed by the remote host.)

Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

One way to resolve this is by changing CommandTimeout property associated with the ADO.NETDestination components in your data flow. The CommandTimeout property specifies the number of seconds you have before a command times out. By default, the property is set to 30 seconds, which is not enough time for SQL Azure. For example, if we were to run our SSIS package with the default settings, the CurrencyRatesTempImport task would fail, as shown in Figure 4.


Figure 4: A task in an SSIS packaging failing because it times out

We can fix the problem by modifying CommandTimeout property value for that ADO.NETDestination component, as shown in Figure 5.


Figure 5: Setting the CommandTimeout property on an ADO.NET Destination

If we were to change the CommandTimeout setting from 30 to 600, the data load operation would most likely not time out.


Until SQL Data Sync emerges as a trusted service, there’s a simple way to upload our data from an on-premises SQL Server instance to SQL Azure. We can use an SSIS package and T-SQL MERGE statements to upload and synchronize the data. Keep in mind that uploading data to a SQL Azure database is free, and downloading data costs extra; hence, our goal is to upload data to Azure and to merge it in Azure instead of reading the data in Azure and comparing it in-house. And once you see how relatively easy it is to implement this solution, you’ll be able to make better use of SQL Azure and better serve your outside users.