Questions About Using TSQL to Import Excel Data You Were Too Shy to Ask

“How do I write an ad hoc query to retrieve data from an Excel spreadsheet?”
“How do I retrieve data from an Excel spreadsheet that does not include a header row?”
“How do I import data from an Excel workbook directly into a new table?”
“When I try to retrieve data from an Excel worksheet, I receive an error saying that the Ad Hoc Distributed Queries component is turned off. How do I enable that component?”
“I’m importing data from an Excel spreadsheet. One of its columns contains mostly numeric data with a few alphanumeric values mixed in. When I run my query, SQL Server replaces the alphanumeric values with NULL. How do I preserve the alphanumeric values?”
“How do I import data from an Excel worksheet directly into an existing table in my SQL Server database?”
“Can I import Excel data directly into a temporary table?”
“How do I limit the data I retrieve from an Excel spreadsheet to specific rows and columns?”
“How do I use a linked server to import data from an Excel worksheet?”
“I’m using a linked server to import data from an Excel spreadsheet. Is there a way to limit the columns and rows I return when I query the data?”
“I’ve created a linked server that connects to an Excel workbook file and need to be able to view a list of spreadsheets within the workbook. Is there a way to retrieve this information?”
“How do I import Excel data through a linked server if the spreadsheet contains no header columns?”
“How do I join data in a SQL Server table with data from an Excel workbook?”
“When I try to import Excel data into SQL Server, I keep running into errors related to the OLE DB provider. I’ve received errors about not being able to initialize the provider, not being able to create an instance of the provider, and not being able to install a 64-bit version of the provider. Can you help?”

“How do I write an ad hoc TSQL query to retrieve data from an Excel spreadsheet?”

The simplest way to query Excel data is to use the OPENROWSET function to connect directly to the Excel file. The OPENROWSET function is a SQL Server built-in rowset function (nondeterministic) that lets you connect to an OLE DB data source. The function serves as a one-time alternative to creating a linked server.

The OPENROWSET function returns data in a tabular format, which means we can use the function in a query’s FROM clause as we would any table expression. For example, the following SELECT statement retrieves the data on the vEmployee spreadsheet in the EmployeeData1.xmls file:

SELECT *

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml;

Database=C:\DataFiles\EmployeeData1.xlsx',

[vEmployee$]);

When calling the OPENROWSET function to retrieve Excel data, we must pass several arguments into the function. The first is the name of the provider, in this case, Microsoft.ACE.OLEDB.12.0. We can choose from one of two providers:

Microsoft.Jet.OLEDB.4.0: Use on SQL Server 32-bit editions for Excel 2003 files (or earlier).
Microsoft.ACE.OLEDB.12.0: Use on SQL Server 32-bit editions for Excel 2007 files (or later) or on SQL Server 64-bit editions for any Excel files.

The second OPENROWSET argument defines the provider string and includes multiple parts, divided by semi-colons. The provider string in the example above is divided into two parts. The first one specifies the file type, in this case, Excel 12.0 Xml. Here’s the current breakdown on file types:

For Excel ’97-2003 (.xls) files, use Excel 8.0.
For Excel 2007-2010 (.xlsx) files, use Excel 12.0 Xml.
For Excel 2007-2010 macro-enabled (.xlsm) files, use Excel 12.0 Macro.
For Excel 2007-2010 non-XML binary (.xlsb) files, use Excel 12.0.

The second part of the argument is the Database option, which provides the file’s path and file name (C:\DataFiles\EmployeeData1.xlsx). This of course will vary depending on the file’s location. Be aware, however, that when accessing files on a remote server, the permissions issue can get tricky. You might find it easier to copy the file to the local system before importing the data into SQL Server.

The third argument we pass into the OPENROWSET function is the name of the spreadsheet we want to access. Notice that we need to add a dollar sign ($) to the end of the name and enclose the name in brackets, as in [vEmployee$]. (The spreadsheet received this name because the data comes from the vEmployee view in the AdventureWorks2014 sample database.)

That’s all there is to retrieving data from an Excel spreadsheet. The query above returns 290 rows of employee data. We can also use the OPENDATASOURCE function to achieve the same results. The OPENDATASOURCE function is another SQL Server rowset function that can be used in place of a linked server, as shown in the following example:

SELECT *

FROM OPENDATASOURCE('Microsoft.ACE.OLEDB.12.0',

'Data Source=C:\DataFiles\EmployeeData1.xlsx;

Extended Properties=Excel 12.0 Xml')...[vEmployee$];

As you can see, the arguments we pass into the OPENDATASOURCE function are structured a bit differently from those passed into the OPENROWSET function, but the results are the same. Although the OPENDATASOURCE and OPENROWSET functions serve different purposes, when used to retrieve Excel data in this way, they basically work the same way.

“How do I retrieve data from an Excel spreadsheet that does not include a header row?”

By default, when you retrieve data from an Excel file, SQL Server assumes that the first row in the spreadsheet is a header that contains column names. However, you can include an option in your provider string that determines whether or not the first row should be treated as a header. That option, HDR, can be set to YES or NO, with YES being the default. The following SELECT statement specifies the default value:

SELECT *

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES;

Database=C:\DataFiles\EmployeeData1.xlsx',

[vEmployee$]);

The statement returns the same 290 rows that were returned in the previous examples. However, if we want to specify that no header row exists, we can change the setting to NO:

SELECT *

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=NO;

Database=C:\DataFiles\EmployeeData1.xlsx',

[vEmployee$]);

When using the HDR option, you have to be careful that you don’t inadvertently assume there’s no header row when there actually is, or vice versa. In this case, we assumed incorrectly. The following table shows part of the results returned by the SELECT statement:

F1	F2	F3	F4	F5
NULL	FirstName	MiddleName	LastName	JobTitle
259	Ben	T	Miller	Buyer
278	Garrett	R	Vargas	Sales Representative
204	Gabe	B	Mares	Production Technician – WC40
78	Reuben	H	D’sa	Production Supervisor – WC40
255	Gordon	L	Hee	Buyer
66	Karan	R	Khanna	Production Technician – WC60
270	FranÃ§ois	P	Ajenstat	Database Administrator
22	Sariya	E	Harnpadoungsataya	Marketing Specialist
161	Kirk	J	Koenigsbauer	Production Technician – WC45
124	Kim	T	Ralls	Stocker
10	Michael	NULL	Raheem	Research and Development Manager
248	Mike	K	Seamans	Accountant
175	Reed	T	Koch	Production Technician – WC30
155	Fadi	K	Fakhouri	Production Technician – WC20
144	Paul	R	Singh	Production Technician – WC20
192	Brenda	M	Diaz	Production Supervisor – WC40

The problem, as you can see, is that the column names contained in the spreadsheet are treated like any other row of data and show up here as the first row. Notice, too, that SQL Server automatically uses the default Excel column names, F1, F2, F3, etc. Now let’s retrieve data from a spreadsheet that actually is header-less:

SELECT *

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=NO;

Database=C:\DataFiles\EmployeeData2.xlsx',

[vEmployee$]);

This time around, we’re retrieving data from the EmployeeData2.xmlx file, which lists the employee data without the column names. The following table shows part of the results now returned by our statement:

F1	F2	F3	F4	F5
259	Ben	T	Miller	Buyer
278	Garrett	R	Vargas	Sales Representative
204	Gabe	B	Mares	Production Technician – WC40
78	Reuben	H	D’sa	Production Supervisor – WC40
255	Gordon	L	Hee	Buyer
66	Karan	R	Khanna	Production Technician – WC60
270	FranÃ§ois	P	Ajenstat	Database Administrator
22	Sariya	E	Harnpadoungsataya	Marketing Specialist
161	Kirk	J	Koenigsbauer	Production Technician – WC45
124	Kim	T	Ralls	Stocker
10	Michael	NULL	Raheem	Research and Development Manager
248	Mike	K	Seamans	Accountant
175	Reed	T	Koch	Production Technician – WC30
155	Fadi	K	Fakhouri	Production Technician – WC20
144	Paul	R	Singh	Production Technician – WC20
192	Brenda	M	Diaz	Production Supervisor – WC40
166	Jack	S	Richins	Production Supervisor – WC30

Now let’s retrieve data from the same file, only this time around we’ll set the HDR option to YES, just to mix things up a bit:

SELECT *

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES;

Database=C:\DataFiles\EmployeeData2.xlsx',

[vEmployee$]);

Not surprisingly, this can skewer our results even more, with the first row of data treated as column names, as shown in the following table:

259	Ben	T	Miller	Buyer
278	Garrett	R	Vargas	Sales Representative
204	Gabe	B	Mares	Production Technician – WC40
78	Reuben	H	D’sa	Production Supervisor – WC40
255	Gordon	L	Hee	Buyer
66	Karan	R	Khanna	Production Technician – WC60
270	FranÃ§ois	P	Ajenstat	Database Administrator
22	Sariya	E	Harnpadoungsataya	Marketing Specialist
161	Kirk	J	Koenigsbauer	Production Technician – WC45
124	Kim	T	Ralls	Stocker
10	Michael	NULL	Raheem	Research and Development Manager
248	Mike	K	Seamans	Accountant
175	Reed	T	Koch	Production Technician – WC30
155	Fadi	K	Fakhouri	Production Technician – WC20
144	Paul	R	Singh	Production Technician – WC20
192	Brenda	M	Diaz	Production Supervisor – WC40
166	Jack	S	Richins	Production Supervisor – WC30
112	John	P	Evans	Production Technician – WC50

What all this points to is that, when retrieving data from an Excel spreadsheet, it helps to know what the data looks; otherwise, you can end up with all sorts of unusual results.

“How do I import data from an Excel workbook directly into a new table?”

Once you’ve gotten down how to use the OPENROWSET function to retrieve data from an Excel spreadsheet, it’s a relatively easy to slip that data into a new table. (Refer to the previous two questions if you have any questions about using the function.) The simplest way to create the table on the fly is to use a SELECT...INTO statement to create the table when you import the data, as shown in the following example:

SELECT * INTO EmployeeInfo1

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES;

Database=C:\DataFiles\EmployeeData1.xlsx',

[vEmployee$]);

All we’ve done here is specify that the database engine should create the EmployeeInfo1 table when we run the query. The data from the Excel spreadsheet will then be inserted into the new table.

“When I try to retrieve data from an Excel worksheet, I receive an error saying that the Ad Hoc Distributed Queries component is turned off. How do I enable that component?”

When you first try to retrieve data from an Excel spreadsheet, there’s a good chance you’ll run into this error; however, the issue is easily addressed. Let’s look at the problem in action. Suppose we try to run the following SELECT statement:

SELECT *

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml;

Database=C:\DataFiles\EmployeeData1.xlsx',

[vEmployee$]);

Nothing new here. As we saw in the previous example, we’re simply trying to retrieve the data from the EmployeeData1.xlsx spreadsheet. However, instead of returning the actual data, we get the following error:

Msg 15281, Level 16, State 1, Line 181

SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure. For more information about enabling 'Ad Hoc Distributed Queries', search for 'Ad Hoc Distributed Queries' in SQL Server Books Online.

Unlike a lot of SQL Server error messages, this one is actually useful. All we need to do is use the sp_configure system stored procedure to enable the specified option. If you don’t have the necessary permissions, then get an administrator to do it. The only trick is to first enable the Show Advanced Options setting and then set the Ad Hoc Distributed Queries option, as shown in the following T-SQL script:

EXEC sp_configure 'Show Advanced Options', 1;

RECONFIGURE;

EXEC sp_configure 'Ad Hoc Distributed Queries', 1;

RECONFIGURE;

In each case, we’re setting the option to 1 to enable that option. Notice too that we must run a RECONFIGURE statement each time we set an option. When the statements run, you should see results similar to the following:

1 2	Configuration option 'show advanced options' changed from 0 to 1. Run the RECONFIGURE statement to install. Configuration option 'Ad Hoc Distributed Queries' changed from 0 to 1. Run the RECONFIGURE statement to install.

Once the options have been enabled, we should be able to run our SELECT statement and retrieve the Excel data.

“I’m importing data from an Excel spreadsheet. One of its columns contains mostly numeric data with a few alphanumeric values mixed in. When I run my query, SQL Server replaces the alphanumeric values with NULL. How do I preserve the alphanumeric values?”

When you retrieve data from an Excel spreadsheet, the OLE DB provider guesses at a column’s data type by sampling rows and going with the majority. If that column contains mostly numeric data, chances are, you’ll end up with a numeric column, in which case, your alphanumeric stragglers are returned as NULL values

For example, suppose we start with the following SELECT...INTO statement:

SELECT * INTO EmployeeInfo2

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES;

Database=C:\DataFiles\EmployeeData1.xlsx',

[vEmployee$]);

All we’re doing is importing the Excel data directly into the new table, EmployeeInfo2. Now let’s go into Table Designer in SQL Server Management Studio (SSMS) to look at the data types assigned to the columns. To get there, right click the table in Object Explorer and then click Design. What pops up should look similar to the screenshot shown in Figure 1.

Figure 1: Data types assigned to the columns in the EmployeeInfo2 table

First off, notice that all the columns have been configured with either the float or nvarchar(255) data type. It does not matter that the numeric values contain no decimals or that the character data is all much shorter than 255 characters. This is what the provider does.

Of particular note is the PostalCode column. In the Excel spreadsheet, several employees come from Canada and England, both of which use postal codes based on alphanumeric values, such as T2P 2G8. The other 287 employees live in places with numeric postal codes. Because of how the OLE DB provider works, when we import the data into the EmployeeInfo2 table, the alphanumeric postal codes are replaced with NULL values.

Of course, losing data is never a good sign, but we can get around this problem by adding the IMEX option to our provider string and setting its value to 1, as shown in the following SELECT...INTO statement:

SELECT * INTO EmployeeInfo3

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES; IMEX=1;

Database=C:\DataFiles\EmployeeData1.xlsx',

[vEmployee$]);

When we include the IMEX option and set it to 1, the OLE DB provider will import our problem column as intermixed data, thus avoiding the NULL replacements. We can now return to Table Designer (shown in Figure 2), where we’ll see that the PostalCode column is configured with the nvarchar data type.

Figure 2: Data types assigned to the columns in the EmployeeInfo3 table

If we were to retrieve data from the EmployeeInfo3 table, we’d find that the alphanumeric postal codes have been preserved.

“How do I import data from an Excel worksheet directly into an existing table in my SQL Server database?”

To import Excel data directly into an existing table, you can use an INSERT...SELECT statement along with the OPENROWSET function, like we’ve seen it used in previous examples. To demonstrate how this works, let’s start by creating the following table:

IF OBJECT_ID('EmployeeInfo4', 'U') IS NOT NULL

DROP TABLE EmployeeInfo4;

CREATE TABLE EmployeeInfo4

(

BusinessEntityID INT NOT NULL,

FirstName NVARCHAR(50) NOT NULL,

MiddleName NVARCHAR(50) NULL,

LastName NVARCHAR(50) NOT NULL,

JobTitle NVARCHAR(50) NOT NULL,

PhoneNumber NVARCHAR(25) NULL,

PhoneNumberType NVARCHAR(50) NULL,

EmailAddress NVARCHAR(50) NULL,

EmailPromotion INT NOT NULL,

AddressLine1 NVARCHAR(60) NOT NULL,

City NVARCHAR(30) NOT NULL,

StateProvinceName NVARCHAR(50) NOT NULL,

PostalCode NVARCHAR(15) NOT NULL,

CountryRegionName NVARCHAR(50) NOT NULL

)

Once we have our table, we can run the following INSERT...SELECT statement to import the data:

INSERT INTO EmployeeInfo4

SELECT *

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES; IMEX=1;

Database=C:\DataFiles\EmployeeData1.xlsx',

[vEmployee$]);

Assuming that we have a spreadsheet whose data is relatively compatible with the target columns, the data should be implicitly converted, where necessary, and inserted into the table with no problem. However, if the data is not compatible, we’ll run into problems. For example, suppose we had created the table with a JobTitle column configured as nvarchar(5). If we then ran the INSERT...SELECT statement, we’d receive the following error:

Msg 8152, Level 16, State 13, Line 274

String or binary data would be truncated.

The statement has been terminated.

Again, you must be mindful when importing Excel data. In some cases, you might have to work a bit to get the data into the format you need.

“Can I import Excel data directly into a temporary table?”

Yes, you can import the data into a temporary table just like you can a regular table. In fact, you might find that using a temporary table is a good way to handle a spreadsheet that cannot be easily imported into your existing table. In such cases, you can stage the data in a temporary table in order to transform the data before inserting it into its final destination.

Let’s look at a simple example of how this might work. Suppose you have a spreadsheet that looks something like the one shown in Figure 3.

Figure 3: A partial view of the vEmployee spreadsheet of the EmployeeData3.xlsx file

Notice that we have extra rows at the top, a spreadsheet title (AdventureWorks Employees), and three cells merged into one (FULL NAME). The actual headings don’t come until line 5 and only after that do we get to the actual data.

We can start by pulling the data into a temporary table, as shown in the following SELECT...INTO statement:

SELECT * INTO #EmployeeInfo5

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES; IMEX=1;

Database=C:\DataFiles\EmployeeData3.xlsx',

[vEmployee$]);

At this point, if we were to retrieve the data in #EmployeeInfo5, we would see an unusual mix of headers, as shown in the following table:

AdventureWorks Employees	F2	F3	F4	F5
NULL	NULL	NULL	NULL	NULL
NULL	FULL NAME	NULL	NULL	NULL
BusinessEntityID	FirstName	MiddleName	LastName	JobTitle
259	Ben	T	Miller	Buyer
278	Garrett	R	Vargas	Sales Representative
204	Gabe	B	Mares	Production Technician – WC40
78	Reuben	H	D’sa	Production Supervisor – WC40
255	Gordon	L	Hee	Buyer
66	Karan	R	Khanna	Production Technician – WC60
270	FranÃ§ois	P	Ajenstat	Database Administrator

Although the table shows only part of the results, it’s enough to demonstrate that the data is a bit out of whack. The column names are a mix of the spreadsheet title and default column names, the first two data rows are mostly NULL values (except for FULL NAME), and the third row contains columns names, rather than actual data.

However, once we have the data into a temporary table, we can then convert and filter the data as necessary, as shown in the following SELECT...INTO statement:

SELECT

CAST([AdventureWorks Employees] AS INT) AS EmployeeID,

CAST((F2 + ' ' + F4) AS NVARCHAR(100)) AS FullName,

CAST(F5 AS NVARCHAR(50)) AS JobTitle

INTO EmployeeInfo5

FROM #EmployeeInfo5

WHERE ISNUMERIC([AdventureWorks Employees]) = 1;

Notice that we’ve had to work with the data on its own terms, using a mix of columns names, functions, and expressions to transform the data before inserting it into the EmployeeInfo5 table. If we were then to retrieve the data from the table, we’d find results more like what we’re looking for, as shown in the following table:

EmployeeID	FullName	JobTitle
259	Ben Miller	Buyer
278	Garrett Vargas	Sales Representative
204	Gabe Mares	Production Technician – WC40
78	Reuben D’sa	Production Supervisor – WC40
255	Gordon Hee	Buyer
66	Karan Khanna	Production Technician – WC60
270	FranÃ§ois Ajenstat	Database Administrator
22	Sariya Harnpadoungsataya	Marketing Specialist
161	Kirk Koenigsbauer	Production Technician – WC45
124	Kim Ralls	Stocker

Although the table shows only some of the 290 rows now stored in EmployeeInfo5, it’s enough to demonstrate the quality of the data and how you can use a temporary table to get there.

“How do I limit the data I retrieve from an Excel spreadsheet to specific rows and columns?”

In the previous question, we looked at how to use a temporary table to transform our data. This approach provides a way to limit the rows and columns we ultimately import into our tables. However, we can achieve the same results directly through the OPENROWSET function.

Most of examples we’ve looked at so far have used the OPENROWSET function to retrieve all data from a specific spreadsheet. We specified that spreadsheet as the function’s last argument. However, we can instead specify a SELECT statement as the third argument:

SELECT *

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES; IMEX=1;

Database=C:\DataFiles\EmployeeData1.xlsx',

'SELECT * FROM [vEmployee$]');

In this case, we’re still returning all the data from the vEmployee spreadsheet, only this time around we’re using a more formal approach, which essentially means we’re making more work for ourselves. The results are the same either way. However, the advantage of being able to do it this way is that we can create a more complex SELECT statement in order to filter out columns and rows, as well as do some other transforming along with way:

SELECT *

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES; IMEX=1;

Database=C:\DataFiles\EmployeeData1.xlsx',

'SELECT

BusinessEntityID AS EmployeeID,

FirstName + '' '' + LastName AS FullName,

JobTitle

FROM [vEmployee$]

WHERE CountryRegionName = ''Canada''

ORDER BY BusinessEntityID');

Notice that our SELECT statement is still passed in as the third argument to OPENROWSET. Also, because the argument must be enclosed in single quotes, we escape the single quotes within the text by adding another single quote. Now our outer SELECT statement returns the following results:

EmployeeID	FullName	JobTitle
278	Garrett Vargas	Sales Representative
282	José Saraiva	Sales Representative

We can achieve the same results by putting all the logic in the outer SELECT statement and leaving the provider string alone:

SELECT

BusinessEntityID AS EmployeeID,

FirstName + ' ' + LastName AS FullName,

JobTitle

FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES; IMEX=1;

Database=C:\DataFiles\EmployeeData1.xlsx',

[vEmployee$]

)

WHERE CountryRegionName = 'Canada'

ORDER BY BusinessEntityID;

If you’re working with large quantities of data, you might see performance gains using one approach over the other. However, for the amount of data that’s usually in an Excel spreadsheet, this shouldn’t be much of an issue.

“How do I use a linked server to import data from an Excel worksheet?”

There might be times when you want to use a linked server to persist your connection to the Excel file. That way, you need to supply the connection information only once, making the queries themselves much simpler.

The first step is to use the sp_addlinkedserver system stored procedure to add the linked server to your SQL Server instance, as shown in the following example:

EXEC sp_addlinkedserver

@server = 'ExcelLinkSrv1',

@srvproduct = 'Excel',

@provider = 'Microsoft.ACE.OLEDB.12.0',

@datasrc = 'C:\DataFiles\EmployeeData1.xlsx',

@provstr = 'Excel 12.0 Xml; HDR=YES; IMEX=1';

Most of the information here should look familiar to you if you’ve been using OPENROWSET to query Excel data. First, we need to name the linked server, in this case, ExcelLinkSrv1, and then specify the product (Excel), the provider, the path to the Excel file, and the provider string options.

When it comes time to retrieve the Excel data, we need only specify the name of the linked server and the spreadsheet, as shown in the following example:

1	SELECT * FROM ExcelLinkSrv1...[vEmployee$];

The statement returns all data from the spreadsheet, as we saw with OPENROWSET. We can also use a linked server to import Excel data directly into a table, whether existing or new, just like we would normally do when inserting data.

“I’m using a linked server to import data from an Excel spreadsheet. Is there a way to limit the columns and rows I return when I query the data?”

When we discussed using the OPENROWSET function to retrieve Excel data, we demonstrated how to qualify our queries to filter out rows and columns. We can do the same thing when using a linked server. One way to do it is to use the OPENQUERY function in conjunction with the linked server. OPENQUERY is yet another SQL Server rowset function that lets us specify a pass-through query on the linked server, which in our case, is the Excel spreadsheet.

Let’s demonstrate how this works. First, we create our linked server:

EXEC sp_addlinkedserver

@server = 'ExcelLinkSrv2',

@srvproduct = 'Excel',

@provider = 'Microsoft.ACE.OLEDB.12.0',

@datasrc = 'C:\DataFiles\EmployeeData1.xlsx',

@provstr = 'Excel 12.0 Xml; HDR=YES; IMEX=1';

Once we’ve set up the linked server, we’re good to go. In our SELECT statement, we include the OPENQUERY function in the FROM clause, just like OPENROWSET. When we call OPENQUERY, we pass in two arguments, the name of the linked server and our SELECT statement, as shown in the following example:

SELECT * FROM OPENQUERY(ExcelLinkSrv2,

'SELECT

BusinessEntityID AS EmployeeID,

FirstName + '' '' + LastName AS FullName,

JobTitle

FROM [vEmployee$]

WHERE CountryRegionName = ''Canada''

ORDER BY BusinessEntityID');

As expected, our first argument is ExcelLinkSrv2 and our second argument is a SELECT statement that limits the number of columns and rows and concatenates several values. The following table shows our results:

EmployeeID	FullName	JobTitle
278	Garrett Vargas	Sales Representative
282	José Saraiva	Sales Representative

We can return the same results by skipping the OPENQUERY function and including our logic in the outer SELECT statement:

SELECT

BusinessEntityID AS EmployeeID,

FirstName + ' ' + LastName AS FullName,

JobTitle

FROM ExcelLinkSrv2...[vEmployee$]

WHERE CountryRegionName = 'Canada'

ORDER BY BusinessEntityID;

As with OPENROWSET, you might want to take performance into consideration when deciding which strategy to use. For small files, however, it shouldn’t be much of an issue.

“I’ve created a linked server that connects to an Excel workbook file and need to be able to view a list of spreadsheets within the workbook. Is there a way to retrieve this information?”

Yes, SQL Server provides at least one method for retrieving the spreadsheet names-the sp_tables_ex system stored procedure. To demonstrate how this works, let’s start with the following linked server:

EXEC sp_addlinkedserver

@server = 'ExcelLinkSrv3',

@srvproduct = 'Excel',

@provider = 'Microsoft.ACE.OLEDB.12.0',

@datasrc = 'C:\DataFiles\EmployeeData1.xlsx',

@provstr = 'Excel 12.0 Xml; HDR=YES; IMEX=1';

We can now use the sp_tables_ex system stored procedure to retrieve information about the target data source defined in the ExcelLinkSrv3 linked server, which in this case is the EmployeeData1.xlsx file. To retrieve the data, we simply run the stored procedure and specify the linked server:

1	EXEC sp_tables_ex 'ExcelLinkSrv3'

In this case, the stored procedure returns the results shown in the following table:

TABLE_CAT	TABLE_SCHEM	TABLE_NAME	TABLE_TYPE	REMARKS
NULL	NULL	Emp2$	TABLE	NULL
NULL	NULL	Emp3$	TABLE	NULL
NULL	NULL	vEmployee$	TABLE	NULL

Because we’re dealing with an Excel file, the only column we’re interested in is TABLE_NAME, which lists the names of the spreadsheets in that workbook (three, in this case). If we want to get at the spreadsheet names only, we have to come up with a workaround, such as creating a table variable to hold the data and then extracting the column from the variable, as shown in the following example:

DECLARE @worksheets TABLE

(

TableCat nvarchar(50),

TableSchem nvarchar(50),

WorkSheet nvarchar(50),

TableType nvarchar(32),

Remarks nvarchar(254)

);

INSERT INTO @worksheets

EXEC sp_tables_ex 'ExcelLinkSrv3';

SELECT WorkSheet FROM @worksheets;

Pretty straightforward. We define the variable, insert data into the variable, and retrieve the column that contains the spreadsheet names. The SELECT statement returns the following results:

WorkSheet

Emp2$

Emp3$

vEmployee$

Another approach is to create a SELECT statement that uses the OPENQUERY function to run the stored procedure:

SELECT TABLE_NAME AS WorkSheet

FROM OPENQUERY([srvXYZ\sqlsrv2014],

'EXEC sp_tables_ex ''ExcelLinkSrv3''');

This approach gets around the limitation in SQL Server of not being able to run a procedure directly within a SELECT statement. All I’ve done here is to connect to a local instance of SQL Server 2014 and pass in the EXECUTE statement as the function’s second argument. The statement returns the same results as the SELECT statement in the preceding example.

To use this method, we might need to configure the SQL Server instance for data access. We can use the sp_serveroption system stored procedure to set the Data Access option to TRUE, as shown in the following example:

1	EXEC sp_serveroption 'srvXYZ\sqlsrv2014', 'Data Access', TRUE;

This approach makes it easier to retrieve only the spreadsheet names, as long as you’re okay with the option being enabled. According to Microsoft documentation, Data Access lets you enable and disable a linked server for distributed query access. However, you can also use it on a local instance of SQL Server, as we’ve done here, to let you use the OPENQUERY function in this way.

That said, one of the advantages of returning the list of columns in a table variable (or temporary table) is that you can create a WHILE loop to iterate through the list in order to retrieve data from multiple spreadsheets.

“How do I import Excel data through a linked server if the spreadsheet contains no header columns?”

If you know how to control the header setting when using the OPENROWSET function to perform ad hoc queries, you’ll have no problem controlling the setting if working with a linked server. When you create your linked server, simply set the HDR option to NO, as shown in the following example:

EXEC sp_addlinkedserver

@server = 'ExcelLinkSrv4',

@srvproduct = 'Excel',

@provider = 'Microsoft.ACE.OLEDB.12.0',

@datasrc = 'C:\DataFiles\EmployeeData2.xlsx',

@provstr = 'Excel 12.0 Xml; HDR=NO; IMEX=1';

Once we’ve created our linked server, we can query the spreadsheets in the Excel file. However, we must use the default column names when referencing the Excel fields:

SELECT * FROM OPENQUERY(ExcelLinkSrv4,

'SELECT

F1 AS EmployeeID,

F2 + '' '' + F4 AS FullName,

F5 AS JobTitle

FROM [vEmployee$]

WHERE F14 = ''Canada''

ORDER BY F1');

Notice that we’re using the F1, F2, F4, F5, and F14 column names, just as we would regular names. The SELECT statement returns the results in the following table:

EmployeeID	FullName	JobTitle
278	Garrett Vargas	Sales Representative
282	José Saraiva	Sales Representative

Of course, if the spreadsheet does include a header row at the top, we can simple set the HDR option to YES.

“How do I join data in a SQL Server table with data from an Excel workbook?”

When we use a SQL Server rowset function to retrieve data from an Excel spreadsheet, we can include the function as part of a join condition, as we would a regular table. For example, if we want to run an ad hoc distributed query, we can join the OPENROWSET function to a database table or view, as shown in the following SELECT statement:

SELECT

Emp1.BusinessEntityID AS EmpID,

Emp1.FirstName + ' ' + Emp1.LastName AS FullName,

Emp2.JobTitle,

Emp2.CountryRegionName AS CountryRegion

FROM

HumanResources.vEmployee AS Emp1 INNER JOIN

OPENROWSET('Microsoft.ACE.OLEDB.12.0',

'Excel 12.0 Xml; HDR=YES; IMEX=1;

Database=C:\DataFiles\EmployeeData1.xlsx',

[vEmployee$]) AS Emp2

ON Emp1.BusinessEntityID = Emp2.BusinessEntityID

WHERE Emp2.CountryRegionName <> 'United States'

ORDER BY Emp2.CountryRegionName;

The OPENROWSET function is basically returning a table that we’ve aliased as Emp2. We’re joining the Emp2 table to the vEmployee view in the AdventureWorks2014 database, which we’ve aliased as Emp1. The join is based on the BusinessEntityID column in each data source. The following table shows the SELECT statement’s results:

EmployeeID	FullName	JobTitle	CountryRegion
286	Lynn Tsoflias	Sales Representative	Australia
282	José Saraiva	Sales Representative	Canada
278	Garrett Vargas	Sales Representative	Canada
290	Ranjit Varkey Chudukatil	Sales Representative	France
288	Rachel Valdez	Sales Representative	Germany

Of course, this was an easy join to do because I created the spreadsheet by exporting the data through that view. However, the more the two data sources differ and the odder the spreadsheet’s structure, the more work you’ll have to do to join the tables. If it gets too bad, you might have to come up with a different approach, such as importing the Excel data into a temporary table and then joining that to the regular table.

Few issues cause as much consternation when trying to import Excel data as trying to get the provider right. For whatever reason, Microsoft does not make this an easy process, despite that fact that Microsoft builds SQL Server and Excel and an assortment of other data-driven technologies that they’re constantly hawking as the ultimate unified platform for integrated business intelligence. Given all the hype, you would think Microsoft would go through great pains to ensure that importing Excel data is as painless a process as possible. It does not.

In fact, SQL Server can be quite picky when it comes to importing Excel data. We’ve already discussed the issue of the provider itself, but let me reiterate. When importing data, you must use one of the following ADO providers:

Microsoft.Jet.OLEDB.4.0: Use on SQL Server 32-bit editions for Excel 2003 files.
Microsoft.ACE.OLEDB.12.0: Use on SQL Server 32-bit editions for Excel 2007 files or on SQL Server 64-bit editions for any Excel files.

There are other issues as well, too many to cover here, but I’ll cover what I did on my machine in order to import Excel data into SQL Server from an .xlsx file.

First, I tried to install the 64-bit version of the ACE OLE DB provider because I’m running the 64-bit version of SQL Server. I discovered that I could not do so because the 32-bit version of Office 2010 was installed on my system. You can’t install the 64-bit provider on a computer running 32-bit Office. So Office got axed.

After I uninstalled Office, I installed the driver. I then ran the following T-SQL statements:

EXEC sp_msset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', 'AllowInProcess', 1;

EXEC sp_msset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', 'DynamicParameters', 1;

The sp_msset_oledb_prop stored procedure lets you set properties associated with the ACE OLE DB provider. In this case, I enabled the AllowInProcess and DynamicParameters properties. You can also enable the properties through Object Explorer in SSMS by accessing the provider’s properties. The properties need to be turned on in order to run distributed queries.

Also, in case you haven’t already done so, you need to run the following statements to ensure that you can run ad hoc distributed queries (which we covered earlier):

EXEC sp_configure 'Show Advanced Options', 1;

RECONFIGURE;

EXEC sp_configure 'Ad Hoc Distributed Queries', 1;

RECONFIGURE;

In addition to all this, I had to restart my computer a time or two. I can’t recall for sure the order of events, but eventually I got everything working.

On the surface, this doesn’t seem so bad, but getting to this point was no easy task, in part because I was resistant to the idea that I had to uninstall Office 2010. (OpenOffice here I come.) I believe there are some other workarounds out there, but I finally grew weary of the process and opted to sacrifice Office.

So that’s my story. The problems you might bump into can very depending on whether you’re running a 32-bit version or 64-bit version of SQL Server and whether you’re retrieving Excel 2007 files or Excel 2003 files. In some cases, you might also have to adjust the permissions in the appropriate temp folder to get everything working. Which temp folder? It depends. It could be the one set up for the NetworkService account or the one set up for LocalService account or another temp folder altogether. Again, it depends on your particular configuration.

Here’s one place you might start looking for answers if your run into a roadblock: “Importing an Excel Spreadsheet into a SQL Server Database.” If that doesn’t help, there’s always Google.

Register for Simple Talk

Questions About Using TSQL to Import Excel Data You Were Too Shy to Ask

“How do I write an ad hoc TSQL query to retrieve data from an Excel spreadsheet?”

“How do I retrieve data from an Excel spreadsheet that does not include a header row?”

“How do I import data from an Excel workbook directly into a new table?”

“When I try to retrieve data from an Excel worksheet, I receive an error saying that the Ad Hoc Distributed Queries component is turned off. How do I enable that component?”

“I’m importing data from an Excel spreadsheet. One of its columns contains mostly numeric data with a few alphanumeric values mixed in. When I run my query, SQL Server replaces the alphanumeric values with NULL. How do I preserve the alphanumeric values?”

“How do I import data from an Excel worksheet directly into an existing table in my SQL Server database?”

“Can I import Excel data directly into a temporary table?”

“How do I limit the data I retrieve from an Excel spreadsheet to specific rows and columns?”

“How do I use a linked server to import data from an Excel worksheet?”

“I’m using a linked server to import data from an Excel spreadsheet. Is there a way to limit the columns and rows I return when I query the data?”

“I’ve created a linked server that connects to an Excel workbook file and need to be able to view a list of spreadsheets within the workbook. Is there a way to retrieve this information?”

“How do I import Excel data through a linked server if the spreadsheet contains no header columns?”

“How do I join data in a SQL Server table with data from an Excel workbook?”

Article tags

About the author

Robert Sheldon

Robert's contributions

Articles

Books

Top topics

Robert's latest contributions:

Getting started with Azure Database for PostgreSQL Flexible Server

Moving from SQL to MongoDB’s MQL

Working with the MongoDB Database Tools

“How do I write an ad hoc TSQL query to retrieve data from an Excel spreadsheet?”

“How do I retrieve data from an Excel spreadsheet that does not include a header row?”

“How do I import data from an Excel workbook directly into a new table?”

“When I try to retrieve data from an Excel worksheet, I receive an error saying that the Ad Hoc Distributed Queries component is turned off. How do I enable that component?”

“I’m importing data from an Excel spreadsheet. One of its columns contains mostly numeric data with a few alphanumeric values mixed in. When I run my query, SQL Server replaces the alphanumeric values with NULL. How do I preserve the alphanumeric values?”

“How do I import data from an Excel worksheet directly into an existing table in my SQL Server database?”

“Can I import Excel data directly into a temporary table?”

“How do I limit the data I retrieve from an Excel spreadsheet to specific rows and columns?”

“How do I use a linked server to import data from an Excel worksheet?”

“I’m using a linked server to import data from an Excel spreadsheet. Is there a way to limit the columns and rows I return when I query the data?”

“I’ve created a linked server that connects to an Excel workbook file and need to be able to view a list of spreadsheets within the workbook. Is there a way to retrieve this information?”

“How do I import Excel data through a linked server if the spreadsheet contains no header columns?”

“How do I join data in a SQL Server table with data from an Excel workbook?”

Article tags

Recommended

About the author

Robert Sheldon

Robert's contributions

Articles

Books

Top topics

Robert's latest contributions:

Getting started with Azure Database for PostgreSQL Flexible Server

Moving from SQL to MongoDB’s MQL

Working with the MongoDB Database Tools