- “How do I write an ad hoc query to retrieve data from an Excel spreadsheet?”
- “How do I retrieve data from an Excel spreadsheet that does not include a header row?”
- “How do I import data from an Excel workbook directly into a new table?”
- “When I try to retrieve data from an Excel worksheet, I receive an error saying that the Ad Hoc Distributed Queries component is turned off. How do I enable that component?”
- “I’m importing data from an Excel spreadsheet. One of its columns contains mostly numeric data with a few alphanumeric values mixed in. When I run my query, SQL Server replaces the alphanumeric values with NULL. How do I preserve the alphanumeric values?”
- “How do I import data from an Excel worksheet directly into an existing table in my SQL Server database?”
- “Can I import Excel data directly into a temporary table?”
- “How do I limit the data I retrieve from an Excel spreadsheet to specific rows and columns?”
- “How do I use a linked server to import data from an Excel worksheet?”
- “I’m using a linked server to import data from an Excel spreadsheet. Is there a way to limit the columns and rows I return when I query the data?”
- “I’ve created a linked server that connects to an Excel workbook file and need to be able to view a list of spreadsheets within the workbook. Is there a way to retrieve this information?”
- “How do I import Excel data through a linked server if the spreadsheet contains no header columns?”
- “How do I join data in a SQL Server table with data from an Excel workbook?”
- “When I try to import Excel data into SQL Server, I keep running into errors related to the OLE DB provider. I’ve received errors about not being able to initialize the provider, not being able to create an instance of the provider, and not being able to install a 64-bit version of the provider. Can you help?”
“How do I write an ad hoc TSQL query to retrieve data from an Excel spreadsheet?”
The simplest way to query Excel data is to use the OPENROWSET
function to connect directly to the Excel file. The OPENROWSET
function is a SQL Server built-in rowset function (nondeterministic) that lets you connect to an OLE DB data source. The function serves as a one-time alternative to creating a linked server.
The OPENROWSET
function returns data in a tabular format, which means we can use the function in a query’s FROM
clause as we would any table expression. For example, the following SELECT
statement retrieves the data on the vEmployee
spreadsheet in the EmployeeData1.xmls
file:
1 2 3 4 5 |
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; Database=C:\DataFiles\EmployeeData1.xlsx', [vEmployee$]); |
When calling the OPENROWSET
function to retrieve Excel data, we must pass several arguments into the function. The first is the name of the provider, in this case, Microsoft.ACE.OLEDB.12.0
. We can choose from one of two providers:
Microsoft.Jet.OLEDB.4.0
: Use on SQL Server 32-bit editions for Excel 2003 files (or earlier).Microsoft.ACE.OLEDB.12.0
: Use on SQL Server 32-bit editions for Excel 2007 files (or later) or on SQL Server 64-bit editions for any Excel files.
The second OPENROWSET
argument defines the provider string and includes multiple parts, divided by semi-colons. The provider string in the example above is divided into two parts. The first one specifies the file type, in this case, Excel
12.0
Xml
. Here’s the current breakdown on file types:
- For Excel ’97-2003 (.xls) files, use
Excel
8.0
. - For Excel 2007-2010 (.xlsx) files, use
Excel
12.0
Xml
. - For Excel 2007-2010 macro-enabled (.xlsm) files, use
Excel
12.0
Macro
. - For Excel 2007-2010 non-XML binary (.xlsb) files, use
Excel
12.0
.
The second part of the argument is the Database
option, which provides the file’s path and file name (C:\DataFiles\EmployeeData1.xlsx
). This of course will vary depending on the file’s location. Be aware, however, that when accessing files on a remote server, the permissions issue can get tricky. You might find it easier to copy the file to the local system before importing the data into SQL Server.
The third argument we pass into the OPENROWSET
function is the name of the spreadsheet we want to access. Notice that we need to add a dollar sign ($
) to the end of the name and enclose the name in brackets, as in [vEmployee$]
. (The spreadsheet received this name because the data comes from the vEmployee
view in the AdventureWorks2014
sample database.)
That’s all there is to retrieving data from an Excel spreadsheet. The query above returns 290 rows of employee data. We can also use the OPENDATASOURCE
function to achieve the same results. The OPENDATASOURCE
function is another SQL Server rowset function that can be used in place of a linked server, as shown in the following example:
1 2 3 4 |
SELECT * FROM OPENDATASOURCE('Microsoft.ACE.OLEDB.12.0', 'Data Source=C:\DataFiles\EmployeeData1.xlsx; Extended Properties=Excel 12.0 Xml')...[vEmployee$]; |
As you can see, the arguments we pass into the OPENDATASOURCE
function are structured a bit differently from those passed into the OPENROWSET
function, but the results are the same. Although the OPENDATASOURCE
and OPENROWSET
functions serve different purposes, when used to retrieve Excel data in this way, they basically work the same way.
“How do I retrieve data from an Excel spreadsheet that does not include a header row?”
By default, when you retrieve data from an Excel file, SQL Server assumes that the first row in the spreadsheet is a header that contains column names. However, you can include an option in your provider string that determines whether or not the first row should be treated as a header. That option, HDR
, can be set to YES
or NO
, with YES
being the default. The following SELECT
statement specifies the default value:
1 2 3 4 5 6 |
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; Database=C:\DataFiles\EmployeeData1.xlsx', [vEmployee$]); |
The statement returns the same 290 rows that were returned in the previous examples. However, if we want to specify that no header row exists, we can change the setting to NO
:
1 2 3 4 5 6 |
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=NO; Database=C:\DataFiles\EmployeeData1.xlsx', [vEmployee$]); |
When using the HDR
option, you have to be careful that you don’t inadvertently assume there’s no header row when there actually is, or vice versa. In this case, we assumed incorrectly. The following table shows part of the results returned by the SELECT
statement:
F1 |
F2 |
F3 |
F4 |
F5 |
NULL |
FirstName |
MiddleName |
LastName |
JobTitle |
259 |
Ben |
T |
Miller |
Buyer |
278 |
Garrett |
R |
Vargas |
Sales Representative |
204 |
Gabe |
B |
Mares |
Production Technician – WC40 |
78 |
Reuben |
H |
D’sa |
Production Supervisor – WC40 |
255 |
Gordon |
L |
Hee |
Buyer |
66 |
Karan |
R |
Khanna |
Production Technician – WC60 |
270 |
François |
P |
Ajenstat |
Database Administrator |
22 |
Sariya |
E |
Harnpadoungsataya |
Marketing Specialist |
161 |
Kirk |
J |
Koenigsbauer |
Production Technician – WC45 |
124 |
Kim |
T |
Ralls |
Stocker |
10 |
Michael |
NULL |
Raheem |
Research and Development Manager |
248 |
Mike |
K |
Seamans |
Accountant |
175 |
Reed |
T |
Koch |
Production Technician – WC30 |
155 |
Fadi |
K |
Fakhouri |
Production Technician – WC20 |
144 |
Paul |
R |
Singh |
Production Technician – WC20 |
192 |
Brenda |
M |
Diaz |
Production Supervisor – WC40 |
The problem, as you can see, is that the column names contained in the spreadsheet are treated like any other row of data and show up here as the first row. Notice, too, that SQL Server automatically uses the default Excel column names, F1
, F2
, F3
, etc. Now let’s retrieve data from a spreadsheet that actually is header-less:
1 2 3 4 5 |
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=NO; Database=C:\DataFiles\EmployeeData2.xlsx', [vEmployee$]); |
This time around, we’re retrieving data from the EmployeeData2.xmlx
file, which lists the employee data without the column names. The following table shows part of the results now returned by our statement:
F1 |
F2 |
F3 |
F4 |
F5 |
259 |
Ben |
T |
Miller |
Buyer |
278 |
Garrett |
R |
Vargas |
Sales Representative |
204 |
Gabe |
B |
Mares |
Production Technician – WC40 |
78 |
Reuben |
H |
D’sa |
Production Supervisor – WC40 |
255 |
Gordon |
L |
Hee |
Buyer |
66 |
Karan |
R |
Khanna |
Production Technician – WC60 |
270 |
François |
P |
Ajenstat |
Database Administrator |
22 |
Sariya |
E |
Harnpadoungsataya |
Marketing Specialist |
161 |
Kirk |
J |
Koenigsbauer |
Production Technician – WC45 |
124 |
Kim |
T |
Ralls |
Stocker |
10 |
Michael |
NULL |
Raheem |
Research and Development Manager |
248 |
Mike |
K |
Seamans |
Accountant |
175 |
Reed |
T |
Koch |
Production Technician – WC30 |
155 |
Fadi |
K |
Fakhouri |
Production Technician – WC20 |
144 |
Paul |
R |
Singh |
Production Technician – WC20 |
192 |
Brenda |
M |
Diaz |
Production Supervisor – WC40 |
166 |
Jack |
S |
Richins |
Production Supervisor – WC30 |
Now let’s retrieve data from the same file, only this time around we’ll set the HDR
option to YES
, just to mix things up a bit:
1 2 3 4 5 |
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; Database=C:\DataFiles\EmployeeData2.xlsx', [vEmployee$]); |
Not surprisingly, this can skewer our results even more, with the first row of data treated as column names, as shown in the following table:
259 |
Ben |
T |
Miller |
Buyer |
278 |
Garrett |
R |
Vargas |
Sales Representative |
204 |
Gabe |
B |
Mares |
Production Technician – WC40 |
78 |
Reuben |
H |
D’sa |
Production Supervisor – WC40 |
255 |
Gordon |
L |
Hee |
Buyer |
66 |
Karan |
R |
Khanna |
Production Technician – WC60 |
270 |
François |
P |
Ajenstat |
Database Administrator |
22 |
Sariya |
E |
Harnpadoungsataya |
Marketing Specialist |
161 |
Kirk |
J |
Koenigsbauer |
Production Technician – WC45 |
124 |
Kim |
T |
Ralls |
Stocker |
10 |
Michael |
NULL |
Raheem |
Research and Development Manager |
248 |
Mike |
K |
Seamans |
Accountant |
175 |
Reed |
T |
Koch |
Production Technician – WC30 |
155 |
Fadi |
K |
Fakhouri |
Production Technician – WC20 |
144 |
Paul |
R |
Singh |
Production Technician – WC20 |
192 |
Brenda |
M |
Diaz |
Production Supervisor – WC40 |
166 |
Jack |
S |
Richins |
Production Supervisor – WC30 |
112 |
John |
P |
Evans |
Production Technician – WC50 |
What all this points to is that, when retrieving data from an Excel spreadsheet, it helps to know what the data looks; otherwise, you can end up with all sorts of unusual results.
“How do I import data from an Excel workbook directly into a new table?”
Once you’ve gotten down how to use the OPENROWSET
function to retrieve data from an Excel spreadsheet, it’s a relatively easy to slip that data into a new table. (Refer to the previous two questions if you have any questions about using the function.) The simplest way to create the table on the fly is to use a SELECT...INTO
statement to create the table when you import the data, as shown in the following example:
1 2 3 4 5 |
SELECT * INTO EmployeeInfo1 FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; Database=C:\DataFiles\EmployeeData1.xlsx', [vEmployee$]); |
All we’ve done here is specify that the database engine should create the EmployeeInfo1
table when we run the query. The data from the Excel spreadsheet will then be inserted into the new table.
“When I try to retrieve data from an Excel worksheet, I receive an error saying that the Ad Hoc Distributed Queries component is turned off. How do I enable that component?”
When you first try to retrieve data from an Excel spreadsheet, there’s a good chance you’ll run into this error; however, the issue is easily addressed. Let’s look at the problem in action. Suppose we try to run the following SELECT
statement:
1 2 3 4 5 |
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; Database=C:\DataFiles\EmployeeData1.xlsx', [vEmployee$]); |
Nothing new here. As we saw in the previous example, we’re simply trying to retrieve the data from the EmployeeData1.xlsx
spreadsheet. However, instead of returning the actual data, we get the following error:
1 2 |
Msg 15281, Level 16, State 1, Line 181 SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure. For more information about enabling 'Ad Hoc Distributed Queries', search for 'Ad Hoc Distributed Queries' in SQL Server Books Online. |
Unlike a lot of SQL Server error messages, this one is actually useful. All we need to do is use the sp_configure
system stored procedure to enable the specified option. If you don’t have the necessary permissions, then get an administrator to do it. The only trick is to first enable the Show
Advanced
Options
setting and then set the Ad
Hoc
Distributed
Queries
option, as shown in the following T-SQL script:
1 2 3 4 5 6 7 |
EXEC sp_configure 'Show Advanced Options', 1; RECONFIGURE; GO EXEC sp_configure 'Ad Hoc Distributed Queries', 1; RECONFIGURE; GO |
In each case, we’re setting the option to 1
to enable that option. Notice too that we must run a RECONFIGURE
statement each time we set an option. When the statements run, you should see results similar to the following:
1 2 |
Configuration option 'show advanced options' changed from 0 to 1. Run the RECONFIGURE statement to install. Configuration option 'Ad Hoc Distributed Queries' changed from 0 to 1. Run the RECONFIGURE statement to install. |
Once the options have been enabled, we should be able to run our SELECT
statement and retrieve the Excel data.
“I’m importing data from an Excel spreadsheet. One of its columns contains mostly numeric data with a few alphanumeric values mixed in. When I run my query, SQL Server replaces the alphanumeric values with NULL. How do I preserve the alphanumeric values?”
When you retrieve data from an Excel spreadsheet, the OLE DB provider guesses at a column’s data type by sampling rows and going with the majority. If that column contains mostly numeric data, chances are, you’ll end up with a numeric column, in which case, your alphanumeric stragglers are returned as NULL
values
For example, suppose we start with the following SELECT...INTO
statement:
1 2 3 4 5 |
SELECT * INTO EmployeeInfo2 FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; Database=C:\DataFiles\EmployeeData1.xlsx', [vEmployee$]); |
All we’re doing is importing the Excel data directly into the new table, EmployeeInfo2
. Now let’s go into Table Designer in SQL Server Management Studio (SSMS) to look at the data types assigned to the columns. To get there, right click the table in Object
Explorer
and then click Design
. What pops up should look similar to the screenshot shown in Figure 1.
First off, notice that all the columns have been configured with either the float
or nvarchar(255)
data type. It does not matter that the numeric values contain no decimals or that the character data is all much shorter than 255 characters. This is what the provider does.
Of particular note is the PostalCode
column. In the Excel spreadsheet, several employees come from Canada and England, both of which use postal codes based on alphanumeric values, such as T2P
2G8
. The other 287 employees live in places with numeric postal codes. Because of how the OLE DB provider works, when we import the data into the EmployeeInfo2
table, the alphanumeric postal codes are replaced with NULL
values.
Of course, losing data is never a good sign, but we can get around this problem by adding the IMEX
option to our provider string and setting its value to 1
, as shown in the following SELECT...INTO
statement:
1 2 3 4 5 |
SELECT * INTO EmployeeInfo3 FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; IMEX=1; Database=C:\DataFiles\EmployeeData1.xlsx', [vEmployee$]); |
When we include the IMEX
option and set it to 1
, the OLE DB provider will import our problem column as intermixed data, thus avoiding the NULL
replacements. We can now return to Table Designer (shown in Figure 2), where we’ll see that the PostalCode
column is configured with the nvarchar
data type.
If we were to retrieve data from the EmployeeInfo3
table, we’d find that the alphanumeric postal codes have been preserved.
“How do I import data from an Excel worksheet directly into an existing table in my SQL Server database?”
To import Excel data directly into an existing table, you can use an INSERT...SELECT
statement along with the OPENROWSET
function, like we’ve seen it used in previous examples. To demonstrate how this works, let’s start by creating the following table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
IF OBJECT_ID('EmployeeInfo4', 'U') IS NOT NULL DROP TABLE EmployeeInfo4; GO CREATE TABLE EmployeeInfo4 ( BusinessEntityID INT NOT NULL, FirstName NVARCHAR(50) NOT NULL, MiddleName NVARCHAR(50) NULL, LastName NVARCHAR(50) NOT NULL, JobTitle NVARCHAR(50) NOT NULL, PhoneNumber NVARCHAR(25) NULL, PhoneNumberType NVARCHAR(50) NULL, EmailAddress NVARCHAR(50) NULL, EmailPromotion INT NOT NULL, AddressLine1 NVARCHAR(60) NOT NULL, City NVARCHAR(30) NOT NULL, StateProvinceName NVARCHAR(50) NOT NULL, PostalCode NVARCHAR(15) NOT NULL, CountryRegionName NVARCHAR(50) NOT NULL ) |
Once we have our table, we can run the following INSERT...SELECT
statement to import the data:
1 2 3 4 5 6 |
INSERT INTO EmployeeInfo4 SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; IMEX=1; Database=C:\DataFiles\EmployeeData1.xlsx', [vEmployee$]); |
Assuming that we have a spreadsheet whose data is relatively compatible with the target columns, the data should be implicitly converted, where necessary, and inserted into the table with no problem. However, if the data is not compatible, we’ll run into problems. For example, suppose we had created the table with a JobTitle
column configured as nvarchar(5)
. If we then ran the INSERT...SELECT
statement, we’d receive the following error:
1 2 3 |
Msg 8152, Level 16, State 13, Line 274 String or binary data would be truncated. The statement has been terminated. |
Again, you must be mindful when importing Excel data. In some cases, you might have to work a bit to get the data into the format you need.
“Can I import Excel data directly into a temporary table?”
Yes, you can import the data into a temporary table just like you can a regular table. In fact, you might find that using a temporary table is a good way to handle a spreadsheet that cannot be easily imported into your existing table. In such cases, you can stage the data in a temporary table in order to transform the data before inserting it into its final destination.
Let’s look at a simple example of how this might work. Suppose you have a spreadsheet that looks something like the one shown in Figure 3.
Notice that we have extra rows at the top, a spreadsheet title (AdventureWorks
Employees
), and three cells merged into one (FULL
NAME
). The actual headings don’t come until line 5 and only after that do we get to the actual data.
We can start by pulling the data into a temporary table, as shown in the following SELECT...INTO
statement:
1 2 3 4 5 |
SELECT * INTO #EmployeeInfo5 FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; IMEX=1; Database=C:\DataFiles\EmployeeData3.xlsx', [vEmployee$]); |
At this point, if we were to retrieve the data in #EmployeeInfo5
, we would see an unusual mix of headers, as shown in the following table:
AdventureWorks Employees |
F2 |
F3 |
F4 |
F5 |
NULL |
NULL |
NULL |
NULL |
NULL |
NULL |
FULL NAME |
NULL |
NULL |
NULL |
BusinessEntityID |
FirstName |
MiddleName |
LastName |
JobTitle |
259 |
Ben |
T |
Miller |
Buyer |
278 |
Garrett |
R |
Vargas |
Sales Representative |
204 |
Gabe |
B |
Mares |
Production Technician – WC40 |
78 |
Reuben |
H |
D’sa |
Production Supervisor – WC40 |
255 |
Gordon |
L |
Hee |
Buyer |
66 |
Karan |
R |
Khanna |
Production Technician – WC60 |
270 |
François |
P |
Ajenstat |
Database Administrator |
Although the table shows only part of the results, it’s enough to demonstrate that the data is a bit out of whack. The column names are a mix of the spreadsheet title and default column names, the first two data rows are mostly NULL
values (except for FULL
NAME
), and the third row contains columns names, rather than actual data.
However, once we have the data into a temporary table, we can then convert and filter the data as necessary, as shown in the following SELECT...INTO
statement:
1 2 3 4 5 6 7 |
SELECT CAST([AdventureWorks Employees] AS INT) AS EmployeeID, CAST((F2 + ' ' + F4) AS NVARCHAR(100)) AS FullName, CAST(F5 AS NVARCHAR(50)) AS JobTitle INTO EmployeeInfo5 FROM #EmployeeInfo5 WHERE ISNUMERIC([AdventureWorks Employees]) = 1; |
Notice that we’ve had to work with the data on its own terms, using a mix of columns names, functions, and expressions to transform the data before inserting it into the EmployeeInfo5
table. If we were then to retrieve the data from the table, we’d find results more like what we’re looking for, as shown in the following table:
EmployeeID |
FullName |
JobTitle |
259 |
Ben Miller |
Buyer |
278 |
Garrett Vargas |
Sales Representative |
204 |
Gabe Mares |
Production Technician – WC40 |
78 |
Reuben D’sa |
Production Supervisor – WC40 |
255 |
Gordon Hee |
Buyer |
66 |
Karan Khanna |
Production Technician – WC60 |
270 |
François Ajenstat |
Database Administrator |
22 |
Sariya Harnpadoungsataya |
Marketing Specialist |
161 |
Kirk Koenigsbauer |
Production Technician – WC45 |
124 |
Kim Ralls |
Stocker |
Although the table shows only some of the 290 rows now stored in EmployeeInfo5
, it’s enough to demonstrate the quality of the data and how you can use a temporary table to get there.
“How do I limit the data I retrieve from an Excel spreadsheet to specific rows and columns?”
In the previous question, we looked at how to use a temporary table to transform our data. This approach provides a way to limit the rows and columns we ultimately import into our tables. However, we can achieve the same results directly through the OPENROWSET
function.
Most of examples we’ve looked at so far have used the OPENROWSET
function to retrieve all data from a specific spreadsheet. We specified that spreadsheet as the function’s last argument. However, we can instead specify a SELECT
statement as the third argument:
1 2 3 4 5 |
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; IMEX=1; Database=C:\DataFiles\EmployeeData1.xlsx', 'SELECT * FROM [vEmployee$]'); |
In this case, we’re still returning all the data from the vEmployee
spreadsheet, only this time around we’re using a more formal approach, which essentially means we’re making more work for ourselves. The results are the same either way. However, the advantage of being able to do it this way is that we can create a more complex SELECT
statement in order to filter out columns and rows, as well as do some other transforming along with way:
1 2 3 4 5 6 7 8 9 10 11 |
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; IMEX=1; Database=C:\DataFiles\EmployeeData1.xlsx', 'SELECT BusinessEntityID AS EmployeeID, FirstName + '' '' + LastName AS FullName, JobTitle FROM [vEmployee$] WHERE CountryRegionName = ''Canada'' ORDER BY BusinessEntityID'); |
Notice that our SELECT
statement is still passed in as the third argument to OPENROWSET
. Also, because the argument must be enclosed in single quotes, we escape the single quotes within the text by adding another single quote. Now our outer SELECT
statement returns the following results:
EmployeeID |
FullName |
JobTitle |
278 |
Garrett Vargas |
Sales Representative |
282 |
José Saraiva |
Sales Representative |
We can achieve the same results by putting all the logic in the outer SELECT
statement and leaving the provider string alone:
1 2 3 4 5 6 7 8 9 10 11 |
SELECT BusinessEntityID AS EmployeeID, FirstName + ' ' + LastName AS FullName, JobTitle FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; IMEX=1; Database=C:\DataFiles\EmployeeData1.xlsx', [vEmployee$] ) WHERE CountryRegionName = 'Canada' ORDER BY BusinessEntityID; |
If you’re working with large quantities of data, you might see performance gains using one approach over the other. However, for the amount of data that’s usually in an Excel spreadsheet, this shouldn’t be much of an issue.
“How do I use a linked server to import data from an Excel worksheet?”
There might be times when you want to use a linked server to persist your connection to the Excel file. That way, you need to supply the connection information only once, making the queries themselves much simpler.
The first step is to use the sp_addlinkedserver
system stored procedure to add the linked server to your SQL Server instance, as shown in the following example:
1 2 3 4 5 6 7 |
EXEC sp_addlinkedserver @server = 'ExcelLinkSrv1', @srvproduct = 'Excel', @provider = 'Microsoft.ACE.OLEDB.12.0', @datasrc = 'C:\DataFiles\EmployeeData1.xlsx', @provstr = 'Excel 12.0 Xml; HDR=YES; IMEX=1'; GO |
Most of the information here should look familiar to you if you’ve been using OPENROWSET
to query Excel data. First, we need to name the linked server, in this case, ExcelLinkSrv1
, and then specify the product (Excel
), the provider, the path to the Excel file, and the provider string options.
When it comes time to retrieve the Excel data, we need only specify the name of the linked server and the spreadsheet, as shown in the following example:
1 |
SELECT * FROM ExcelLinkSrv1...[vEmployee$]; |
The statement returns all data from the spreadsheet, as we saw with OPENROWSET
. We can also use a linked server to import Excel data directly into a table, whether existing or new, just like we would normally do when inserting data.
“I’m using a linked server to import data from an Excel spreadsheet. Is there a way to limit the columns and rows I return when I query the data?”
When we discussed using the OPENROWSET
function to retrieve Excel data, we demonstrated how to qualify our queries to filter out rows and columns. We can do the same thing when using a linked server. One way to do it is to use the OPENQUERY
function in conjunction with the linked server. OPENQUERY
is yet another SQL Server rowset function that lets us specify a pass-through query on the linked server, which in our case, is the Excel spreadsheet.
Let’s demonstrate how this works. First, we create our linked server:
1 2 3 4 5 6 7 |
EXEC sp_addlinkedserver @server = 'ExcelLinkSrv2', @srvproduct = 'Excel', @provider = 'Microsoft.ACE.OLEDB.12.0', @datasrc = 'C:\DataFiles\EmployeeData1.xlsx', @provstr = 'Excel 12.0 Xml; HDR=YES; IMEX=1'; GO |
Once we’ve set up the linked server, we’re good to go. In our SELECT
statement, we include the OPENQUERY
function in the FROM
clause, just like OPENROWSET
. When we call OPENQUERY
, we pass in two arguments, the name of the linked server and our SELECT
statement, as shown in the following example:
1 2 3 4 5 6 7 8 |
SELECT * FROM OPENQUERY(ExcelLinkSrv2, 'SELECT BusinessEntityID AS EmployeeID, FirstName + '' '' + LastName AS FullName, JobTitle FROM [vEmployee$] WHERE CountryRegionName = ''Canada'' ORDER BY BusinessEntityID'); |
As expected, our first argument is ExcelLinkSrv2
and our second argument is a SELECT
statement that limits the number of columns and rows and concatenates several values. The following table shows our results:
EmployeeID |
FullName |
JobTitle |
278 |
Garrett Vargas |
Sales Representative |
282 |
José Saraiva |
Sales Representative |
We can return the same results by skipping the OPENQUERY
function and including our logic in the outer SELECT
statement:
1 2 3 4 5 6 7 |
SELECT BusinessEntityID AS EmployeeID, FirstName + ' ' + LastName AS FullName, JobTitle FROM ExcelLinkSrv2...[vEmployee$] WHERE CountryRegionName = 'Canada' ORDER BY BusinessEntityID; |
As with OPENROWSET
, you might want to take performance into consideration when deciding which strategy to use. For small files, however, it shouldn’t be much of an issue.
“I’ve created a linked server that connects to an Excel workbook file and need to be able to view a list of spreadsheets within the workbook. Is there a way to retrieve this information?”
Yes, SQL Server provides at least one method for retrieving the spreadsheet names-the sp_tables_ex
system stored procedure. To demonstrate how this works, let’s start with the following linked server:
1 2 3 4 5 6 7 |
EXEC sp_addlinkedserver @server = 'ExcelLinkSrv3', @srvproduct = 'Excel', @provider = 'Microsoft.ACE.OLEDB.12.0', @datasrc = 'C:\DataFiles\EmployeeData1.xlsx', @provstr = 'Excel 12.0 Xml; HDR=YES; IMEX=1'; GO |
We can now use the sp_tables_ex
system stored procedure to retrieve information about the target data source defined in the ExcelLinkSrv3
linked server, which in this case is the EmployeeData1.xlsx
file. To retrieve the data, we simply run the stored procedure and specify the linked server:
1 |
EXEC sp_tables_ex 'ExcelLinkSrv3' |
In this case, the stored procedure returns the results shown in the following table:
TABLE_CAT |
TABLE_SCHEM |
TABLE_NAME |
TABLE_TYPE |
REMARKS |
NULL |
NULL |
Emp2$ |
TABLE |
NULL |
NULL |
NULL |
Emp3$ |
TABLE |
NULL |
NULL |
NULL |
vEmployee$ |
TABLE |
NULL |
Because we’re dealing with an Excel file, the only column we’re interested in is TABLE_NAME
, which lists the names of the spreadsheets in that workbook (three, in this case). If we want to get at the spreadsheet names only, we have to come up with a workaround, such as creating a table variable to hold the data and then extracting the column from the variable, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
DECLARE @worksheets TABLE ( TableCat nvarchar(50), TableSchem nvarchar(50), WorkSheet nvarchar(50), TableType nvarchar(32), Remarks nvarchar(254) ); INSERT INTO @worksheets EXEC sp_tables_ex 'ExcelLinkSrv3'; SELECT WorkSheet FROM @worksheets; |
Pretty straightforward. We define the variable, insert data into the variable, and retrieve the column that contains the spreadsheet names. The SELECT
statement returns the following results:
WorkSheet |
Emp2$ |
Emp3$ |
vEmployee$ |
Another approach is to create a SELECT
statement that uses the OPENQUERY
function to run the stored procedure:
1 2 3 |
SELECT TABLE_NAME AS WorkSheet FROM OPENQUERY([srvXYZ\sqlsrv2014], 'EXEC sp_tables_ex ''ExcelLinkSrv3'''); |
This approach gets around the limitation in SQL Server of not being able to run a procedure directly within a SELECT
statement. All I’ve done here is to connect to a local instance of SQL Server 2014 and pass in the EXECUTE
statement as the function’s second argument. The statement returns the same results as the SELECT
statement in the preceding example.
To use this method, we might need to configure the SQL Server instance for data access. We can use the sp_serveroption
system stored procedure to set the Data
Access
option to TRUE
, as shown in the following example:
1 |
EXEC sp_serveroption 'srvXYZ\sqlsrv2014', 'Data Access', TRUE; |
This approach makes it easier to retrieve only the spreadsheet names, as long as you’re okay with the option being enabled. According to Microsoft documentation, Data
Access
lets you enable and disable a linked server for distributed query access. However, you can also use it on a local instance of SQL Server, as we’ve done here, to let you use the OPENQUERY
function in this way.
That said, one of the advantages of returning the list of columns in a table variable (or temporary table) is that you can create a WHILE
loop to iterate through the list in order to retrieve data from multiple spreadsheets.
“How do I import Excel data through a linked server if the spreadsheet contains no header columns?”
If you know how to control the header setting when using the OPENROWSET
function to perform ad hoc queries, you’ll have no problem controlling the setting if working with a linked server. When you create your linked server, simply set the HDR
option to NO
, as shown in the following example:
1 2 3 4 5 6 7 |
EXEC sp_addlinkedserver @server = 'ExcelLinkSrv4', @srvproduct = 'Excel', @provider = 'Microsoft.ACE.OLEDB.12.0', @datasrc = 'C:\DataFiles\EmployeeData2.xlsx', @provstr = 'Excel 12.0 Xml; HDR=NO; IMEX=1'; GO |
Once we’ve created our linked server, we can query the spreadsheets in the Excel file. However, we must use the default column names when referencing the Excel fields:
1 2 3 4 5 6 7 8 |
SELECT * FROM OPENQUERY(ExcelLinkSrv4, 'SELECT F1 AS EmployeeID, F2 + '' '' + F4 AS FullName, F5 AS JobTitle FROM [vEmployee$] WHERE F14 = ''Canada'' ORDER BY F1'); |
Notice that we’re using the F1
, F2
, F4
, F5
, and F14
column names, just as we would regular names. The SELECT
statement returns the results in the following table:
EmployeeID |
FullName |
JobTitle |
278 |
Garrett Vargas |
Sales Representative |
282 |
José Saraiva |
Sales Representative |
Of course, if the spreadsheet does include a header row at the top, we can simple set the HDR
option to YES
.
“How do I join data in a SQL Server table with data from an Excel workbook?”
When we use a SQL Server rowset function to retrieve data from an Excel spreadsheet, we can include the function as part of a join condition, as we would a regular table. For example, if we want to run an ad hoc distributed query, we can join the OPENROWSET
function to a database table or view, as shown in the following SELECT
statement:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT Emp1.BusinessEntityID AS EmpID, Emp1.FirstName + ' ' + Emp1.LastName AS FullName, Emp2.JobTitle, Emp2.CountryRegionName AS CountryRegion FROM HumanResources.vEmployee AS Emp1 INNER JOIN OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml; HDR=YES; IMEX=1; Database=C:\DataFiles\EmployeeData1.xlsx', [vEmployee$]) AS Emp2 ON Emp1.BusinessEntityID = Emp2.BusinessEntityID WHERE Emp2.CountryRegionName <> 'United States' ORDER BY Emp2.CountryRegionName; |
The OPENROWSET
function is basically returning a table that we’ve aliased as Emp2
. We’re joining the Emp2
table to the vEmployee
view in the AdventureWorks2014
database, which we’ve aliased as Emp1
. The join is based on the BusinessEntityID
column in each data source. The following table shows the SELECT
statement’s results:
EmployeeID |
FullName |
JobTitle |
CountryRegion |
286 |
Lynn Tsoflias |
Sales Representative |
Australia |
282 |
José Saraiva |
Sales Representative |
Canada |
278 |
Garrett Vargas |
Sales Representative |
Canada |
290 |
Ranjit Varkey Chudukatil |
Sales Representative |
France |
288 |
Rachel Valdez |
Sales Representative |
Germany |
Of course, this was an easy join to do because I created the spreadsheet by exporting the data through that view. However, the more the two data sources differ and the odder the spreadsheet’s structure, the more work you’ll have to do to join the tables. If it gets too bad, you might have to come up with a different approach, such as importing the Excel data into a temporary table and then joining that to the regular table.
“When I try to import Excel data into SQL Server, I keep running into errors related to the OLE DB provider. I’ve received errors about not being able to initialize the provider, not being able to create an instance of the provider, and not being able to install a 64-bit version of the provider. Can you help?”
Few issues cause as much consternation when trying to import Excel data as trying to get the provider right. For whatever reason, Microsoft does not make this an easy process, despite that fact that Microsoft builds SQL Server and Excel and an assortment of other data-driven technologies that they’re constantly hawking as the ultimate unified platform for integrated business intelligence. Given all the hype, you would think Microsoft would go through great pains to ensure that importing Excel data is as painless a process as possible. It does not.
In fact, SQL Server can be quite picky when it comes to importing Excel data. We’ve already discussed the issue of the provider itself, but let me reiterate. When importing data, you must use one of the following ADO providers:
Microsoft.Jet.OLEDB.4.0
: Use on SQL Server 32-bit editions for Excel 2003 files.Microsoft.ACE.OLEDB.12.0
: Use on SQL Server 32-bit editions for Excel 2007 files or on SQL Server 64-bit editions for any Excel files.
There are other issues as well, too many to cover here, but I’ll cover what I did on my machine in order to import Excel data into SQL Server from an .xlsx file.
First, I tried to install the 64-bit version of the ACE OLE DB provider because I’m running the 64-bit version of SQL Server. I discovered that I could not do so because the 32-bit version of Office 2010 was installed on my system. You can’t install the 64-bit provider on a computer running 32-bit Office. So Office got axed.
After I uninstalled Office, I installed the driver. I then ran the following T-SQL statements:
1 2 3 4 5 |
EXEC sp_msset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', 'AllowInProcess', 1; GO EXEC sp_msset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', 'DynamicParameters', 1; GO |
The sp_msset_oledb_prop
stored procedure lets you set properties associated with the ACE OLE DB provider. In this case, I enabled the AllowInProcess
and DynamicParameters
properties. You can also enable the properties through Object
Explorer
in SSMS by accessing the provider’s properties. The properties need to be turned on in order to run distributed queries.
Also, in case you haven’t already done so, you need to run the following statements to ensure that you can run ad hoc distributed queries (which we covered earlier):
1 2 3 4 5 6 7 |
EXEC sp_configure 'Show Advanced Options', 1; RECONFIGURE; GO EXEC sp_configure 'Ad Hoc Distributed Queries', 1; RECONFIGURE; GO |
In addition to all this, I had to restart my computer a time or two. I can’t recall for sure the order of events, but eventually I got everything working.
On the surface, this doesn’t seem so bad, but getting to this point was no easy task, in part because I was resistant to the idea that I had to uninstall Office 2010. (OpenOffice here I come.) I believe there are some other workarounds out there, but I finally grew weary of the process and opted to sacrifice Office.
So that’s my story. The problems you might bump into can very depending on whether you’re running a 32-bit version or 64-bit version of SQL Server and whether you’re retrieving Excel 2007 files or Excel 2003 files. In some cases, you might also have to adjust the permissions in the appropriate temp folder to get everything working. Which temp folder? It depends. It could be the one set up for the NetworkService
account or the one set up for LocalService
account or another temp folder altogether. Again, it depends on your particular configuration.
Here’s one place you might start looking for answers if your run into a roadblock: “Importing an Excel Spreadsheet into a SQL Server Database.” If that doesn’t help, there’s always Google.
Load comments