Simple Talk is now part of the Redgate Community hub - find out why

Creating CSV Files Using BCP and Stored Procedures

Nigel Rivett demonstrates some core techniques for extracting SQL Server data into CSV files, focussing on a stored procedure approach that gives complete control of the format of the extracted data.

This article focuses on the use of the Bulk Copy Program (BCP) to create CSV files. Although it is possible to create a CSV file using DTS or SSIS, using BCP is often simpler and more efficient.

I use master..sysobjects as an example table to extract.

Create a simple CSV file

The simplest way to copy data from a database table to file is to use the basic BCP command:

The basic format for the BCP command for creating a CSV file is as follows:

BCP <table> out <filename> <switches>

The switches used here are:

  • -c Output in ASCII with the default field terminator (tab) and row terminator (crlf)
  • -t override the field terminator with “,”
  • -T use a trusted connection. Note that U -P may be used for username/password
  • -S connect to this server to execute the command

Note that, like DTS/SSIS, BCP is a client utility, hence you need to supply the connection information.

For transfer of data between SQL servers, in place of -c, use -n or -N for native data format (-N = Unicode). This is much faster and avoids data conversion problems. Please refer to the previous BOL link for the complete format of the BCP command.

As BCP is a command line utility it is executed from T-SQL using xp_cmdshell. Create a directory called BCP on your c: drive and execute:

Other field and row delimiters

Often, character data includes commas which will be interpreted in the file as a field terminator. A simple way to cater for this is to use a different field delimiter which does not appear in the data. Commonly used characters are pipe (|):

And caret (^):

The terminators are not limited to a single character, so if necessary you can use (|^):

Note that this will increase the size of the file and so slow down the import/export. Another way to cater for embedded commas is to “quote encapsulate” character data – more about that later.

It is unusual to need to change the row terminator from the default of crlf (carriage return, line feed) but occasionally you will need the cr or lf on its own.

To do this use the hex value cr = 13 = 0x0D, lf = 10 = 0x0A. If you can’t remember these values they are easily obtained, as follows:

To use these in the BCP command for the row terminator the -r switch is used:

When the resulting file is opened in notepad the row terminators will not end the line – the character should appear as a non-ASCII character (a little oblong). If opened or copied to query analyser (or management studio) these will be interpreted as line breaks and the file will be more readable.

The terminator characters can be encapsulated with double quotes in the command to allow for space. I like to do this for anything other than a single character delimiter. The following example gives a crlf row terminator and | ^ field terminator.

Formatting the extracted data

If you do not require all the columns/rows, or you wish to alter the data, a simple method is to create a view. This can be used in the BCP command in the same way as a table. The following view allows you to extract just the name, the formatted create date and time and then order the results according the date created (most recent first):

Now we can quote encapsulate the extracted data by including the formatting in the view:

Note:
Quote encapsulation can also be performed by use of a format file.

Complete control – stored procedures

If you examined the BCP syntax in BOL, you might have noticed that it is possible to extract from a query by using the queryout keyword in place of out.

So, for example, an equivalent but neater version of the previous code extract would place the ORDER BY clause in the BCP statement rather than the view:

Of course, strictly speaking, the view was not necessary at all as the query could have been included in the BCP statement but that can get difficult to maintain.

The queryout method allows you to BCP from the result of a stored procedure, which opens up a lot of possibilities and offers a lot of control over the file format. For anything other than a simple table extract I would tend to use this method rather than a view. I would also format each line within the stored procedure. This means that the formatting can be tested independently from the file creation.

Employing this technique, the above extract becomes:

More complex formatting

We can now change the format of the extracted data to anything we require. For example, to include column names at the top and a rowcount at the end:

This stored procedure will format the data as required, but when it is run via the BCP command it receives an error:

“Invalid object name ‘#a'”

Using the profiler you can see that when using queryout, the query is called twice. BCP tries to get the result set format by calling the stored procedure with fmtonly. This does not execute the stored procedure but returns the format of the resultset. Unfortunately, it does not work if the stored procedure creates and uses a temp table. This problem can be circumvented by including a set fmtonly off command in the execution:

Note that the column terminator is not used since the resultset is a single column.

Editor’s Note
This was the only piece of code that I failed to get working. The stored procedure executed fine outside the BCP, but when I tried to run the above command I received an error:

SQLState = HY010, NativeError = 0
Error = [Microsoft][SQL Native Client]Function sequence error
NULL

The author was unable to replicate this error. Is anyone aware of what might be causing this on my machine?
End Editor’s Note

Look at this execution using the profiler and you will see set fmtonly off followed by set fmtonly on. There is a potential problem with this though: it means that the stored procedure is executed twice, once for the format and once to extract the data – due to the fmtonly setting, both these calls will produce a resultset. This needs to be taken into account when considering how long the process will take and its impact on your system.

As the stored procedure is executed twice it should not contain any updates of permanent data. Particularly if data is flagged as exported then the flagging will be performed on the first call and no data will be extracted for the second call. All in all, this method should be used with caution and comments (warnings) placed in the stored procedure for the aid of future maintenance.

To get round all of these problems, use the stored procedure to insert into a table then, from the BCP, extract from that table. This gives you added features that can be valuable on systems that are performing a lot of extracts:

  • It gives a record of the data extracted – if the file is lost it can be recreated.
  • The data extracted can be presented to a user for viewing from the database.
  • If there are problems with the file, the support can see the data without needing access to the file.

For this process, the format stored procedure, s_bcpMasterSysobjects, will insert into the table using an identifier and the BCP will extract using the same identifier. This process can be controlled be (a control) stored procedure which would allocate the identifier and pass it to the format stored procedure which inserted the extracted data into a table with that identifier. It then calls another stored procedure or in-line code to create files (via BCP) for each identifier which is not marked as complete. The identifier is flagged as complete after the file creation. In this way the file creation becomes independent of the extract of the data.

This identifier can be passed to both processes from a controlling stored procedure or the format stored procedure can get the next available identifier and the extract can flag the identifier as extracted – so it extracts everything that has not been extracted, allowing the export to be independent of the extract.

Which technique you use will depend on your system. The following code demonstrates the use of a controlling stored procedure:

Now the data is extracted via:

You can view the data extracted via:

Alternatively, you can use the following, which will also give the date that the data was extracted:

Note:
For the extract I have given three sort columns Seq1, Seq2, Seq3 which are all order ascending. This means that the extracting stored procedure needs to place the values in ascending order (see the date manipulation). This could have ascending and descending sort columns or the use could be dependent on the extract type.

For a system which is performing a lot of small exports I will usually implement this process, and a similar process for imports.

Extracting all tables from a database

This is usually performed to transfer data between servers/databases and so should use the native data format. The easiest way is to create the BCP statements from a query then execute them:

The result of this will be a series of BCP statements of the form:

which will extract all tables in the database. To import just change the “out” to “in”.

Summary

We have seen how to use BCP to perform a simple extract of a table to a CSV file, how to perform slightly more complex extracts using a view, extracting using a stored procedure to format the data and a system which allows the data to be extracted and saved in a table and viewed/exported from there.

How you log in to Simple Talk has changed

We now use Redgate ID (RGID). If you already have an RGID, we’ll try to match it to your account. If not, we’ll create one for you and connect it.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.

Continue

Simple Talk now uses Redgate ID

If you already have a Redgate ID (RGID), sign in using your existing RGID credentials. If not, you can create one on the next screen.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.

Continue