SQL Server has never been short of ways to read from and write to files and it is always better to use the standard techniques provided by SQL Server where possible. However, most of them are really designed for reading and writing tabular data and aren’t always trouble-free when used with large strings or relatively unstructured data.
For reading tabular data from a file, whether character-delimited or binary, there is nothing that replaces the hoary old Bulk Copy Program (BCP), which underlies more esoteric methods such as Bulk Insert. It is possible to read text-based delimited files with ODBC, simple files can be read and written-to using xp_cmdshell, and you will find that OSQL is wonderful for writing results to file, but occasionally I’ve found I need to do more than this.
Thankfully, when armed with OLE Automation and the FileSystem Object (FSO), all sorts of things are possible. The FileSystem Object was introduced into Windows to provide a single common file-system COM interface for scripting languages. It provides a number of handy services that can be accessed from TSQL. In this article, I provide examples of stored procedures that use this interface to allow you to:
- Read lines of text from a file
- Read a file into a SQL Server data type
- Write out to a file
- Get file details
- Tidy up XML, XHTML or HTML code
I’ll provide a few details on the FSO along the way, but let’s start with examples of some of these procedures in action. You’ll need to enable OLE Automation on your test server in order to follow along.
Reading lines from a file
I have often pined for a simple function that will read information a line at a time, and to present to me a ‘fake table’ where each line of text is a row, with a primary key based on the line number. With such a function, one can then do one’s own parsing and checking of data.
Well, here it is. Create the uftReadFileAsTable stored procedure in your test database, and try it out with something like:
Select line from
where line not like '#%'
--where line doesnt begin with a hash
Just fill in an existing file name and path to the file you wish to read, instead of ‘MyPath‘ and ‘MyFileName‘, and away you go.
This is a method I use for reading web logs and gathering usage statistics. It is also useful where the data feed has to be validated before one can parse it into the final SQL data format.
Reading a file into a SQL Server data type
This is all very well, but how about something that reads a file in one gulp into a varchar or XML datatype? Perhaps you need to extract data from HTML, XHTML or some other format. Create the ufsReadfileAsString procedure and try something like…
Select dbo.ufsReadfileAsString ('MyPath','MyFileName')
Writing out a file
No problem – just create spWriteStringToFile and try:
execute spWriteStringToFile 'This article describes how to fully access the
local filesystem from SQL Server. It shows a
way of reading and writing data to file, and
accessing the details of the server's
filesystem using OLE Automation to access
the filesystem object'
The path you use instead of ‘MyPath’ will have to exist, in this example.
Getting file details
If you need to find out the attributes of a particular file, then try out spFileDetails:
Execute spFileDetails 'c:\autoexec.bat'
More on the FSO and OLE Automation
There are all sorts of things you can do with the FSO. You can copy files, move files, create folders, delete files, get the names of special directories, and so on. This may sound esoteric, but sometimes the simplest backup procedures require such operations. I pause here just to give some minimum necessary background details on FSO, and on OLE Automation.
The Filesystem Object, on which all the stored procedures in this article rely, is a component of the scripting runtime library. As it is a COM object, it is readily accessible from the set or stored procedures built-in to SQL Server called the OLE Automation Stored Procedures, These allow a connection, through T-SQL commands, to create and use COM-based objects.
Each OLE Automation stored procedure returns an integer code that is the HRESULT returned by the underlying OLE Automation operation. When an error happens, the HRESULT returns an error code rather than the 0 which signals success, which then has to be turned into a meaningful error message with the sp_OAGetErrorInfo procedure.
The full set of OLE automation procedures are:
The attached files have many examples on how to use these procedures and they are well covered on Books on Line, so I won’t bother to repeat the details here.
The FileSystem Object
The FileSystem Object is a COM object that is provided primarily for scripting languages such as Jscript, ASP, and VBscript. It provides everything that is necessary for file or folder operations. It also allows for most simple file reading and writing operations, and to gather information about drives, directories and files.
The FSO allows you to manipulate and shred path names, to copy, move, delete or create files or folders. You can check for their existence or get a number of objects and collections. These are…
- Drive Object
- Drives Collection
- File Object
- Files Collection
- Folder Object
- Folders Collection
- TextStream Object
This provides information about a logical drive (physical, or network) attached to the system, such as its share name, type, total space, and how much room is available
This will give a list of the physical or logical drives attached to the system and includes all drives, regardless of type. (Removable-media drives need not have media inserted for them to appear in this collection).
This will allow you to create, delete, or move a file, and to find out the file name, path, and other properties. It has a method to open a file as a text stream.
This provides a list of all files contained within a folder.
The Folder object has methods that allow you to create, delete, or move folders. Also there are several properties that return folder names, creation date, paths and so on.
This provides a list of all the folders within a Folder.
This object is extraordinarily useful as it provides the means to read and write to text files.
All of these methods, properties and collections are readily available from Books on Line and W3Schools. I also find Dino Esposito’s book ‘Windows Script Host’ (Wrox 1999) very useful
The HTML Tidier
We end up with a rather more esoteric procedure that we will throw into the pot just to show what can be achieved with FSO. This is a procedure that corrects and tidies up any XML, XHTML or HTML. It will take any file and format it prettily and check for any errors. It is an example which illustrates several techniques and shows how one can integrate a DOS application.
This stored procedure requires the latest version of HTMLTidy (March 2007) to be installed on your server. I put it in System32, as it is a simple MSDOS.EXE file.
HTML Tidy was written by Dave Raggett, who works on assignment to the World Wide Web Consortium where he is the W3C lead for Voice and Multimodal. It has become the standard way of checking XML and HTML files for syntax. It is a remarkable tool that combines a Lint and a prettifier. It has now been adopted by an enthusiastic band of people who work through W3C and SourceForge to maintain it. It also requires you to set the configuration that I have embedded in the source of the stored procedure, but which you can supply as a parameter to over-ride the defaults I have chosen. Before twiddling with these parameters you’d be best advised to check the documentation at http://tidy.sourceforge.net/docs/quickref.html
Once everything is in place, all you have to do is to supply the input as a string and the spHTMLtidy stored procedure will put the tidied version in the output string and fill another string with all the warning and error information.
Declare @ProcessedFile varchar(max)
Declare @Messages varchar(max)
Declare @UnprocessedFile varchar(max)
Select @UnprocessedFile = '<html>
<head <title>awful HTML</head>
<p>This is my first real HTML<br><div>
<table><tr><th>What a mess</tr>
<tr><td></td>this is supposed to be in the table</tr><td></td>
<tr> this is awful<td>and this is a line</td></tr>
Execute spHTMLtidy @UnprocessedFile, @processedFile output,
This will produce a tidied up XHTML like this…
<?xml version="1.0" encoding="us-ascii"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
This is my first real HTML
this is supposed to be in the table this is awful
What a mess
and this is a line
If you are using SQL Server 2000, then I’m sorry to say you will be restricted to source code of 8000 characters, but I guessed that this routine would be more useful in SQL Server 2005 which has a native XML data type for which this routine would be ideal.
SQL Server is not designed as a means of file and directory manipulation, and for most of the time it is not the appropriate place in an application to do it. However, now and then in the life of a DBA or database developer, it is nice to have the functionality to hand and, for this sort of work, the File System Object is ideal. I’ve used it mostly in combination with DMO for automating routine DBA chores, such as copying and updating server configurations. It is particularly handy for creating test cells, and test configurations. I also find it is ideal for the occasional one-off task, such as importing and extracting the data from a large number of XHTML files, or indexing a large text-based repository of data.
Phil returns to this theme in The TSQL of Text Files and also, with Robyn Page, in workbenches such as Importing Text-based data: Workbench. they show more advanced ways of processing text-based information in Getting HTML Data: Workbench and TSQL Regular Expression Workbench. Even Phil Factor codes faster with SQL Prompt…download a free trial version and check it out.