How to Read Raw Files in SSIS

If you’ve ever used SQL Server 2005 Integration Services (SSIS), you will have come across raw files. Microsoft introduced them with SSIS as a mechanism for storing data on the local file system. The design of raw files enables data to be written very quickly but, unfortunately, to achieve this performance Microsoft has used a proprietary binary storage format. This means that you can’t open these files and view their contents using a normal editor. What’s more frustrating is that Microsoft has yet to supply a tool for reading these files.

To get around this limitation, I developed my own custom “Raw File Reader” tool, which this article briefly describes.

Why raw files?

One of the big benefits of working with raw files is the ease with which you can add them to your packages. The raw file destination requires you to specify only the name of file to use and the columns you want to store. Conversely, if you use a text file destination, then you have to build the structure of the file and ensure it makes your data flow. If you use an OLEDB destination, then you have to have a database and also a table created to store the data.

Their ease of configuration makes raw files ideal for storing errors in your data flows. In the past I’ve been guilty of configuring certain components of my packages to ignore failures, on the assumption that it “can’t error”. Of course, this isn’t ideal and a much better approach is to capture all errors and handle them appropriately. This is where raw files are very useful. All you need to do is direct your error flows to a raw file destination, specify and filename and select the columns you want to store. If errors do occur, then they will be stored in the raw files.

So, you’re storing any errors in a raw file, and now you want to review them – but you can’t because of the proprietary format used. This is the exactly the srot fo scenario for which my raw file reader is designed.

Using the Raw File Reader

The Raw File Reader is designed to make it very simple to read the contents or raw files. A few of the key features are as follows ( a full list can be seen at the above link):

  • Read any SSIS Raw file
  • Results displayed in a grid
  • Columns automatically resized
  • Order of columns can be changed
  • Data can be copied from the results grid using CTRL+C
  • Column headers can be included or excluded from the copied data
  • Notification (and option to reload) when the currently loaded raw file has changed i.e. package is rerun

Opening a raw file

The simplest operation is to open a raw file (File | Open). This process will, by default, read the raw file configuration and then immediately read the data from the raw file. The configuration is displayed in a list at the top of the form and the data is displayed in a grid at the bottom of the form (you can change this default behaviour so that the data is not automatically loaded – see later).

387-image001.gif

Once the data from your raw file has been loaded you can sort the data and copy it into another application. Once a raw file has successfully been read, the filename will appear in the Recent Files section of the File menu. The last 10 files opened will be displayed here.

Sorting and copying data

Data on the grid can be sorted and copied using standard “excel-style” functionality. To sort data, simply click on the column heading of the column you wish to sort. Clicking on a column heading for the second time will reverse the order of the sorting.

You can select a cell from the grid (or multiple cells, by dragging the cursor and/or using the CTRL key) and copy to the clipboard using CRL+C, from where they can be pasted into other applications. When the data is copied to the clipboard the area copied will be the maximum area required to include all the selected cells, however cells not selected will be blank. In the example below, we copy only the object name and the object type:

When the copied data is pasted into Excel you can see that, by default, column headings are included but the data in the unselected cells are not included (you can change this default behaviour – see the Options section, later).

Opening the raw files associated with a package

Opening every raw file individually may be time consuming. Therefore, the tool provides the option of opening every raw file associated with a package. Simply, open the package configuration, identify the raw file destinations and open the files that are specified for these destinations. At present the filename for the raw file destinations must be configured directly, not via expressions or variables.

When you open the package, all of the raw files for the package will be opened, and can be organized using the options in the Window menu. As with opening raw files, recently opened packages are included in the Recent Projects list on the File menu.

Working with raw files whilst debugging packages

The Raw File Reader will automatically reload a raw file when it detects the file has changed. This is a configurable option and so can be turned off if required. If you turn this option off you can manually reload the file by using the Read File command in the Tools menu.

Reading large files

Raw files are often large. If you are using large files I recommend that you disable the option that enables the raw file data to be loaded as soon as the file is opened (see the Options section). Once the configuration has been read, you choose to load only the required columns, by checking and un-checking the columns in the Raw file columns pane at the top of the window. This can drastically reduce the time it takes to load a file. If you want to select/de-select more than one column at a time, select them using the SHIFT or CTRL keys and then click on one of the check boxes. All the selected items will then be changed to the state of the checkbox selected.

Once you have selected the columns you want, the raw file can be read by using the Read File command in the Tools menu.

Options

The configurable options can be accessed by the Options menu in the Tools menu. The copy options configures when headers are included with copied cells.

Option

Description

Always include headers

The column headers, for the cells copied, are always included, irrespective of the number or distribution of cells selected.

Never include headers

Headers are never included

Auto include headers

Column headers are only included if the cells selected are in more than one column. This enables you to copy data from a single column without column headers being added.

Other options are as follows:

Option

Description

Reload file when it changes

This enables the application to reload the data for a raw file automatically when the application detects the raw file has changed

Load file when opened

If selected the data from a raw file is read immediately that a file is opened. If deselected then only the raw file configuration is read when the file is opened.

Raw file extensions

There is no standard naming for raw file extensions, this allows you to configure the extension that the Open File dialog uses to determine a raw file. Multiple extensions can be specified by separating them by semi colons “;”

Future enhancements

I am looking to develop the application further and would appreciate comments on what features you would like to see in future versions. Some of those currently on my list are

  1. Changing the raw file columns pane to display the meta data for each column i.e. data types, sizes etc, this will allow sorting of columns
  2. Row numbers
  3. Extra column headers details
  4. Paging rows
  5. Monitoring a folder for raw files