Should you use ADO.NET DataReader or DataSet?

In the beginning, there was the recordset, a central feature of classic ADO. In ADO.NET it was gone.

In the beginning, there was the recordset, a central feature of classic ADO. Like a Swiss army knife, it wasn’t the perfect tool for every job, but given the correct configuration, it could do whatever you needed it to do.

The recordset wasn’t always pretty to watch, however, and it came into prominence at a time when client/server applications for which it was optimized were being replaced by web applications. Its dark side included an amazing level of complexity and the number of special cases when it was used with particular providers.

So while the recordset could do almost any task, most code did not use it correctly-in some cases using inappropriate default values, and in others explicitly setting parameters in ways that were not optimal for the application at hand.

Enter ADO.NET

When I was first introduced to ADO.NET, then called ADO+, I was a little concerned: Calling this new technology by the same name as what we now know as classic ADO would cause no end of confusion. I was also troubled by the lack of scrollable cursors and a number of other features not supported in ADO.NET.

My concerns were well founded. Many classic ADO developers new to ADO.NET were befuddled by the lack of anything called a recordset. They expected a 2.0 sort of upgrade to classic ADO. Instead, ADO.NET was a totally new set of components, and it was optimized for the Internet world.

Access through DataReader

When accessing data with ADO.NET, you can use either a DataReader or a DataSet. I’ll start with the DataReader.

There are a number of data providers, such as SqlClient, included with ADO.NET, and each has its own implementation of the DataReader. All of the DataReader classes implement the IDataReader interface. While having different classes for the DataReader in each data provider might seems troublesome, you can usually specify a parameter or return value as an IDataReader and ignore the specific type of the DataReader. Unfortunately, there are a number of occasions when this is not possible. I’ll go into this a bit later.

So, what is a DataReader? It is a one-way, forward-only method of reading data. A common use of the DataReader to read through a result set would look like the following:

When you get the results into the DataReader, it’s important to note that the “cursor” or logical pointer into the returned data is just before the first row. You need to call .Read() once to read any data. .Read() will return true as long as there is an additional row to be read.

This pattern results in better code than the standard patterns used for classic ADO recordsets. When working with recordsets, the logical pointer is initially placed at the first row of data. To read records you must create a loop that contains a call to .MoveNext at the bottom of the loop. I cannot tell you the number of times I forgot this when working with recordsets in classic ADO. Whenever my application stopped when it should be reading data, I knew I had forgotten or misplaced the .MoveNext call.

A common complaint on the newsgroups related to classic ADO recordsets was the lack of a reliable row count. The availability of a proper row count was dependent on the cursor location and a number of other factors. In the DataReader, even the pretense of a row count is gone. In fact, in the 1.0 implementation, there was no way to know if the DataReader had any data without reading it. And reading the DataReader actually consumed the row. Fortunately, version 1.1 implementations contain a .HasRows property.

Note: There is no IDataReader2 interface that includes the .HasRows property. If you want to use .HasRows, you need to cast it to the implementation class that supports it.

Access through DataSet

Another way to access data with ADO.NET is to use a DataSet. A DataSet is a database-independent, in-memory data store that enables the developer to directly access all rows and columns of one or many tables that a DataSet can contain. The DataSet has a Tables collection of DataTable objects, and the DataTable has a Rows collection that can be indexed to get to a particular row by number. The Rows collection has a .Count property that enables the developer to determine the number of rows in any of the tables in a DataSet.

One thing missing in the classic ADO recordset was the ability to control exactly how an update would take place. The DataSet enables you to update the in-memory copy of the data, and use a DataAdapter specific to a data provider to have those changes persisted back in the database.

If you run a shop where database updates are done using stored procedures, you can use a properly configured DataAdapter to ensure that changes to the database are made using stored procedures. The important thing to remember is that the DataSet is independent of any particular database; the only connection that exists is when the DataSet is used in conjunction with a DataAdapter.

Which to use?

Which of the data access classes should you use in your application?

If you need read-only access to the data, something often done inside an ASP.NET application, using a DataReader makes sense. Unless you are going to persist the data between postbacks, the data that you use to, say, populate a grid, will not be present when someone clicks on one of the rows of that grid.

The ASP.NET application model is that there will be a number of independent requests, and between requests, a client will not maintain a connection to the database. ASP.NET provides session state, a way of maintaining the appearance of a stateful application, but storing a client-specific DataSet in session state is not considered good form and may limit the scalability of your application.

When updates to the data are required, you can either directly execute UPDATE, INSERT or DELETE SQL statements, or you can call a stored procedure. While this seems like more work that just updating data in a DataSet, it gives you more control over exactly how the updates take place, and works very well in a web environment.

One possible disadvantage of using a DataReader is that the connection must be open while you are accessing the data. If you will be doing lengthy processing with each row of returned data, a DataSet might be a better idea. The DataSet can be filled (using a DataReader under the covers, by the way), and then you can use the data in the DataSet at your leisure without consuming a connection.

If you are building a Windows Forms application (or a “Smart Client”), DataSets offer some advantages. Since the application will maintain its state over the entire time it is running, and client context does not tie up server resources, using a DataSet can be very convenient for Windows Forms applications. The model for how it is used is very similar to a classic ADO recordset configured to use client cursor, except that rather than moving a logical cursor, the DataTable’s Rows collection is indexed to reach a particular row.

The dark side of using DataSets is that the data remains in memory for as long as you use it. If you are retrieving thousands or millions of rows, a DataSet is probably not an ideal solution.

Some exceptions…

My predisposition to using DataReaders for ASP.NET applications and DataSets for Windows Forms applications has many exceptions. One of the features of the DataSet that makes it flexible is that it can be easily serialized into XML. It is often convenient to use the .WriteXml method to write out the contents of the DataSet. You cannot write out a single DataTable in a DataSet in the 1.x version of ADO.NET, however.

In addition to writing the DataSet to an XML file, .NET Web Services accepts parameters of type DataSet and return values of type DataSet. While this can be powerful, it does limit your application to .NET consumers of your web service, since the DataSet is not accepted by web services created with any other tools I am aware of. Many people consider it evil to return a DataSet from a web service, but it can be a good idea, especially if you are certain that only .NET consumers will ever need to access your web service.

DataSets can also easily be passed between layers in an application, even if those layers are not in the same process space. For instance, an application server can get the data from a database and then send off the DataSet across process space, or even across a wire to a separate server doing presentation services.

There is one special case where DataSets are essential in an ASP.NET application. When you have data that is accessed by a number of users and can reasonably be cached for at least a short period of time, the cache system (in the System.Web namespace, but actually usable in Windows Forms applications) can be used on a DataSet. Caching can have a dramatic impact on the performance of a heavily used application.

If you are passing results between logical layers inside the same process address space, you can still use DataReaders. How then, you may wonder, do you ensure that the connection is closed? You can’t close it in the method for creating the DataReader, nor can you close it in the method for returning the DataReader since the DataReader will be closed as well.

When you create a DataReader, you call .ExecuteDataReader. This method accepts a parameter that can be CommandBehavior.CloseConnection. This parameter tells the DataReader that when it is closed, the underlying connection should be closed as well. This is an example function that shows how you can return a DataReader and ensure that it is closed by the calling method:

Then, you could call the method as follows:

Failure to use a pattern like this will lead to a great deal of difficulty with connections that are not closed in a timely fashion. I believe that Microsoft initially pushed DataReaders as the preferred way to do database access, at least for ASP.NET applications. As time went on, I think the company discovered that many developers misused DataReaders, not properly closing the connection. Recent Microsoft presentations have often emphasized DataSets even for ASP.NET applications.

Conclusion

You should now have a good understanding of the strengths and weaknesses of the two very different ways you can access data using ADO.NET. If you can reliably use a safe code pattern to ensure that the underlying connection will be closed, the DataReader can provide you access to data in a very efficient manner. If maintaining session state is not a problem (like when you are in a Windows Forms application), perhaps a DataSet will be more convenient. As with many developer decisions, there is no clear-cut choice between using a DataSet or a DataReader. Remembering the general rules, as well as the significant exceptions, can help ensure that your application performs and scales well.