Why should I rebuild a table after dropping or adding a column?

The amount of space used by table rows can be important, and there are good articles on the web that show ways to estimate the amount of space that a table row uses. However, the actual space used by a table row could be more than what we would have calculated. The problems come when we modify the schema of a table. Sometimes it is necessary to add a column to a table or to drop a column from a table. This operation is apparently straightforward: when you drop a column, it should free up the space, and a new column should use more space. However, when a column is dropped, SQL Server will not go and modify all the database pages for your table, free the space, move rows around, etc. By dropping a column, you may end up just updating the metadata, and your rows will still consume the same amount of space as before. This could have unpleasant consequence, such as in the following example:

We create a table that has three fixed length columns. One of these columns (columnWeWillDrop) will consume 7000 bytes (NCHAR uses UCS-2 encoding, which uses two bytes per character)
We insert a single row into this table like:

In this example I put the string of 3500 letter ‘a’ into the columnWeWillDrop column. This will help us to identify it easier on the database pages. Indeed, let’s look at the relevant database page:

In my case the result is 0x590100000100. Note that I’m using SQL Server 2008, for earlier version you will need to use different system tables/views. This is the page id of the first page for our table, the page number is 0x159 and the file number 1 (0x590100000100). Since we have not added a clustered index to the table, it will be a heap, and data pages for heaps are doublelinked, so from the first page we will be able to explore the whole table. The content of the first page is (0x159 equals to 345 decimal):

DBCC TRACEON (3604)
DBCC PAGE (2,1,345, 3)
DBCC TRACEOFF (3604)

The first parameter is the database id (in my case, since I worked in tempdb it is 2), the second parameter is the file id, the third is the page id, and the last parameter influences the details returned by the DBCC PAGE command. We can see that it contains the character ‘a’, the contents of our column ‘columnWeWillDrop’

Now let’s drop this column:

If we look at the database page, we will be surprised to see the large number of ‘a’s. While a select * from our table will not return the dropped column, it still uses physical space!

So what will happen if we add a few columns to our table? SQL Server can accommodate fixed length columns with up to 8000 bytes of total length. We currently should use 8 bytes (two times four for the two integers) plust a bit of overhead.

This works as expected. In theory we should have plenty of space left, as we are using about 608 bytes (well below the 8000 byte limit) plus some overhead. So let’s try to add another fixed length column:

We get an error message:

Creating or altering table ‘wastedSpace1’ failed because the minimum row size would be 8215, including 7 bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.

Unfortunately the space used by our dropped column will remain to be wasted. We need to rebuild the table. In case of a heap, we could select into another table, or create a clustered index. In case of a table that has a clustered index, we could use the ALTER INDEX .. REBUILD

These operations will be expensive, because likely all the database rows will be modified. However, this is the price for the speed of the ALTER TABLE DROP COLUMN operation, which is just a meta data operation. It is nice to see however, that SQL Server 2005 and 2008 has improved since 2000 in respect to these schema operations, and many repeated column add/drop operations do not always result in wasted space.

With variable length columns the amount of wasted space is not that significant, but it still exists. It is worth while exploring the system_internals_partition_columns system view and see where fixed length columns start (these are the ones with a positive number in the leaf_offset column), and check if they form a contiguous row data area.

If you are interested to hear more about the hows and whys of the above problem, come to my session at SQL Bits 4 in Manchester on 28 March 2009.

Also, if you happen to be in Neuss, Germany 22 April 2009, do come to my session at the European PASS Conference where I’ll be talking about the various ways to pass parameters in SQL Server.