On the Trail of the Expanding Databases

It is sometimes difficult for other IT people to understand the constraints that DBAs have to work under. So often the ideal or obvious solution is frustratingly out of reach because of 'political' reasons that seem baffling to the spectator. Francis Hanlon describes a typical incident: It is essential reading for anyone wishing an insight of the problems that DBAs face

On the Trail of the Expanding Databases

I look after a server that contains three identical production databases, used by three different health care clinics. Growth has always been fairly rapid but recently, I noticed that these databases were growing at a rate that far exceeded the norm. They were part of a purchased package, and I knew the vendor was in the process of implementing an interface from these databases to another system. Was this abnormal growth a side-effect of something they’d done while implementing this new interface, or completely unrelated?

It was time to investigate.

Monitoring Database Growth

Now, you might ask, how did I know that the database growth was accelerating at a greater rate than it had in the past? There are several available methods of tracking database growth but the one I use most often is based on a script I found at SQLServerCentral:

 http://www.sqlservercentral.com/scripts/Maintenance+and+Management/30217/

I won’t go more into specifics except to say that it important not only to know how big is your database today but how big was it last month, and the month before, so that proper space planning is possible.

Of course, in this case, the database was experiencing abnormal growth so my careful planning was moot, and I was running out of free space.

Monitoring Table Space

Armed with the knowledge that my database was growing rapidly, I needed to gather more specific information. What was causing the rapid growth and did it seem likely to continue or was it just a one-time data load event?

I needed to check the size of all the tables in the database and find out if any new tables had been created, or if there were any old tables being more heavily used as a result of the vendor’s new interface. Was the database experiencing overall growth amongst all the tables, or just in some history tables, or were there specific interface tables that were rather large?

I spoke to the vendor and was told that they were making no schema changes. What they were doing was implementing a stored procedure that wrote to several existing tables that, in the absence of this new interface, would remain unpopulated.

On SQLTeam.com, I discovered a script that shows a breakdown of space used by table, including the amount of free space. This script works well in both SQL 2000 and SQL 2005, although I often change the script to print the output by the unused column, as opposed to the table_name column.

NOTE
If you are using SQL Server 2005 SP2,then an alternative to the above script is to make use of the included reports. Simply right-click on the database, choose reports, then standard reports and Disk Usage by Table. While this is an excellent report, it does not allow you to sort the output, which is an advantage of the script.

Using this script, I discovered that one of the tables affected by the vendor’s changes was the major culprit in the growth of the database. Although it contained only 194,000 rows, in terms of physical space it was hogging 9.2 GB, of which 8.5 GB was marked as unused. However, I also found other problem tables that were not affected by the vendor’s changes. The next worst offender, with 23,000 rows, took up 710 KB of storage and was marked as having 620 KB of unused space. Using this same script, I found out that two of my other databases that were completely unaffected by the vendor’s changes were also exhibiting, to a lesser degree, abnormal growth of unused space. It would seem the vendor’s actions had helped me to notice the growth of unused space but were not the only cause of it. Some urgent action was required.

Reclaiming Unused Space

I continued to research. Several searches on terms such as ‘reclaim space in SQL’ returned results that pointed toward the DBCC CLEANTABLE command, which is said to ‘reclaim space from dropped variable-length columns in tables’.

Well, given that I hadn’t dropped any columns, I guessed that this was unlikely to work. However, I couldn’t see what harm it could do either so I gave it a try on a restored version of our production database that I put on a test box. Having run the command I found that (surprise, surprise) virtually no space was reclaimed.

My next thought was to run a DBCC DBREINDEX on this table to fix the unused space. Again, this had little effect. The index fragmentation improved but it never was very bad to begin with since indexes get defragged everyday in this database. In any case, my objective was to reclaim space and the DBCC DBREINDEX reclaimed very little space so it was not what I was after.

Clustered versus Non-Clustered Indexes

During these investigations, I noticed that each of the offending tables did not have a clustered index. What if I made the primary key a clustered index then changed it back to a non-clustered index so the vendor couldn’t say I changed their schema?

On the 23K row table, I scripted out the primary key index, which was a unique non-clustered index, dropped it and re-added it as a clustered index, then dropped it again and returned it to its previous state:

The results were excellent – all unused space was reclaimed – and the schema was back the way it was before I did anything.

One table done; now for the big one! However, for the 194K-row table, the trick did not work as well as it did on the first table. The space used by the table shrank from 9 GB to 5.4 GB, of which 4.9G was still unused. I needed to reclaim more space.

Text versus Varchar versus Text in Row

I was intrigued to know why dropping then recreating the indexes on one of the problem tables reclaimed all of the unused space and on the other one only reclaimed half of it

One difference seemed to be that the 194K-row table had a column defined as type TEXT, whereas the other table was using varchars. Was there a problem with text columns and space?  I started searching and found a couple of forum postings that made reference to text and wasted space, and these led me to a Microsoft knowledge based article, called FIX: TEXT/IMAGE Data Storage Space Not Reclaimed Correctly. This article describes a space issue with tables having a text column and indicates the solution is to “Use a SELECT INTO statement to transfer the whole table to a new table in the same database”. Now, I did notice that the article described this as a problem with SQL Server 7 and I was running SQL Server 2000. However, when I looked at the database properties I discovered the compatibility level was set to SQL Server 70.

So, I created a script, shown below, to define the 194K row table, and indexes, exactly the same as the original. Before running this script, I renamed the original table (and indexes) to tableOLD. Next, I ran the script to recreate my new table and copied all the rows from the old, renamed table, into the new one.

Now, when I look at how much space the table was using, I found that there was no unused space at all.

Maybe if this column was defined as varchar instead of text, then I wouldn’t have these expanding-table issues in future. I wrote an email to the vendor describing the problems we were experiencing and the solutions I’d explored. I suggested that the primary key indexes for these tables be redefined as clustered indexes, and that the text fields be changed to varchar. The vendor wrote back saying that changing the indexes permanently to clustered indexes would be OK, but that we could not change the text column to varchar as this would mean application coding changes. The Vendor also mentioned the compatibility level must remain at 70.

An alternative to swapping the text column to varchar, would be to instead swap it to use SQL Server 2000’s Text in Row feature, a change that the vendor deemed acceptable. However, was this making things too complex? I’d seen articles both praising and condemning Text in Row. More worrying though, was the fact that Text in Row is to be removed from a future version of SQL Server. Microsoft’s advice was:

“Avoid using this option in new development work, and plan to modify applications that currently use text in row. We recommend that you store large data by using the varchar(max), nvarchar(max), or varbinary(max) data types. To control in-row and out-of-row behavior of these data types, use the large value types out of row option.”

This is all well and good if the database were running in SQL 2005, but my vendor still only certified SQL Server 2000, with the compatibility is set to 70, so using varchar(max) won’t work in my case.

Still, I was not convinced I really wanted to go in the Text in Row direction. Was there any other possible solution?

The Solution, Part 1: More Memory, Fewer Tables!

I continued my research and stumbled across another knowledge base article that described circumstances whereby a SQL Server with low memory and lots of tables might significantly increase the unused space for some tables.

This sounded promising because the three databases experiencing this issue were on a server with a total of 50 databases and over 14,000 tables, and having only 2 GB of memory. Could the solution be as simple as moving the problem databases to a new server with fewer databases (fewer tables) and more memory? I had another available server with 8 GB of memory and only 1 user database, with about 1200 tables.

I decided to try moving one of these databases before looking further into Text in Row.  A month later, none of the tables in this database exhibited a high level of unused space. So, here I had the simplest and most effective solution to all the problems: move the problem databases to a server with fewer tables and more available memory.

However, as most DBAs will recognize, the simplest solution isn’t always the one you can implement in all cases.

The Solution, Part 2: Implementing Text in Row

I still had two databases on the old server, both containing tables that continued to experience high levels of unused space. For political reasons, I was not allowed to move these databases to the less crowded server. For these databases, I was forced to explore the Text in Row option.

So, once again, I reclaimed the unused space using the previously described script, but this time, when I recreated the problem tables I used the Text In Row parameter:

Several more weeks passed and all was well in the databases; they had ceased to expand abnormally. So, though it was far from ideal due to deprecation of the Text in Row feature, I had a second workable solution to my problems.

Summary

In some ways, my experiences in solving this problem were typical of that of many DBAs. Often, the ideal solution is out of reach for “political” reasons or for other reasons outside the control of the DBA, such as use of vendor applications with specific restrictions, and so on.

My investigations had led me to the ideal solution: moving each of the problem databases to a server with fewer tables and more memory. However, this solution was only available to me in the case of one of the databases.

For the other two databases, I was left with a solution that “worked” but that may be rendered obsolete in future versions of SQL Server. In a bold move, I opted for the status quo, regardless. I continue to make judicious use of Text In Row and await the opportunity to either move to a new server or to upgrade to SQL 2005, and make use of varchar(max).

In any case, continued and close monitoring of the growth of these databases, both at a database level and at an individual table level, is definitely warranted.