SQL Server Partitioning without Enterprise Edition

Table partitioning is a valuable technique for managing very large database tables. It has always been possible with SQL Server, even if slightly cumbersome. SQL Server 7.0 introduced partitioning through partitioned views: These were combined with constraints to allow the query optimizer to remove irrelevant tables from the query plan and reduce the overall plan cost when a UNIONed view accessed multiple tables. As Barry King explains, this is still a valuable feature that works in every current version of SQL Server.

What is Partitioning and Why use it?

Table Partitioning is a way to segregate your data using a particular scheme or set of rules. The main reason for doing this is query performance; when you partition data and you have chosen a good partitioning scheme relevant to your data it means that as long as you stick to the partitioning scheme rules your query only “touches” the data in that partition. It was introduced in SQL 2005.

Imagine a 60 million row table, querying this depending on the complexity of the query; number of joins and criteria etc may take a considerable amount of time. In the world of partitioning , the number of rows queried is reduced according the granularity of the partitioning scheme.

There is of course some downsides to this. The first is that queries that cross the partition rules will take longer, arguably not longer than querying the non-partitioned table but this will be dependent on the number of partitions and again the granularity.

The second ,for small to medium size companies, most  important problem is that true partitioning requires SQL Server Enterprise Edition. The current big daddy of the SQL family. This can be a very hard financial pill to swallow. Some may argue, including Microsoft,  that if you are dealing with such a large volume of data that you must be a large, financially rich company and that this is not a barrier. The ability to partition/split data is a common request – especially as data grows and there are natural splits within your schema to take advantage of.

I am going to show a way to have table partitioning in SQL Server databases without the need for the Enterprise Edition of SQL Server that can easily to used in 2000, 2005 or 2008.

To achieve a version of partitioning without Enterprise Edition turns out to be relatively simple and offers nearly all of the benefits of true partitioning.

Lets say we have a ficticious company called NurseBank. NurseBank is operated in 3 Regions; England, Scotland and Wales. Every time a Nurse visits a Patient a Visit record is created and this is assigned to a Region who deals with the booking of Nurses for that Region.

In our original schema we would have had something like this for the Visit table:

Now, if you created that table; drop it. We are going to create 3 new tables to cater for the new partitions.

With these three tables created. We can create the Partitioned View that will allow us to select, update and insert data into the partitions.

You can now insert data into the Visit view as we would have done previously with the original table (with one exception that I will cover shortly).

The interesting and important aspect is what SQL Server does when insert happens and what query plan is used to select the data.

This will produce the following query plan:

886-PN3.JPG

As you can see it correctly uses the Visit_England partitioned table. If the Region is omitted from the WHERE clause, the query plan is vastly different and of course cannot determine the correct table to use so basically it UNION joins them all as shown below.

886-PN4.JPG

I don’t think I need to say but this is bad. However, the whole point of partitioning the data is when you have a need to do so (large volumes of data) and you have a viable partitioning scheme that will avoid the above type of plan as much as possible.

Lastly, I wanted to mention the exception I hinted at before.

For a partitioned view to work fully (as in allow inserts) , your partitioned table cant have an identity column.

What! I hear you say, whats the point of that? How I can use this?. Well, in most situations the partitioning of the data is done to improve the read performance of queries. If this is the case then the perfect scenario for this type of partitioning is to partition the data on a replicated separate database. The way this works is that you have your inserts performed on your main database or cluster and when the data is replicated to the other database (lets call this the readonly but it isn’t a physical readonly – its just we wont be doing writes on it) the replication procedure is modified to insert into the Partitioned View and the original table thats replicated.

In this way you don’t need an identity column on the partitioned tables as this would be known and passed into the table via replication.

I hope you find this example useful, this is by no means the only way to improve performance and you really must consider all available options but I thought this was a useful and relatively painless method to try and one we currently use in our business.

If you are interested in using true Table Partitioning, I’d recommend the article Partitioning in SQL Server 2008 on SQL Server Central by Muhammad Shujaat Siddiqi.