Oracle to SQL Server, Crossing the Great Divide, Part 3

We soon learn, in SQL Server, that heaps are a bad thing, without necessarily understanding how or why. Jonathan Lewis is an Oracle expert who doesn't like to take such strictures for granted, especially when they don't apply to Oracle. Jonathan discovers much about how SQL Server places data, and concludes from his experiments that heaps perform badly in SQL Server because you cannot specify a fill factor for them.

At the start of the previous installment of this series, I noted that good, scalable database performance is largely dependent on putting the data in the right place, being able to access it efficiently, and avoiding unnecessary overheads. Having looked at ways of generating data of various types and patterns at reasonable volumes, the time has come to investigate how SQL Server handles the placing of data.

Pages and Extents

In Oracle, we talk about data segments. A simple object corresponds to a single data segment, each partition of a partitioned object (or sub-partition of a composite partitioned object) is a separate data segment; segments, in turn, comprise a collection of extents; and an extent is a contiguous set of blocks (pages) in a file.

I started my research into similar data structures in SQL Server. There is no such thing as a “segment” in SQL Server, but a search in Books Online for the word “Extents” proved to be a good choice. The first hit was an article on Understanding Pages and Extents, which was an excellent starting point for learning how SQL Server allocates space and stores data.

I won’t repeat the details of my journey through the manuals, but instead will summarize some of the highlights (with comments about Oracle in italics):

  • A page is the smallest unit of storage and is a fixed 8KB
    Oracle can use blocks of 2KB, 4KB, 8KB, 16KB or even, on some platforms, 32KB. A single database can use multiple block sizes. In most cases the default, and best choice, for a platform is 8KB, and mixing block sizes is rarely a good idea.
  • An extent is a collection of eight consecutive pages
    Oracle allows for variable extent sizes, and has various strategies for specifying the extent sizes that should be used for an object or collection of objects.
  • Certain “mapping” pages are used to track the allocation status of other pages, providing information such as which extents have been allocated, the objects to which they have been allocated, which pages in which extents are used, and which pages are free. This leads to references to pages such as Global Allocation Map (GAM) pages, Shared Global Allocation Map (SGAM) pages and Index Allocation Maps (IAM) pages.
    Oracle has two dramatically different ways for dealing with extent allocation (“dictionary managed” and locally managed”, and two ways of dealing with block usage (“freelist managed” and “bitmap managed”), but I won’t go into details.
  • A single extent may hold pages from several objects. This is a mechanism to avoid wasting space but is only used when an object is very small.
    In Oracle, all the blocks in an extent belong to a single object, but a single extent can be as small as two blocks, and in the most recent version of Oracle an object doesn’t allocate an extent until you insert some data into it.

A couple of thoughts that follow on from the above observations: there is an “allocation unit” for space at the operating system level in Windows, which defaults to 4KB on my little laptop. Clearly it would make sense to match the size of the allocation unit to the size to the database page. In fact, since I also noticed that SQL Server does a read-ahead for an entire extent when possible, there is a case for creating an allocation unit of 16KB, 32KB, or 64KB (if these are possible) and making sure you align the database page boundaries with the O/S allocation unit boundaries.  (On the down side, perhaps a larger extent size would result in a “read / fill / write” operation when SQL Server wanted to write a single 8KB block into a 32KB extent.)

Data Storage in Heaps and B-Trees

Moving on from extents to objects, just two clicks away at Table and Index Organization, we find that:

  • A table is always partitioned, although it may (and probably usually does) have only one partition.
  • Each partition consists of many logical components which may be Heaps or (clustered) B-trees – the “HoBT”. In other words, someone using SQL Server is likely to think of a “table” as something which includes its indexes.
    In Oracle, the separation of tables and indexes is more clear-cut, to the extent that a partitioned table may have local or global indexes. In other words, a “single partition” of an Oracle index may cover every partition of a partitioned table, or be partitioned using a different strategy from the table partitioning.
  • Each HoBT may hold three types of pages – “row data”, “LOB” or “row overflow”
    In Oracle, overflow data may be “chained”, holding rows too long for a single block, or “migrated”, holding rows that have been updated and have to move because there is not enough space in the current block for the new length of the row. Nevertheless, “overflow” is not treated differently from any other row data. The LOB columns for a table are, like indexes, given their own dedicated segments.

With a little background, then, it’s time for some technical investigation. How well can I track where the data goes? In the previous article, I made a few comments about the differences in the amount of space used by Oracle and SQL Server for storing the same data; for the same million rows of data, a heap in Oracle used a about 8% more pages, while the non-clustered primary key index used about 7% less. Since data location is key to performance, I’m going to try and find out what they do differently.

We’ll start with the data creation script shown in Listing 1 (see the previous article for the reasons behind the structure of this script). The script is based on a template that I use for generating all sorts of volumes and patterns of data, but in this case creates only 5,000 rows in a heap table with a non-clustered non-unique index:

Listing 1: Creating test_table and populating it with data

During my research into space allocation in SQL Server, I found a script that reported the space used by a table (and its indexes), by querying the sys.allocation_units view. I adapted this query for my own requirements, as shown in Listing 2 (the STR commands are there simply to make the output tidy in SQLCMD).

Listing 2: Reporting on space allocation in test_table

After creating a new database (called testdata), and creating the one (heap) table and index shown in Listing 1, the output of the query was as follows:

The index_id is zero for the heap table and 2 for the index. If I had created the index as a clustered index then there would be just one line in this report, with an index_id of 1.

We have 81 pages allocated to the heap table, of which 79 are used for data, and 25 pages allocated to its index, of which 17 are used for data.

In the previous article, I hinted that there appears to be no equivalent in heap tables for the “fill factor” that is an option for indexes (in Oracle, the PCTFREE parameter, which applies to tables and indexes, is effectively the same as “100 – fill factor”). Unfortunately, no-one took the hint so it’s time to find out the hard way how well filled are the table blocks, which means dumping the actual pages…

Investigating Table Blocks using DBCC IND and DBCC PAGE

I had to find a way of listing the pages allocated to the table, and then dump them. Tricky, but I had noticed that whenever you want to do anything subtle in SQL Server you need to use the DBCC command, so I tried a Google search for “DBCC PAGE”, and soon found a collection of interesting articles written by Paul Randall, the first of which gave me the syntax of both DBCC PAGE command, following which I soon found the “DBCC IND” command.

  • DBCC IND( database, table, index_id )
  • DBCC PAGE ( database, file_id, block_id, level)

I started by investigating the heap, with a call to DBCC IND, shown in Listing 3.

Listing 3: Calling DBCC IND

Despite what may be suggested by the “ind”, this produces a list of the blocks in the table:

This tells us that file 1, block 153 is the first data block in the table. So let’s dump it, as shown in Listing 4, and see what’s in it. Note that in SSMS, you will first need to turn on a trace flag (3604), using DBCC TRACEON (3604).

Listing 4: Dumping page 153

The level 3 dump gives us a full symbolic dump, from which I’ve extracted a few lines:

The block held 64 rows (slots 0 to 63) and each one was 124 bytes, for a total of 7,936. Add a couple of bytes for each pointer in the table of “Offsets”, and the total gets to 8,064. Add the block header (which I believe is 96 bytes) and there are only 32 bytes of free space in the block.

This is rather bad news if you’re hoping to update any of the data in the table. Unless there’s a “fill factor” for heap tables, this default 100% fill gives anyone using SQL Server a fairly compelling reason for using nothing but clustered indexes, unless there are some tables that aren’t going to need any updates.

To demonstrate the problem of updates, let’s perform the following simple update of test_table, as shown in Listing 5.

Listing 5: Updating the test_table table

This is what slot 3 (amongst others) looks like after I’ve executed the update and then re-run the level 3 dump command.

Slots 0, 1, and 2 show vc_small = xxxxxxxxxx, but between them the three rows have taken up 30 of the 32 bytes that I had estimated as free space, so there’s no room in the block for the update to slot 3 and the row has been copied to another block, leaving nothing but a pointer behind. The space freed up by this “row migration” allowed several more updates to take place in the block before it was full again, and I ended up with a total of 5 forwarding stubs in the block.

To an Oracle database designer, if there is no specific need to use a clustered index, to group data in a pattern that’s different from the order that reflects the natural order of arrival of the data, then a heap table is the obvious structure for the table. However, when you implement a heap table in SQL Server, any updates to the data will have a negative impact on performance because of the way that rows will move to different blocks. This, presumably, is one reason why there seems to be such enthusiasm for creating a clustered index on an identity column, as it keeps the data collected in order of arrival but allows you to leave some space in each block for updates by specifying an appropriate fill factor.

What about non-clustered indexes on heap tables? How do they cope with row movement? Here are the first few lines of output from using the DBCC IND command on our index:

You’ll notice from the PagePID that the index and the table start off sharing the same extent (the table uses blocks 153, 154, 157, 158 and 159 and the index is using blocks 155 and 156).

Block 156 is the IAM page (PageType = 10) for the index, and block 172 is the root block (IndexLevel = 1, so it’s not a leaf block). Lets’ dump block 155, the first leaf block and take a look at the index entries:

I’ve shown the start, the end, and a patch in the middle, from the list of leaf entries. You’ll notice that each entry consists of key value (my random_data column), a HEAP RID (row identifier), which seems to be the block address for the key. Since the HEAP RID is also labeled as “key”, I assume that it has been appended to the real key value as an aid to ordering the appearance of duplicates; certainly the RIDs appear to be sorted within key. Interestingly, the byte-ordering of the HEAP RID means that it could, in principle, introduce a greater degree of random I/O than necessary – it looks as if rows with the same key in adjacent blocks may appear at non-adjacent positions in the index.

I’ve printed a section from the index which shows a few occurrences of the random_data value of 3 because it’s reporting one of the rows in the table block I dumped: the HEAP RID = 0x990000000100 matches file 1, block 153 (it looks like first 4 bytes is the block id, last two is the file id).

There’s a problem though: slot 47 of the table block is the slot that holds the value 3, and I can’t see the reference to slot 47 anywhere in the block dump of the leaf block. Does this mean that index look-ups are only accurate to the table block, leaving SQL Server to search the whole table block for the matching key? It seems unlikely, especially when you think of the possible consequences this would have for finding migrated rows.

Fortunately, by switching the dump level to 2 (raw dump row details) we can find the following in the file:

Note the “2f”; converted to decimal, that’s the number 47 that we needed for the slot identifier. You might also note that this version of the dump doesn’t include anything that looks like the KeyHashValue. Just as in Oracle, some of the dump files report information that is derived at the time of the dump and some of the real information is missing from the dump.

Having tracked down the link from an index leaf to the table row, it takes just a few minutes to check that when a row is migrated from the heap table, the index entry is not updated; it points to the forwarding stub, rather than pointing to the new location of the row (Oracle adopts the same strategy).

Tackling one structure at a time is probably enough. I shall take a look at clustered indexes (unique and non-unique) in the next article.

Conclusions

Unless I’ve missed something obvious, the main conclusion that I’ve reached about SQL Server and heap tables is that you cannot specify the equivalent of a fill factor for a heap table. This means that updates to heap tables will probably result in an unreasonable amount of row movement and so leading to an unreasonable performance overhead when you query the data, because you will have to follow a forwarding link to find the row. The absence of a fill factor is (by itself) enough of a threat to make heap tables in SQL Server fairly undesirable.