Rebuilding indexes is an activity that you shouldn’t need to do often. There are always a few special cases – like when you’ve moved or recreated a table or if you’ve done a massive delete on a table – when it’s probably a reasonable idea but, in general, there are very few cases where there’s any great benefit to be had by any sort of routine rebuild – especially if the index in question is very big.
If you suspect that there will be some benefit to doing a rebuild it’s worth knowing how take a look at the “state of health” of an index so that you can confirm your suspicion, and this article describes a couple of mechanisms that allow you to do this relatively cheaply.
The big picture
Running at steady state, a B-tree index will tend run at an average of 70% space utilisation, 30% free space in a leaf block. Given the information about the number of rows indexed and the average column length of each column in the index, it is possible to estimate roughly how many leaf blocks would be a fairly typical size of the index.
Inevitably there several reasons why any attempt to estimate the size is only a rough approximation: it’s hard to correctly allow for index compression, the average column length (user_tab_cols.avg_col_len) reported by Oracle is subject to rounding errors, the distribution of nulls in the index entries can have a significant impact on how accurately you can allow for column and row overheads, index columns holding more than 127 bytes need 2 length bytes rather than 1. Nevertheless, except for fairly extreme cases, the calculation can give you some idea of whether the size of your index is in the right ballpark.
The calculation is quite simple:
- Sum (user_tab_cols.avg_col_len + 1) for each column in the index.
- Add 6 for the rowid that is part of the index entry (8 if it’s a globally partitioned index on a partitioned table)
- Add 1 if the index is non-unique
- Add 2 for the row overhead
- Add 2 for the row directory entry
This gives you the typical row length for an index entry.
Now take the block size, subtract 192, (or roughly 200, but if you’re using an 8K block size then the answer is 8,000) for the block overhead, multiple by (1 – pctfree/100) for the setting you’re planning to rebuild to – pctfree = 30 is a good idea, but many people rebuild at the default pctfree = 10. This gives you the number of bytes available per leaf block for the rebuild.
Now simply divide the bytes per leaf block by the typical row length and truncate to get the number of index entries per leaf block. Divide this into the total number of rows (user_indexes.num_rows – or user_ind_partitions or user_ind_subpartitions as appropriate) in the index to get the number of leaf blocks needed.
For indexes which aren’t compressed and which don’t have a very small number of rows compared to the table this should be within 1 or 2 percent of the size of the index after a rebuild.
If you don’t want to go through this by hand there’s a script on my blog which will do the work for all the indexes in a schema assuming you’ve recently collected object stats on the schema. All you have to do is specify what you want to set as a pctfree, and how far above the estimated size an index should currently be before you report it as a candidate for rebuilding.
When all the entries in an index block have been deleted the leaf block becomes free for re-use. But until it is reused by new data it stays in place in the index tree structure and still contributes to the workload of using that index – an index range scan from ‘A’ to ‘C’, for example, would still have to pass through an empty ‘B’ block if you once had a lot of ‘B’ values that have been deleted.
It is a little unusual to lose a lot of space to empty blocks but it can happen – an index that has been used to model a FIFO (first in first out) processing queue is the standard example – and it’s nice to have a cheap way of confirming your suspicion. The dbms_space package contains two related procedures, the free_space() procedure that you can use to check the number of blocks on the freelists of indexes if you still have any indexes using freelist management, and the space_usage() procedure for indexes using automatic segment space management (ASSM).
I’ve published a generic script to use this package on my blog. A key detail to note with the ASSM report is that index leaf blocks are only ever reported as full or half-empty – the (apparently) half-empty ones are the ones that are in place but empty and available for re-use. Here’s a sample of the output for an index where I’ve done a big delete on the underlying table:
==== ASSM ==== Unformatted : 0 / 0 Freespace 1 ( 0 - 25% free) : 0 / 0 Freespace 2 ( 25 - 50% free) : 909 / 7,446,528 Freespace 3 ( 50 - 75% free) : 0 / 0 Freespace 4 ( 75 - 100% free) : 0 / 0 Full : 4,864 / 39,845,888 ======= Generic ======= Enter value for m_partition_name: Segment Total blocks: 5,888 Object Unused blocks: 21
As you can see I have a segment with 5,888 blocks allocated (that’s 46 extents of 1MB/128 blocks) of which 21 blocks have not yet been formatted. There are 4,864 blocks holding some index entries, and 909 blocks which are in the index structure but holding no data. The numbers don’t quite add up: 4864 + 909 + 21 = 5794 which is 94 blocks short. This difference can be accounted for by the space management blocks: two L1 bitmap blocks per extent plus one L2 (in the first extent) plus the segment header block.
Even if an index seems to be about the right size it’s possible that part of the index is well packed and operates very efficiently and part of the index has a lot of nearly empty blocks that introduce an undesirable level of inefficiency. If the efficient part is the only part you’re using the waste space may not be a problem but if the part of the index that you make most use of is the part with the large amount of free space then there may be a performance gain to be had from a rebuild. It’s nice, therefore, to be able to check the pattern of distribution of rows across the index, in index order. You can do this with the treedump event:
alter session set event 'immediate trace name treedump level 123456';
The level in the above line should be the object_id of the index (or index partition, or subpartition) that you want to dump.
select object_id, object_name from user_objects where object_type = 'INDEX' and object_name = 'FCT_PK' ;
Oracle will dump a summary of your index into the process trace file – and if you’re running 11g and have the appropriate permissions you can find out the name of the trace file with the following query:
select tracefile from v$process where addr = ( select paddr from v$session where sid = ( select sid from V$mystat where rownum = 1 ) ) ;
A treedump consists of one line for each block in the index – and Oracle has to read every block in the index to produce it so be careful about timing your treedumps, and remember to check that the parameter max_dump_file_size won’t result in the trace being terminated before it’s complete. (The default value in modern versions of Oracle is “unlimited”, in 9i and 10g any numeric limit was the number of O/S blocks, in 11g and 12c it’s the number of bytes).
There are some older versions of Oracle where Oracle did a symbolic dump of every leaf block as it did the treedump – I haven’t seen this happen for a long time but do test a small index before you try this on production. Here’s a sample treedump (with a couple of hundred lines removed) from a model of a “FIFO” index. I’ve deleted lots of “old” rows (with a certain lack of success) from the low values end while inserting new rows with high values.
----- begin tree dump branch: 0x1800204 25166340 (0: nrow: 307, level: 1) leaf: 0x1800205 25166341 (-1: nrow: 326 rrow: 3) leaf: 0x1800206 25166342 (0: nrow: 326 rrow: 3) leaf: 0x1800207 25166343 (1: nrow: 326 rrow: 3) leaf: 0x1800208 25166344 (2: nrow: 326 rrow: 4) leaf: 0x1800209 25166345 (3: nrow: 326 rrow: 3) leaf: 0x180020a 25166346 (4: nrow: 326 rrow: 3) leaf: 0x1800316 25166614 (268: nrow: 326 rrow: 28) leaf: 0x1800317 25166615 (269: nrow: 326 rrow: 27) leaf: 0x1800318 25166616 (270: nrow: 326 rrow: 27) leaf: 0x1800319 25166617 (271: nrow: 326 rrow: 27) leaf: 0x180031a 25166618 (272: nrow: 326 rrow: 27) ... leaf: 0x1800330 25166640 (294: nrow: 326 rrow: 54) leaf: 0x1800331 25166641 (295: nrow: 326 rrow: 55) leaf: 0x1800332 25166642 (296: nrow: 326 rrow: 281) leaf: 0x1800333 25166643 (297: nrow: 326 rrow: 326) leaf: 0x1800334 25166644 (298: nrow: 326 rrow: 326) leaf: 0x1800335 25166645 (299: nrow: 326 rrow: 326) leaf: 0x1800336 25166646 (300: nrow: 326 rrow: 326) leaf: 0x1800337 25166647 (301: nrow: 326 rrow: 326) leaf: 0x1800338 25166648 (302: nrow: 326 rrow: 326) leaf: 0x1800339 25166649 (303: nrow: 326 rrow: 326) leaf: 0x180033a 25166650 (304: nrow: 326 rrow: 326) leaf: 0x180033b 25166651 (305: nrow: 244 rrow: 244) ----- end tree dump
The first line of the treedump is always about the root block – which is just a particular case of a branch block (although, for an index of just one block, the root block is a leaf block). The two numbers following the word ‘branch’ are the block address of the block in hex then decimal. The information in brackets then tells us that this is the zeroth block at this level, it hold 307 pointers to the next level, and it is at branch level 1 – i.e. one level above leaf blocks, this value will match the value reported by user_indexes.blevel (the root block is the only block at the top level, of course, so it will always be the zeroth block).
This is only a small index, but for a larger index we could use a tool like grep to extract all the lines about branch levels, e.g:
[oracle@linux11 trace]$ grep branch test_ora_19818.trc branch: 0x1800404 25166852 (0: nrow: 14, level: 2) branch: 0x180050c 25167116 (-1: nrow: 258, level: 1) branch: 0x1800614 25167380 (0: nrow: 259, level: 1) branch: 0x180071c 25167644 (1: nrow: 259, level: 1) branch: 0x1800824 25167908 (2: nrow: 259, level: 1) branch: 0x180092c 25168172 (3: nrow: 259, level: 1) branch: 0x1800a34 25168436 (4: nrow: 259, level: 1) branch: 0x1800b3c 25168700 (5: nrow: 259, level: 1) branch: 0x1800c44 25168964 (6: nrow: 259, level: 1) branch: 0x1800d4c 25169228 (7: nrow: 259, level: 1) branch: 0x1800e54 25169492 (8: nrow: 259, level: 1) branch: 0x1800f5c 25169756 (9: nrow: 259, level: 1) branch: 0x1801064 25170020 (10: nrow: 259, level: 1) branch: 0x180116c 25170284 (11: nrow: 259, level: 1) branch: 0x18011c2 25170370 (12: nrow: 84, level: 1)
In this example the index has a blevel of 2 and the root block points to 14 branch blocks at level 1. The first branch block (labeled as the “-1th”) points to 258 leaf blocks, and the last branch blocks point to 84 leaf blocks. Every branch block (except the first and last) points to the same number of leaf blocks so there are no indications here of a severely disrupted index. If we think there may be problems with this index we will have to dig down further.
Go back to the earlier tree dump and take a look at the leaf entries. If you check the block addresses you’ll see that the leaf blocks are basically in order. It’s a new index in a clean tablespace and the data has been inserted in index order so we’re always adding new leaf blocks at the high end of the index, and that tends to mean (approximately, given the impact of ASSM) that the next block in the segment becomes the next block in the index. In most indexes the incoming data is likely to arrive in a more randomised order so blocks in the middle of the index will get full and split, introducing blocks from “the end” of the segment into the middle of the listing – the leaf blocks will appear to be badly-ordered. Here’s a small section of a treedump where I’ve created an index on an empty table then inserted 60,000 rows of randomly generated values:
leaf: 0x180010d 25166093 (44: nrow: 227 rrow: 227) leaf: 0x18001ce 25166286 (45: nrow: 221 rrow: 221) leaf: 0x1800171 25166193 (46: nrow: 141 rrow: 141) leaf: 0x180029e 25166494 (47: nrow: 134 rrow: 134) leaf: 0x1800195 25166229 (48: nrow: 240 rrow: 240) leaf: 0x1800122 25166114 (49: nrow: 183 rrow: 183) leaf: 0x18002ef 25166575 (50: nrow: 169 rrow: 169) leaf: 0x18001df 25166303 (51: nrow: 172 rrow: 172) leaf: 0x18002d8 25166552 (52: nrow: 186 rrow: 186) leaf: 0x180015b 25166171 (53: nrow: 221 rrow: 221) leaf: 0x1800185 25166213 (54: nrow: 202 rrow: 202) leaf: 0x1800143 25166147 (55: nrow: 208 rrow: 208) leaf: 0x1800188 25166216 (56: nrow: 194 rrow: 194) leaf: 0x180011d 25166109 (57: nrow: 178 rrow: 178)
As you can see there appears to be no order in the addresses of the index leaf blocks. Looking at the numbers in brackets you can see that there is no “level” (you could say that all leaf blocks are implicitly “level 0”). You can also see that the numbering system applies to leaf blocks as well; this fragment is leaf blocks 44 to 57 (or 46 to 59 depending on how you consider the -1th and zeroth blocks) of one of the level 1 branch blocks. Blocks are numbered relative to the branch they belong to, not by an absolute position in the index.
The final, and perhaps most important, feature of the leaf block description is the pair of numbers given by nrow and rrow. The former is the current size of the leaf block’s row directory; the latter is the number of entries in the row directory that would be in use after all transactions touching that leaf block had committed. Loosely speaking rrow is a measure of how many entries there are at present in the leaf block while nrow tells you about the maximum number that have been there in the recent past, and the difference tells you something about the number of recent deletions.
When you look at these numbers you tend to look for extreme low values for rrow, but it’s also worth considering what nrow would be if you rebuilt the index at pctfree = 0, pctfree = 10 (the default) and pctfree=30 (which is where a typical b-tree gets to when the data is randomly arriving). In the case of the sample above if I rebuilt the index at its original pctfree 10 it would report 224 entries per leaf block, and as you can see a couple of leaf blocks hold more entries than that; if I rebuilt it at pctfree 0 it would report 249 entries per leaf block, and at pctfree 30 it would report 173 entries per leaf block. It’s not a complete coincidence that when I rebuilt this index at 30% it produced an index of 367 leaf blocks while the index had expanded to 369 leaf blocks before the rebuild.
If you’re wondering how I calculate the sizes after rebuilds at different settings of pctfree, I didn’t; I ran the test three times and did the actual rebuild after each run.
A detail we infer from the 249 rows per block we got from the rebuild at pctfree 0 is that a 50/50 block split is likely to leave some blocks holding about 125 rows before they start refilling – and if we look at the sample we see that some blocks have as few as 134 rows although none of them get close to the anticipated low; given a larger volume of data we would eventually see a few leaf blocks with 125 rows (or even a few less) and a few with 249. It’s worth remembering that when you’re looking at an index at finer levels of detail you should expect some of the blocks to be half empty.
We can dig deeper, of course. If we see something about the pattern of rows per block, whether it’s leaf blocks per branch block, or table rowids per leaf block, we can always dump a couple of blocks to see if they give us any clues. There’s nothing interesting about any of the examples I’ve shown so far, but you might (for example) find a patch in your treedump that shows a long sequence of nearly empty blocks somewhere in the middle of the index and want to know what values are associated with those anomalous blocks. Since the treedump gives you block addresses you can do a block dump.
In the dump from our FIFO index above we saw that block 25166642 was on the boundary between nearly empty and nearly full blocks – so let’s dump it, just in case there’s something interesting in it:
select dbms_utility.data_block_address_file('25166642') rel_file, dbms_utility.data_block_address_block('25166642') block from dual ; REL_FILE BLOCK ---------- ---------- 6 818 1 row selected. alter system dump datafile 6 block 818;
I’m not going to go through the details of reading the symbolic dump of an index block – there are several resources on the Internet that will give you that information (Richard Foote’s blog is the obvious first place to look); the only point I want to make is that starting from an overview of the index tree you might have some clues to follow about WHY an index might be less efficient than you expect and a mechanism that allows you to examine the critical data very closely.
One warning – I’ve cheated a little on my calls to dbms_utility, it reports the tablespace-relative file number, not the absolute file number that the dump command needs. This is fine if you have a “small” number of files in your system (less than 1023) because the relative file number will (probably) match the absolute file number, but if you have a very large number of files you will have to know which tablespace you’re looking at and work out the absolute file number – the following query demonstrates the method:
select df.file#, df.rfile# from v$datafile df, v$tablespace ts where ts.name = 'TEST_8K_ASSM' and df.ts# = ts.ts# ;
B-tree indexes on randomly arriving data tend to operate at an average of 70% space-efficiency (30% free space). If you have reason to think that some of your indexes are behaving unreasonably and suspect that a rebuild will help performance then there are some tools that will allow you to examine the index at a level of detail that can be helpful without introducing an overwhelmingly huge load on the system. It’s always possible to estimate a reasonable size of and index, and if the index doesn’t match expectations the treedump command is a good next step, and may give you some important clues about why your index isn’t following the typical pattern. The dbms_space package tells you about empty blocks, and the dump command can show you the content of any potentially interesting blocks.