In the first three parts (Part 1: Basic Table Compression, Part 2: Read-Only Data, Part 3: OLTP Compression) of this series we examined table compression – both basic and the separately licensed OLTP compression. In this article we move on to index compression which, as we shall see, uses the same “deduplication” techniques as table compression but includes a couple of refinements to cater for the particular structural requirements of B-tree indexes. Most significantly, you decide which columns should be deduplicated, the deduplication mechanism is only applied to a specified leading subset of columns for every single index entry, and the deduplication is maintained in real time all the time.
In this article we’ll be looking at the structure of a compressed index and how Oracle stores the information it needs to use that structure. In the final article of the series we’ll describe the mechanics of selects, inserts and deletes, and analyse the costs and benefits of index compression.
Before looking at compressed B-tree indexes, we need to review a few points about how Oracle stores data in index leaf blocks; there are four key details I’d like to highlight as particularly relevant when we start to compare compressed and non-compressed B-tree indexes (and from this point on wards I’m not going to keep repeating “B-tree”).
- An index entry includes the rowid of the corresponding table row. In a non-unique the rowid is treated as if it an extra column (of variable length) appended to the index which effectively makes it unique, but in a unique index the rowid is “carried” as an extra data item of fixed length.
- Just like table blocks, the bulk of an index block is made up of a row heap that holds the actual entries, and a row directory that is a list of pointers into the heap; however the symbolic block dump for an index doesn’t list the row directory in the way that it does for tables.
- The entries in the row heap an of index block are not sorted as they are inserted into the block, the appearance of order is created by inserting the row pointers into the correct place in the row directory so that a walk from the top to the bottom of the row directory will report the entries in the heap as if they were sorted.
- There are occasions when blocks are restructured in real-time, and the row heap is sorted so that it is (temporarily) in sync with the row directory. Index rebuilds, leaf block splits, and the equivalent of a “heap block compress” are reasons for this to happen.
We can confirm the first three points very simply with a short SQL script followed by a block dump. The last point takes a little more time and care to demonstrate, but again simple SQL with block dumps will make the point. These tests are left as an exercise for the user, although it may be quicker to reference the blog of Richard Foote who specializes in writing about indexes.
Compressing an index
You can create an index with compression enabled, or rebuild it to change its level of compression. For example, here’s a table creation statement followed by four possible sets of indexing commands:
create table t1 ( v1 varchar(4), n1 number(4) ); create index t1_i1 on t1(v1) compress; create index t1_i1 on t1(v1); alter index t1_i1 rebuild compress; create index t1_i1 on t1(v1, n1) compress; alter index t1_i1 rebuild compress 1; create unique index t1_i1 on t1(v1, n1) compress; alter index t1_i1 rebuild compress 2; select compression, prefix_length from user_indexes;
If you try running the above (with suitable “drop index” commands interspersed) you’ll find that the last attempt to rebuild the index will result in an Oracle error: “ORA-25194: invalid COMPRESS prefix length value”.
The compress option tells Oracle to deduplicate the leading column(s) of an index; by default this means all the columns of a non-unique index, and all but the last column of a unique index. When you remember that the rowid is appended to a non-unique index as if it were an extra column making the index unique, you can see that this apparent difference between compressing unique and non-unique indexes is actually self-consistent. This is why my sample code produces an error when we try to compress the two-column unique index on both its columns. This also helps to explain why an index is not automatically compressed if you create it in a tablespace with “default compression” – Oracle doesn’t want to have to make a decision about how many columns to compress.
To see what happens internally when we compress an index, let’s run the SQL to create the table and last (valid) indexing option from above – i.e. compress 1 column on the unique index – and insert the following data into the table:
(‘AAAA’,1), (‘AAAA’,2), (‘AAAA’,3), (‘AAAA’,4), (‘BBBB’,1), (‘BBBB’,2), (‘BBBB’,3), (‘BBBB’,4), (‘CCCC’,1), (‘CCCC’,2), (‘CCCC’,3), (‘CCCC’,4),
In a non-compressed index leaf block for this data we would find a row directory with 12 pointers to the row heap, and in the row heap we would find our 12 key values, each carrying a rowid. But we’ve told Oracle to “compress 1” which means eliminate duplicates – on a block by block basis – for the first column. Just as we create a list of tokens in each block for basic and OLTP table compression, we create a list of tokens (called prefixes) in each leaf block of the index – in our case we can see three repetitive values for the first column, so our row heap will hold 15 items of which three are prefixes and 12 “suffixes” and we will have, in effect, two row directories, one for the prefixes and one for the suffixes, and somehow Oracle will encode information that allows it to (a) rebuild complete index entries by combining prefixes with suffixes, and (b) walk the index entries in the right order as it works through the directories. In effect we are going to store something like the following:
(T1 = ‘AAAA’) (T2 = ‘BBBB’) (T3 = ‘CCCC’) (T1 1)(T1 2)(T1 3)(T1 4)(T2 1)(T2 2)(T2 3)(T2 4) (T3 1)(T3 2)(T3 3)(T3 4)
There’s a big difference, though, between table compression and index compression and their use of tokens. For table compression a given token could be used anywhere in any given row, its use isn’t limited to a specific column of a row; but for index compression the token has to represent the leading column(s) of the index, and every token represents exactly the same number of columns – so we don’t need to put anything into the final index entry to represent the token, we can infer its presence. So let’s see how the index entries are presented in a symbolic dump:
prefix row#0 flag: -P----, lock: 0, len=7 col 0; len 4; (4): 41 41 41 41 prc 4 prefix row#1 flag: -P----, lock: 0, len=7 col 0; len 4; (4): 42 42 42 42 prc 4 prefix row#2 flag: -P----, lock: 0, len=7 col 0; len 4; (4): 43 43 43 43 prc 4 row#0 flag: ------, lock: 2, len=11, data:(6): 01 40 01 81 00 00 col 0; len 2; (2): c1 02 psno 0 row#1 flag: ------, lock: 2, len=11, data:(6): 01 40 01 81 00 09 col 0; len 2; (2): c1 03 psno 0 row#2 flag: ------, lock: 2, len=11, data:(6): 01 40 01 81 00 03 col 0; len 2; (2): c1 04 psno 0 row#3 flag: ------, lock: 2, len=11, data:(6): 01 40 01 81 00 02 col 0; len 2; (2): c1 05 psno 0 row#4 flag: ------, lock: 2, len=11, data:(6): 01 40 01 81 00 05 col 0; len 2; (2): c1 02 psno 1 row#5 flag: ------, lock: 2, len=11, data:(6): 01 40 01 81 00 01 col 0; len 2; (2): c1 03 psno 1 etc ...
I’ve cut this dump short, there were another six “row#” entries, but showing them wouldn’t have given you any further information. As you can see, the first three items are the prefixes – each takes 7 bytes (len = 7) consisting of the flag byte (flag:) that shows it’s a prefix entry, a lock byte (that never seems to get used) and, for each column (and there is only one), a length byte and the four bytes that make up the prefix. You’ll also notice that Oracle reports something called the “prc” (possibly “prefix usage count”) reporting the number of suffixes associated with this prefix. This value is not actually stored in exactly the way it’s presented – but we’ll see in a moment how Oracle derives this value as it dumps the block.
The remaining items are the various suffixes, and each takes 11 bytes – the flag byte, lock byte, six bytes for the “fixed size data” that is the rowid for the entry (by the way, although Oracle reports (6) for every entry, the value is derived rather than being stored as a “length byte”) and, for each column, a length byte and the two bytes that make up the column value. Again we see an extra entry, this time the “psno” (possibly “prefix sequence number”), that tells us which prefix the suffix belongs to; again this item is derived, not something stored with the row.
You will note that in the dump the prefixes appear in alphabetical order, and the suffixes appear in numeric order within prefix. This is an artifact of the way the dump works, and the way that Oracle uses index blocks to find key values; it has nothing to do with the order I inserted the data (which I randomized). There’s one more detail that’s worth examining (in any index block dump) – it’s the number in square brackets that appears on the first line of each index entry after the “row#N”, this is the byte address within the block of the actual entry. It’s these addresses that get recorded in the row directories and, by adjusting the row directory as necessary, Oracle can ensure that a simple walk through the row directories (jumping to the row heap from each entry) will return the index entries in the right order.
In my example the prefixes’ data values and prefix addresses happen to show that the prefixes are stacked (from the bottom of the block upwards) in the right (i.e. reverse) order – that was a coincidence; but when you look at the addresses for the ‘AAAA’ rows (row#0 to row#3) it is possible to work out that the values arrived in the order 1, 4, 3, 2 by looking at the addresses in descending order, viz: 8025, 7989, 7978, 7883.
So here’s the raw dump of the row directories that Oracle actually stored for these 12 rows / 15 entries:
1ECB1F59 1F351F2A 1F471ED6 1EEC1EE1 1F021EF7 1F1F1F0D 00041F52 00081F40 000C1F18
With a little cosmetic re-arrangement (which includes messing around with swapping word order), this turns into:
1F59 1ECB 1F2A 1F35 1ED6 1F47 1EE1 1EEC 1EF7 1F02 1F0D 1F1F 1F52 0004 1F40 0008 1F18 000C
Then, for convenience, turning the hexadecimal number into decimal we get:
8025 7883 7978 7989 7894 8007 7905 7916 7927 7938 7949 7967 8018 4 8000 8 7960 12
You should recognize the first six numbers in the first line, they are the six addresses of the suffix entries I printed further up the page, in the correct order to report the values in sorted order – the last six numbers in the first line are the addresses of the six entries I didn’t print. This list is the main row directory for the leaf block – and if this index hadn’t been compressed that would be all you needed to know about an index row directory.
But this is a compressed index, so we also have a row directory for the prefixes, which you can see in the second line. The four-digit numbers are the addresses of the prefixes, in the correct order to report the values in sorted order, but for the prefix directory there’s a number after each address which is an index into the main row directory. This identifies the last entry in the main directory to which each prefix applies. (It may seem a little odd, you might have expected Oracle to store 1, 5, 8 to identify the first entry for each prefix, but there’s probably a clever bit of code that makes it easier to work from the last).
Obviously there are a couple of other bits of information in the data header information that let Oracle get to the row directories and use them properly (most significantly the total number of entries in main directory and total number of prefixes), but once you have a picture of the two directories you can see how Oracle can derive the prc (prefix usage count) and psno (prefix serial number) that it reports as it walks through the prefix directory in order, stepping through the related suffix directory entries in order. You can also begin to think about the specific workload that’s involved with using and maintaining a compressed index.
I’ll finish with one little detail about how Oracle can fool you when you try to investigate compressed indexes. If you do the wrong tests it would be very easy to decide that Oracle kept the entire row heap in a very tidy sorted order rather than letting the heap turn into a mess while carefully maintaining the row directories. If you rebuild an index, or when a leaf block splits, or when the index equivalent of a “heap block compress” (the tidying up that occurs when there is usable space in the block that it isn’t in the free space gap) takes place, then Oracle rebuilds the heap into a neatly sorted order that (from the bottom upwards) reads:
Prefix1, suffix11, suffix12, suffix13, suffix14, Prefix2, suffix21, suffix222, suffix23, suffix24, Prefix3, suffix31, suffix32, suffix33, suffix34, Prefix4, suffix41, suffix42, …
If you haven’t watched a compressed index evolving as data is inserted, updated, and deleted, you could get completely the wrong picture of what Oracle is doing.
We’ve seen that compression when applied to indexes is similar to basic table compression – there is a token table, and a “row” table, and duplicate values are copied once into the token table. However the deduplication applied to indexes is only about a pre-specified number of columns at the start of the index entry, it is applied to every row unconditionally in exactly the same way, and the data is maintained in real time.
In table compression we saw that there was a single row directory split into two pieces that allowed us to identify tokens and the main rows separately. Under index compression there are two row directories of different structures – although there is still only one row heap – which allow us to identify prefixes in order, how many suffixes there are with each prefix, and how to find the first suffix so that we can walk the suffixes in sorted order.
In the final article in this series we’ll see how much work Oracle has to do to maintain and use the two row directories.
There is a catalogue of all five items in this series (and a few others) at this URL