{"id":73262,"date":"2013-01-24T11:22:14","date_gmt":"2013-01-24T11:22:14","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/uncategorized\/compression-in-oracle-part-2-read-only-data\/"},"modified":"2021-07-14T13:07:46","modified_gmt":"2021-07-14T13:07:46","slug":"compression-in-oracle-part-2-read-only-data","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/compression-in-oracle-part-2-read-only-data\/","title":{"rendered":"Compression in Oracle \u2013 Part 2: Read-Only Data"},"content":{"rendered":"<p>In <a title=\"Compression in Oracle \u2013 Part 1: Basic Table Compression\" href=\"https:\/\/allthingsoracle.com\/compression-oracle-basic-table-compression\/\">part one<\/a> of this series we saw that the mechanics of basic table compression came in to play only for direct path inserts, CTAS, and <em>\u201calter table move\u201d<\/em>. We also saw that Oracle would, by default, set the free space percentage (<em><strong>pctfree<\/strong><\/em>) for a table to zero if the table were defined as compressed \u2013 giving us a hint that basic compression should be viewed as a read-only strategy.<\/p>\n<p>When we examined a block dump we found that Oracle didn\u2019t \u201ccompress\u201d data, it \u201cdeduplicated\u201d the data on a block by block basis, creating a list of the duplicated values within each block, replacing duplicated items with tokens representing those items. Moreover Oracle could rearrange the column order for each block to improve its chances of being able to use a single token to represent multiple adjacent column values. This told us that Oracle didn\u2019t need to \u201cdecompress\u201d blocks when reading them, it merely needed to reconstruct rows by following pointers which, of course, is a CPU intensive task.<\/p>\n<p>In this article we look at what happens if you don\u2019t stick to the \u201cread-only\u201d guideline, and from this we can move on in the third article to the (separately licensed) \u201cOLTP\u201d compression option. As before, all examples come from an instance of Oracle 11.2.0.3.<\/p>\n<h3>Deduplication and Deletes<\/h3>\n<p>You will recall that in the last article I picked a row from a data block dump that consisted entirely of a single token, then found that Oracle had recursively applied deduplication to reduce that token to a combination of two tokens and two extra column values. Here is the row we examined:<\/p>\n<pre>tab 1, row 0, @0x1b28\r\ntl: 5 fb: --H-FL-- lb: 0x0 cc: 4\r\ncol 0: [ 4] 41 41 41 41\r\ncol 1: [10] 41 41 41 41 41 41 41 41 41 41\r\ncol 2: [ 2] c1 02\r\ncol 3: [10] 20 20 20 20 20 20 20 20 20 31\r\nbindmp: 2c 00 01 04 31<\/pre>\n<p>And here is what we found when we looked up the value of the single token referenced::<\/p>\n<pre>Tab 0, row 49, @0x1ed0\r\ntl: 19 fb: --H-FL-- lb: 0x0 cc: 4\r\ncol 0: [ 4] 41 41 41 41\r\ncol 1: [10] 41 41 41 41 41 41 41 41 41 41\r\ncol 2: [ 2] c1 02\r\ncol 3: [10] 20 20 20 20 20 20 20 20 20 31\r\nbindmp: 00 08 04 36 40 ca c1 02 d2 20 20 20 20 20 20 20 20 20 31<\/pre>\n<p>The first five bytes of the <strong>bindmp <\/strong>tells use that the token is used 8 times in the block (00 08), is made up of 4 columns, and requires us to look up tokens 54 (0x36) and 64 (0x40), which are as follows:<\/p>\n<pre>tab 0, row 54, @0x1f74\r\ntl: 7 fb: --H-FL-- lb: 0x0 cc: 1\r\ncol 0: [ 4] 41 41 41 41\r\nbindmp: 00 0a cc 41 41 41 41\r\n\r\ntab 0, row 64, @0x1f7b\r\ntl: 13 fb: --H-FL-- lb: 0x0 cc: 1\r\ncol 0: [10] 41 41 41 41 41 41 41 41 41 41\r\nbindmp: 00 05 d2 41 41 41 41 41 41 41 41 41 41<\/pre>\n<p>With this picture in front of us, we can now recognize the extra work that has to be introduced if we delete the original row. Two things have to happen \u2013 the row has to be marked as deleted (in the normal way), but the \u201cusage count\u201d on token 49 also has to decrease by 1. Here\u2019s a tiny extract from the block immediately after deleting our row, first the row entry itself:<\/p>\n<pre>tab 1, row 0, @0x1b28\r\ntl: 2 fb: --HDFL-- lb: 0x2\r\nbindmp: 3c 02<\/pre>\n<p>And here\u2019s the binary dump from token 49 \u2013 note, particularly, the second byte:<\/p>\n<pre>bindmp: 00 07 04 36 40 ca c1 02 d2 20 20 20 20 20 20 20 20 20 31<\/pre>\n<p>So one thing we can see is that the work done to maintain the block increases \u2013 even on a simple delete. But this token is used in seven other rows in the block, so you what happens if I delete those rows as well? The answer depends on how many concurrent sessions do the deletes. If I use a single process to delete all eight rows Oracle deletes the token as I delete the eighth row \u2013 at which point tokens 63 and 64 have to be updated to show that they have one less dependency. If I repeat the test using multiple sessions to delete the rows and don\u2019t commit after each delete, I can get to a situation where a token shows zero usage but doesn\u2019t disappear. (It\u2019s possible that some later block tidying operation that I haven\u2019t yet observed will get rid of tokens in this state.)<\/p>\n<p>Until I mentioned the concurrency test I hadn\u2019t said anything about <strong>commits<\/strong> (or <strong>rollbacks<\/strong>); the changes to the tokens take place on the delete, and don\u2019t wait for a commit \u2013 so what happens if I commit, or rollback? On a commit the usual commit cleanout may take place, updating the transaction\u2019s ITL slot with the commit SCN (in other words, nothing new or interesting happens). On a rollback the data is restored \u2013 the row is recreated from the undo information, any tokens that have been deleted will also be recreated, and any relevant token usage counts are incremented.<\/p>\n<p>An important point to note is that on the rollback, compression is preserved. There will be a difference between the original block and the block after the rollback, though \u2013 the rows (and tokens) that are restored will be written back into the block\u2019s free space \u2013 which may require the block to go through a process that coalesces the free space. So if you repeat a block dump after the rollback you may see that the contents of the block have moved. In my case (after deleting the 8 rows that referenced token 49 and rolling back) I saw the following difference:<\/p>\n<pre>tab 0, row 49, @0x1ed0 -- original position of token 0\r\ntab 0, row 49, @0x134a -- position of token 0 after rollback\r\n\r\ntab 1, row 0, @0x1b28 -- original position of row 0\r\ntab 1, row 0, @0x1322 -- position of row 0 after rollback<\/pre>\n<h3>Compression and Free space<\/h3>\n<p>The business of rows \u201cmoving\u201d if you delete and roll back introduces an interesting point about free space. When you declare a table with basic compression, it defaults to <em><strong>pctfree 0<\/strong><\/em> \u2013 no free space, and yet there was free space in my block for rows to move into when a rollback took place.<\/p>\n<p>There is a minor complexity that can confuse the issue when using small rows (and \u201ccompressed\u201d rows can be very small) but even allowing for this, I found that Oracle seemed to leave a little free space (a few tens of bytes, enough for roughly two full rows in my small test case). This may simply be an error in the estimates that Oracle used for space requirements while using the direct path mechanism to build blocks &#8211; it may have been deliberate.<\/p>\n<p>This small amount of space allowed Oracle to restore the deleted rows, pushing down and tidying up the block to reclaim the little holes left from the prior deletes as it did so. In some cases you may find that this free space could even allow you to survive a tiny number of updates \u2013 so let\u2019s look at a few numbers.<\/p>\n<p>I\u2019m going to change my starting data set slightly so that a typical row looks like this: (1000001, \u2018AAAA\u2019, \u2019AAAAAAAAAA\u2019,\u2019&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;1\u2019), the first column is a sequence number, the second column cycles from As to Es (5 distinct values), the third column from As to Js (10 distinct values), and the last column, space padded to 10 characters, cycles from 1 to 50; and I\u2019ve generated 800 rows of data. Because of the way I\u2019ve created the data, there happen to be eleven rows in the first block where the 2nd and 3rd columns are all A\u2019s, so I\u2019ve going to run the following SQL statement and dump the first block of the table to see what has happened.<\/p>\n<pre>update t1\r\nset\r\n        vc_rep = 'BBBB'\r\nwhere\r\n        vc_rep = 'AAAA'\r\nand     vc_cycle = 'AAAAAAAAAA'\r\nand     rownum &lt;= 4\r\n ;<\/pre>\n<p>It turned out that there was enough space in the block for two rows to be updated and still fit in the block, but two of my rows had to be migrated. Here are the before and after images of one of the rows that stayed in the block; take a close look at the binary dumps:<\/p>\n<pre>tab 1, row 0, @0x1bb8 -- before\r\ntl: 11 fb: --H-FL-- lb: 0x0 cc: 4\r\ncol 0: [ 4] 41 41 41 41\r\ncol 1: [10] 41 41 41 41 41 41 41 41 41 41\r\ncol 2: [10] 20 20 20 20 20 20 20 20 20 31\r\ncol 3: [ 5] c4 02 01 01 02\r\nbindmp: 2c 00 02 03 1b cd c4 02 01 01 02\r\n\r\ntab 1, row 0, @0x4f3 -- after\r\ntl: 37 fb: --H-FL-- lb: 0x2 cc: 4\r\ncol 0: [ 4] 42 42 42 42\r\ncol 1: [10] 41 41 41 41 41 41 41 41 41 41\r\ncol 2: [10] 20 20 20 20 20 20 20 20 20 31\r\ncol 3: [ 5] c4 02 01 01 02\r\nbindmp: 2c 02 04 00 cc 42 42 42 42 d2 41 41 41 41 41 41 41 41 41 41 d2 20 20 20 20 20 20 20 20 20 31 cd c4 02 01 01 02<\/pre>\n<p>On the update, Oracle has expanded the row to the full four columns before modifying it and writing it back to the block in the free space (note the change of length (<strong>tl:<\/strong> ) and row address). You\u2019ll have to take my word for it that there were two tokens in the token table that could have been used to substitute for the first two columns of this updated row, but Oracle didn\u2019t try to find them and use them. \u2013 so at first sight it looks as if updates to compressed data are likely to cause total chaos, a highly compressed row could end up being expanded to a huge row, that won\u2019t fit in the tiny amount of free space and therefore has to be migrated.<\/p>\n<p>In passing, although we now have a bit of a mess, with a couple of expanded rows and a couple of migrated rows, when I issued a rollback, Oracle cleaned up all the mess and (apart from the physical rearrangement of rows in the block) left all the rows in their original compressed, un-migrated state.<\/p>\n<p>So how bad can updates get? Before getting too alarmed it\u2019s worth looking more closely at what I did with my update. I modified a column that was part of a token which covered most of the row. What if my update had changed a column which hadn\u2019t been tokenized \u2013 would Oracle still expand the row to deal with the update? The answer is no. The last column in the block dumps above are reporting the <strong>id<\/strong> column, here are the \u201cbefore and after\u201d dumps of a row when I repeated my test but updated the just the <strong>id<\/strong> column<\/p>\n<pre>tab 1, row 0, @0x1bb8 -- before\r\ntl: 11 fb: --H-FL-- lb: 0x0 cc: 4\r\ncol 0: [ 4] 41 41 41 41\r\ncol 1: [10] 41 41 41 41 41 41 41 41 41 41\r\ncol 2: [10] 20 20 20 20 20 20 20 20 20 31\r\ncol 3: [ 5] c4 02 01 01 02\r\nbindmp: 2c 00 02 03 1b cd c4 02 01 01 02\r\n\r\ntab 1, row 0, @0x1bb8 -- after\r\ntl: 10 fb: --H-FL-- lb: 0x2 cc: 4\r\ncol 0: [ 4] 41 41 41 41\r\ncol 1: [10] 41 41 41 41 41 41 41 41 41 41\r\ncol 2: [10] 20 20 20 20 20 20 20 20 20 31\r\ncol 3: [ 4] c3 64 64 64\r\nbindmp: 2c 02 02 03 1b cc c3 64 64 64<\/pre>\n<p>The update took place in situ (and the total row length dropped from 11 bytes to 10 bytes) because the new value was actually smaller than the old value \u2013 but notice how the binary dumps show us that the row consists, in both cases, of token 1b (fifth byte) which represents the first three columns (fourth byte) reported, and one actual stored value. Row expansion has NOT taken place on this update.<\/p>\n<p>It gets even better; my initial test rows happened to be rows where virtually the whole row was covered by a single token \u2013 what happens if I have a row that is represented by several tokens and my update affects only one of the columns that is tokenized ? Whatever you may have heard about compressed data and expansion, Oracle doesn\u2019t expand the whole row \u2013 it only expands the tokens affected by the update. Again, here\u2019s a \u201cbefore and after\u201d pair of dumps to make the point:<\/p>\n<pre>tab 1, row 18, @0x1ac2\r\ntl: 13 fb: --H-FL-- lb: 0x0 cc: 4\r\ncol 0: [ 4] 44 44 44 44\r\ncol 1: [10] 58 58 58 58 58 58 58 58 58 58\r\ncol 2: [10] 20 20 20 20 20 20 20 20 33 34\r\ncol 3: [ 5] c4 02 01 01 14\r\nbindmp: 2c 00 04 03 32 37 45 cd c4 02 01 01 14\r\n\r\ntab 1, row 18, @0x1ab8\r\ntl: 23 fb: --H-FL-- lb: 0x2 cc: 4\r\ncol 0: [ 4] 44 44 44 44\r\ncol 1: [10] 59 59 59 59 59 59 59 59 59 59\r\ncol 2: [10] 20 20 20 20 20 20 20 20 33 34\r\ncol 3: [ 5] c4 02 01 01 14\r\nbindmp: 2c 02 04 00 32 d2 59 59 59 59 59 59 59 59 59 59 45 cd c4 02 01 01 14<\/pre>\n<p>At the start of this test, the binary dump shows that this row consisted of three separate tokens (0x32, 0x37 and 0x45) followed by a stored value. I updated \u201ccol 1\u201d which held the value \u2018XXXXXXXXXX\u2019 to \u2018YYYYYYYYYY\u2019 and, as you can see, the final binary dump shows that this row still includes tokens 0x32 and 0x45, but token 0x37 has been replaced by the actual value. You can also see that the total length of the row has increased by 10 bytes, from 13 to 23, which means Oracle had to move it into the (small amount of) free space and, during a subsequent repacking of the rows in the block, the final address of the row has changed.<\/p>\n<p>So, when you update data that has been subject to basic compression, Oracle may have to do some expansion of tokens into column values, but it keeps this expansion to a minimum (and if the update is a \u201cno-change\u201d update, expansion simply doesn\u2019t take place). Since there seems to be a tiny amount of free space left in a block after compression \u2013 even when <strong>pctfree<\/strong> has defaulted to 0 \u2013 you may be able to do a tiny number of updates to a compressed table without causing dramatic expansion and row migration, but you may find the effects impossible to predict.<\/p>\n<p>If you do need to do a little data maintenance on compressed data, you really need to do some careful testing with real data to see if you can find a suitable setting for <strong>pctfree<\/strong> that will keep row migrations to an acceptable level.<\/p>\n<h3>Summary<\/h3>\n<p>When you delete rows from a table defined with <strong>basic compression<\/strong>, there may be a little extra CPU usage as Oracle maintains the token table to reduce the reference counts on related tokens, and deletes tokens when their count reaches zero; but apart from this, and apart from a little space wastage when tokens get to a zero usage but don\u2019t get deleted, the overheads on deletion don\u2019t seem to present a threat.<\/p>\n<p>When you update rows in a table defined with basic compression, you have to remember that Oracle will have set the <strong>pctfree<\/strong> to zero by default when you first defined the table as compressed (and will keep resetting it every time you <strong>move<\/strong> the table) so there will be very little space for rows to grow unless you explicitly set <strong>pctfree<\/strong> to something larger.<\/p>\n<p>To make things more difficult, if your update modifies columns that have been tokenized, Oracle will work on a copy of the row with those modified tokens expanded to their full value \u2013 and it will not attempt to recompress the modified row afterwards even if there are suitable tokens already in existence in the block. As a side effect, you are likely to find that updates to compressed tables result in significant increases in row length and excessive row migration. Ironically, the smarter Oracle has been in compressing your data (resulting, typically, in multiple consecutive columns turning into a single token) the worse the explosion is likely to be when you start to update the data.<\/p>\n<p>As a basic guideline \u2013 unless you are extremely well-informed about your data and its usage then <strong>basic compression<\/strong> should be used only for read-only data. In the next article we\u2019ll be looking at \u201cOLTP\u201d compression and see what Oracle does (if anything) to improve this situation.<\/p>\n<p>There is a catalogue of all five items in this series (and a few others) at <em><strong><a href=\"https:\/\/jonathanlewis.wordpress.com\/2015\/08\/05\/compression\/\">this URL<\/a><\/strong><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In part one of this series we saw that the mechanics of basic table compression came in to play only for direct path inserts, CTAS, and \u201calter table move\u201d. We also saw that Oracle would, by default, set the free space percentage (pctfree) for a table to zero if the table were defined as compressed \u2013 giving us a hint&hellip;<\/p>\n","protected":false},"author":101205,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143533],"tags":[],"coauthors":[],"class_list":["post-73262","post","type-post","status-publish","format-standard","hentry","category-oracle-databases"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73262","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/101205"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=73262"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73262\/revisions"}],"predecessor-version":[{"id":91733,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73262\/revisions\/91733"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=73262"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=73262"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=73262"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=73262"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}