{"id":73264,"date":"2013-01-15T16:53:30","date_gmt":"2013-01-15T16:53:30","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/uncategorized\/compression-oracle-basic-table-compression\/"},"modified":"2021-07-14T13:07:46","modified_gmt":"2021-07-14T13:07:46","slug":"compression-oracle-basic-table-compression","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/compression-oracle-basic-table-compression\/","title":{"rendered":"Compression in Oracle &#8211; Part 1: Basic Table Compression"},"content":{"rendered":"<p>In this short series on compression in Oracle, we will be looking at the various types of compression used by traditional Oracle systems, this means: (i) basic table compression, (ii) OLTP table compression, and (iii) index compression. I won&#8217;t be discussing the hybrid columnar compression (HCC) that appeared with Exadata (though I may write up a mini-series on that topic some time in the future).<\/p>\n<p>Of the three mechanisms, index compression and basic table compression are part of the core product, while OLTP compression is part of the separately licensed \u201cAdvanced Compression Option (ACO)\u201d. In this first article we will restrict ourselves to creating data using basic table compression, leaving the question of updating and deleting compressed data to the second article, which will lead us into \u201cOLTP\u201d compression in the third article. We&#8217;ll leave index compression to stand alone in the fourth and fifth articles.<\/p>\n<p>Our aim in this first article is to answer a few of the more common questions that people ask about compression by examining the mechanisms that Oracle uses for basic table compression.<\/p>\n<h3>When does (basic) compression work<\/h3>\n<p>The usual questions people ask about compression are: <em>\u201chow do I create compressed data?\u201d<\/em>, <em>\u201chow does Oracle decompress the blocks?\u201d<\/em>, <em>\u201cwhat impact does compression have on performance?\u201d<\/em> and the question you should ask before using any feature <em>\u201care there any side effects I need to worry about?\u201d<\/em><\/p>\n<p>The easiest way to answer the first question is through a demonstration showing some of the possibilities. Here are 5 pieces of SQL that will create and populate a table with a copy of the first 50,000 rows of the view <b><i>all_objects<\/i><\/b> on a database running 11.2.0.3; after running each one I collected stats on the table and ran a query to report the number of blocks in the table, and a few other details about the table.<\/p>\n<pre>--\t1. Baseline CTAS\r\ncreate table t1\r\nas\r\nselect * from all_objects where rownum &lt;= 50000;\r\n\r\n--\t2. CTAS with basic compression enabled\r\ncreate table t1 compress basic\r\nas\r\nselect * from all_objects where rownum &lt;= 50000;\r\n\r\n--\t3. Normal insert into empty table defined as compressed\r\ncreate table t1 compress basic\r\nas\r\nselect * from all_objects where rownum = 0;\r\n\r\ninsert into t1 select * from all_objects where rownum &lt;= 50000\r\n\r\n--\t4. Direct path insert into empty table defined as compressed\r\ncreate table t1 compress basic\r\nas\r\nselect * from all_objects where rownum = 0;\r\n\r\ninsert \/*+ append *\/ into t1 select * from all_objects where rownum &lt;= 50000\r\n\r\n--\t5. CTAS without compression, then change to compressed\r\ncreate table t1\r\nas\r\nselect * from all_objects where rownum &lt;= 50000;\r\n\r\nalter table t1 compress basic; \r\n\r\nalter table t1 move<\/pre>\n<p>The query I ran after each test looked like this:<\/p>\n<pre>select  blocks, pct_free , compression, compress_for\r\nfrom    user_tables\r\nwhere   table_name = 'T1';<\/pre>\n<p>There are other possibilities, of course; we could define a tablespace so that any table created in that tablespace was, by default, compressed; we can arrange for a single partition or subpartition of a partitioned table to be compressed; we can even\u00a0change the default compression on partitioned tables so that all new partitions or subpartitions are created with compression.<\/p>\n<p>Sticking with our five samples, though,\u00a0I&#8217;ve summarised the results of these code fragments in the following table \u2013 although you\u2019ll notice that I\u2019ve reported two results from test 5, one from before the move, one after:<\/p>\n<table style=\"margin-bottom: 20px\">\n<tbody>\n<tr>\n<td valign=\"top\" width=\"149\">Test<\/td>\n<td valign=\"top\" width=\"76\">BLOCKS<\/td>\n<td valign=\"top\" width=\"94\">PCT_FREE<\/td>\n<td valign=\"top\" width=\"104\">COMPRESSION<\/td>\n<td valign=\"top\" width=\"113\">COMPRESS_FOR<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"149\">1 (CTAS)<\/td>\n<td valign=\"top\" width=\"76\">714<\/td>\n<td valign=\"top\" width=\"94\">10<\/td>\n<td valign=\"top\" width=\"104\">DISABLED<\/td>\n<td valign=\"top\" width=\"113\"><\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"149\">2 (CTAS compress)<\/td>\n<td valign=\"top\" width=\"76\">189<\/td>\n<td valign=\"top\" width=\"94\">0<\/td>\n<td valign=\"top\" width=\"104\">ENABLED<\/td>\n<td valign=\"top\" width=\"113\">BASIC<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"149\">3 Insert<\/td>\n<td valign=\"top\" width=\"76\">644<\/td>\n<td valign=\"top\" width=\"94\">0<\/td>\n<td valign=\"top\" width=\"104\">ENABLED<\/td>\n<td valign=\"top\" width=\"113\">BASIC<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"149\">4 Insert append<\/td>\n<td valign=\"top\" width=\"76\">189<\/td>\n<td valign=\"top\" width=\"94\">0<\/td>\n<td valign=\"top\" width=\"104\">ENABLED<\/td>\n<td valign=\"top\" width=\"113\">BASIC<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"149\">5a Compress<\/td>\n<td valign=\"top\" width=\"76\">714<\/td>\n<td valign=\"top\" width=\"94\">10<\/td>\n<td valign=\"top\" width=\"104\">ENABLED<\/td>\n<td valign=\"top\" width=\"113\">BASIC<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"149\">5b Move<\/td>\n<td valign=\"top\" width=\"76\">189<\/td>\n<td valign=\"top\" width=\"94\">0<\/td>\n<td valign=\"top\" width=\"104\">ENABLED<\/td>\n<td valign=\"top\" width=\"113\">BASIC<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The first test is just a baseline to give you an idea of how much space the data needs by default \u2013 as you can see I let the table used the default value of <strong>pctfree<\/strong> (reported in view <strong>user_table<\/strong>s as <strong>pct_free<\/strong>) and needed 714 blocks to hold the data.<\/p>\n<p>When I included the compress option with the CTAS, Oracle automatically set the pctfree to zero \u2013 and in this case managed to squeeze the data down to just 189 blocks. The zero pctfree is a hint that Oracle thinks this table is going to be read-only; but it is possible to set a non-zero pctfree and, as we shall see in the next article in the series, there may be cases where you want to take advantage of that option. Notice that in this, and all subsequent tests, the two columns reporting compression show that basic compression is enabled.<\/p>\n<p>In the third and fourth tests I created an empty table with the compress option (which is why the pct_free is 0), and then inserted the data. As you can see, the data is only compressed when the insert is a direct path insert (you may recall that <em>&#8220;compress basic&#8221;<\/em> was originally <em>&#8220;compress for direct_load operations&#8221;<\/em>); the normal insert, even though it\u2019s an array insert, doesn\u2019t compress the data. (The drop from 714 blocks to 644 blocks in test 3 is just the side effect of pctfree dropping from 10 to zero \u2013 allowing for small variations 90% of 714 is very close to 644).<\/p>\n<p>The final test then warns us that simply changing a table from uncompressed to compressed does nothing to the data. If you want to convert uncompressed data to compressed data you first need to change the table definition, and then you need to move the table to compress its contents. If you do this, of course, you\u2019d also have to rebuild any indexes on the table immediately afterwards.<\/p>\n<h3>It\u2019s compression, Jim, but not as we know it.<\/h3>\n<p>The rest of the questions I\u2019ve asked above can best be addressed by seeing how Oracle does compression, and the answer is that (for basic and OLTP compression) Oracle doesn\u2019t really do compression. What it \u00a0does is \u201cde-duplication\u201d at the block level. Imagine you had three rows in a block containing the following data:<\/p>\n<pre>(\u2018XXXX\u2019, \u2018abcdef\u2019, 254.32, \u2018CLOSED\u2019)\r\n(\u2018XXXX\u2019, \u2018pqrstu\u2019, 17.12,  \u2018CLOSED\u2019)\r\n(\u2018AAAA\u2019, \u2018abcdef\u2019, 99.99,  \u2018CLOSED\u2019)<\/pre>\n<p>Oracle could notice that the value \u2018XXXX\u2019 appears twice, that the value \u2018abcdef\u2019 appears twice, and that the value \u2018CLOSED\u2019 appears three times. So it can create a table of repeated values in the block, and insert tokens into the rows to make them shorter, so our block might start to look like this:<\/p>\n<pre>T1 (\u2018XXXX\u2019)\r\nT2 (\u2018abcdef\u2019)\r\nT3 (\u2018CLOSED\u2019)\r\n(T1, T2, 254.32, T3)\r\n(T1, \u2018pqrstu\u2019, 17.12, T3)\r\n(\u2018AAAA\u2019, T2, 99.99, T3)<\/pre>\n<p>In fact, Oracle can get smarter than that, because it can rearrange the column order for each individual block to maximize the possibility of multiple columns turning into a single token. Notice, in my example, that token T1 and token T3 both appear in all three rows. Oracle can rearrange the order that the columns are stored in this block to put those tokens side by side, and create a new token the represents the combination of the two individual tokens, so our block becomes:<\/p>\n<pre>T1 (\u2018XXXX\u2019, T2)\t\t-- a token made from a value and a token\r\nT2 (\u2018CLOSED\u2019)\r\nT3 (\u2018abcdef\u2019)\r\n(T1, T3, 254.32)\t-- notice how this row is now only 3 \u201ccolumns\u201d\r\n(T1, \u2018pqrstu\u2019, 17.12)\t-- ditto\r\n(\u2018AAAA\u2019, T2, T3, 99.99)<\/pre>\n<p>Let\u2019s take a closer look at this by examining a few lines from the symbolic block dump of a block.\u00a0 Here\u2019s the first fragment I want to highlight \u2013 it appears only for compressed table blocks:<\/p>\n<pre>perm_9ir2[4]={ 2 0 1 3 }<\/pre>\n<p>This is a table with 4 columns but <strong><em>for this block<\/em><\/strong> Oracle has rearranged (permuted) the order the columns are stored so that the thing stored at column 2 in this block is column 0 in the table definition, column 0 is really column 1, column 1 is column 2 and column 3 is (still) column 3<\/p>\n<pre>0x24:pti[0]\u00a0\u00a0\u00a0\u00a0 nrow=65\u00a0\u00a0\u00a0 offs=0\r\n0x28:pti[1]\u00a0\u00a0\u00a0\u00a0 nrow=400\u00a0\u00a0 offs=65<\/pre>\n<p>There are two \u201ctables\u201d in this block, the first is the token table which holds the 65 tokens which will be used in the block, starting at offset zero in the block\u2019s row directory, the second holds 400 \u201cproper\u201d rows, starting at offset 65 in the block\u2019s row directory. (This means, by the way, that the block\u2019s row directory currently has 465 entries \u2013 there are some interesting side effects that can appear with large row directories.)<\/p>\n<p>If we hunt through the block dump to find the first row in table 1 (i.e. the first \u201creal\u201d row) we find the following, which looks pretty much like an ordinary row dump from a typical heap table block dump for a row with 4 columns. But there are a few special points to notice:<\/p>\n<pre>tab 1, row 0, @0x1b28\r\ntl: 5 fb: --H-FL-- lb: 0x0\u00a0 cc: 4\r\ncol\u00a0 0: [ 4]\u00a0 41 41 41 41\r\ncol\u00a0 1: [10]\u00a0 41 41 41 41 41 41 41 41 41 41\r\ncol\u00a0 2: [ 2]\u00a0 c1 02\r\ncol\u00a0 3: [10]\u00a0 20 20 20 20 20 20 20 20 20 31\r\nbindmp: 2c 00 01 04 31<\/pre>\n<p>According to the column lengths (the number in square brackets) the row length should be 26 bytes, plus the 4 bytes for the 4 column lengths, plus one byte each for the flag byte (fb:), lock byte (lb:)\u00a0\u00a0and column count (cc:) \u2013 but the total length (tl:) is only 5 bytes. And the last line of this extract shows us the actual values (bind map <em>[ed: see note below from Flado, &#8220;binary dump&#8221; is a much better interpretation of bindmp]<\/em>) of those five bytes. These five bytes are the flag byte (0x2c = &#8216;&#8211;H-FL&#8217;), the lock byte, the &#8220;stored&#8221; column count &#8211; i.e. the number of columns stored at this location, which is just one &#8211; and the next two bytes tell us that that one &#8220;column&#8221; is a token representing 4 consecutive and we need to look at token 0x31 of the token table (interestingly, different versions of Oracle managed to order the token table differently even though the environment seemed to be the same &#8211; my test results are all from 11.2.0.3). Let\u2019s look at row 49 (0x31) in table 0:<\/p>\n<pre>tab 0, row 49, @0x1ed0\r\ntl: 19 fb: --H-FL-- lb: 0x0  cc: 4\r\ncol  0: [ 4]  41 41 41 41\r\ncol  1: [10]  41 41 41 41 41 41 41 41 41 41\r\ncol  2: [ 2]  c1 02\r\ncol  3: [10]  20 20 20 20 20 20 20 20 20 31\r\nbindmp: 00 08 04 36 40 ca c1 02 d2 20 20 20 20 20 20 20 20 20 31<\/pre>\n<p>The token looks almost identical to the row \u2013 but the total length of the token is 19 bytes. So let\u2019s look at the binary dump for the token. The first two bytes of the map tell us that this token is used 8 times in the block. The next byte tells us that there are 4 columns in the token and, through some cunning encoding, the next two bytes tell us that the first two columns of this token are actually tokens 0x36 (decimal 54) and 0x40 (64). You can then see the actual values preceded by &#8220;200 + column length&#8221; for the last two columns of the token.<\/p>\n<p>So let\u2019s look at tokens 54 and 64 \u2013 which allow us to see that they are, indeed, single column tokens, with values that match the values we saw in token zero and the actual row. (You might note that token 54 is used in a total of 10 places in this block (the 0x0a in the second place in the binary dump, and token 64 is used in 5 places.)<\/p>\n<pre>tab 0, row 54, @0x1f74\r\ntl: 7 fb: --H-FL-- lb: 0x0  cc: 1\r\ncol  0: [ 4]  41 41 41 41\r\nbindmp: 00 0a cc 41 41 41 41\r\n\r\ntab 0, row 64, @0x1f7b\r\ntl: 13 fb: --H-FL-- lb: 0x0  cc: 1\r\ncol  0: [10]  41 41 41 41 41 41 41 41 41 41\r\nbindmp: 00 05 d2 41 41 41 41 41 41 41 41 41 41<\/pre>\n<p>So, by tracking our way from row directory to row, to token (by way of row directory), to two more tokens (by way of the row directory) we see that we can expand an entry that started out as 5 bytes into a full row of 4 columns with 26 bytes of data.<\/p>\n<p>There are several lessons to be learned from the work we did tracking through the block dump. The first is that Oracle doesn\u2019t decompress the table blocks, it simply re-constructs the row you need by hopping back and forth between the row directory and the row pieces (the code may even avoid visiting some tokens \u2013 the single column ones &#8211; if the column values aren\u2019t needed for the SQL statement). The second is that it\u2019s possible to spend a significant amount of extra CPU time bouncing around a block to reconstruct a row \u2013 this could have a noticeable impact on CPU usage if you are doing lots of tablescans. As a side effect, because Oracle will have to hold (pin) the block for some time to reconstruct rows, you may find that your code will do fewer \u201cconsistent gets \u2013 examination\u201d which means more activity on the \u201ccache buffers chains\u201d latch. Of course, we hope that the extra CPU time will be offset by the smaller number of physical reads we may have to do because we&#8217;ve packed our rows into a smaller number of blocks, which may allow us to keep more data cached for longer.<\/p>\n<h3>Summary<\/h3>\n<p>There\u2019s still a lot to say about the side effects of compression, in particular what happens when you delete or update rows, and this will lead us on to the implementation of compression for OLTP (or <em>\u2018for all operations\u2019<\/em> as it used to be) \u2013 but those are topics for future articles.<\/p>\n<p>What we have seen from this first article is that basic compression is used only with direct path inserts, not with ordinary DML, and Oracle will by default set pctfree to zero on the table, which is a good indication that you are not supposed to modify the data once you\u2019ve created it \u2013 basic compression is really only appropriate for data that is virtually read-only.<\/p>\n<p>We have also seen that basic compression is just de-duplication of repeated values \u2013 but Oracle can get very clever about minimizing the amount of space used. In particular the stored column order can change from block to block, allowing Oracle to maximize the options for creating tokens that represent multiple adjacent columns. This deduplication mechanism means that Oracle doesn\u2019t have to decompress blocks, it keeps the blocks in the buffer cache just like any other block, but reconstructs rows (in the PGA) by following pointers to tokens \u2013 and chasing pointers is a CPU intensive process; the better your compression the more CPU intensive your queries (particularly your tablescans) are likely to be.<\/p>\n<p>There is a catalogue of all five items in this series (and a few others) at <em><strong><a href=\"https:\/\/jonathanlewis.wordpress.com\/2015\/08\/05\/compression\/\">this URL<\/a><\/strong><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this short series on compression in Oracle, we will be looking at the various types of compression used by traditional Oracle systems, this means: (i) basic table compression, (ii) OLTP table compression, and (iii) index compression. I won&#8217;t be discussing the hybrid columnar compression (HCC) that appeared with Exadata (though I may write up a mini-series on that topic&hellip;<\/p>\n","protected":false},"author":101205,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143533],"tags":[],"coauthors":[],"class_list":["post-73264","post","type-post","status-publish","format-standard","hentry","category-oracle-databases"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73264","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/101205"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=73264"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73264\/revisions"}],"predecessor-version":[{"id":91735,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73264\/revisions\/91735"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=73264"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=73264"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=73264"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=73264"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}