{"id":89181,"date":"2020-11-30T16:24:02","date_gmt":"2020-11-30T16:24:02","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=89181"},"modified":"2021-07-14T13:06:50","modified_gmt":"2021-07-14T13:06:50","slug":"are-your-oracle-archived-log-files-much-too-small","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/are-your-oracle-archived-log-files-much-too-small\/","title":{"rendered":"Are your Oracle archived log files much too small?"},"content":{"rendered":"<p>Have you ever noticed that your archived redo logs can be much smaller than the online redo logs and may display a wild variation in size? This difference is most likely to happen in systems with a very large SGA (system global area), and the underlying mechanisms may result in Oracle writing data blocks to disc far more aggressively than is necessary. This article is a high-level description of how Oracle uses its online redo logs and why it can end up generating archived redo logs that are much smaller than you might expect.<\/p>\n<h2>Background<\/h2>\n<p>I\u2019ll start with some details about default values for various instance parameters, but before I say any more, I need to present a warning. Throughout this article, I really ought to keep repeating <strong><em>\u201cseems to be\u201d<\/em><\/strong> when making statements about what Oracle is (seems to be) doing. Because I don\u2019t have access to any specification documents or source code, all I have is a system to experiment on and observe and the same access to the documentation and MOS as everyone else. However, it would be a very tedious document if I had to keep reminding you that I\u2019m just guessing (and checking). I will say this only once: remember that any comments I make in this note are <strong><em>probably close<\/em><\/strong> to correct but <strong><em>aren\u2019t guaranteed<\/em><\/strong> to be absolutely correct. With that caveat in mind, start with the following (for 19.3.0.0):<\/p>\n<p><strong><em>Processes<\/em><\/strong> defaults to <strong><em>cpu_count<\/em><\/strong> * 80 + 40<\/p>\n<p><strong><em>Sessions<\/em><\/strong> defaults to <strong><em>processes<\/em><\/strong> * 1.5 + 24 (the manuals say + 22, but that seems to be wrong)<\/p>\n<p><strong><em>Transactions<\/em><\/strong>: defaults to <strong><em>sessions<\/em><\/strong> * 1.1<\/p>\n<p>The parameter I\u2019ll be using is <code>transactions<\/code>. Since 10g, Oracle has allowed multiple <em>\u201cpublic\u201d<\/em> and <em>\u201cprivate\u201d<\/em> redo log buffers, generally referred to as <em>strands<\/em>; the number of private strands is <code>trunc(transactions\/10)<\/code>. and the number of public strands is <code>trunc(cpu_count\/16)<\/code> with a minimum of 2. In both cases, these are maximum values, and Oracle won\u2019t necessarily use all the strands (and may not even allocate memory for all the private strands) unless there is contention due to the volume of concurrent transactions.<\/p>\n<p>For 64-bit systems, the private strands are approximately 128KB and, corresponding to the private redo strands, there is a set of matching <strong><em>in-memory undo<\/em><\/strong> structures which are also approximately 128KB each. There are <code>x$<\/code> objects you can query to check the maximum number of private strands but it&#8217;s easier to check <code>v$latch_children<\/code> as there&#8217;s a latch protecting each in-memory undo structure and each redo strand:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select  name, count(*) \r\nfrom    v$latch_children\r\nwhere   name in (\r\n                'In memory undo latch',\r\n                'redo allocation'\r\n        )\r\ngroup by name\r\n\/<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"609\" height=\"118\" class=\"wp-image-89182\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/11\/word-image-97.png\" \/><\/p>\n<p>The difference between the number of redo allocation latches and in-memory undo latches tells you that I have three public redo strands on this system as well as the 168 (potential) private strands.<\/p>\n<p>The next important piece of the puzzle is the general strategy for memory allocation. To allow Oracle to move chunks of memory between the shared pool and buffer cache and other large memory areas, the memory you allocate for the SGA target is handled in \u201cgranules\u201d so, for example:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select  component, current_size, granule_size \r\nfrom    v$sga_dynamic_componentsa\r\nwhere   current_size != 0\r\n\/<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"638\" height=\"126\" class=\"wp-image-89183\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/11\/word-image-98.png\" \/><\/p>\n<p>This is a basic setup with a 10GB SGA target, with everything left to default. As you can see from the final column, Oracle has decided to handle SGA memory in granules of 32MB. The granule size is dependent on the SGA target size according to the following table:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"368\" height=\"321\" class=\"wp-image-89184\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/11\/word-image-99.png\" \/><\/p>\n<p>The last piece of information you need is the memory required for the \u201cfixed SGA\u201d which is reported as the \u201cFixed size\u201d as the instance starts up or in response to the command <code>show sga<\/code> (or <code>select * from v$sga<\/code>), e.g.:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SQL&gt; show sga<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"450\" height=\"115\" class=\"wp-image-89185\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/11\/word-image-100.png\" \/><\/p>\n<p>Once you know the granule size, the fixed SGA size, and the number of CPUs (technically the value of the parameter <code>cpu_count<\/code> rather than the actual number of CPUs) you can work out how Oracle calculates the size of the individual public redo strands \u2013 plus or minus a few KB.<\/p>\n<p>By default, Oracle allocates a single granule to hold both the fixed SGA and the public redo strands, and the number of public redo strands is <code>trunc(cpu_count\/16)<\/code>. With my <code>cpu_count<\/code> of 48, a fixed SGA size of 12,686,456 bytes (as above), and a granule size of 33,554,432 bytes (32MB), I should expect to see three public redo strands sized at roughly:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">(33,554,432 \u2013 12,686,456) \/ trunc((48 \/ 16)) = 20,867,976 \/ 3 = 6,955,992<\/pre>\n<p>In fact, you\u2019ll notice if you start chasing the arithmetic, that the numbers won\u2019t quite match my predictions. Some of the variations are due to the granule map, and other headers, pointers and other things I don\u2019t know about in the granule that holds the fixed SGA and public redo buffers. In this case, the actual reported size of the public strands was 6,709,248 bytes \u2013 so the calculation is in the right ballpark but not extremely accurate.<\/p>\n<h2>Log buffer, meet log file<\/h2>\n<p>It is possible to set the <code>log_buffer<\/code> parameter in the <em>startup<\/em> file to dictate the space to be allocated to the public strands. It\u2019s going to be convenient for me to do so to demonstrate why the archived redo logs can vary so much in size for \u201cno apparent reason\u201d. Therefore, I\u2019m going to restart this instance with the setting <code>log_buffer = 60M<\/code>. This will (should) give me 3 public strands of 20MB. But 60MB is more than one granule, and if you add the 12MB fixed size that\u2019s 72MB+. This result is more than two granules, so Oracle will have to allocate three granules (96MB) to hold everything. This is what I see as the startup SGA:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"443\" height=\"105\" class=\"wp-image-89186\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/11\/word-image-101.png\" \/><\/p>\n<p>Checking the arithmetic 87973888 + 12685456 = 100659344 = 95.996 MB i.e. three granules.<\/p>\n<p>The reason why I wanted a nice round 20MB for the three strands is that I created my online redo log files at 75MB (which is equivalent to 3 strands worth plus 15MB), and then I decided to drop them to 65MB (three strands plus 5MB).<\/p>\n<p>My first test consisted of nothing but switching log files then getting one process to execute a pl\/sql loop that did several thousand single-row updates and commits. After running the test, this is what the last few entries of my log archive history looked like:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select  sequence#, blocks, blocks * block_size \/ 1048576 \r\nfrom    v$archived_log;<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"514\" height=\"151\" class=\"wp-image-89187\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/11\/word-image-102.png\" \/><\/p>\n<p>My test generated roughly 72MB of redo, but Oracle switched log files (twice) after a little less than 35MB. That\u2019s quite interesting, because 34.36 MB is roughly 20MB (one public strand) + 15MB and that \u201ccoincidence\u201d prompted me to reduce my online logs to 65MB and re-run the test. This is what the next set of archived logs looked like:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"515\" height=\"150\" class=\"wp-image-89188\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/11\/word-image-103.png\" \/><\/p>\n<p>This time I have archived logs which are about 25MB \u2013 which is equivalent to one strand plus 5MB &#8211; and that\u2019s a very big hint. Oracle is behaving as if each public strand allocates space for itself in the online redo log file in chunks equal to the size of the strand. Then, when the strand has filled that chunk, Oracle tries to allocate the next chunk \u2013 if the space is available. If there\u2019s not enough space left for a whole strand, Oracle allocates the rest of the file. If there\u2019s no space left at all, Oracle triggers a log file switch, and the strand allocates its space in the next online redo log file. The sequence of events in my case was that each strand allocated 20MB, but only one of my strands was busy. When it had filled its first allocation, it allocated the last 5 (or 15) MB in the file. When that was full, I saw a log file switch.<\/p>\n<p>To test this hypothesis, I could predict that if I ran two copies of my update loop (addressing 2 different tables to avoid contention), I would see the archived logs coming out at something closer to 45MB (i.e. two full allocations plus 5MB). Here\u2019s what I got when I tried the experiment:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"506\" height=\"143\" class=\"wp-image-89189\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/11\/word-image-104.png\" \/><\/p>\n<p>If you\u2019re keeping an eye on the <strong><em>sequence#<\/em><\/strong> column, you may wonder why there\u2019s been a rather large jump from the previous test. It\u2019s because my hypothesis was right, but my initial test wasn\u2019t quite good enough to demonstrate it. With all my single-row updates and commits and the machine I was using, Oracle needed to take advantage of the 2<sup>nd<\/sup> public redo strand some of the time but not for a \u201cfair share\u201d of the updates. Therefore, the log file switch was biased towards the first public redo strand reaching its limits, and this produced archived redo logs of a fairly regular 35MB (which would be 25MB for public strand #1 and 10MB for public strand #2). It took me a few tests to realize what had happened. The results above come from a modified test where I changed the loop to run a small number of times updating 1,000 rows each time. I checked that I still saw switches at 25MB with a single run \u2013 though, as you may have noticed, the array update generated a lot less redo than the single-row updates to update the same number of rows.<\/p>\n<p>Finally, since I had three redo strands, what\u2019s going to happen if I run three copies of the loop (addressing three different tables)?<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"141\" class=\"wp-image-89190\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/11\/word-image-105.png\" \/><\/p>\n<p>With sufficient pressure on the redo log buffers (and probably with just a little bit of luck in the timing), I managed to get redo log files that were just about full when the switch took place.<\/p>\n<h2>Consequences<\/h2>\n<p>Reminder \u2013 my description here is based on tests and inference; it\u2019s not guaranteed to be totally correct.<\/p>\n<p>When you have multiple public redo strands, each strand allocates space in the current online redo log file equivalent to the size of the strand, and the memory for the strand is pre-formatted to include the redo log block header and block number. When Oracle has worked its way through the whole strand, it tries to allocate space in the log file for the next cycle through the strand. If there isn\u2019t enough space for the whole strand, it allocates all the available space. If there\u2019s no space available at all, it initiates a log file switch and allocates the space from the next online redo log file.<\/p>\n<p>As a consequence, if you have N public redo strands and aren\u2019t keeping the system busy enough to be using all of them constantly, then you could find that you have N-1 strand allocations that are virtually empty when the log file switch takes place. If you are not aware of this, you may see log file switches taking place much more frequently than you expect, producing archived log files that are much smaller than you expect. (On the damage limitation front, Oracle doesn\u2019t copy the empty parts of an online redo log file when it archives it.)<\/p>\n<p>In my case, because I had created redo log files of 65MB then set the <code>log_buffer<\/code> parameter to 60MB when the <code>cpu_count<\/code> demanded three public redo strands, I was doomed to see log file switches every 25MB. I wasn\u2019t working the system hard enough to keep all three strands busy. If I had left the log buffer to default (which was a little under 7MB), then in the worst-case scenario I would have left roughly 14MB of online redo log unused and would have been producing archived redo logs of 51MB each.<\/p>\n<p>Think about what this might mean on a very large system. Say you have 16 Cores which are pretending to be 8 CPUs each (fooling Oracle into thinking you have 128 CPUs), and you\u2019ve set the SGA target to be 100GB. Because the system is a little busy, you\u2019ve set the redo log file size to be a generous-sounding 256MB. Oracle will allocate 8 public redo strands (<code>cpu_count\/16<\/code>), and \u2013 checking the table above \u2013 a <strong><em>granule size<\/em><\/strong> of 256MB. Can you spot the threat when the granule size matches the file size, and you\u2019ve got a relatively large number of public redo strands?<\/p>\n<p>Assuming the fixed SGA size is still about 12M (I don\u2019t have a machine I can test on), you\u2019ve got<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">strand size = 244MB \/ 8 = 30.5 MB<\/pre>\n<p>When you switch into a new online redo log file, you pre-allocate 8 chunks of 30.5MB for a total of 244M, with only 12MB left for the first strand to allocate once it\u2019s filled its initial allocation. The worst-case scenario Is that you could get a log file switch after only 30.5 MB + 12 MB = 42.5 MB. You could be switching log files about 6 times as often as expected when you set up the 256MB log file size.<\/p>\n<h2>Conclusion<\/h2>\n<p>There may be various restrictions or limits that make some difference to the arithmetic I\u2019ve discussed. For example, I have seen a system using 512MB granules that seemed to set the public strand size to 32 MB when I was expecting the default to get closer to 64MB. There are a number of details I\u2019ve avoided commenting on in this article, but the indications from the tests that I can do suggest that if you\u2019re seeing archived redo log files that are small compared to the online redo log files, then the space you\u2019re losing is because of space reserved for inactive or \u201clow-use\u201d strands. If this is the case, there\u2019s not much (legal) that you can do other than increase the online redo log files so that the \u201clost\u201d space (which won\u2019t change) becomes a smaller <strong><em>fraction<\/em><\/strong> of the total size of the online redo log. As a guideline, setting the online redo log file size to <strong><em>at least twice the granule size<\/em><\/strong> seems like a good idea, and four times the size may be even better.<\/p>\n<p>There are other options, of course, that involve tweaking the parameter file. Most of these changes would be for hidden parameters, but you could supply a suitable setting for the <code>log_buffer<\/code> parameter that makes the individual public strands much smaller \u2013 provided this didn\u2019t introduce excessive waits for log buffer space. Remember that the value you set would be shared between the public redo threads, and don\u2019t forget that the number of public redo threads might change if you change the number of CPUs in the system.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Oracle sometimes generates smaller archived redo logs than you might expect based on the size of the online redo logs. In this article, Jonathan Lewis explains how Oracle uses online redo logs and why there might be a discrepancy in the archived log size.&hellip;<\/p>\n","protected":false},"author":101205,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143533],"tags":[124952],"coauthors":[39048],"class_list":["post-89181","post","type-post","status-publish","format-standard","hentry","category-featured","category-oracle-databases","tag-redgate-deploy"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/89181","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/101205"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=89181"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/89181\/revisions"}],"predecessor-version":[{"id":89194,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/89181\/revisions\/89194"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=89181"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=89181"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=89181"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=89181"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}