{"id":73209,"date":"2014-03-18T15:03:15","date_gmt":"2014-03-18T15:03:15","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/uncategorized\/parallel-execution-skew-introduction\/"},"modified":"2021-07-14T13:07:34","modified_gmt":"2021-07-14T13:07:34","slug":"parallel-execution-skew-introduction","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/parallel-execution-skew-introduction\/","title":{"rendered":"Parallel Execution Skew &#8211; Introduction"},"content":{"rendered":"<p>This is the first part of a mini-series about Parallel Execution Skew and how to address it.<\/p>\n<h3>Introduction<\/h3>\n<p>So what does &#8220;Parallel Execution Skew&#8221; mean? With Oracle Enterprise Edition you can have Oracle automatically spread the processing of a single SQL statement across several worker processes in the hope to complete the processing in less elapsed wall-clock time.<\/p>\n<blockquote>\n<p><em><strong>Note:<\/strong> The &#8220;Oracle Parallel Execution&#8221; feature of Enterprise Edition covers more than just the parallel processing of SQL statements, Oracle can actually do parallel processing of other tasks, too, like crash recovery, backup\/recovery, index builds, export etc. but here I specifically refer to the parallel processing of SQL execution<\/em><\/p>\n<\/blockquote>\n<p>The important point to understand with Parallel Execution is that in order to perform parallel processing, a given task &#8211; in this case here the execution of a SQL statement &#8211; has to be divided and distributed across multiple worker processes and it only can speed up the execution if the processing is spread (ideally) evenly across the involved processes. In case there is an <strong>uneven distribution<\/strong> of work, the overall execution time depends on the processes that have to perform the <strong>majority of work<\/strong>, and in consequence other involved processes that perform the minority of work will be <strong>idle<\/strong> during that time.<\/p>\n<p>Since Parallel Execution comes with significant <strong>overhead<\/strong> (start-up\/coordination\/communication\/implementation overhead, refer to below links for more background information), which means that with Parallel Execution the database actually has to perform <strong>more work<\/strong> than with a comparable serial execution, in case of significant Parallel Execution Skew it is not uncommon to end up with an execution that isn&#8217;t much faster than serial execution, or in worst case even slower. Or in other words, as long as the work distribution is not even, there is room for improvement in terms of execution time.<\/p>\n<p>Therefore, if you want to make the most of Parallel Execution, it is very important to check for Parallel Execution Skew as it can significantly influence the performance of Parallel Execution no matter how powerful the underlying system is and how much resources you try to throw at your SQL statement execution.<\/p>\n<blockquote>\n<p><em><strong>Note:<\/strong> I won&#8217;t go into too much details here how to determine if a SQL execution has a problem with Parallel Execution Skew &#8211; if you are interested in more background information, please refer to below mentioned resources. This series here focuses on how to address Parallel Execution Skew if an SQL execution is actually affected by it<\/em><\/p>\n<p><em>&#8211; <a title=\"Understanding Parallel Execution\" href=\"http:\/\/www.oracle.com\/technetwork\/articles\/database-performance\/geist-parallel-execution-1-1872400.html\" target=\"_blank\">OTN article mini-series &#8220;Understanding Parallel Execution&#8221;<\/a><\/em><\/p>\n<p><em>&#8211; <a title=\"Analyzing and Troubleshooting Oracle Parallel Execution\" href=\"http:\/\/www.red-gate.com\/products\/oracle-development\/deployment-suite-for-oracle\/education\/webinars\/webinar-oracle-parallel-execution-randolf-geist\" target=\"_blank\">AllThingsOracle.com webinar &#8220;Analyzing and Troubleshooting Oracle Parallel Execution&#8221;<\/a><\/em><\/p>\n<p><em>&#8211; <a title=\"Analysing Parallel Execution Skew\" href=\"http:\/\/oracle-randolf.blogspot.com\/2014\/03\/analysing-parallel-execution-skew-video.html\" target=\"_blank\">Multi-part video tutorial on my blog on detecting and analysing Parallel Execution Skew<\/a><\/em><\/p>\n<\/blockquote>\n<p>So why does it sometimes happen that Oracle can&#8217;t automatically ensure an even work distribution? The Oracle <strong>query optimizer<\/strong> automatically generates the <strong>rules<\/strong> for the work<strong> distribution<\/strong> in case of Parallel SQL Execution and therefore follows certain generic strategies. For more complex operations like aggregations, sorts and joins Oracle follows certain rules to ensure in principle an efficient and correct processing.<\/p>\n<p>For example, in order to process the join of two row sources of similar size using Parallel Execution, Oracle uses the <strong>join key(s)<\/strong> as criteria to <strong>distribute<\/strong> the work rather than distribute the data randomly. Why? Because Oracle has to ensure that data with the <strong>same join keys<\/strong> of the two row sources will be processed by the <strong>same worker<\/strong>, otherwise the join wouldn&#8217;t produce correct results &#8211; consider a random distribution where join key value &#8220;a&#8221; of row source 1 will be processed by worker no. 3, but join key value &#8220;a&#8221; of row source 2 will be processed by a different worker and therefore not finding a match in this worker&#8217;s processed slice of rowsource 1. The join would discard this data as it doesn&#8217;t find a match, clearly an incorrect result.<\/p>\n<h3>Baseline<\/h3>\n<p>Before we look at an example of Parallel Execution Skew, in order to have a baseline where we can see a good work distribution, here is our test case set-up and a simple two table join that can be run with serial and parallel execution to see how parallel execution in an ideal case can speed up the execution.<\/p>\n<p>Test case table set-up:<\/p>\n<pre>create table t_1\r\ncompress\r\nas\r\nselect  \/*+ use_nl(a b) *\/\r\n        rownum as id\r\n      , rpad('x', 100) as filler\r\nfrom\r\n        (select \/*+ cardinality(1e5) *\/ * from dual\r\nconnect by\r\n        level &lt;= 1e5) a, (select \/*+ cardinality(20) *\/ * from dual connect by level &lt;= 20) b\r\n;\r\n\r\nexec dbms_stats.gather_table_stats(null, 't_1', method_opt=&gt;'for all columns size 1')\r\n\r\ncreate table t_2\r\ncompress\r\nas\r\nselect\r\n        rownum as id\r\n      , case when rownum &lt;= 5e5 then mod(rownum, 2e6) + 1 else 1 end as fk_id_skew\r\n      , rownum as fk_id_uniform\r\n      , case when rownum &lt;= 5e5 then mod(rownum, 2e6) + 1 else null end as fk_id_sparse\r\n      , rpad('x', 100) as filler\r\nfrom\r\n        (select \/*+ cardinality(1e5) *\/ * from dual\r\nconnect by\r\n        level &lt;= 1e5) a, (select \/*+ cardinality(20) *\/ * from dual connect by level &lt;= 20) b\r\n;\r\n\r\nexec dbms_stats.gather_table_stats(null, 't_2', method_opt=&gt;'for all columns size 1')\r\n\r\nalter table t_1 parallel 4 cache;\r\n\r\nalter table t_2 parallel 4 cache;<\/pre>\n<blockquote>\n<p><em>Note: If you want to reproduce the example and you&#8217;re using Oracle 12c it is important to stick to the same <strong>METHOD_OPT<\/strong> parameter used above &#8211; we&#8217;ll later see that Oracle 12c introduces a new feature that can detect and deal with Parallel Execution Skew automatically (at present in a limited number of scenarios) &#8211; and this feature requires a histogram to be triggered. Using <strong>FOR ALL COLUMNS SIZE 1<\/strong> prevents the histogram creation and therefore allows reproducing the skew problem in 12c, too.<\/em><\/p>\n<\/blockquote>\n<p>The table structure is straightforward for those two tables, the only noteworthy being the different variations of the T_2 FK column &#8211; one having a <strong>uniform value<\/strong> distribution (FK_ID_UNIFORM, actually representing a 1:1 relationship), one having <strong>one very popular value<\/strong>\u00a0(FK_ID_SKEW), \u00a0and the third one being another variation of &#8220;popular&#8221; having a lot of NULLs (FK_ID_SPARSE).<\/p>\n<blockquote>\n<p><em><strong>Note:<\/strong> Setting the tables to <strong>CACHE<\/strong> enables from 11.2 on the new <strong>in-memory Parallel Execution<\/strong> feature. Prior to this Parallel Execution Servers always <strong>bypassed<\/strong> the <strong>buffer cache<\/strong> and used <strong>direct path reads<\/strong> &#8211; a strategy optimized for reading large segment via full segment scans since direct path reads don&#8217;t have the overhead of maintaining the buffer cache as part of the physical read operation. With the in-memory Parallel Execution feature Parallel Execution Servers can make (in RAC even optimized) use of the buffer cache. For more information see <a href=\"http:\/\/blog.tanelpoder.com\/2013\/11\/27\/when-do-oracle-parallel-execution-slaves-issue-buffered-physical-reads-part-1\/\" target=\"_blank\">this post<\/a> by Tanel Poder for example<\/em><\/p>\n<\/blockquote>\n<p>And here is the simple query that can be run once serially, then using Parallel Execution:<\/p>\n<p><strong>Serial execution:<\/strong><\/p>\n<pre>select count(t_2_filler) from (\r\nselect  \/*+ monitor\r\n            no_parallel\r\n            leading(t_1 t_2)\r\n            use_hash(t_2)\r\n            no_swap_join_inputs(t_2)\r\n        *\/\r\n        t_1.id as t_1_id\r\n      , t_1.filler as t_1_filler\r\n      , t_2.id as t_2_id\r\n      , t_2.filler as t_2_filler\r\nfrom    t_1\r\n      , t_2\r\nwhere\r\n        t_2.fk_id_uniform = t_1.id\r\nand     regexp_replace(t_2.filler, '^\\s+([[:alnum:]]+)\\s+$', lpad('\\1', 10), 1, 1, 'c') &gt;= regexp_replace(t_1.filler, '^\\s+([[:alnum:]]+)\\s+$', lpad('\\1', 10), 1, 1, 'c')\r\n);<\/pre>\n<blockquote>\n<p><em><strong>Note:<\/strong> The query makes use of the new statement level hint syntax of the <strong>NO_PARALLEL<\/strong> hint &#8211; this is officially supported only from 11.2 on. In previous releases it requires the object level hint per object, like <strong>NO_PARALLEL(T_1) NO_PARALLEL(T_2)<\/strong>. Instead you could also simply disable parallel query in the session: <strong>ALTER SESSION DISABLE PARALLEL QUERY<\/strong> to achieve the same on session level.<\/em><\/p>\n<p><em>The remaining hints used (except the MONITOR hint to request Real-Time SQL Monitoring of the execution) dictate a specific join order (<strong>LEADING<\/strong>), join method (<strong>USE_HASH<\/strong>) and hash join operation order (<strong>NO_SWAP_JOIN_INPUTS<\/strong>)<\/em><\/p>\n<\/blockquote>\n<p><strong>Parallel Execution:<\/strong><\/p>\n<pre>select count(t_2_filler) from (\r\nselect  \/*+ monitor\r\n            leading(t_1 t_2)\r\n            use_hash(t_2)\r\n            no_swap_join_inputs(t_2)\r\n            pq_distribute(t_2 hash hash)\r\n        *\/\r\n        t_1.id as t_1_id\r\n      , t_1.filler as t_1_filler\r\n      , t_2.id as t_2_id\r\n      , t_2.filler as t_2_filler\r\nfrom    t_1\r\n      , t_2\r\nwhere\r\n        t_2.fk_id_uniform = t_1.id\r\nand     regexp_replace(t_2.filler, '^\\s+([[:alnum:]]+)\\s+$', lpad('\\1', 10), 1, 1, 'c') &gt;= regexp_replace(t_1.filler, '^\\s+([[:alnum:]]+)\\s+$', lpad('\\1', 10), 1, 1, 'c')\r\n);<\/pre>\n<blockquote>\n<p><em><strong>Note:<\/strong> There is an additional hint that is specific to Parallel Execution: <strong>PQ_DISTRIBUTE<\/strong>. For joins this hint tells the optimizer what kind of Parallel Distribution method to use for the join between T_1 and T_2 &#8211; in this case a hash distribution based on the join key. For more details regarding distribution methods and the PQ_DISTRIBUTE hint, please refer to my OTN Mini-Series &#8220;Understanding Parallel Execution&#8221; (see link above)<\/em><\/p>\n<\/blockquote>\n<p>Note that I used Oracle 12c for all my tests.<\/p>\n<p>On my test system the <strong>serial execution<\/strong> takes approx. <strong>57 seconds<\/strong>, whereas at a <strong>DOP (Degree Of Parallelism) of 4<\/strong> the <strong>Parallel Execution<\/strong> takes approx. <strong>15 seconds<\/strong> (I have 12 cores \/ 24 CPUs in my test system so plenty of CPU resources available). So the Parallel Execution in this particular case really scales very well, thanks to the simplicity of the query and the fact that it is completely CPU bound and not constrained by any I\/O. In real-life it is more common to see Parallel Execution not to scale that well, for several reasons, two of them being the overhead of Parallel Execution for more complex execution plans and I\/O constraints.<\/p>\n<p>Obviously the processing has been split very evenly among the Parallel Execution Servers, otherwise the Parallel Execution couldn&#8217;t be <strong>four times faster<\/strong> at a <strong>DOP of 4<\/strong>.<\/p>\n<p>We can see that confirmed for example by looking at the <strong>&#8220;Activity&#8221; tab<\/strong> of a <strong>Real-Time SQL Monitoring<\/strong> report (only available if you happen to have an additional <strong>Diagnostics + Tuning Pack license<\/strong>):<\/p>\n<p><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/oracle\/2014\/03\/sql_mon_Parallel_Execution_Skew_example2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-4813\" alt=\"sql_mon_Parallel_Execution_Skew_example2\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/oracle\/2014\/03\/sql_mon_Parallel_Execution_Skew_example2-1024x793.png\" width=\"585\" height=\"453\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/oracle\/2014\/03\/sql_mon_Parallel_Execution_Skew_example2-1024x793.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/oracle\/2014\/03\/sql_mon_Parallel_Execution_Skew_example2-300x232.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/oracle\/2014\/03\/sql_mon_Parallel_Execution_Skew_example2.png 1280w\" sizes=\"auto, (max-width: 585px) 100vw, 585px\" \/><\/a><\/p>\n<p>The &#8220;Activity&#8221; tab tells us that apart from a short duration at the beginning (the start-up overhead of Parallel Execution) four sessions were active all the time, so obviously the work was very evenly distributed among the Parallel Execution Servers.<\/p>\n<p>Another interesting information from the Real-Time SQL Monitoring report is the <strong>&#8220;Database Time&#8221;<\/strong> in the <strong>&#8220;Time &amp; Wait Statistics&#8221;<\/strong> section of the <strong>overview<\/strong>: It shows us a <strong>duration<\/strong> of <strong>15 seconds<\/strong>, but a <strong>&#8220;Database Time&#8221;<\/strong> of <strong>58 seconds<\/strong> &#8211; which is very similar to the duration \/ database time of the serial execution. So in total the database performed a similar amount of work, but it completed faster because the processing was spread among four worker processes.<\/p>\n<blockquote>\n<p><em><strong>Note:<\/strong> This is a <strong>simplified<\/strong> example that processes a <strong>very small amount of data<\/strong> and in principle is<strong> CPU bound<\/strong> by evaluating a silly regular expression. Therefore it <strong>minimizes<\/strong> the Parallel Execution <strong>overhead<\/strong> and scales pretty well provided you have sufficient CPU resources available. Typically with <strong>real-life examples<\/strong> that process <strong>larger amounts of data<\/strong> and use <strong>more complex execution plans<\/strong> you won&#8217;t achieve the almost perfect scaling as you see with this simple example. If you want to see the effect of the overhead more clearly\u00a0<a title=\"Parallel Execution Overhead\" href=\"http:\/\/oracle-randolf.blogspot.com\/2014\/03\/parallel-execution-overhead.html\" target=\"_blank\">see this post<\/a> on my blog where I demonstrate a three table join based on the same test data setup.<\/em><\/p>\n<\/blockquote>\n<p>Now that we&#8217;ve seen Parallel Execution scaling well we&#8217;ll have a look in the next part of the series\u00a0at a quite similar case where the work distribution limits the speedup achieved.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the first part of a mini-series about Parallel Execution Skew and how to address it. Introduction So what does &#8220;Parallel Execution Skew&#8221; mean? With Oracle Enterprise Edition you can have Oracle automatically spread the processing of a single SQL statement across several worker processes in the hope to complete the processing in less elapsed wall-clock time. Note: T&hellip;<\/p>\n","protected":false},"author":316197,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143533],"tags":[48468],"coauthors":[],"class_list":["post-73209","post","type-post","status-publish","format-standard","hentry","category-oracle-databases","tag-parallel-execution"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73209","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/316197"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=73209"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73209\/revisions"}],"predecessor-version":[{"id":91694,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73209\/revisions\/91694"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=73209"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=73209"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=73209"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=73209"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}