{"id":94088,"date":"2022-04-27T20:59:37","date_gmt":"2022-04-27T20:59:37","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=94088"},"modified":"2022-04-27T20:59:37","modified_gmt":"2022-04-27T20:59:37","slug":"oracle-optimizer-removing-or-coalescing-subqueries","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/oracle-optimizer-removing-or-coalescing-subqueries\/","title":{"rendered":"Oracle optimizer removing or coalescing subqueries"},"content":{"rendered":"<p><strong>Jonathan Lewis' continuing series on the Oracle optimizer and how it transforms queries into execution plans:<\/strong><\/p>\n<ol>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/transformations-by-the-oracle-optimizer\/\">Transformations by the Oracle Optimizer<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/the-effects-of-null-with-not-in-on-oracle-transformations\/\">The effects of NULL with NOT IN on Oracle transformations<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/oracle-subquery-caching-and-subquery-pushing\/\">Oracle subquery caching and subquery pushing<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/oracle-optimizer-removing-or-coalescing-subqueries\/\">Oracle optimizer removing or coalescing subqueries<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/oracle-optimizer-or-expansion-transformations\/\">Oracle optimizer Or Expansion Transformations<\/a><\/li>\n<\/ol>\n\n<p>So far, this series has examined the shape of some of the execution plans that Oracle\u2019s optimizer can produce when there are subqueries in the <code>WHERE<\/code> clause and noted that the optimizer will often <em>unnest <\/em>a subquery to produce a join instead of using a <em>filter<\/em>\u00a0operation that repeatedly runs the subquery. You\u2019ve also seen that it\u2019s possible to dictate the optimizer\u2019s choice and have some control over the point in the execution plan where the filter operation takes place.<\/p>\n<p>This installment moves on to a couple of the more sophisticated transformations that the optimizer can apply to reduce the number of subqueries that end up in the execution plan.<\/p>\n<h2>Removing aggregate subqueries<\/h2>\n<p>If you browse any of the public Oracle forums you\u2019ve probably seen suggestions that certain patterns of queries would be more efficient if they were rewritten to use analytic functions rather than using a strategy involving self-referencing subqueries. It\u2019s not commonly known that you don\u2019t necessarily have to rewrite such queries; you may simply be able to tell the optimizer to do an internal rewrite for you.<\/p>\n<p>To demonstrate this, I\u2019ll use the <code>emp<\/code> and <code>dept<\/code> tables from the <code>scott<\/code> schema <em>($ORACLE_HOME\/rdbms\/admin\/utlsampl.sql<\/em>) as I did in the previous installment, running the demonstration from SQL*Plus on Oracle 19c (19.11.0.0). I\u2019m going to both extend and simplify the query I wrote in that installment for \u2018<em>employess with a salary greater than the departmental average<\/em>.\u2019 The extension is that I\u2019m going to include the department name in the output; the simplification is that I\u2019m going to remove (temporarily) the reference to <code>nvl(comm,0)<\/code>:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select\r\n        \/*+\r\n                qb_name(main)\r\n        *\/\r\n        e1.*, d.dname\r\nfrom    emp e1, dept d\r\nwhere   d.deptno = e1.deptno\r\nand     e1.sal &gt; (\r\n                select  \/*+ qb_name(subq) *\/\r\n                        avg(e2.sal)\r\n                from    emp e2\r\n                where   e2.deptno = d.deptno\r\n        )\r\norder by\r\n        e1.deptno, e1.empno\r\n\/<\/pre>\n<p>You might not expect the addition of the department name or the elimination of the <code>nvl()<\/code> expression to make a significant difference to the optimizer, but the resulting plan is dramatically different. Thanks to referential integrity, the <code>dept<\/code> table behaves as a sort of \u2018focal point,\u2019 allowing the optimizer to connect the two appearances of the <code>emp<\/code> table and use a mechanism to \u2018<em>remove aggregate subquery.\u2019 <\/em>Here\u2019s the execution plan for the modified query (generated by <code>autotrace<\/code>):<\/p>\n<pre class=\"theme:powershell-output font-size:12 lang:ps decode:true \">----------------------------------------------------------------------------------\r\n| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |\r\n----------------------------------------------------------------------------------\r\n|   0 | SELECT STATEMENT      |          |    14 |  1694 |     6  (34)| 00:00:01 |\r\n|   1 |  SORT ORDER BY        |          |    14 |  1694 |     6  (34)| 00:00:01 |\r\n|*  2 |   VIEW                | VW_WIF_1 |    14 |  1694 |     5  (20)| 00:00:01 |\r\n|   3 |    WINDOW SORT        |          |    14 |   756 |     5  (20)| 00:00:01 |\r\n|*  4 |     HASH JOIN         |          |    14 |   756 |     4   (0)| 00:00:01 |\r\n|   5 |      TABLE ACCESS FULL| DEPT     |     4 |    52 |     2   (0)| 00:00:01 |\r\n|   6 |      TABLE ACCESS FULL| EMP      |    14 |   574 |     2   (0)| 00:00:01 |\r\n----------------------------------------------------------------------------------\r\nPredicate Information (identified by operation id):\r\n---------------------------------------------------\r\n   2 - filter(\"VW_COL_11\" IS NOT NULL)\r\n   4 - access(\"D\".\"DEPTNO\"=\"E1\".\"DEPTNO\")<\/pre>\n<p>The optimizer has eliminated the aggregate subquery, reducing the query from three tables and two blocks to a simple two-table join with an analytic aggregate \u2013 at least that\u2019s what the <strong><em>window sort<\/em><\/strong> operation and the generated view name <code>vw_wif_1<\/code> suggest.<\/p>\n<p>It\u2019s a reasonable guess that the internal rewrite uses the <code>avg() over<\/code> approach, but this is a case where you probably ought to check the 10053 (CBO) trace file if you want to be confident about the details. Here\u2019s the <strong><em>\u2018unparsed\u2019<\/em><\/strong> query (extracted from the trace file but with a <strong>lot<\/strong> of cosmetic editing) that the optimizer finally produced for costing this plan:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select  \/*+ qb_name (main) *\/ \r\n   vw_wif_1.item_1 empno, vw_wif_1.item_2 ename, vw_wif_1.item_3 job,\r\n   vw_wif_1.item_4 mgr, vw_wif_1.item_5 hiredate, vw_wif_1.item_6 sal,\r\n   vw_wif_1.item_7 comm, vw_wif_1.item_8 deptno, vw_wif_1.item_9 dname \r\nfrom    (\r\n   select \r\n        e1.empno item_1, e1.ename item_2, e1.job item_3, \r\n        e1.mgr item_4, e1.hiredate item_5, e1.sal item_6,\r\n        e1.comm item_7, e1.deptno item_8, d.dname item_9,\r\n        case\r\n             when e1.sal &gt; avg(e1.sal) over (partition by e1.deptno) \r\n             then e1.rowid \r\n             end  vw_col_10 \r\n   from \r\n        test_user.dept d,\r\n        test_user.emp e1 \r\n   where\r\n        d.deptno = e1.deptno\r\n        ) vw_wif_1 \r\nwhere \r\n        vw_wif_1.vw_col_10 is not null \r\norder by \r\n        vw_wif_1.item_8,\r\n        vw_wif_1.item_1\r\n\/<\/pre>\n<p>It\u2019s interesting to note that the optimizer has introduced a <code>CASE<\/code> expression in the inline view (<code>vw_wif_1<\/code>) to generate a column that can be tested for nullity in the main query block. The human version for this strategy would probably have been to generate the <em>\u2018average over department\u2019<\/em> as a column in the inline view that could be compared with <code>sal<\/code> in the outer query, e.g.:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select\r\n        e2.* \r\nfrom\r\n        (\r\n        select \r\n                e1.*, d.dname,\r\n                avg(e1.sal) over(partition by e1.deptno) avg_sal\r\n        from    emp e1, dept d\r\n        where   d.deptno = e1.deptno\r\n        )       e2\r\nwhere\r\n        e2.sal &gt; e2.avg_sal\r\norder by\r\n        e2.deptno,\r\n        e2.empno\r\n\/<\/pre>\n<p>I said I\u2019d simplified the query by removing the reference to <code>nvl(comm,0)<\/code>; I had to do this initially because the transformation would otherwise not be used. Eventually, I did get the transformation to appear with the expression <code>sal + nvl(comm,0)<\/code><strong><em>,<\/em><\/strong> but I had to add a virtual column to the table matching the expression and rewrite the query using the virtual column:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">alter table emp add nvl_earnings -- invisible\r\n        generated always as (sal + nvl(comm,0))\r\n\/<\/pre>\n<p>This anomaly is probably one introduced in Oracle 19.9 by a fix for bug 30593046 which produced wrong results in some cases involving <em>\u2018scalar subquery unnesting\u2019<\/em> in 12.2.0.1. Unfortunately, the fix blocked too many cases where unnesting was safe, so a replacement bug fix (33325981) appeared in 19.13 to restore some of the blocked cases. However, my example (where unnesting is a necessary precursor to subquery removal) still isn\u2019t allowed past the block. When I ran the test case on 12.2.0.1, the optimizer did unnest the subquery with the original expression but didn\u2019t carry on to remove the subquery.<\/p>\n<p>In previous installments, I\u2019ve discussed using hints to control the optimizer\u2019s strategy when you think you know more about the data than the optimizer does. In this case, the transformation relies on sorting (i.e., the <strong><em>Window Sort<\/em><\/strong> at operation 3), so there are likely to be occasions when a bad choice of plan does a lot more work than the optimizer\u2019s arithmetic suggests, and you might want to block the transformation. Conversely there may be cases where the transformation doesn\u2019t appear when it would be a really good idea. Unfortunately, even though the transformation has been available since 10gR2, there is no custom hint to force it or block it. The best you can do if you want to stop the transformation from taking place is to disable the feature using an <code>alter session<\/code> command or adding the <code>opt_param()<\/code> hint to the query: <code>opt_param(\u2018_remove_aggregate_subquery\u2019,\u2019false\u2019)<\/code>. However, if you think the optimizer isn\u2019t using the transformation when it should be, there\u2019s not a lot you can do about it. It\u2019s possible that you may find cases where a hint to <strong><em>unnest<\/em><\/strong> the critical subquery will result in the optimizer deciding to go one step further and remove the subquery \u2013 but that\u2019s just a conjecture, I haven\u2019t yet produced an example to demonstrate that this is actually possible.<\/p>\n<h2>Coalescing subqueries<\/h2>\n<p>Consider the following query which will be repeated in its full context a little later:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select  \/*+ qb_name(main) *\/\r\n        *\r\nfrom    test_t1 t1\r\nwhere \r\n        t1.is_deleted='N' \r\nand     (\r\n            t1.id2 in (select \/*+ qb_name(id2) *\/ t2.id2 \r\n                       from test_t2 t2 \r\n                       where t2.is_deleted='N' and t2.id=:p_id\r\n                       )\r\n         or t1.id3 in (select \/*+ qb_name(id3) *\/ t2.id2 \r\n                       from test_t2 t2 \r\n                       where t2.is_deleted='N' and t2.id=:p_id\r\n                       )\r\n        );<\/pre>\n<p>When you examine the two subqueries, you\u2019ll notice that they are identical, and both are driven by the same bind variable <code>:p_id.<\/code> The same subquery appears twice because the result is compared with two different columns from table <code>test_t1<\/code><strong><em>,<\/em><\/strong> so it\u2019s tempting to ask the question: <em>\u201ccan we get each row just once then compare it twice?\u201d<\/em><\/p>\n<p>Since the optimizer often converts IN subqueries into existence subqueries you could consider transforming both these subqueries into existence subqueries then merging them into a single subquery that pushes the <code>OR<\/code> condition inside the subquery, doing something like:<\/p>\n<p>Step 1:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">and     (\r\n    exists (select \/*+ qb_name(id2) *\/ null \r\n           from test_t2 t2 \r\n           where t2.is_deleted='N' and t2.id=:p_id \r\n           and t2.id2 = t1.id2\r\n          )\r\n    or exists (select \/*+ qb_name(id3) *\/ null \r\n               from test_t2 t2 \r\n               where t2.is_deleted='N' and t2.id=:p_id \r\n                   and t2.id2 = t1.id3\r\n                   )\r\n        );<\/pre>\n<p>Step 2:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk \">and     (\r\n            exists (select \/*+ qb_name(SEL$????????) *\/ null \r\n                    from test_t2 t2 \r\n                    where t2.is_deleted='N' and t2.id=:p_id \r\n                    and (t2.id2 = t1.id2 or t2.id2 = t1.id3)\r\n                   )\r\n        );<\/pre>\n<p>The mechanism displayed in step 2 is known as <em>\u2018subquery coalescing<\/em>,\u2019 and it has been available to the optimizer and enabled since 11.2.0.1, and comes complete with the pair of hints <code>\/*+ [no_]coalesce_sq() *\/.<\/code><\/p>\n<p>In this example, it\u2019s, fairly obvious that if you had an index on nothing but <code>t2(is_deleted, id) <\/code>and if (despite the clue in the second column name) there were lots of rows matching the predicates <code>t2.is_deleted=\u2019N\u2019<\/code><em> and <\/em><code>t2.id = :p_id<\/code> then it would make sense to combine the two subqueries so that you would only have to visit those rows once each, and could stop at the first occurrence of either the <code>t1.id2<\/code> value or the <code>t1.id3<\/code> value. On the other hand, if you had an index like <code>t2(id, id2)<\/code> there may be no benefit gained from the transformation, so it\u2019s nice that the mechanism can be blocked with a simple hint.<\/p>\n<p>In fact, someone raised a question about this specific example in one of the public Oracle forums because the transformation had had a most undesirable side effect and produced a bad plan that took far longer to run than the best possible plan. Here\u2019s some code (supplied on the forum, but with a few enhancements) to produce a model of what the data looked like:<\/p>\n<pre class=\"theme:powershell-output font-size:12 lang:ps decode:true\">rem\r\nrem     Script:         coalesce_sq_2.sql\r\nrem     Author:         Forum Member, edited: Jonathan Lewis\r\nrem     Dated:          March 2022\r\nrem\r\ncreate table test_t1 as\r\nselect\r\n        level id, level id2, level id3, 'N' as is_deleted\r\nfrom    dual\r\nconnect by\r\n        level &lt; 1e5\r\n;\r\nalter table test_t1 add constraint test_t1_pk primary key (id);\r\ncreate index test_t1_idx2 on test_t1(id2);\r\ncreate index test_t1_idx3 on test_t1(id3);\r\ncreate table test_t2 as select * from test_t1;\r\nalter table test_t2 add constraint test_t2_pk primary key (id);\r\nvariable p_id number\r\nexec :p_id := 5000\r\nalter session set statistics_level = all;\r\nset serveroutput off\r\nselect  \/*+ qb_name(main) *\/\r\n        *\r\nfrom    test_t1 t1\r\nwhere \r\n        t1.is_deleted='N' \r\nand     (\r\n            t1.id2 in (select \/*+ qb_name(id2) *\/ t2.id2 \r\n                       from test_t2 t2 \r\n                       where t2.is_deleted='N' and t2.id=:p_id\r\n                       )\r\n         or t1.id3 in (select \/*+ qb_name(id3) *\/ t2.id2 \r\n                       from test_t2 t2 \r\n                       where t2.is_deleted='N' and t2.id=:p_id\r\n                       )\r\n        );\r\nselect * \r\nfrom table(dbms_xplan.display_cursor(format=&gt;' allstats last -rows'));<\/pre>\n<p>Here\u2019s the resulting plan:<\/p>\n<pre class=\"theme:powershell-output font-size:12 lang:ps decode:true\">--------------------------------------------------------------------------------------------\r\n| Id  | Operation                    | Name       | Starts | A-Rows |   A-Time   | Buffers |\r\n--------------------------------------------------------------------------------------------\r\n|   0 | SELECT STATEMENT             |            |      1 |      1 |00:00:00.24 |     200K|\r\n|*  1 |  FILTER                      |            |      1 |      1 |00:00:00.24 |     200K|\r\n|*  2 |   TABLE ACCESS FULL          | TEST_T1    |      1 |  99999 |00:00:00.01 |     305 |\r\n|*  3 |   TABLE ACCESS BY INDEX ROWID| TEST_T2    |  99999 |      1 |00:00:00.17 |     200K|\r\n|*  4 |    INDEX UNIQUE SCAN         | TEST_T2_PK |  99999 |  99999 |00:00:00.09 |     100K|\r\n--------------------------------------------------------------------------------------------\r\nPredicate Information (identified by operation id):\r\n---------------------------------------------------\r\n   1 - filter( IS NOT NULL)\r\n   2 - filter(\"T1\".\"IS_DELETED\"='N')\r\n   3 - filter(((\"T2\".\"ID2\"=:B1 OR \"T2\".\"ID2\"=:B2) AND \"T2\".\"IS_DELETED\"='N'))\r\n   4 - access(\"T2\".\"ID\"=:P_ID)<\/pre>\n<p>As usual, when pulling a live execution plan from memory the text of the subquery used for the FILTER at operation 1 has disappeared from the Predicate Information, but you can infer from the predicates reported at operation 3 that subquery coalescing has taken place. For confirmation, you could use <code>explain plan<\/code> and <code>dbms_xplan.display()<\/code> to get a report that shows the missing predicate information (being careful to remember that <code>explain plan<\/code> knows nothing about the bind variable \u2013 not even its type \u2013 which is why you see <code>to_number(:p_id)<\/code> in the following:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">   1 - filter( EXISTS (SELECT \/*+ QB_NAME (\"ID2\") *\/ 0 \r\n                       FROM \"TEST_T2\" \"T2\" WHERE\r\n              \"T2\".\"ID\"=TO_NUMBER(:P_ID) \r\n              AND (\"T2\".\"ID2\"=:B1 OR \"T2\".\"ID2\"=:B2) AND\r\n              \"T2\".\"IS_DELETED\"='N'))<\/pre>\n<p>In fact, if you had reported the outline information in the call to <code>dbms_xplan.display_cursor()<\/code>, you would also have seen that it included the following two directives:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">Outline Data\r\n-------------\r\n      COALESCE_SQ(@\"ID3\")\r\n      COALESCE_SQ(@\"ID2\")<\/pre>\n<p>The key thing to note, however, is the amount of work that Oracle has had to do. For every row in the table, it has executed that subquery. Looking at the definition of <code>table test_t2<\/code> (and especially its primary key) it\u2019s easy to see that there can be at most one row where <code>t2.id = :p_id<\/code>, so why didn\u2019t the optimizer unnest the subquery and use it to drive into <code>test_t1<\/code><strong><em>? <\/em><\/strong>The answer is that while you would use only one row and one value of <code>t2.id2<\/code> to drive a join into <code>test_t1<\/code><strong><em>,<\/em><\/strong> there are two different columns in <code>test_t1<\/code> in the query, and you\u2019d have to do something complicated to unravel the pieces and join to<code> test_t1<\/code> twice per row from <code>test_t2<\/code> \u2013 so the optimizer doesn\u2019t (yet) try it. There are further comments about this complication in the following blog article by Mohamed Houri: <a href=\"https:\/\/hourim.wordpress.com\/2017\/08\/12\/unnesting-of-coalesced-subqueries\/\">https:\/\/hourim.wordpress.com\/2017\/08\/12\/unnesting-of-coalesced-subqueries\/<\/a><\/p>\n<p>Since subquery coalescing has blocked a strategy that seems sensible to the human eye it\u2019s worth telling the optimizer not to use the feature just to see if something interesting happens. Add the <code>no_coalesce_sq()<\/code> hint for the two named subquery blocks to the main query:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select  \/*+\r\n                qb_name(main)\r\n                no_coalesce_sq(@id2)\r\n                no_coalesce_sq(@id3)\r\n--              or_expand(@main (1) (2))\r\n        *\/\r\n        *\r\nfrom    test_t1 t1\r\nwhere   \r\n               t1.is_deleted='N' \r\nand     (\r\n            t1.id2 in (select \/*+ qb_name(id2) *\/ t2.id2 \r\n                       from test_t2 t2 \r\n                       where t2.is_deleted='N' and t2.id=:p_id\r\n                       )\r\n         or t1.id3 in (select \/*+ qb_name(id3) *\/ t2.id2 \r\n                       from test_t2 t2 \r\n                       where t2.is_deleted='N' and t2.id=:p_id\r\n                       )\r\n        );<\/pre>\n<p>The <code>or_expand(@main (1) (2))<\/code> hint &#8211; that I\u2019ve commented out here &#8211; was something I added to force 12.2.0.1 to produce the plan that I got from 19.11.0.0. Here\u2019s the more complicated, but far more efficient, execution plan that appeared:<\/p>\n<pre class=\"theme:powershell-output font-size:12 lang:ps decode:true\">-----------------------------------------------------------------------------------------------------------\r\n| Id | Operation                               | Name            | Starts | A-Rows |   A-Time   | Buffers |\r\n-----------------------------------------------------------------------------------------------------------\r\n|  0 | SELECT STATEMENT                        |                 |      1 |      1 |00:00:00.01 |      16 |\r\n|  1 |  VIEW                                   | VW_ORE_7F40D524 |      1 |      1 |00:00:00.01 |      16 |\r\n|  2 |   UNION-ALL                             |                 |      1 |      1 |00:00:00.01 |      16 |\r\n|  3 |    NESTED LOOPS                         |                 |      1 |      1 |00:00:00.01 |       7 |\r\n|* 4 |     TABLE ACCESS BY INDEX ROWID         | TEST_T2         |      1 |      1 |00:00:00.01 |       3 |\r\n|* 5 |      INDEX UNIQUE SCAN                  | TEST_T2_PK      |      1 |      1 |00:00:00.01 |       2 |\r\n|* 6 |     TABLE ACCESS BY INDEX ROWID BATCHED | TEST_T1         |      1 |      1 |00:00:00.01 |       4 |\r\n|* 7 |      INDEX RANGE SCAN                   | TEST_T1_IDX2    |      1 |      1 |00:00:00.01 |       3 |\r\n|* 8 |    FILTER                               |                 |      1 |      0 |00:00:00.01 |       9 |\r\n|  9 |     NESTED LOOPS                        |                 |      1 |      1 |00:00:00.01 |       6 |\r\n|*10 |      TABLE ACCESS BY INDEX ROWID        | TEST_T2         |      1 |      1 |00:00:00.01 |       3 |\r\n|*11 |       INDEX UNIQUE SCAN                 | TEST_T2_PK      |      1 |      1 |00:00:00.01 |       2 |\r\n|*12 |      TABLE ACCESS BY INDEX ROWID BATCHED| TEST_T1         |      1 |      1 |00:00:00.01 |       3 |\r\n|*13 |       INDEX RANGE SCAN                  | TEST_T1_IDX3    |      1 |      1 |00:00:00.01 |       2 |\r\n|*14 |     TABLE ACCESS BY INDEX ROWID         | TEST_T2         |      1 |      1 |00:00:00.01 |       3 |\r\n|*15 |      INDEX UNIQUE SCAN                  | TEST_T2_PK      |      1 |      1 |00:00:00.01 |       2 |\r\n-----------------------------------------------------------------------------------------------------------\r\nPredicate Information (identified by operation id):\r\n---------------------------------------------------\r\n   4 - filter(\"T2\".\"IS_DELETED\"='N')\r\n   5 - access(\"T2\".\"ID\"=:P_ID)\r\n   6 - filter(\"T1\".\"IS_DELETED\"='N')\r\n   7 - access(\"T1\".\"ID2\"=\"T2\".\"ID2\")\r\n   8 - filter(LNNVL( IS NOT NULL))\r\n  10 - filter(\"T2\".\"IS_DELETED\"='N')\r\n  11 - access(\"T2\".\"ID\"=:P_ID)\r\n  12 - filter(\"T1\".\"IS_DELETED\"='N')\r\n  13 - access(\"T1\".\"ID3\"=\"T2\".\"ID2\")\r\n  14 - filter((\"T2\".\"ID2\"=:B1 AND \"T2\".\"IS_DELETED\"='N'))\r\n  15 - access(\"T2\".\"ID\"=:P_ID)<\/pre>\n<p>A key feature that you can spot very easily is that this plan accesses a total of only 16 buffers, handling only a few rows, rather than accessing 200,000 buffers and handling 10,000 rows. It\u2019s clearly a significant reduction in the workload, but how does this plan actually work?<\/p>\n<p>Effectively the optimizer has turned the <code>OR<\/code>\u2019ed pair of subqueries into a <code>UNION ALL<\/code>, then unnested the union all, then pushed the join to <code>test_t1<\/code> inside the <code>UNION ALL<\/code> (in a step rather like the reverse of <strong><em>join factorization<\/em><\/strong> \u2013 which will be reviewed in a future installment). To eliminate rows that have already appeared in the first branch of the <code>UNION ALL<\/code> the optimizer has then added a filter subquery to the second branch.<\/p>\n<p>You might note that if I had not had a unique (primary key) index on <code>test_t2<\/code> there would have been <code>SORT<\/code> <code>UNIQUE<\/code> operations applied to the rows selected from <code>test_t2<\/code> in both branches before the joins into <code>test_t1<\/code>.<\/p>\n<p>In effect the query has been transformed into:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk \">select  {columns}\r\nfrom    t2, t1\r\nwhere   <strong><em>t1.id2<\/em><\/strong> = t2.id2 and etc...\r\nunion all\r\nselect  {columns}\r\nfrom    t2, t1\r\nwhere   <strong><em>t1.id3<\/em><\/strong> = t2.id2 and etc...\r\nand     not exists (\r\n                select {columns}\r\n                from    t2, t1\r\n                where  <strong><em>t1.id2<\/em><\/strong> = t2.id2\r\n        )<\/pre>\n<p>I won\u2019t extract and reformat the \u2018unparsed\u2019 query from the CBO trace file since it is a little long and messy and adds little value to the SQL sketch above. Notice, however, from the Predicate Information provided by the call to <strong><em>explain plan<\/em><\/strong> that where I\u2019ve used <code>not exists(subquery)<\/code> in the simplified framework, the optimizer actually uses <code>lnnvl( exists( subquery ) )<\/code> at operation 8 of the plan. Oracle is just applying its generic <em>\u2018does not evaluate to true\u2019<\/em> function to allow for possible nulls rather than using a simple \u2018<em>not\u2019<\/em>.<\/p>\n<p>Although the resulting plan looks a little complex it is just a combination of a small number of simple transformations done in the right order, so it\u2019s a little surprising that the optimizer doesn\u2019t find it automatically as it searches its space of transformations. This omission has now been noted as (unpublished) bug 33957043: \u201c<em>Subquery Coalesce prevents Or Expansion\/transformation\u201d<\/em>.<\/p>\n<p>As a reminder about how difficult it is to use hints to control the fine detail of the optimizer, when I checked the Outline Information for this execution plan, it held 35 hints that would be needed in an SQL Plan Baseline to reproduce the plan. Though the list included an <code>or_expand()<\/code> hint, the <code>no_coalesce_sq()<\/code> hint didn\u2019t make an appearance.<\/p>\n<h2>Summary<\/h2>\n<p>Converting an <strong><em>aggregate subquery<\/em><\/strong> strategy to an <strong><em>analytic function<\/em><\/strong> strategy is a rewrite that many people now do by hand when they spot the option and the numbers look right. The optimizer can do this as an internal transformation, though it doesn\u2019t seem to take the option as often as it could. Unfortunately, there\u2019s no explicit hint to force the transformation (though an <code>unnest()<\/code> hint may have a lucky side effect), and the only possible hint to block it is <code>the opt_param()<\/code> hint which may be needed in cases where the transformation introduces a large sorting overhead. In some cases, you may need to help the optimizer to pick the path by simplifying the SQL through the use of virtual columns, but that requirement may become redundant in a future release.<\/p>\n<p>If you have multiple subqueries which look sufficiently similar and are used in a similar fashion, the optimizer can do the equivalent of <em>\u2018find the least common multiple\u2019<\/em> so that it has to run a smaller number of subqueries, potentially using each call for multiple tests. This transformation has a related pair of hints, so it can be blocked or (if legal) forced. The demonstration of the transformation in this article highlighted a case where the optimizer missed an opportunity for using <strong><em>or expansion<\/em><\/strong> after doing <strong><em>subquery coalescing,<\/em><\/strong> so it\u2019s worth knowing that the mechanism can be blocked.<\/p>\n<p>A future installment will examine Or Expansion and Join Factorisation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Oracle optimizer often changes the query to get better performance. In this article, Jonathan Lewis explains two more optimizations involving subqueries.&hellip;<\/p>\n","protected":false},"author":101205,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143533],"tags":[95506],"coauthors":[39048],"class_list":["post-94088","post","type-post","status-publish","format-standard","hentry","category-featured","category-oracle-databases","tag-automate"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/94088","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/101205"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=94088"}],"version-history":[{"count":12,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/94088\/revisions"}],"predecessor-version":[{"id":94100,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/94088\/revisions\/94100"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=94088"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=94088"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=94088"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=94088"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}