{"id":73177,"date":"2015-01-13T14:47:06","date_gmt":"2015-01-13T14:47:06","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/uncategorized\/execution-plans-part-12-cardinality-feedback\/"},"modified":"2021-07-14T13:07:27","modified_gmt":"2021-07-14T13:07:27","slug":"execution-plans-part-12-cardinality-feedback","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/execution-plans-part-12-cardinality-feedback\/","title":{"rendered":"Execution Plans Part 12: Cardinality Feedback"},"content":{"rendered":"<p><strong><a href=\"https:\/\/allthingsoracle.com\/execution-plans-part-11-actuals\/\">In the previous instalment <\/a><\/strong>of this series I introduced three ways of accessing the run-time statistics for a query and described, for one of the methods, the basics of the information we can get and how we can use it. In this article I want to expand on the use of one method to show it can help use identify and resolve performance problems. The method is known (after Wolfgang Breitling&#8217;s exposition of it some 10 or more years ago) as <strong><a href=\"https:\/\/blogs.oracle.com\/optimizer\/entry\/cardinality_feedback\" target=\"_blank\">Cardinality Feedback<\/a><\/strong>.<\/p>\n<h2>Skewed Data<\/h2>\n<p>I have the following query \u2013 which I may not have analyzed properly before putting it into production:<\/p>\n<pre>select\r\n\tt1.id,\r\n\tt2.small_vc\r\nfrom\r\n\tt1,\r\n\tt2\r\nwhere\r\n\tt1.date_ord &gt;= trunc(sysdate) - 14\r\nand\tt1.supp_id = 1\r\nand\tt2.id = t1.id\r\norder by\r\n\tt1.id\r\n;<\/pre>\n<p>When I enable rowsource execution statistics by setting <em><strong>statistics_level<\/strong><\/em> to <em><strong>all<\/strong><\/em> (which increased the run time from 0.02 seconds to 0.25 seconds) I got the following information from the in-memory execution plan:<\/p>\n<pre>select * from table(dbms_xplan.display_cursor(null,null,'iostats last +cost'));\r\n\r\n-----------------------------------------------------------------------------------------------------------------\r\n| Id  | Operation                      | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |\r\n-----------------------------------------------------------------------------------------------------------------\r\n|   0 | SELECT STATEMENT               |         |      1 |        |   206 (100)|   2800 |00:00:00.23 |     799 |\r\n|   1 |  SORT ORDER BY                 |         |      1 |      1 |   206   (2)|   2800 |00:00:00.23 |     799 |\r\n|   2 |   NESTED LOOPS                 |         |      1 |      1 |   205   (1)|   2800 |00:00:00.21 |     799 |\r\n|   3 |    NESTED LOOPS                |         |      1 |      1 |   205   (1)|   2800 |00:00:00.15 |     560 |\r\n|*  4 |     TABLE ACCESS BY INDEX ROWID| T1      |      1 |      1 |   202   (1)|   2800 |00:00:00.09 |     200 |\r\n|*  5 |      INDEX RANGE SCAN          | T1_DATE |      1 |  14013 |    41   (3)|  14000 |00:00:00.04 |      40 |\r\n|*  6 |     INDEX RANGE SCAN           | T2_I1   |   2800 |      1 |     2   (0)|   2800 |00:00:00.03 |     360 |\r\n|   7 |    TABLE ACCESS BY INDEX ROWID | T2      |   2800 |      1 |     3   (0)|   2800 |00:00:00.02 |     239 |\r\n-----------------------------------------------------------------------------------------------------------------\r\nPredicate Information (identified by operation id):\r\n---------------------------------------------------\r\n   4 - filter(\"T1\".\"SUPP_ID\"=1)\r\n   5 - access(\"T1\".\"DATE_ORD\"&gt;=TRUNC(SYSDATE@!)-14)\r\n   6 - access(\"T2\".\"ID\"=\"T1\".\"ID\")<\/pre>\n<p>Note that I\u2019ve included <em>\u201c+cost\u201d<\/em> as an extra format parameter in the call to <em><strong>display_cursor()<\/strong><\/em>; conveniently this shows me where the optimizer thinks the resource costs are going to be \u2013 that little bit of information is very useful as an adjunct to the <em><strong>E-Rows<\/strong><\/em> information. I\u2019ve also used the <em>\u201ciostats\u201d<\/em> option rather than the <em>\u201callstats\u201d<\/em> option \u2013 I wouldn\u2019t normally do this but it\u2019s a way to eliminate the memory statistics which made the report much wider to show just a little extra information that the <em>\u201csort order by\u201d<\/em> at operation 1 needed (and got) about 100KB of memory to complete as an optimal workarea operation.<\/p>\n<p>Before we even begin to do a comparison of estimates and actuals, we can note something that stands out as an efficiency threat in this example: the <em><strong>E-Rows<\/strong><\/em> in lines 5 and 4 tell us that Oracle expects to acquire 14,000 rowids from index <em><strong>t1_date<\/strong><\/em>\u00a0and discard all but one of the rows after using those rowids to visit table <em><strong>t1<\/strong><\/em> \u2013 and most of the predicted cost of the query (viz: 202 \u2013 41) comes from the work done visiting that table. The optimizer seems to think that it has to pick a very poor quality index as the best way of addressing this query. Looking at the <em><strong>access()<\/strong><\/em> predicate on line 5 and the <em><strong>filter()<\/strong><\/em> predicate on line 4, we might decide that (a) we need to extend the index that seems to exist on just\u00a0<em><strong>(date_ord)<\/strong><\/em> so that it becomes <em><strong>(date_ord, supp_id)<\/strong><\/em>, or (b) if there\u2019s an index already on <em><strong>(supp_id)<\/strong><\/em> we need to extend it to <em><strong>(supp_id, date_ord)<\/strong><\/em>, or maybe (c) create an index on <em><strong>(supp_id)<\/strong><\/em> \u2013 with or without <em><strong>date_ord<\/strong><\/em> \u2013 because it\u2019s missing and ought to exist.<\/p>\n<p>Having noted that we may not have the right indexes anyway, we can still go through the process of comparing actuals and estimates as this may help us towards an informed decision on the infrastructure anyway. (Sometimes you may decide that there\u2019s no point in chasing the execution statistics further until the indexing is corrected, of course.)<\/p>\n<p>The order of operation (i.e. the order in which rowsources are produced) for the plan is: 5, 4, 6, 3, 7, 2, 1 \u2013 so we\u2019ll consider the plan lines in that order, remembering that the <em><strong>E-Rows<\/strong><\/em> figure must be multiplied by the <em><strong>Starts<\/strong><\/em> figure before you compare it to the\u00a0<em><strong>A-Rows<\/strong><\/em> figure:<\/p>\n<p style=\"padding-left: 30px\">5 \u2013 index range scan \u2013 estimated rows 14,013, starts 1; actual rows 14,000. Good prediction.<br \/>\n4 \u2013 table access by rowid \u2013 estimated rows 1, starts 1; actual rows 2,800. Bad prediction.<\/p>\n<p style=\"padding-left: 30px\">There are two reasons why we have to worry about line 4: first, why is the prediction so bad? secondly will the prediction for this line result in a bad choice of execution plan (e.g. wrong type of joins from this point onwards or is the whole join order wrong because of this extreme low volume\/high cost prediction)?<\/p>\n<p style=\"padding-left: 30px\">6 \u2013 index range scan \u2013 estimated rows 1, starts 2,800; actual rows 2,800. Good prediction!<\/p>\n<p>How did we get back to a good prediction after the terrible prediction for operation 4? The answer to that question is another question \u2013 <em>\u201cwhy not?\u201d<\/em> Although we have to remember to multiply <em><strong>E-Rows<\/strong><\/em> by <em><strong>Starts<\/strong><\/em> before comparing it to <em><strong>A-Rows<\/strong><\/em>, it\u2019s not necessarily the case that a good match is a good thing. The line-by-line comparison between actuals and estimates is a <strong>local<\/strong> comparison, and a good <strong>local<\/strong> prediction isn\u2019t automatically indicative of a good\u00a0<strong>global<\/strong> strategy.<\/p>\n<p>In this case if we access table <em><strong>t2<\/strong><\/em> by the index on <em><strong>id<\/strong><\/em> we really will get one rowid and one one row per access with very few buffer visits \u2013 and that\u2019s what the prediction says, so it really is a good prediction. But the fact that we will have to execute (start) that operation 2,800 times is (probably) a bad thing.<\/p>\n<p>We don\u2019t have a column called <em><strong>\u201cE-Starts\u201d<\/strong><\/em> in the output, but if we did it would hold the value 1 (the value of <em><strong>E-Rows<\/strong><\/em> from the driving table of the nested loop join) and that would help us recognize where the important error in the prediction took place. When we look at the <em><strong>A-Rows<\/strong><\/em> column of a plan we need to know whether the volume of data comes from the number of rows per execution, or the number of executions.<\/p>\n<p style=\"padding-left: 30px\">3 \u2013 nested loop \u2013 estimated rows 1, starts 1; actual rows 2,800. Bad prediction.<br \/>\n7 \u2013 table access by rowid \u2013 estimated rows 1, starts 2,800; actual rows 2,800. Good (local) prediction again.<br \/>\n2 \u2013 nested loop \u2013 estimated rows 1, starts 1; actual rows 2,800. Bad prediction.<br \/>\n1 \u2013 Sort order by \u2013 estimated rows 1, starts 1; actual rows 2,800. Bad prediction.<\/p>\n<p>If we summarize the state of this query plan, then: the <em><strong>E-Rows<\/strong><\/em> for lines 5 and 4 suggest that the optimizer knew that it didn\u2019t have a high-precision option for\u00a0accessing table <em><strong>t1<\/strong><\/em>. However when we consider the <em><strong>E-Rows<\/strong><\/em> and\u00a0<em><strong>A-Rows<\/strong><\/em> for line 4 we see that the optimizer had a <strong>bad<\/strong> idea of what the data in table <em><strong>t1<\/strong><\/em> looked like, and the entire plan from that point onwards was (probably) inappropriate because of that initial estimate.\u00a0Ignoring the initial error, every other step (join order and access method) that the optimizer dictated from that point on behaved as the optimizer expected \u2013 each of the later nested loop joins used a high precision index to identify and return a small volume of data per execution \u2013 but by that point we were already locked into something that was potentially a bad plan.\u00a0This leads us to two general points about using execution plans with rowsource execution statistics as a tool for highlighting root causes of performance problems.<\/p>\n<ul>\n<li><span style=\"font-size: 13px\">At any point in a plan, significant differences between actuals and estimates may point to flaws in our infrastructure, such as poor choice of index definitions or problems with defining or collecting statistics.<\/span><\/li>\n<li><span style=\"font-size: 13px\">The earliest point in an execution plan where the estimated row count differs significantly from the actual row count is probably the point at which the optimizer\u2019s choice of execution plan has gone wrong, and we should investigate the cause of that error first.<\/span><\/li>\n<\/ul>\n<p>The problem in this particular case was that there was a significant skew in the data. There were several hundred supplier IDs in table <em><strong>t1<\/strong><\/em>, but a large fraction of the rows were supplied by a very small subset of the suppliers. Since I had not created a histogram on column <em><strong>supp_id<\/strong><\/em> the optimizer had effectively estimated the number of rows in table <em><strong>t1<\/strong><\/em> as <em><strong>(num_rows \/ supp_id.num_distinct)<\/strong><\/em>, but supplier 1 was one of my high-volume suppliers, with 2,800 items supplied in the last two weeks.<\/p>\n<p>There is no trivial, generic, solution to this class of problem; the best strategy is a compromise that depends on too many variables, so I won\u2019t go into details of the ways you could address it. As a side-note to the main topic, though, I will just mention that since I wasn\u2019t using bind variables in this query, and since there were two predicates on table <em><strong>t1<\/strong><\/em>, and since I was using Oracle 11.2.0.4, the optimizer marked the initial cursor as <em>is_sharable = \u2018N\u2019<\/em>, <em>use_feedback_stats = \u2019Y\u2019<\/em> and re-optimized the statement the second time I ran it, with an internal set of <em><strong>opt_estimate()<\/strong><\/em> hints reflecting the actual run-time statistics produced by the first execution. This produced a different execution plan which included a hash join into <em><strong>t2<\/strong><\/em> (still using the same date index into <em><strong>t1<\/strong><\/em>).<\/p>\n<h2>Scalar Subquery Caching<\/h2>\n<p>Another class of problem we can run into where rowsource execution statistics can be helpful is in the area of <em><strong>scalar subqueries<\/strong><\/em>, whether in the select list or in the <em>where<\/em> clause. In <a href=\"https:\/\/allthingsoracle.com\/execution-plans-part-10-guesswork\/\"><em><strong>part 10 (Guesswork)<\/strong><\/em><\/a> of this series I introduce a variant of the following query and its execution plan:<\/p>\n<pre>select\r\n        count(*)\r\nfrom    (\r\n        select  \/*+ no_merge *\/\r\n                outer.*\r\n        from\r\n                emp outer\r\n        where\r\n                outer.sal &gt; (\r\n                        select  \/*+ no_unnest *\/\r\n                                avg(inner.sal)\r\n                        from\r\n                                emp inner\r\n                        where\r\n                                inner.dept_no = outer.dept_no\r\n                )\r\n        )\r\n;\r\n\r\n--------------------------------------------------------------\r\n| Id  | Operation             | Name | Rows  | Bytes | Cost  |\r\n--------------------------------------------------------------\r\n|   0 | SELECT STATEMENT      |      |     1 |       |   245 |\r\n|   1 |  SORT AGGREGATE       |      |     1 |       |       |\r\n|   2 |   VIEW                |      |   167 |       |   245 |\r\n|*  3 |    FILTER             |      |       |       |       |\r\n|   4 |     TABLE ACCESS FULL | EMP  | 20000 |   156K|    35 |\r\n|   5 |     SORT AGGREGATE    |      |     1 |     8 |       |\r\n|*  6 |      TABLE ACCESS FULL| EMP  |  3333 | 26664 |    35 |\r\n--------------------------------------------------------------\r\n\r\nPredicate Information (identified by operation id):\r\n---------------------------------------------------\r\n   3 - filter(\"OUTER\".\"SAL\"&gt; (SELECT \/*+ NO_UNNEST *\/\r\n              AVG(\"INNER\".\"SAL\") FROM \"EMP\" \"INNER\" WHERE \"INNER\".\"DEPT_NO\"=:B1))\r\n   6 - filter(\"INNER\".\"DEPT_NO\"=:B1)<\/pre>\n<p>The <em><strong>emp<\/strong> <\/em>table lists 20,000 employees spread over 6 departments. The variant of the query I used in <a href=\"https:\/\/allthingsoracle.com\/execution-plans-part-10-guesswork\/\"><em><strong>part 10<\/strong><\/em><\/a> didn\u2019t reduce the result set to an inline <em><strong>no_merge()<\/strong><\/em> view with a \u00a0simple <em><strong>count(*)<\/strong><\/em> as this code does, but the key features of the query are unchanged. The question, of course, was how many times does the correlated (scalar) subquery run? The answer, in principle, was <em>&#8220;any number you like between 6 (number of departments) and 20,000 (number of employees)&#8221;<\/em>. If you check the <em><strong>Cost<\/strong><\/em> figures you will see that Oracle has based its arithmetic on a model that expects to execute the subquery 6 times: \u00a0245 = 35 (tablescan at operation 4) + 6 * 35 (tablescan at operation 6).<\/p>\n<p>So when you add a couple of new departments to the company, fire a few people, hire a few people, and move a few employees into the new departments what happens? For a very small change in data volume the performance of the query could change dramatically. Initially this query completed in 0.35 seconds; after changing the <em><strong>dept_no<\/strong><\/em> (to carefully selected values) for a couple of (carefully selected) employees the run time jumped to a little over 3 minutes \u2013 all CPU. Here\u2019s the execution plan, with rowsource execution statistics after the change:<\/p>\n<pre>----------------------------------------------------------------------------------------\r\n| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |\r\n----------------------------------------------------------------------------------------\r\n|   0 | SELECT STATEMENT      |      |      1 |        |      1 |00:03:09.75 |    1467K|\r\n|   1 |  SORT AGGREGATE       |      |      1 |      1 |      1 |00:03:09.75 |    1467K|\r\n|   2 |   VIEW                |      |      1 |    167 |   9997 |00:03:09.71 |    1467K|\r\n|*  3 |    FILTER             |      |      1 |        |   9997 |00:03:09.63 |    1467K|\r\n|   4 |     TABLE ACCESS FULL | EMP  |      1 |  20000 |  20000 |00:00:00.08 |     220 |\r\n|   5 |     SORT AGGREGATE    |      |   6671 |      1 |   6671 |00:03:09.31 |    1467K|\r\n|*  6 |      TABLE ACCESS FULL| EMP  |   6671 |   3333 |     22M|00:01:38.37 |    1467K|\r\n----------------------------------------------------------------------------------------\r\n\r\nPredicate Information (identified by operation id):\r\n---------------------------------------------------\r\n   3 - filter(\"OUTER\".\"SAL\"&gt;)\r\n   6 - filter(\"INNER\".\"DEPT_NO\"=:B1)<\/pre>\n<p>As with the nested loop join in the first example we really could do with two <em>\u201cStarts\u201d<\/em> columns, one for <em><strong>A-Starts<\/strong><\/em> (actuals) and one for <em><strong>E-Starts<\/strong><\/em> (estimated), but all we have is the actuals. Despite the fact that the optimizer modelled for 6 starts of the subquery, it actually ran it 6,671 times because the run-time engine was unable to keep all the required subquery results cached.<\/p>\n<p>In this case, the query was artificially constructed to demonstrate a particular class of the scalar subquery problem \u2013 the correct solution would be to stop blocking the subquery unnesting \u2013 but in real-life this type of behaviour does occur and it\u2019s nice to know that we can determine exactly where the problem appears as it tells us something about how we need to rewrite (or get Oracle to rewrite) the query.<\/p>\n<p>The same type of issue (perhaps more prevalent nowadays) appears with scalar subqueries in the select list. If you have a query with multiple scalar subqueries in the select list how do you find out where the time is going when the query takes too long to complete? It&#8217;s likely that you can get a lot of help from rowsource execution statistics. Based on the same table, here&#8217;s a query, with its execution plan and run-time stats demonstrating the point:<\/p>\n<pre>select\r\n        dept_no,\r\n        emp_no,\r\n        (select sum(sal) from emp e2 where e2.dept_no = e1.dept_no) sal_tot\r\nfrom\r\n        emp e1\r\norder by\r\n        dept_no, emp_no\r\n;\r\n\r\n----------------------------------------------------------------------------------------------------------------\r\n| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |\r\n----------------------------------------------------------------------------------------------------------------\r\n|   0 | SELECT STATEMENT   |      |      1 |        |  20000 |00:00:04.71 |     731K|       |       |          |\r\n|   1 |  SORT AGGREGATE    |      |   3338 |      1 |   3338 |00:00:04.59 |     731K|       |       |          |\r\n|*  2 |   TABLE ACCESS FULL| EMP  |   3338 |   3333 |     11M|00:01:06.77 |     731K|       |       |          |\r\n|   3 |  SORT ORDER BY     |      |      1 |  20000 |  20000 |00:00:04.71 |     731K|   832K|   511K|  739K (0)|\r\n|   4 |   TABLE ACCESS FULL| EMP  |      1 |  20000 |  20000 |00:00:00.08 |     219 |       |       |          |\r\n----------------------------------------------------------------------------------------------------------------<\/pre>\n<p>With one unlucky combination of department and employee the inline scalar subquery ran 3,338 times instead of once per department &#8211; which is a clue that this query should probably have used a non-mergable aggregate view (or materialized factored subquery) with a hash join.<\/p>\n<h2>Summary<\/h2>\n<p>We have seen that rowsource execution statistics can tell you how many times an operation is run, how much work it actually does in total, and whether the optimizer\u2019s estimate of how much data it will return is realistic or not \u2013 remembering particularly that we should be comparing <em><strong>A-Rows<\/strong><\/em> with <em><strong>E-rows<\/strong><\/em> * <em><strong>Starts<\/strong><\/em>. We have also noted that this final bit of information is <em><strong>\u201clocal\u201d<\/strong><\/em> and that an accurate local estimate isn\u2019t necessarily indicative of a good \u201cglobal\u201d strategy.<\/p>\n<p>There are a couple of places where we can apply global thinking to the rowsource execution stats, though. The first is in looking for the earliest place in an execution plan where the actual number of rows in a driving rowsource (typically the first rowsource of a join) differs from the estimated number by a significant factor \u2013 this is likely to lead to subsequent poor choices of join method, or even be the source of a bad initial join order. The second is finding places where a large fraction of the work of the query comes from scalar subqueries being executed far more frequently than we might hope or expect \u2013 and without some form of the rowsource execution stats it isn\u2019t generally possible to know how many times each scalar subquery will have to be executed.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous instalment of this series I introduced three ways of accessing the run-time statistics for a query and described, for one of the methods, the basics of the information we can get and how we can use it. In this article I want to expand on the use of one method to show it can help use identify&hellip;<\/p>\n","protected":false},"author":101205,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143533],"tags":[48351,48371,48405,4783,48494,48502],"coauthors":[],"class_list":["post-73177","post","type-post","status-publish","format-standard","hentry","category-oracle-databases","tag-actuals","tag-cardinality","tag-estimated","tag-execution-plans","tag-scalar-subqueries","tag-skewed-data"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73177","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/101205"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=73177"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73177\/revisions"}],"predecessor-version":[{"id":91667,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73177\/revisions\/91667"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=73177"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=73177"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=73177"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=73177"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}