{"id":73137,"date":"2016-01-19T15:59:22","date_gmt":"2016-01-19T15:59:22","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/uncategorized\/12c-histogram-top-frequency\/"},"modified":"2021-07-14T13:07:18","modified_gmt":"2021-07-14T13:07:18","slug":"12c-histogram-top-frequency","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/12c-histogram-top-frequency\/","title":{"rendered":"12c Histogram: TOP-Frequency"},"content":{"rendered":"<p>As of Oracle version 12.1, if a number of buckets <strong>N<\/strong> in a histogram is less than the number of distinct values in the data set, Oracle will <strong>not<\/strong> generate the instable and imprecise legacy Height Balanced histogram. Instead, two new types of histogram are possible: TOP-Frequency and Hybrid. In order to select between the later and the former type of histogram, Oracle first proceeds to a determination step. It examines how representative the TOP-N distinct values are relative to all values in the data set. If the TOP-N values accounts for more than a certain threshold, a TOP-Frequency is used. When the TOP-N values fail to satisfy the threshold condition, a Hybrid histogram is calculated. This first article investigates the TOP-Frequency histogram. It shows how Oracle determines the TOP-Frequency threshold, examines how the low and high values of the data set influence this threshold value and outlines the arithmetic used by Oracle to work out the cardinality estimates of both popular and non-popular TOP-Frequency histogram.<\/p>\n<h2>TOP-Frequency: pre-requisites<\/h2>\n<p>Here&#8217;s below the simple model we are going to use in order to illustrate the TOP-Frequency histogram:<\/p>\n<pre>create table T_TopFreq as\r\nselect\r\n    rownum n1\r\n    , case when mod(rownum, 100000)   = 0 then   90\r\n           when mod(rownum, 10000)    = 0 then   180\r\n          when mod(rownum, 1000)= 0 then   84\r\n              when mod(rownum, 100)      = 0 then   125\r\n              when mod(rownum,50)        = 2 then   7\r\n              when mod(rownum-1,80)      = 2 then   22\r\n              when mod(rownum, 10)       = 0 then   19\r\n              when mod(rownum-1,10)      = 5  then   15\r\n              when mod(rownum-1,5)       = 1  then   11\r\n              when trunc((rownum -1\/3)) &lt; 5  then   25\r\n              when trunc((rownum -1\/5)) &lt; 20  then   33\r\n      else 42\r\n        end n2   \r\nfrom dual\r\nconnect by level &lt;= 2e2;<\/pre>\n<p>The above data set contains 200 rows, of which the n2 column has 9 distinct values as shown below:<\/p>\n<pre>SQL&gt; compute sum label 'Total rows' of cnt on report\r\nSQL&gt; break   on report\r\nSQL&gt; select *\r\n    from\r\n    (select n2, count(1) cnt\r\n     from t_topfreq\r\n     group by n2\r\n     order by n2);\r\n\r\n        N2        CNT\r\n---------- ----------\r\n         7          4\r\n        11         36\r\n        15         20\r\n        19         18\r\n        22          3\r\n        25          3\r\n        33          8\r\n        42        106\r\n       125          2\r\n           ----------\r\nTotal rows        200\r\n\r\n9 rows selected.\r\n<\/pre>\n<p>In order to qualify for a TOP-Frequency histogram, the following three conditions must be satisfied:<\/p>\n<ul>\n<li>The number of buckets used to gather the histogram must be less than the number of distinct values in the data set<\/li>\n<li>The sampling percentage is the default one : <code><strong>AUTO_SAMPLE_SIZE<\/strong><\/code><\/li>\n<li>The top most frequent distinct values must exceed a certain<strong> threshold<\/strong> (explained later in this article)<\/li>\n<\/ul>\n<p>The two first conditions are quite obvious, so let&#8217;s dig into some details for the third prerequisite.<\/p>\n<h2>Top Frequency: Inflexion point<\/h2>\n<p>In the following example, we are going to ask Oracle to gather statistics for the dataset above using fewer buckets (8) than the available number of distinct values (9). The <code><strong>AUTO_SAMPLE_SIZE<\/strong><\/code> default sampling will be used to ensure we don&#8217;t generate a Height Balance histogram. In order to check what Oracle is doing behind the scenes, we are going to trace the call to the dbms_stats package (under <span style=\"color: red;\">12.1.0.2<\/span>) as shown in the following:<\/p>\n<pre>SQL&gt; exec dbms_stats.set_global_prefs ('TRACE', to_char (1+16)); \r\nSQL&gt; BEGIN\r\n     dbms_stats.gather_table_stats (user\r\n                                  ,'T_TOPFREQ'\r\n                                  , method_opt=&gt; 'for columns n2 size 8');\r\n   END;\r\n\/\r\nSQL&gt; exec dbms_stats.set_global_prefs('TRACE', null);<\/pre>\n<p>The &#8216;TRACE&#8217; parameter in the first call above will trace any call to dbms_stats package by sending the corresponding trace file directly to the dbms_outpout current session. Here&#8217;s a snippet of this trace file:<\/p>\n<pre>DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME\r\nDBMS_STATS:   Y         Y                                                 N1\r\nDBMS_STATS:   Y    Y    Y    Y    Y    Y    Y                        Y    N2\r\nDBMS_STATS: Approximate NDV Options\r\nDBMS_STATS: ACL,TOPN,NIL,NIL,RWID,U,U<span style=\"color: red;\">8U<\/span>\r\nDBMS_STATS: <span style=\"color: red;\">start processing top n values for column \"N2\"<\/span>\r\nDBMS_STATS: topn sql (len: 589):\r\nDBMS_STATS: +++ select \/*+  no_parallel(t) no_parallel_index(t) \r\n                           dbms_stats cursor_sharing_exact use_weak_name_resl\r\n                           dynamic_sampling(0) no_monitoring \r\n\t\t\t\t xmlindex_sel_idx_tbl no_substrb_pad  \r\n                        *\/ \r\n\t          substrb(dump(\"N2\",16,0,64),1,240) val\r\n                ,rowidtochar(rowid) rwid \r\n\t       from \"XXXX\".\"T_TOPFREQ\" t \r\n\t\twhere rowid in  \r\n               (chartorowid('AAAJmWAAEAACN1zAAA'),chartorowid('AAAJmWAAEAACN1zAAB')\r\n\t         ,chartorowid('AAAJmWAAEAACN1zAAC'),chartorowid('AAAJmWAAEAACN1zAAF'\t\t  ,chartorowid('AAAJmWAAEAACN1zAAG'),chartorowid('AAAJmWAAEAACN1zAAH')\r\n\t        ,chartorowid('AAAJmWAAEAACN1zAAJ'),chartorowid('AAAJmWAAEAACN1zAAU')) \r\n\t   order by \"N2\"\r\nDBMS_STATS: <span style=\"color: red; background-color: yellow;\">remove last bucket<\/span>: Typ=2 Len=2: c1,2b add: Typ=2 Len=3: c2,2,1a\r\nDBMS_STATS: <span style=\"color: red; background-color: yellow;\">removal_count: 1<\/span> total_nonnull_rows: 200 mnb:  8\r\nDBMS_STATS:  adjusted coverage: .98\r\nDBMS_STATS: hist_type in exec_get_topn: 2048 ndv:9 mnb:8\r\nDBMS_STATS: Evaluating frequency histogram for col: \"N2\"\r\nDBMS_STATS:  number of values = 8, max # of buckects = 8, pct = 100, ssize = 200\r\nDBMS_STATS:   csr.hreq: 1 Histogram gathering flags: 2055\r\nDBMS_STATS: done_hist in process_topn: TRUE csr.ccnt: 1\r\nDBMS_STATS: <span style=\"color: red;\">Mark column \"N2\" as top N computed<\/span>\r\nDBMS_STATS:          Need Actual Values (DSC_EAVS)\r\nDBMS_STATS:          Histogram Type: <code>TOP-FREQUENCY<\/code> Data Type: 2\r\nDBMS_STATS:          Histogram Flags: 8196 Histogram Gathering Flags: 2310\r\nDBMS_STATS:          Incremental: FALSE Fix Control 13583722: 1<\/pre>\n<p>The above trace file is clearly showing (<span style=\"color: red;\"><code>start processing top n values for column \"N2\"<\/code><\/span>) that, before attempting to collect a Hybrid histogram, Oracle did internal arithmetic in order to process the TOP-8 distinct values for column N2 to see how representative they are when compared to the rest of the remaining values. That is, Oracle has tried to work out the proportion of unpopular values that are statistically insignificant when compared to very few distinct popular ones. If popular values dominate the dataset then Oracle will generate a TOP-Frequency histogram for the TOP-8 values and ignore the remaining insignificant values. The above trace file shows that Oracle has indeed succeeded to pass the threshold prerequisite (<span style=\"color: red;\"><code>Mark column \"N2\" as top N computed<\/code><\/span>) and produced a TOP-Frequency histogram as shown below:<\/p>\n<pre>SQL&gt; select\r\n        column_name\r\n       ,num_distinct\r\n       ,num_buckets\r\n       ,sample_size\r\n       ,histogram\r\n     from\r\n        user_tab_col_statistics\r\n     where table_name = 'T_TOPFREQ'\r\n     and column_name  = 'N2';\r\n\r\nCOLUMN_NAME  NUM_DISTINCT NUM_BUCKETS SAMPLE_SIZE HISTOGRAM\r\n------------ ------------ ----------- ----------- ---------------\r\nN2                      9           8        200 <span style=\"color: red;\">TOP-FREQUENCY<\/span><\/pre>\n<p>In fact, I have been playing with the above call to dbms_stats package changing the number of buckets several times until I succeeded in producing a <code>TOP-FREQUENCY<\/code> histogram with 8 buckets. Nevertheless, as specified in <a href=\"http:\/\/www.aioug.org\/sangam14\/images\/Sangam14\/Presentations\/201446_garg_ppt.ppt.pdf\">Anju Garg&#8217;s<\/a> article, it is possible to derive the number of buckets upon which Oracle will generate a <code>TOP-FREQUENCY<\/code> histogram based on the following inflexion point (or threshold in Oracle words) where N represents the number of specified buckets:<\/p>\n<pre>               InfPoint = <span style=\"color: red;\">round<\/span> ((N-1)\/N * table (num_rows))\r\n              InfPoint = <span style=\"color: red;\">round<\/span> ((8-1)\/8 * 200) = 175<\/pre>\n<p>In other words, to qualify for a <code>TOP-FREQUENCY<\/code> histogram, the <code>TOP-8 N2<\/code> distinct values should occupy more than 175 rows in the data set i.e.<\/p>\n<pre>                     <code>TOP-8 N2<\/code> rows &gt;= 175 rows<\/pre>\n<p>And, as shown in <a href=\"http:\/\/www.aioug.org\/sangam14\/images\/Sangam14\/Presentations\/201446_garg_ppt.ppt.pdf\">Anju Garg<\/a>&#8216;s article, the <code>TOP-8 N2<\/code> can be computed using the following SQL:<\/p>\n<pre>SQL&gt; select\r\n         sum (cnt) TopNRows\r\n      from (select\r\n               n2\r\n              ,count(*) cnt\r\n            from t_topfreq\r\n            group by n2\r\n            order by count(*) desc\r\n            )\r\n     where rownum &lt;= 8;\r\n\r\n  TOPNROWS\r\n----------\r\n       198\r\n<\/pre>\n<p>The <code>TOP-8 N2<\/code> values occupy <code>198<\/code> rows which is more than the inflexion row count threshold of 175 rows. This is why the current configuration qualifies for a <code>TOP-FREQUENCY<\/code>.<\/p>\n<h2>Top Frequency: Inflexion point and low-high values<\/h2>\n<p>Whatever the type of histogram Oracle has to generate, the low and high values must be in the histogram information. When these two extreme values are naturally present in the <code>TOP-N<\/code> distinct values then the above threshold formula remains correct. However, when they are insignificant and negligible they will be forced into the histogram information. This value forcing is possible to the detriment of excluding a victim value from the actual <code>TOP-N<\/code> distinct values. The victim values are those with the less popular distinct value.<\/p>\n<p>Let&#8217;s see whether the low and high values are among the <code>TOP-8 N2<\/code> distinct values:<\/p>\n<pre>SQL&gt; select min(n2), max(n2) from t_topfreq;\r\n\r\n   MIN(N2)    MAX(N2)\r\n---------- ----------\r\n        7        125<\/pre>\n<pre>SQL&gt; SQL&gt; select\r\n            n2, cnt\r\n         from (select\r\n                 n2\r\n                ,count(*) cnt\r\n              from t_topfreq\r\n              group by n2\r\n              order by count(*) desc, n2\r\n               )\r\n        where rownum &lt;= 8;\r\n\r\n\r\n        N2        CNT\r\n---------- ----------\r\n        42        106\r\n        11         36\r\n        15         20\r\n        19         18\r\n        33          8\r\n         7          4 \u2192 low value is present in the TOP-8\r\n        22          3 \u2192 22 is the first least popular value\r\n        25          3\r\n\r\n8 rows selected.<\/pre>\n<p>In the previous example, the highest value (<code>125<\/code>) is not among the <code>TOP-8 N2<\/code> distinct values. However, as explained above, this extreme value has been forced in the histogram information as shown in the following:<\/p>\n<pre>SQL&gt; select\r\n         endpoint_actual_value value,\r\n         endpoint_number,\r\n         endpoint_number - (lag(endpoint_number,1,0)\r\n                           over(order by endpoint_number)) value_count\r\n     from\r\n        user_tab_histograms\r\n     where\r\n        table_name    = 'T_TOPFREQ'\r\n     and column_name  = 'N2'\r\n     order by\r\n     endpoint_number\r\n     ;\r\n\r\nVALUE      ENDPOINT_NUMBER VALUE_COUNT\r\n---------- --------------- -----------\r\n7                        4           4\r\n11                      40          36\r\n15                      60          20\r\n19                      78          18\r\n25                      81           3\r\n33                      89           8\r\n42                     195         106\r\n125                    196           1 \u2192 forced high value with frequency 1\r\n\r\n8 rows selected.<\/pre>\n<p>As you can see the insignificant high value <code>125<\/code> has been forced into the histogram table with a hardcoded frequency of 1. This forcing has been possible at the cost of a value exclusion from the TOP-8 naturally dominant values. And, as mentioned above, the victim value in this case is 22 representing the first value with the least popular frequency (3).<\/p>\n<p>Notice that the above dbms_stats trace file also indicates that Oracle has proceeded to a remove-forcing operation via the following line:<\/p>\n<pre>DBMS_STATS: <span style=\"color: red;\">remove last bucket<\/span>: Typ=2 Len=2: c1,2b add: Typ=2 Len=3: c2,2,1a\r\nDBMS_STATS: removal_count: 1 total_nonnull_rows: 200 mnb: 8<\/pre>\n<p>Although the trace file is not very clear, it nevertheless indicates that something is removed to let a room for the high value 125 (<code>Typ=2 Len=3: c2,2,1a<\/code>):<\/p>\n<pre>SQL&gt; select dump(125) high_value from dual;\r\n\r\nHIGH_VALUE\r\n---------------------\r\nTyp=2 Len=3: 194,2,26<\/pre>\n<p>Whenever Oracle operates such an exclusion-forcing value into the histogram table, the TOP-N count of rows should be adjusted before to be compared with the threshold as shown in the following:<\/p>\n<pre>SQL&gt; select\r\n         sum (cnt) TopNRows\r\n      from (select\r\n               n2\r\n              ,count(*) cnt\r\n            from t_topfreq\r\n            group by n2\r\n            order by count(*) desc\r\n            )\r\n     where rownum &lt;= 8;\r\n\r\n  TOPNROWS\r\n----------\r\n       198\r\n\r\nAdjustedTopNRows = TopNRows - count (least popular TOP-8) + 1 (forced frequency)\r\nAdjustedTopNRows = 198 - 3 + 1 = 196<\/pre>\n<p>It&#8217;s worth noting that, in this particular case, both <code>TopNRows<\/code> and <code>AdjustedTopNRows<\/code> are greater than the threshold of 175 rows. However, whenever Oracle generates a Hybrid histogram where you think that your TOP-N setup qualify for the a <code>TOP-FREQUENCY<\/code>, you must ensure that you are comparing the correct TopNRows with the threshold number of rows.<\/p>\n<h2>2. Top Frequency: cardinality estimation<\/h2>\n<h3>2.1 Popular values<\/h3>\n<p>The select statement exposed below shows the resulting metadata produced for captured popular N2 column values when it qualifies for <code>TOP-FREQUENCY<\/code> histogram due to the appropriate number of buckets (8):<\/p>\n<pre>SQL&gt; select\r\n         endpoint_actual_value value,\r\n         endpoint_number - (lag(endpoint_number,1,0)\r\n                           over(order by endpoint_number)) value_count\r\n     from\r\n        user_tab_histograms\r\n     where\r\n        table_name   = 'T_TOPFREQ'\r\n     and column_name = 'N2'\r\n     order by\r\n        endpoint_number;\r\n\r\nVALUE      VALUE_COUNT\r\n---------- -----------\r\n7                    4\r\n11                  36\r\n15                  20\r\n19                  18\r\n25                   3\r\n33                   8\r\n42                 106 ***\r\n125                  1\r\n\r\n8 rows selected.<\/pre>\n<p>For the above captured popular values, the Optimizer uses the following formula to calculate its cardinality estimates:<\/p>\n<pre>E-Rows = value_count * num_rows\/sample_size<\/pre>\n<p>Applied to N2 = <code><span style=\"color: red;\">42<\/span><\/code>, the above estimation formula gives this:<\/p>\n<pre>E-Rows = 106 * 200\/200 = 106<\/pre>\n<p>This is of course backed by the following execution plan where we can see that Oracle has come up with exactly the same cardinality estimation of <code>106<\/code>:<\/p>\n<pre>SQL&gt; select count(1) from t_topfreq where n2 = <span style=\"color: red;\">42<\/span>;\r\n\r\n  COUNT(1)\r\n----------\r\n         106\r\n\r\nSQL&gt; select * from table(dbms_xplan.display_cursor);\r\n------------------------------------------------\r\n| Id  | Operation          | Name      | Rows  |\r\n------------------------------------------------\r\n|   0 | SELECT STATEMENT   |           |       |\r\n|   1 |  SORT AGGREGATE    |           |     1 |\r\n|*  2 |   TABLE ACCESS FULL| T_TOPFREQ |   <span style=\"color: red;\">106<\/span> |\r\n------------------------------------------------\r\nPredicate Information (identified by operation id):\r\n---------------------------------------------------\r\n   2 - filter(\"N2\"=42)<\/pre>\n<p>However, the exclusion-forcing value operation done by Oracle in order to insert the high value into the histogram table makes both the forced value (125) and the excluded value (22) not following the same cardinality estimation formula. The next section discusses this particular case.<\/p>\n<h3>2.2 Non-Popular values<\/h3>\n<p>Since Oracle concentrated on capturing the most representative 8 N2 values over the available 9 distinct ones, then the following least occurring distinct values have not been captured and have been consequently considered to be non-popular:<\/p>\n<pre>SQL&gt; with popular_val as\r\n        (select\r\n           endpoint_actual_value value\r\n        from\r\n           user_tab_histograms\r\n        where\r\n            table_name  = 'T_TOPFREQ'\r\n        and column_name = 'N2')\r\n       select distinct n2\r\n      from t_topFreq all_val\r\n      where not exists (select null\r\n                        from\r\n                           popular_val pop_val\r\n                        where\r\n                        all_val.n2 = pop_val.value);\r\n\r\n        N2\r\n----------\r\n        22<\/pre>\n<p>Let&#8217;s now see what cardinality estimate Oracle will come up with for this non-popular non-captured value:<\/p>\n<pre>SQL&gt; select count(1) from t_topfreq where n2 = 22;\r\n\r\n  COUNT(1)\r\n----------\r\n         3\r\n\r\nSQL&gt; select * from table(dbms_xplan.display_cursor);\r\n------------------------------------------------\r\n| Id  | Operation          | Name      | Rows  |\r\n------------------------------------------------\r\n|   0 | SELECT STATEMENT   |           |       |\r\n|   1 |  SORT AGGREGATE    |           |     1 |\r\n|*  2 |   TABLE ACCESS FULL| T_TOPFREQ |     4 |\r\n------------------------------------------------\r\nPredicate Information (identified by operation id):\r\n---------------------------------------------------\r\n   2 - filter(\"N2\"=22)<\/pre>\n<p>I have to restrain the enthusiasm of those using the <em>\u2018\u2018half least popular value&#8221;<\/em> rule to get the selectivity of a non-popular frequency histogram before they extend it to the current non-popular top-frequency using the following formula:<\/p>\n<pre>E-Rows = min (value_count of popular values)\/<span style=\"color: red;\">2<\/span>\r\nE-Rows = 1\/2 = 0.5 rounded to <span style=\"color: red;\">1<\/span><\/pre>\n<p>The best way to reverse engineer the path used by Oracle to get its cardinality estimates is to use the corresponding 10053 trace file, of which a snippet is shown below:<\/p>\n<pre>SINGLE TABLE ACCESS PATH \r\n  Single Table Cardinality Estimation for T_TOPFREQ[T_TOPFREQ] \r\n  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE\r\n  Column (#2): \r\n<span style=\"color: red;\">NewDensity: 0.020000<\/span>, OldDensity: 0.002500 <span style=\"color: red;\">BktCnt: 196<\/span> PopBktCnt:195, PopValCnt:7, NDV:9\r\n  Column (#2): N2(NUMBER)\r\n    AvgLen: 4 NDV: 9 Nulls: 0 Density: 0.020000 Min: 7.000000 Max: 125.000000\r\n    Histogram: <span style=\"color: red;\">Top-Freq<\/span>  #Bkts: 196  UncompBkts: 196  EndPtVals: 8  ActualVal: yes\r\n  Table: TOPFREQ3 Alias: TOPFREQ3\r\n    Card: Original: 200.000  Rounded: 4  Computed: 4.000000  Non Adjusted: 4.000000<\/pre>\n<p>This tends to suggest that Oracle is simply using the following formula to compute the selectivity of non-popular (insignificant) TOP-Frequency values:<\/p>\n<pre>E-Rows = num_rows * NewDensity<\/pre>\n<p>The NewDensity, when there is no extreme value forcing, is calculated using the following formula:<\/p>\n<pre>NewDensity = (sample_size-TopNRows)\/(num_distinct-num_buckets)\/sample_size<\/pre>\n<p>And the formula exposed below when Oracle has forced an extreme value into the histogram information:<\/p>\n<pre>NewDensity = (sample_size- AdjustedTopNRows)\/(num_distinct-num_buckets)\/sample_size\r\nNewDensity = (200 \u2013 196) \/(9-8)\/200 = 0.02<\/pre>\n<p>Finally, using this computed <em>NewDensity<\/em> and applying it in the above mentioned cardinality estimation formula for a non-popular TOP-Frequency gives the following:<\/p>\n<pre>E-Rows = num_rows * NewDensity\r\nE-Rows = 200 * .02 = 4<\/pre>\n<p>There is however an odd situation happening here. Oracle is considering the forced high value 125 to be non-popular and is applying the NewDensity to get its selectivity as shown below:<\/p>\n<pre>SQL&gt; select count(1) from t_topfreq where n2 = 125;\r\n\r\n  COUNT(1)\r\n----------\r\n         2\r\nSQL&gt; select * from table(dbms_xplan.display_cursor);\r\n------------------------------------------------\r\n| Id  | Operation          | Name      | Rows  |\r\n------------------------------------------------\r\n|   0 | SELECT STATEMENT   |           |       |\r\n|   1 |  SORT AGGREGATE    |           |     1 |\r\n|*  2 |   TABLE ACCESS FULL| T_TOPFREQ |     4 |\r\n------------------------------------------------\r\n\r\nPredicate Information (identified by operation id):\r\n---------------------------------------------------\r\n   2 - filter(\"N2\"=125)<\/pre>\n<p>Oracle is certainly flagging somewhere in its internal code that the high value present in the histogram has been forced, which allows the Optimizer to recognize it and use the &#8221;appropriate&#8221; cardinality formula.<\/p>\n<h2>Conclusion<\/h2>\n<p>Prior to Oracle 12c, when histograms are collected for a data set using fewer buckets than the number of distinct values in the data set, Oracle will systematically generate an instable and imprecise Height Balanced histogram where at most 127 (254\/2) values can be captured. With 12c under default value of <strong>AUTO_SAMPLE_SIZE<\/strong>, Oracle will not only cease to generate Height Balanced histogram, but will also carry out a preliminary determination step to see how dominant the TOP-N (N is the number of buckets) values are with regards to the total number of rows in the data set. If these TOP-N values are beyond the inflexion point then a <code>TOP-FREQUENCY<\/code> histogram is calculated for the dominant values while the neglected ones are considered to be hybrid non-popular values. This new type of histogram really helps the optimizer to come up with highly-accurate cardinality estimation, a fundamental prerequisite for optimal execution plans. In the next article we will examine the 12c Hybird histogram.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As of Oracle version 12.1, if a number of buckets N in a histogram is less than the number of distinct values in the data set, Oracle will not generate the instable and imprecise legacy Height Balanced histogram. Instead, two new types of histogram are possible: TOP-Frequency and Hybrid. In order to select between the later and the former ty&hellip;<\/p>\n","protected":false},"author":305852,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143533],"tags":[48347,48417,48527],"coauthors":[],"class_list":["post-73137","post","type-post","status-publish","format-standard","hentry","category-oracle-databases","tag-12c","tag-histogram","tag-top-frequency"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73137","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/305852"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=73137"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73137\/revisions"}],"predecessor-version":[{"id":91633,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73137\/revisions\/91633"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=73137"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=73137"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=73137"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=73137"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}