{"id":93390,"date":"2022-02-22T19:29:59","date_gmt":"2022-02-22T19:29:59","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=93390"},"modified":"2022-03-18T21:21:59","modified_gmt":"2022-03-18T21:21:59","slug":"the-effects-of-null-with-not-in-on-oracle-transformations","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/the-effects-of-null-with-not-in-on-oracle-transformations\/","title":{"rendered":"The effects of NULL with NOT IN on Oracle transformations"},"content":{"rendered":"<p><strong>Jonathan Lewis' continuing series on the Oracle optimizer and how it transforms queries into execution plans:<\/strong><\/p>\n<ol>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/transformations-by-the-oracle-optimizer\/\">Transformations by the Oracle Optimizer<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/the-effects-of-null-with-not-in-on-oracle-transformations\/\">The effects of NULL with NOT IN on Oracle transformations<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/oracle-subquery-caching-and-subquery-pushing\/\">Oracle subquery caching and subquery pushing<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/oracle-optimizer-removing-or-coalescing-subqueries\/\">Oracle optimizer removing or coalescing subqueries<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/oracle-optimizer-or-expansion-transformations\/\">Oracle optimizer Or Expansion Transformations<\/a><\/li>\n<\/ol>\n\n<p>In my <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/transformations-by-the-oracle-optimizer\/\">first article on transformations<\/a>, I described how Oracle liked to minimize the number of query blocks that the optimizer had to handle. I presented the example of an <code>IN<\/code><em> subquery<\/em> in the <code>where<\/code> clause that Oracle would rewrite as the equivalent <code>EXISTS<\/code><em> subquery<\/em> before unnesting the subquery to produce a join. The join might be a <em>semi-join<\/em> or <em>partial join<\/em> but might include an <em>in-line aggregate view<\/em>, which might be further transformed by <em>complex view merging<\/em>. This article considers effects of NULL with NOT IN on Oracle transformations.<\/p>\n<p>There are many more transformations that the optimizer can apply to subqueries, but before moving on to any of the more subtle and complicated transformations, it\u2019s important to look at the effects of <strong><em>nulls<\/em><\/strong> and the impact they have when <code>IN<\/code> is changed to <code>NOT<\/code> <code>IN<\/code>.<\/p>\n<h2>To be or not to be<\/h2>\n<p>Or, to relate Hamlet\u2019s question to subqueries and the optimizer: <code>IN<\/code> or <code>NOT<\/code> <code>IN<\/code>. Before pursuing the topic of subqueries any further, it\u2019s important to keep in mind that there is an essential difference between IN subqueries and <code>NOT<\/code> <code>IN<\/code> subqueries. While <code>NOT<\/code> <code>EXISTS<\/code> is the opposite of <code>EXISTS<\/code>, NOT IN is <strong>not<\/strong> the exact opposite of <code>IN<\/code> and if you forget this you may end up rewriting a query in a way that produces the wrong results.<\/p>\n<p>The problem is best explained with a basic definition followed by a simple example:<\/p>\n<ul>\n<li><code>IN()<\/code> is equivalent to equals <code>ANY()<\/code><\/li>\n<li><code>NOT IN()<\/code> is equivalent to not equal to <code>ALL()<\/code><\/li>\n<\/ul>\n<p>Think about what happens if the result set of your subquery includes a <code>NULL<\/code> \u2013 to keep it simple you can start with a list of values rather than a subquery then consider the two expressions:<\/p>\n<ul>\n<li>3 IN (1, 3, 5, null)<\/li>\n<li>4 NOT IN (1, 3, 5, null)<\/li>\n<\/ul>\n<p>The first expression expands to (3 = 1) <strong>or<\/strong> (3 = 3) <strong>or<\/strong> (3 = 5) <strong>or<\/strong> (3 = null).<\/p>\n<p>The second expression expands to (4 != 1) <strong>and<\/strong> (4 != 3) <strong>and<\/strong> (4 != 5) <strong>and<\/strong> (4 != null)<\/p>\n<p>Because the first expression is a list of disjuncts (<code>OR\u2019s<\/code>) it evaluates to <code>TRUE<\/code> if <strong><em>any<\/em><\/strong> of the atomic expressions evaluates to <code>TRUE<\/code>; in fact, at runtime Oracle will stop checking as soon as it finds the first expression that evaluates to <code>TRUE<\/code> (i.e. <em>3 = 3<\/em>).<\/p>\n<p>Because the second expression is a list of conjuncts (<code>ANDs<\/code>), it evaluates to <code>TRUE<\/code> <strong><em>only if every<\/em><\/strong> atomic expression evaluates to <code>TRUE<\/code>. The first three expressions evaluate to <code>TRUE<\/code>, but the last expression in the list is <code>4 != null<\/code>, which evaluates to <code>UNKNOWN<\/code>, so the compound expression evaluates to <code>FALSE<\/code>.<\/p>\n<p>To the human eye, the value 4 is clearly <strong>not<\/strong> <em>in the list<\/em>, but the Oracle database (or any other relational database) doesn\u2019t interpret <code>NOT<\/code> <code>IN<\/code> the way the human eye does.<\/p>\n<p>This script will build the last example into a simple Oracle model:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">create table t1 (n1 number);\r\ncreate table t2 (n2 number);\r\ninsert into t1 values(4);\r\ninsert into t2 values(1);\r\ninsert into t2 values(3);\r\ninsert into t2 values(5);\r\ninsert into t2 values(null);\r\ncommit;\r\nexecute dbms_stats.gather_table_stats(user,'t1')\r\nexecute dbms_stats.gather_table_stats(user,'t2')<\/pre>\n<p>My table <strong><em>t1<\/em><\/strong> holds a row with the value 4, and my table <strong><em>t2 <\/em><\/strong>does not. I\u2019m going to start by querying <strong><em>t1<\/em><\/strong> for all the values that are IN table <strong><em>t2<\/em><\/strong>, then for all the values that are <code>NOT<\/code> <code>IN<\/code> table <strong><em>t2<\/em><\/strong> \u2013 here are the queries with results cut and pasted from an SQL*Plus session running 19.11:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SQL&gt; select * from t1 where n1 in (select n2 from t2);\r\nno rows selected\r\nSQL&gt; \r\nSQL&gt; select * from t1 where n1 not in (select n2 from t2);\r\nno rows selected\r\nSQL&gt;<\/pre>\n<p>Whether you ask for <code>IN<\/code> or <code>NOT<\/code> <code>IN<\/code>, you get no rows \u2013 <code>NOT<\/code> <code>IN<\/code> is <strong><em>not<\/em><\/strong> the opposite of <code>IN<\/code> if the subquery can return nulls. You can confirm this point by executing two more statements:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SQL&gt; delete from t2 where n2 is null;\r\n1 row deleted.\r\nSQL&gt; \r\nSQL&gt; select * from t1 where n1 not in (select n2 from t2);\r\n        N1\r\n----------\r\n         4\r\n1 row selected.<\/pre>\n<p>When deleting the <em>null<\/em> row from <strong><em>t2,<\/em><\/strong> the <em>not in<\/em> version of our query returns the 4 row.<\/p>\n<h2>Execution plans<\/h2>\n<p>It\u2019s very easy to forget about the impact of nulls and the benefits of declaring mandatory columns as <strong><em>not null<\/em><\/strong>. Nulls introduce all sorts of problems (to people and to the optimizer). Over the years, the optimizer has been enhanced to work around some of the performance problems they used to produce.<\/p>\n<p>Side note: Oracle handles <strong>not null<\/strong> column declarations differently from \u201c(column is not null)\u201d <strong>check constraints<\/strong>, and a <strong>not null<\/strong> declaration may allow the optimizer to consider extra execution paths that it would not otherwise be allowed to use. You may find that a simple <strong>not null<\/strong> declaration makes a huge difference to performance.<\/p>\n<p>When looking at the execution plan from the first example with the <code>NOT<\/code> <code>IN<\/code> subquery, you can see Oracle giving a little warning about the presence of nulls and the impact it might have on our processing:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"639\" height=\"196\" class=\"wp-image-93391\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/02\/word-image-19.png\" \/><\/p>\n<p>The optimizer has chosen to use a <strong><em>hash anti-join<\/em><\/strong>. The significance of the <strong><em>ANTI<\/em><\/strong> is that the test fails if the search succeeds &#8211; the predicate for operation 1 must fail for every row in <strong><em>t2<\/em><\/strong> before the <strong><em>t1<\/em><\/strong> row can be reported. But there\u2019s a further label to the hash join operation: <strong><em>NA<\/em><\/strong>, for null-aware, which shows that Oracle is taking some sort of defensive action during the hash join to avoid making mistakes with nulls.<\/p>\n<p>This suggests another way of understanding what\u2019s going on. You\u2019ve seen that <code>IN<\/code> and <code>NOT<\/code> <code>IN<\/code> aren\u2019t exactly opposites, and you saw in the previous article that non-correlated IN subqueries could be turned into correlated <code>EXISTS<\/code> subqueries. Add the null row back into <strong><em>t2<\/em><\/strong> and see what the execution plan looks like if you assume (incorrectly) that you can rewrite our non-correlated <code>NOT<\/code> <code>IN<\/code> subquery as a correlated <code>NOT<\/code> <code>EXISTS<\/code> subquery:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SQL&gt; select * from t1 where not exists (select null from t2 where t2.n2 = t1.n1);\r\n        N1\r\n----------\r\n         4\r\n1 row selected.<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"671\" height=\"274\" class=\"wp-image-93392\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/02\/word-image-20.png\" \/><\/p>\n<p>The result from <code>NOT<\/code> <code>EXISTS<\/code> disagrees with the result from NOT IN, and the plan doesn\u2019t quite match the <code>NOT<\/code> IN plan. Operation 1 (hash join anti) is not labeled as <em>Null-aware<\/em> (NA), and there\u2019s a predicate at operation 3 that stops any null(s) from <strong><em>t2<\/em><\/strong> being passed up to the hash join operation. To sum up:<\/p>\n<ul>\n<li><code>NOT<\/code> <code>EXISTS<\/code> is the opposite of <code>EXISTS<\/code>,<\/li>\n<li>You can transform <code>IN<\/code> to <code>EXISTS<\/code><\/li>\n<li>You cannot (in the presence of nulls) transform <code>NOT<\/code> IN to <code>NOT<\/code> <code>EXISTS<\/code><\/li>\n<li><code>NOT<\/code> IN is not (in the presence of nulls) the opposite of <code>IN<\/code><\/li>\n<\/ul>\n<p>You could get some idea of what is hidden behind the <em>Null-Aware<\/em> tag by keeping an eye open for any surprises in plans, predicates, or execution statistics.<\/p>\n<p>For example, change the two tables to increase the volume of data.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">create table t1 (n1, v1) as\r\nselect \r\n        case when rownum &lt;= 200 then rownum end n1, \r\n        lpad(rownum,100) v1\r\nfrom \r\n        all_objects\r\nwhere\r\n        rownum &lt;= 50000\r\n;\r\ncreate table t2 as select n1 n2, v1 v2 from t1;\r\nset serveroutput off\r\nselect \/*+ gather_plan_statistics leading(t1 t2) *\/ \r\n        * \r\nfrom    t1 \r\nwhere   n1 not in (select n2 from t2)\r\n;\r\nselect  * from table(dbms_xplan.display_cursor(format=&gt;'rowstats last'));<\/pre>\n<p>I\u2019ve defined the two tables so that they both hold 50,000 rows, with only 200 rows with non-null values in each table. Here\u2019s the execution plan I get from my query \u2013 reported from memory after executing it with rowsource execution stats enabled:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"534\" height=\"219\" class=\"wp-image-93393\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/02\/word-image-21.png\" \/><\/p>\n<p>A key point to notice here is that the tablescan of <strong><em>t2<\/em><\/strong> stopped after only 201 rows \u2013 and a couple more tests setting other rows to null show that the scan stops as soon as Oracle finds a null in <strong><em>t2<\/em><\/strong>. This makes sense of course, because as soon as Oracle finds a null in <strong><em>t2, <\/em><\/strong>the query can\u2019t return any rows, so there\u2019s no point in continuing.<\/p>\n<p>In passing, I was a little surprised that the actual row count (<em>A-rows<\/em>) from <strong><em>t1<\/em><\/strong> was 50,000: I had assumed before I did the test that Oracle would only build the hash table from the rows where <strong><em>t1.n1<\/em><\/strong> was not null.<\/p>\n<h2>Alternative plans<\/h2>\n<p>After understanding Oracle\u2019s handling of nulls and NOT IN, it\u2019s worth taking a quick note of a couple of variations in the patterns you might see in execution plans. These variations exist because of the <em>null-aware<\/em> mechanisms as there are three possible variations with HASH JOIN ANTI operation. First, if the relevant column in the <strong><em>t2<\/em><\/strong> (i.e. subquery) table has been defined as not null (or if you add a column is not null predicate to the subquery, you get a single null aware SNA) join:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select \/*+ gather_plan_statistics leading(t1 t2) *\/  * from t1 where n1\r\nnot in (select n2 from t2 where n2 is not null)<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"548\" height=\"221\" class=\"wp-image-93394\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/02\/word-image-22.png\" \/><\/p>\n<p>Then, if you have not null declarations or predicates for the relevant columns from both tables the join becomes a normal anti join:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select \/*+ gather_plan_statistics leading(t1 t2) *\/  * from t1 where n1\r\nis not null and n1 not in (select n2 from t2 where n2 is not null)<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"520\" height=\"213\" class=\"wp-image-93395\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/02\/word-image-23.png\" \/><\/p>\n<p>Notice how the <strong><em>A-Rows<\/em><\/strong> supplied to the hash join by the <strong><em>t1<\/em><\/strong> tablescan is just the 200 where <strong><em>n1<\/em><\/strong> is not null in this case.<\/p>\n<p>Another case which looks quite counter-intuitive appears if the statistics, indexing, and <strong><em>not null<\/em><\/strong> conditions are right:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">create index t2_i2 on t2(n2);\r\nselect \/*+ gather_plan_statistics *\/  * from t1 where  n1 is not null\r\nand n1 not in (select n2 from t2)<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"582\" height=\"276\" class=\"wp-image-93396\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/02\/word-image-24.png\" \/><\/p>\n<p>I\u2019ve added a predicate to eliminate the null rows from <strong><em>t1<\/em><\/strong> and created an index on the <strong><em>n2<\/em><\/strong> column of <strong><em>t2<\/em><\/strong>, and that\u2019s had the effect of making the optimizer think that a nested loop join would be appropriate \u2013 though it is an anti join, of course, and also single null aware. However, if you look at the shape of the plan, you can see that the optimizer has started with a <strong><em>filter<\/em><\/strong> at operation 1, for which the first child is the nested loop at operation 2, and the second child is a full tablescan of <strong><em>t2<\/em><\/strong> at operation 5. Moreover, if you look at the <strong><em>Starts<\/em><\/strong> column in the plan, you\u2019ll see that operation 5 has started once, but operation 2 has never been started. This is an example of something I\u2019ve labeled the <em>fixed subquery<\/em>, and it\u2019s an example of the rare case where the last child of a filter operation can be called first to decide whether or not the first child should execute at all.<\/p>\n<p>In this case, operation 1 does convert the <code>NOT<\/code> <code>IN<\/code> subquery to a <code>NOT<\/code> <code>EXISTS<\/code> subquery (I\u2019ve expanded the predicate at operation 1 by passing the query through <strong><em>explain plan<\/em><\/strong> to show this), but this will only give the right answer if there are no rows with nulls for <strong><em>n2<\/em><\/strong> in table <strong><em>t2<\/em><\/strong>. So the run-time activity starts with the tablescan of <strong><em>t2<\/em><\/strong> at operation 5 to check whether or not there are any rows with <strong><em>n2 = null<\/em><\/strong>, and as you can see Oracle has found the first null, stopped the tablescan, and then not executed the nested loop.<\/p>\n<p>Finally, I\u2019ll just point out that <em>null-awareness<\/em> is not restricted to anti-joins \u2013 you may also see (S)NA on semi-joins, though the term Oracle uses, in this case, is <em>null accepting<\/em> rather than null aware. (The feature appeared in 12.1, though I have published a note about a case that produced the wrong results until12.2). The SQL has to be a little more complex (and I\u2019ve hinted the following example fairly heavily to produce the plan I wanted from the 50,000 row tables of the previous examples):<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">select  \/*+ \r\n                gather_plan_statistics \r\n                leading(@sel$5da710d3 t1@sel$1 t2@sel$2)\r\n                use_nl(@sel$5da710d3 t2@sel$2)\r\n                index(@sel$5da710d3 t2@sel$2)\r\n        *\/\r\n        * \r\nfrom    t1 \r\nwhere\r\n        n1 is null\r\nor      exists (select null from t2 where n2 = n1)\r\n;<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"551\" height=\"214\" class=\"wp-image-93397\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2022\/02\/word-image-25.png\" \/><\/p>\n<h1>The effects of NULL on Oracle optimizer transformations<\/h1>\n<p>You have seen IN subqueries converted to <code>EXISTS<\/code> subqueries and learned that it is not automatically safe for Oracle (or the human programmer) to convert <code>NOT<\/code> <code>IN<\/code> subqueries to <code>NOT<\/code> <code>EXISTS<\/code> subqueries thanks to problems caused by nulls in the data. In recent versions of Oracle, the optimizer has been coded to work around some of the problems, but the side effects may produce plans that have to be less efficient than you might expect and plans that might have extra operations that seem to be redundant.<\/p>\n<p>Join operations with the suffix <strong><em>NA<\/em><\/strong> (<em>null-aware\/accepting<\/em>) or <strong><em>SNA<\/em><\/strong> (<em>single null-aware\/accepting<\/em>) are clues that the plan has been adjusted to allow for columns that may hold nulls. Therefore, it could be worth checking whether it would be legal to add <em>is not null<\/em> predicates to specific queries, or even to add <code>NOT<\/code> <code>NULL<\/code> declarations to relevant columns.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>IN and NOT IN subqueries are frequently used in Oracle. Jonathan Lewis explains how the optimizer handles NULL with IN and NOT IN.&hellip;<\/p>\n","protected":false},"author":101205,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143533],"tags":[95506,4459],"coauthors":[39048],"class_list":["post-93390","post","type-post","status-publish","format-standard","hentry","category-featured","category-oracle-databases","tag-automate","tag-oracle"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/93390","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/101205"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=93390"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/93390\/revisions"}],"predecessor-version":[{"id":93571,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/93390\/revisions\/93571"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=93390"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=93390"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=93390"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=93390"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}