{"id":717,"date":"2009-10-22T00:00:00","date_gmt":"2009-10-22T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/query-optimizer-and-cartesian-products\/"},"modified":"2021-09-29T16:22:03","modified_gmt":"2021-09-29T16:22:03","slug":"query-optimizer-and-cartesian-products","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/query-optimizer-and-cartesian-products\/","title":{"rendered":"Query Optimizer and Cartesian Products"},"content":{"rendered":"<p class=\"START\">If you use the Query Optimizer (QO) enough, you will eventually run into this interesting observation: There are some situations where the QO just decides to remove a join instruction from a query, and that behavior, in turn, creates a Cartesian product (in other words, the same thing as a join but without the structured relationship &#8211; essentially a Cross Join). For instance, take the following example:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">USE tempdb\r\nGO\r\nDECLARE @tab1 TABLE(a INT)\r\nINSERT INTO @tab1\r\n\u00a0\u00a0SELECT TOP 1000 1\r\n\u00a0\u00a0FROM sysobjects b, sysobjects a\r\nSET STATISTICS io ON\r\nSET STATISTICS time ON\r\nSELECT * \r\n\u00a0\u00a0FROM @tab1 a\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0INNER JOIN @tab1 b\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ON a.a = b.a\r\nSELECT * \r\n\u00a0\u00a0FROM @tab1 a\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0INNER JOIN @tab1 b\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ON a.a = b.a\r\n\u00a0\u00a0WHERE a.a = 1\r\nSET STATISTICS io OFF\r\nSET STATISTICS time OFF <\/pre>\n<p>Both of the queries above return the same rows from the @tab1 table, but there is obviously a difference between the first instruction and the second. As we can easily see, the difference is the WHERE condition present in the second query (&#8220;<em>a.a<\/em>&#8221; must be equal to 1), and <em>that<\/em> is the key. The QO <em>knows<\/em> that there is a redundancy of filters &#8211; if &#8220;<em>a.a<\/em>&#8221; is equal to 1 and &#8220;<em>a.a<\/em>&#8221; is <em>also<\/em> equal to &#8220;<em>b.a<\/em>&#8221; (the JOIN condition), then obviously &#8220;<em>b.a<\/em>&#8221; will also be equal to 1. In that case, the QO removes the join condition and applies just the one &#8220;a = 1&#8221; filter to both &#8220;<em>a<\/em>&#8221; and &#8220;<em>b<\/em>&#8221; tables.<\/p>\n<p>This behavior can be clearly seen when we look at the execution plan:<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<p><strong>StmtText <\/strong><\/p>\n<\/td>\n<td>\n<p><strong>Argument <\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>select * from @tab1 a\u00a0\u00a0 inner join @tab1 b\u00a0\u00a0\u00a0\u00a0\u00a0 on a.a = b.a\u00a0\u00a0 where a.a = 1<\/p>\n<\/td>\n<td>\n<p>NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>\u00a0 |&#8211;Nested Loops(Inner Join)<\/p>\n<\/td>\n<td>\n<p>NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |&#8211;Table Scan(OBJECT:(@tab1 AS [b]), WHERE:(@tab1.[a] as<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [b].[a]=(1)))<\/p>\n<\/td>\n<td>\n<p>OBJECT:(@tab1 AS [b]), WHERE:(@tab1.[a] as [b].[a]=(1))<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |&#8211;Table Scan(OBJECT:(@tab1 AS [a]), WHERE:(@tab1.[a] as<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [a].[a]=(1)))<\/p>\n<\/td>\n<td>\n<p>OBJECT:(@tab1 AS [a]), WHERE:(@tab1.[a] as [a].[a]=(1))<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u00a0 <br \/>\n Take notice of the argument column; the QO has generated just one argument to read both tables, and there is no join condition in the Nested Loops. That characterizes a Cartesian product, in that for each row of the table &#8216;a&#8217;, the SQL will link with all rows of the table &#8216;b&#8217;.<\/p>\n<p>Ok, but is that a bad thing? In this case, yes. To have that particular query use a hash algorithm to make the join will certainly be better than a Loop Join. If you&#8217;re not sure about that, just look at the first query, remembering that they are both semantically equivalent. If you then look at the plan for that first query (below), you can see the QO has generated a Hash Join to link the table:<\/p>\n<table>\n<tbody>\n<tr>\n<td valign=\"bottom\">\n<p><strong>StmtText<\/strong><\/p>\n<\/td>\n<td valign=\"bottom\">\n<p><strong>Argument<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>select * from @tab1 a\u00a0\u00a0 inner join @tab1 b\u00a0\u00a0\u00a0\u00a0\u00a0 on a.a = b.a<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>\u00a0 |&#8211;Hash Match(Inner Join, HASH:([b].[a])=([a].[a]),<br \/>\n \u00a0 RESIDUAL:(@tab1.[a] as [b].[a]=@tab1.[a] as [a].[a]))<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>HASH:([b].[a])=([a].[a]), RESIDUAL:(@tab1.[a] as<br \/>\n [b].[a]=@tab1.[a] as [a].[a])<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |&#8211;Table Scan(OBJECT:(@tab1 AS [b]))<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>OBJECT:(@tab1 AS [b])<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |&#8211;Table Scan(OBJECT:(@tab1 AS [a]))<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>OBJECT:(@tab1 AS [a])<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>If we then analyze the results of the IO and time statistics, we will see that the first query used less time and read fewer pages to return the data:<\/p>\n<p><strong>First query (Hash Join):<\/strong> <br \/>\n <code>Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. <br \/>\nTable '#2E75B1C0'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. <br \/>\n\u00a0 <br \/>\nSQL Server Execution Times: <br \/>\n\u00a0\u00a0 CPU time = 328 ms,\u00a0 elapsed time = 19976 ms. <\/code><br \/>\n \u00a0\u00a0 <br \/>\n <strong>Second query (Loop Join):<\/strong> <br \/>\n <code>Table '#2E75B1C0'. Scan count 2, logical reads 2002, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. <br \/>\n\u00a0 <br \/>\nSQL Server Execution Times: <br \/>\n\u00a0\u00a0 CPU time = 407 ms,\u00a0 elapsed time = 20377 ms.<\/code><\/p>\n<p>Now, if you&#8217;re clever, you&#8217;re probably thinking, &#8220;<em>Hmmm, in that case we could just put a Hint in to force a Hash Join, yes?<\/em>&#8221; Wrong (at least at this stage&#8230;). If you try this, you will receive the follow error message:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT * \r\n\u00a0\u00a0FROM @tab1 a\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0INNER JOIN @tab1 b\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ON a.a = b.a\r\n\u00a0\u00a0WHERE a.a = 1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0OPTION(hash JOIN)\r\n\r\nMsg 8622, Level 16, State 1, Line 20 \r\nQuery processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.<\/pre>\n<p>When SQL removes the join condition, it <em>needs<\/em> to make a Cartesian product, and the QO can&#8217;t do this with a Hash or Merge Join, only with a Loop join. So in this case, the logic of the QO clashes with the hint.<\/p>\n<p>What you can do to fix that? Wait.<\/p>\n<p>Yes, that&#8217;s all you can do. Wait for the next version or point release of SQL Server, and trust the development team to imitate Oracle. To &#8216;fix&#8217; this problem, Oracle has a handy parameter, which you can look <a href=\"http:\/\/jonathanlewis.wordpress.com\/2006\/12\/13\/cartesian-merge-join\/\">here<\/a> to learn more about. There <em>is<\/em> a <a href=\"https:\/\/connect.microsoft.com\/\">Connect<\/a> item talking about the problem, so we should keep watching this space.<\/p>\n<p>That&#8217;s all folks.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In his continuing quest to bring a deeper understanding of Query Optimizer to the world at large, Fabiano takes a moment to point out a potential pitfall you may encounter. A light read, but one worth perusing.&hellip;<\/p>\n","protected":false},"author":65554,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4178,5076,4150,4252],"coauthors":[6809],"class_list":["post-717","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-bi","tag-query-optimizer","tag-sql","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/717","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/65554"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=717"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/717\/revisions"}],"predecessor-version":[{"id":75328,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/717\/revisions\/75328"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=717"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=717"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=717"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=717"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}