Query Optimizer and Cartesian Products

In his continuing quest to bring a deeper understanding of Query Optimizer to the world at large, Fabiano takes a moment to point out a potential pitfall you may encounter. A light read, but one worth perusing.

If you use the Query Optimizer (QO) enough, you will eventually run into this interesting observation: There are some situations where the QO just decides to remove a join instruction from a query, and that behavior, in turn, creates a Cartesian product (in other words, the same thing as a join but without the structured relationship – essentially a Cross Join). For instance, take the following example:

Both of the queries above return the same rows from the @tab1 table, but there is obviously a difference between the first instruction and the second. As we can easily see, the difference is the WHERE condition present in the second query (“a.a” must be equal to 1), and that is the key. The QO knows that there is a redundancy of filters – if “a.a” is equal to 1 and “a.a” is also equal to “b.a” (the JOIN condition), then obviously “b.a” will also be equal to 1. In that case, the QO removes the join condition and applies just the one “a = 1” filter to both “a” and “b” tables.

This behavior can be clearly seen when we look at the execution plan:

StmtText

Argument

select * from @tab1 a   inner join @tab1 b      on a.a = b.a   where a.a = 1

NULL

  |–Nested Loops(Inner Join)

NULL

       |–Table Scan(OBJECT:(@tab1 AS [b]), WHERE:(@tab1.[a] as
       [b].[a]=(1)))

OBJECT:(@tab1 AS [b]), WHERE:(@tab1.[a] as [b].[a]=(1))

       |–Table Scan(OBJECT:(@tab1 AS [a]), WHERE:(@tab1.[a] as
       [a].[a]=(1)))

OBJECT:(@tab1 AS [a]), WHERE:(@tab1.[a] as [a].[a]=(1))

 
Take notice of the argument column; the QO has generated just one argument to read both tables, and there is no join condition in the Nested Loops. That characterizes a Cartesian product, in that for each row of the table ‘a’, the SQL will link with all rows of the table ‘b’.

Ok, but is that a bad thing? In this case, yes. To have that particular query use a hash algorithm to make the join will certainly be better than a Loop Join. If you’re not sure about that, just look at the first query, remembering that they are both semantically equivalent. If you then look at the plan for that first query (below), you can see the QO has generated a Hash Join to link the table:

StmtText

Argument

select * from @tab1 a   inner join @tab1 b      on a.a = b.a

NULL

  |–Hash Match(Inner Join, HASH:([b].[a])=([a].[a]),
  RESIDUAL:(@tab1.[a] as [b].[a]=@tab1.[a] as [a].[a]))

HASH:([b].[a])=([a].[a]), RESIDUAL:(@tab1.[a] as
[b].[a]=@tab1.[a] as [a].[a])

       |–Table Scan(OBJECT:(@tab1 AS [b]))

OBJECT:(@tab1 AS [b])

       |–Table Scan(OBJECT:(@tab1 AS [a]))

OBJECT:(@tab1 AS [a])

If we then analyze the results of the IO and time statistics, we will see that the first query used less time and read fewer pages to return the data:

First query (Hash Join):
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#2E75B1C0'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
   CPU time = 328 ms,  elapsed time = 19976 ms.

  
Second query (Loop Join):
Table '#2E75B1C0'. Scan count 2, logical reads 2002, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
   CPU time = 407 ms,  elapsed time = 20377 ms.

Now, if you’re clever, you’re probably thinking, “Hmmm, in that case we could just put a Hint in to force a Hash Join, yes?” Wrong (at least at this stage…). If you try this, you will receive the follow error message:

When SQL removes the join condition, it needs to make a Cartesian product, and the QO can’t do this with a Hash or Merge Join, only with a Loop join. So in this case, the logic of the QO clashes with the hint.

What you can do to fix that? Wait.

Yes, that’s all you can do. Wait for the next version or point release of SQL Server, and trust the development team to imitate Oracle. To ‘fix’ this problem, Oracle has a handy parameter, which you can look here to learn more about. There is a Connect item talking about the problem, so we should keep watching this space.

That’s all folks.