As I am starting to prepare for an upcoming blog series on database designs and associated query patterns (stay tuned!) I was thinking about what query keywords I don’t really know well enough. ANY and ALL were two keywords that immediately popped into my mind that I had never really used (and I probably should have.)
ANY and ALL are used as options on scalar comparison operators like =, !=, >, etc to allow them to be used with multi-row sub-queries.
You know if you do something like the following:
select *
from Sales.SalesOrderHeader
where SalesOrderId = ( select SalesOrderId
from Sales.SalesOrderDetail
where SalesOrderId in (43659, 43660, 43661))
You will get the following error:
Msg 512, Level 16, State 1, Line 11
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
However, you can change this to be:
select *
from Sales.SalesOrderHeader
where SalesOrderId = ANY ( select SalesOrderId
from Sales.SalesOrderDetail
where SalesOrderId in (43659, 43660, 43661))
And now each value in the left input will be checked against the set on the right.
Admittedly, this isn’t that useful, as = ANY is basically equivalent to IN, as in the following statement. Heck, I did use IN in the example to get the 3 rows for the test for a reason:
select *
from Sales.SalesOrderHeader
where SalesOrderId in (43659, 43660, 43661)
One big difference between IN and = ANY, is that it only works with subqueries so this will not work
select *
from Sales.SalesOrderHeader
where SalesOrderId = ANY (43659, 43660, 43661)
Will result in:
Msg 102, Level 15, State 1, Line 3
Incorrect syntax near ‘43659’.
The next thing you might think (I did) is that if = ANY is the same as IN, then != ANY must be the same as NOT in, right? No, because the != ANY is the operator, and you have to think about what “Does not equal ANY” means. Actually the following query will return every row in the table:
select *
from Sales.SalesOrderHeader
where SalesOrderId != ANY (select SalesOrderId
from Sales.SalesOrderDetail
where SalesOrderId in 43659, 43660, 43661))
Why? What the “!=ANY” operator says is to return a row that the row values doesn’t match any subquery row value, so if even one row doesn’t match, then it will be successful. So take 43659. It does match 43659, but it doesn’t match the other 2 rows. Or as this page in Technet puts it, NOT IN says != value1 and != value2 and != value3, where != ANY is != value1 or != value2 or != value3.
In this case, you could use the ALL keyword, which says to check the value against every value returned, and the value returned must match all values (Not tremendously interesting with an equal operator, but essential to understanding the operators), but if any row matches, it doesn’t match all of them. When no rows are returned by the subquery, it returns true. So the following (with the subquery negated with the 1=2 will return all rows in the table:
select *
from Sales.SalesOrderHeader
where SalesOrderId = ALL (select SalesOrderId
from Sales.SalesOrderDetail
where SalesOrderId in (43659, 43660, 43661)
and 1=2)
But, if this is the case, then != ALL should return = rows, right? Wrong.
select *
from Sales.SalesOrderHeader
where SalesOrderId != ALL (select SalesOrderId
from Sales.SalesOrderDetail
where SalesOrderId in (43659, 43660, 43661)
and /* again */ 1=2)
Both return the exact same set of rows. The problem (and why I could wrap my head around these operators) is that the operator is truly = ALL and != ALL. So = ALL says that you must match ALL values, but != ALL means that you must be different than all individual values. So in the following, it will return all rows in the table except the three values from the subquery:
select *
from Sales.SalesOrderHeader
where SalesOrderId != ALL (select SalesOrderId
from Sales.SalesOrderDetail
where SalesOrderId in (43659, 43660, 43661))
What I particularly like about the ANY and ALL keywords, is that they are named in a very straightforward manner, once you get the idea of what they do, but as noted, it wasn’t necessarily intuitive to start with for me. On a practical side, what if we want to compare one value to all of the values in a related set. In my example (using ye olde AdventureWorks database), I want to see what orders do not have any sales order items where the UnitPrice is less than 500. A classic way of making this comparison would be to use an aggregate on the salesOrderItems to get the max UnitPrice for all items and use it as a filter:
select *
from Sales.SalesOrderHeader
where SalesOrderId in ( select SalesOrderId
from Sales.SalesOrderDetail
group by SalesOrderId
having max(UnitPrice) <= 500)
It is a technique I have used many times over. But, really, what would be easier would be to check each item against the scalar value, rather than doing an aggregate. Using the >= ALL operator, we can do this in a direct method. Each row is compared to the value on the left side of the operator.
select *
from Sales.SalesOrderHeader
where 500 >= ALL ( select UnitPrice
from Sales.SalesOrderDetail
where SalesOrderHeader.SalesOrderId = SalesOrderDetail.SalesOrderId)
Using AdventureWorks2014 as it is, freshly downloaded, the performance for both queries is very much evenly matched, based on the plan:
And Statistics IO and Time are very much similar as well. In my tries, the IN version took a few milliseconds more than than the >= ALL version. But note that the >= ALL query suggested an index. This is the index (with a slightly better name!):
CREATE NONCLUSTERED INDEX unitPrice_include_salesOrderId
ON [Sales].[SalesOrderDetail] ([UnitPrice]) INCLUDE ([SalesOrderID])
From there, you will see a fairly large (though not necessarily, amazing) improvement:
The CPU and reads were quite a bit better, with the >= ALL version needing:
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahea
Table ‘SalesOrderHeader’. Scan count 1, logical reads 689, physical reads 0,
Table ‘SalesOrderDetail’. Scan count 1, logical reads 96, physical reads 0, r
SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 318 ms.
While the IN version needed:
Table ‘SalesOrderHeader’. Scan count 1, logical reads 689, physical reads 0,
Table ‘SalesOrderDetail’. Scan count 1, logical reads 1246, physical reads 0,
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 361 ms.
The elapsed times were similar, but there was an improvement over pretty much equal as well.
To test to make sure you have the correct answer we can use the following query to look at the values that are returned. It takes the basic query and looks at the data for each of the salesOrderDetail rows for a salesOrder:
select SalesOrderId, max(UnitPrice) as MaxUnitPrice, min(UnitPrice) as MinUnitPrice
from sales.SalesOrderDetail
where salesOrderId = ANY ( select SalesOrderId
from Sales.SalesOrderHeader
where 500 >= ALL (select UnitPrice
from Sales.SalesOrderDetail
where SalesOrderHeader.SalesOrderId =
SalesOrderDetail.SalesOrderId)
)
group by salesOrderId
order by MaxUnitPrice desc, MinUnitPrice desc
Which will return a result set like:
SalesOrderId MaxUnitPrice MinUnitPrice
———— ——————— ———————
46641 469.794 469.794
46934 469.794 469.794
46963 469.794 469.794
47022 469.794 469.794
47048 469.794 469.794
…
73273 2.29 2.29
73040 2.29 2.29
51782 1.374 1.374
53564 1.374 1.374
65214 1.374 1.374
Which you can see, all of the max values are < 500 (So 500 is greater than all of the values).
I haven’t done a tremendous amount of performance testing (as you can tell), but it is clear to me that ANY and ALL have the propensity to be of some value in queries on occasion (and not just to get a blog entry posted so I don’t vanish from the SQLBlog role again this month.
Load comments