{"id":98310,"date":"2023-10-16T00:28:42","date_gmt":"2023-10-16T00:28:42","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=98310"},"modified":"2026-04-17T13:09:34","modified_gmt":"2026-04-17T13:09:34","slug":"dont-use-distinct-as-a-join-fixer","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/dont-use-distinct-as-a-join-fixer\/","title":{"rendered":"Don&#8217;t Use DISTINCT as a SQL Join Fixer &#8211; Use EXISTS Instead"},"content":{"rendered":"\n<p><strong>Using <code>DISTINCT<\/code> to suppress duplicate rows caused by a SQL JOIN is a common antipattern. It produces correct results but forces SQL Server to generate every duplicate row first, then sort them away &#8211; adding an unnecessary sort operation to the execution plan that scales with data volume. The better approach is <code>EXISTS<\/code>: it tests whether a matching row exists without ever generating the duplicates that <code>DISTINCT<\/code> then discards, producing the same result with a leaner execution plan.<\/strong><\/p>\n\n\n\n<p>I&#8217;ve quietly resolved performance issues by re-writing slow queries to avoid <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">DISTINCT<\/code>. Often, the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">DISTINCT<\/code> is there only to serve as a &#8220;join-fixer,&#8221; and I can explain what that means using an example.<\/p>\n\n\n\n<p>Let&#8217;s say we have the following grossly simplified schema, representing customers, products, and product categories:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> CREATE TABLE dbo.Customers\n (\n   CustomerID int NOT NULL,\n   Name       nvarchar(255) NOT NULL,\n   CONSTRAINT PK_Customers PRIMARY KEY (CustomerID)\n );\n\n CREATE TABLE dbo.Categories\n (\n   CategoryID int NOT NULL,\n   Name       nvarchar(255) NOT NULL,\n   CONSTRAINT PK_Categories PRIMARY KEY (CategoryID),\n   CONSTRAINT UQ_Categories UNIQUE (Name)\n );\n\n CREATE TABLE dbo.Products\n (\n   ProductID  int NOT NULL,\n   CategoryID int NOT NULL,\n   Name       nvarchar(255) NOT NULL,\n   CONSTRAINT PK_Products PRIMARY KEY (ProductID)\n );<\/pre>\n\n\n\n<p>And then we have tables for orders and order details:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> CREATE TABLE dbo.Orders\n (\n   OrderID    int NOT NULL,\n   CustomerID int NOT NULL,\n   OrderDate  date,\n   OrderTotal decimal(12,2),\n   CONSTRAINT PK_Orders PRIMARY KEY (OrderID)\n );\n\n CREATE TABLE dbo.OrderDetails\n (\n   OrderID    int NOT NULL,\n   LineItemID int NOT NULL,\n   ProductID  int NOT NULL,\n   Quantity   int NOT NULL,\n   CONSTRAINT PK_OrderDetails PRIMARY KEY  (OrderID, LineItemID),\n   INDEX IX_OrderDetails_OrderID_ProductID (OrderID, ProductID)\n );\n<\/pre>\n\n\n\n<p>And some sample data:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> INSERT dbo.Customers (CustomerID, Name)\n   VALUES (1,N'Aaron'), (2,N'Bob');\n\n INSERT dbo.Categories (CategoryID, Name)\n   VALUES(1,N'Beauty'), (2,N'Grocery');\n\n INSERT dbo.Products (ProductID, CategoryID, Name)\n   VALUES (1,1,N'Lipstick'), (2,1,N'Mascara'), (3,2,N'Strawberries'); \n\n INSERT dbo.Orders (OrderID, CustomerID, OrderDate, OrderTotal)\n   VALUES (1,1,getdate(),32.50), (2,2,getdate(),47.05);\n\n INSERT dbo.OrderDetails (OrderID, LineItemID, ProductID, Quantity) \n   VALUES (1,1,1,5), (2,1,3,10); <\/pre>\n\n\n\n<p>Marketing says we want to send an e-mail or give a discount code to all the customers who have ordered a product from the beauty category. The initial attempt at a query for this might be something like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> SELECT c.CustomerID, c.Name\n FROM dbo.Customers AS c\n INNER JOIN dbo.Orders AS o\n   ON c.CustomerID = o.CustomerID\n INNER JOIN dbo.OrderDetails AS od\n   ON o.OrderID = od.OrderID\n INNER JOIN dbo.Products AS p\n   ON od.ProductID = p.ProductID\n INNER JOIN dbo.Categories AS cat\n   ON p.CategoryID = cat.CategoryID\n   WHERE cat.Name = N'Beauty';<\/pre>\n\n\n\n<p>The plan doesn&#8217;t look so bad (yet):<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/joinfix-p1.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/joinfix-p1.png\" alt=\"A simple plan\"\/><\/a><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>And in local or test data, the output might look right, since we may have inserted a single row into <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">OrderDetails<\/code> to match our criteria (and to make our tests pass). But what if I have ordered <em>two<\/em> products from the beauty category (in the same order, or across multiple orders)?<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> INSERT dbo.OrderDetails (OrderID, LineItemID, ProductID, Quantity)\n   VALUES(1,2,2,1);<\/pre>\n\n\n\n<p>Now the query returns that customer twice! We certainly don&#8217;t want to send them two e-mails, or issue multiple discount codes to the same customer. And the plan, on its own, can&#8217;t really provide any obvious clues that there are duplicate rows:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/joinfix-p2.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/joinfix-p2.png\" alt=\"Hidden duplicates\"\/><\/a><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>But you sure will notice if you inspect the results, or an end user will notice if you unleash this in production. The quick fix tends to be: slap a big ol&#8217; <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">DISTINCT<\/code> on there which, indeed, fixes the symptom by eliminating duplicates:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> SELECT DISTINCT c.CustomerID, c.Name\n -------^^^^^^^^\n FROM dbo.Customers AS c\n INNER JOIN dbo.Orders AS o\n   ON c.CustomerID = o.CustomerID\n INNER JOIN dbo.OrderDetails AS od\n   ON o.OrderID = od.OrderID\n INNER JOIN dbo.Products AS p\n   ON od.ProductID = p.ProductID\n INNER JOIN dbo.Categories AS cat\n   ON p.CategoryID = cat.CategoryID\n   WHERE cat.Name = N'Beauty';<\/pre>\n\n\n\n<p>But at what cost? A <a href=\"https:\/\/sqlserverfast.com\/epr\/sort\/\" target=\"_blank\" rel=\"noopener\">distinct sort<\/a>, that&#8217;s what!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/joinfix-p3.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/joinfix-p3.png\" alt=\"Pain caused by that DISTINCT\"\/><\/a><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>If I&#8217;m testing changes to this query in my local environment, and maybe just testing the output and that it returned the data quickly, I might miss clues in the plan and be pretty satisfied that adding <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">DISTINCT<\/code> fixed the issue without impacting performance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-this-will-only-get-worse-with-more-data\">This will only get worse with more data.<\/h3>\n\n\n\n<p>And while we could spend a lot of time tuning indexes on all the involved tables to make that sort hurt less, this multi-table join is always going to produce rows you never ultimately need. Think about SQL Server&#8217;s job: yes, it needs to return correct results, but it also should do that <em>in the most efficient way possible<\/em>. Reading all the data (and then sorting it), only to throw away some or most of it, is very wasteful.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-can-we-express-the-query-without-distinct\">Can we express the query without DISTINCT?<\/h3>\n\n\n\n<p>When I know I need to &#8220;join&#8221; to tables but only care about existence of rows and not any of the output from those tables, I turn to <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">EXISTS<\/code>. <\/p>\n\n\n\n<div id=\"callout-block_1ceabbc2adee0215679848d3ac1c4764\" class=\"callout alignnone\">\n    <div class=\"child-last:mb-0 child-first:mt-0 bg-gray-50 dark:bg-gray-950 p-4xl my-3xl\">\n\n<p><strong>You may also be interested in<\/strong>: <br><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/the-poor-neglected-full-outer-join\/\" target=\"_blank\" rel=\"noreferrer noopener\">FULL OUTER JOIN for cases where you genuinely need all rows from both tables<\/a><\/p>\n\n<\/div>\n<\/div> \n\n\n<p>I also try to eliminate looking up values that I know are going to be the same on every row. In this case, I don&#8217;t need to join to <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Categories<\/code> every time if <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">CategoryID<\/code> is effectively a constant.<\/p>\n\n\n\n<p>One way to express this same query, ensuring no duplicate customers and, hopefully, reducing the cost of sorting:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> DECLARE @CategoryID int;\n\n SELECT @CategoryID = CategoryID\n   FROM dbo.Categories WHERE Name = N'Beauty';\n   \n SELECT c.CustomerID, c.Name\n   FROM dbo.Customers AS c\n   WHERE EXISTS\n   (\n     SELECT 1 FROM dbo.OrderDetails AS od\n       INNER JOIN dbo.Orders AS o\n         ON od.OrderID = o.OrderID\n       INNER JOIN dbo.Products AS p\n         ON od.ProductID = p.ProductID\n       WHERE o.CustomerID = c.CustomerID\n         AND p.CategoryID = @CategoryID\n   );<\/pre>\n\n\n\n<p>There&#8217;s a simple, additional index seek against <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Categories<\/code>, of course, but the plan for the overall query has been made drastically more efficient (we&#8217;re down to 2 scans and 2 seeks)<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/joinfix-p4.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/09\/joinfix-p4.png\" alt=\"A plan using EXISTS\"\/><\/a><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Another way to express the same query is to force <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Orders<\/code> to be scanned later:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"> DECLARE @CategoryID int;\n\n SELECT @CategoryID = CategoryID\n   FROM dbo.Categories WHERE Name = N'Beauty';\n\n SELECT c.CustomerID, c.Name\n   FROM dbo.Customers AS c\n   WHERE EXISTS\n   (\n     SELECT 1 \n       FROM dbo.Orders AS o\n       WHERE o.CustomerID = c.CustomerID\n       AND EXISTS \n       (\n         SELECT 1 FROM dbo.OrderDetails AS od\n           INNER JOIN dbo.Products AS p\n            ON od.ProductID = p.ProductID\n           WHERE od.OrderID = o.OrderID\n           AND p.CategoryID = @CategoryID\n       )\n   );<\/pre>\n\n\n\n<p>This can be beneficial if you have more <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Orders<\/code> than <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Customers<\/code> (I certainly hope that&#8217;s the case). Notice in the plan that <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Orders<\/code> is scanned later, hopefully after many irrelevant orders have been filtered out.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/distinct-last-plan-fix.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/distinct-last-plan-fix.png\" alt=\"A slightly different EXISTS plan\"\/><\/a><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-conclusion\">Conclusion<\/h3>\n\n\n\n<p><code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">DISTINCT<\/code> is often hiding flaws in the underlying logic, and it can really pay off to explore other ways to write your queries without it. There was another interesting use case <a href=\"https:\/\/sqlperformance.com\/2017\/01\/t-sql-queries\/surprises-assumptions-group-by-distinct\" target=\"_blank\" rel=\"noopener\">I wrote about a few years ago<\/a> that showed how changing <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">DISTINCT<\/code> to <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">GROUP BY<\/code> &#8211; even though it carries the same semantics and produces the same results &#8211; can help SQL Server filter out duplicates earlier and have a serious impact on performance.<\/p>\n\n\n\n<div id=\"callout-block_1ceabbc2adee0215679848d3ac1c4764\" class=\"callout alignnone\">\n    <div class=\"child-last:mb-0 child-first:mt-0 bg-gray-50 dark:bg-gray-950 p-4xl my-3xl\">\n\n<p><strong>You may also be interested in:<\/strong><br><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/parameter-sniffing\/\" target=\"_blank\" rel=\"noreferrer noopener\">How parameter sniffing amplifies the cost of soft-heavy queries<\/a><\/p>\n\n<\/div>\n<\/div> \n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: Don&#039;t use DISTINCT as a &quot;join-fixer&quot;<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. Why is using DISTINCT to fix duplicate rows from a JOIN a bad practice?<\/h3>\n            <div class=\"faq-answer\">\n                <p>When DISTINCT is used on a multi-table JOIN that produces duplicates, SQL Server must generate every duplicate row before it can eliminate them. This adds a sort or hash aggregate operation to the execution plan. The work grows with data volume &#8211; every extra row in the joined tables that creates a duplicate is work that could be entirely avoided by rewriting the join logic.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. How do I rewrite a query that uses DISTINCT as a join fixer?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Use EXISTS instead of JOIN when you only need to test whether a related row exists, not retrieve values from it. Replace &#8216;SELECT DISTINCT a.col FROM TableA a JOIN TableB b ON a.id = b.id&#8217; with &#8216;SELECT a.col FROM TableA a WHERE EXISTS (SELECT 1 FROM TableB b WHERE b.id = a.id)&#8217;. The EXISTS version stops scanning as soon as it finds the first match &#8211; the JOIN version must find and return all of them.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. When is DISTINCT actually appropriate in SQL Server?<\/h3>\n            <div class=\"faq-answer\">\n                <p>DISTINCT is appropriate when duplicate rows in your result set are genuinely caused by the data itself &#8211; not by how you have written the join. For example, if two different source rows have identical values across all selected columns, DISTINCT is the right tool. The antipattern is using DISTINCT to compensate for a JOIN that is joining to a table in a way that was never intended to produce multiple rows per parent record.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">4. Does EXISTS perform better than DISTINCT in SQL Server?<\/h3>\n            <div class=\"faq-answer\">\n                <p>In most cases, yes &#8211; especially as table size grows. EXISTS uses a semi-join operator in the execution plan, which short-circuits after finding the first match. DISTINCT with a JOIN produces all matching rows and then eliminates duplicates in a separate operation. The difference is most pronounced on large tables with many matching rows per join key.<\/p>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Using DISTINCT to fix duplicate rows from a SQL join is a common antipattern. It generates all the duplicates first, then sorts them away &#8211; hurting performance. Learn how to rewrite these queries using EXISTS for cleaner, faster SQL Server results.&hellip;<\/p>\n","protected":false},"author":341115,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143531],"tags":[4183],"coauthors":[158980],"class_list":["post-98310","post","type-post","status-publish","format-standard","hentry","category-featured","category-t-sql-programming-sql-server","tag-t-sql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98310","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/341115"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=98310"}],"version-history":[{"count":34,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98310\/revisions"}],"predecessor-version":[{"id":110014,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98310\/revisions\/110014"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=98310"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=98310"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=98310"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=98310"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}