{"id":1032,"date":"2010-11-22T00:00:00","date_gmt":"2010-11-22T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/showplan-operator-of-the-week-merge-join\/"},"modified":"2021-08-16T15:02:11","modified_gmt":"2021-08-16T15:02:11","slug":"showplan-operator-of-the-week-merge-join","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/learn\/showplan-operator-of-the-week-merge-join\/","title":{"rendered":"ShowPlan Operator of the Week &#8211; Merge Join"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\">Hello dear readers: We&#8217;re once more going to talk about ShowPlan operators. Over the past few weeks and months, we&#8217;ve featured the ShowPlan operators that are used by SQL Server to build the query plan. If you&#8217;re just getting started with my Showplan series, you can find a list of all my articles so far <a href=\"http:\/\/www.simple-talk.com\/author\/fabiano-amorim\/\">here<\/a>.<\/p>\n<p>For quite a while, I&#8217;ve wanted to talk about the<b> join<\/b> operators (loop, merge and hash), but I always wondered whether, if I wrote on this subject, you&#8217;d find it interesting. Why should I have thought that? It is because there are already a lot of blog posts and articles on this subject. Despite that, I just couldn&#8217;t miss having those operators in my series, so I really hope that you like my approach on this topic. I&#8217;ll start by featuring the Merge Join operator. I don&#8217;t have to follow the usual order, Loop, Merge and Hash, so I&#8217;ll start with the Merge.<\/p>\n<h1>Introduction<\/h1>\n<p>Once when I was presenting a lecture on this subject, I asked the audience if there was anyone in the room that had a database with only one table. To my surprise, one person raised his hand: I was stunned into silence, to the amusement of the audience.<\/p>\n<p>Simply put, a join is an operation that links one table to another, and SQL Server can use three algorithms to perform this operation, the Loop, Merge and Hash.<\/p>\n<p>The merge join performs an inner join, an outer join or in some cases even a union operation. The merge join is very fast because it requires that both inputs are already sorted by the respective key columns. To do an analogy with the joins I&#8217;ll use the same example that I like to show in my presentations.<\/p>\n<p>To illustrate the Merge Join behavior, I&#8217;ll start by creating two tables, one called &#8220;Cursos&#8221; (which means &#8216;Courses&#8217; in Portuguese) and one table called &#8220;Alunos&#8221; (which means &#8220;Students&#8221; in Portuguese). The following script will create the tables and populate them with some garbage data:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">USE tempdb  GO\r\nIF OBJECT_ID('Alunos') IS NOT NULL\r\nBEGIN\r\n\u00a0 DROP TABLE Alunos\r\n\u00a0 DROP TABLE Cursos\r\nEND\r\nGO\r\nCREATE TABLE Cursos (ID_Cursos INT PRIMARY KEY, Nome_Curso VARCHAR(80))\r\nCREATE TABLE Alunos (ID_Alunos INT PRIMARY KEY, Nome_Aluno VARCHAR(80), ID_Cursos INT)\r\nY, Nome_Curso VARCHAR(80))\r\nGO\r\n\u00a0\r\nINSERT INTO Cursos (ID_Cursos, Nome_Curso)\u00a0 VALUES (1, 'Medicina')\r\nINSERT INTO Cursos (ID_Cursos, Nome_Curso)\u00a0 VALUES (2, 'Educa\u00c3\u00a7\u00c3\u00a3o F\u00c3\u00adsica')\r\nINSERT INTO Cursos (ID_Cursos, Nome_Curso)\u00a0 VALUES (3, 'Sistemas de Informa\u00c3\u00a7\u00c3\u00a3o')\r\nINSERT INTO Cursos (ID_Cursos, Nome_Curso)\u00a0 VALUES (4, 'Engenharia')\r\nINSERT INTO Cursos (ID_Cursos, Nome_Curso)\u00a0 VALUES (5, 'F\u00c3\u00adsica Quantica')\r\nINSERT INTO Cursos (ID_Cursos, Nome_Curso)\u00a0 VALUES (6, 'Paisagismo')\r\nGO\r\n\u00a0\r\nINSERT INTO Alunos (ID_Alunos, Nome_Aluno, ID_Cursos)\u00a0 VALUES (1, 'Fabiano Amorim', 2)\r\nINSERT INTO Alunos (ID_Alunos, Nome_Aluno, ID_Cursos)\u00a0 VALUES (2, 'Laerte Junior', 6)\r\nINSERT INTO Alunos (ID_Alunos, Nome_Aluno, ID_Cursos)\u00a0 VALUES (3, 'Fabricio Catae', 5)\r\nINSERT INTO Alunos (ID_Alunos, Nome_Aluno, ID_Cursos)\u00a0 VALUES (4, 'Thiago Zavaschi', 3)\r\nINSERT INTO Alunos (ID_Alunos, Nome_Aluno, ID_Cursos)\u00a0 VALUES (5, 'Diego Nogare', 4)\r\nINSERT INTO Alunos (ID_Alunos, Nome_Aluno, ID_Cursos)\u00a0 VALUES (6, 'Rodolfo Roim', 3)\r\nINSERT INTO Alunos (ID_Alunos, Nome_Aluno, ID_Cursos)\u00a0 VALUES (7, 'Rodrigo Fernandes', 5)\r\nINSERT INTO Alunos (ID_Alunos, Nome_Aluno, ID_Cursos)\u00a0 VALUES (8, 'Nilton Pinheiro', 4)\r\n<\/pre>\n<p>This is what the data looks like:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT * FROM Alunos  SELECT * FROM Cursos\r\n<\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1182-Fabiano1.jpg\" alt=\"1182-Fabiano1.jpg\" \/><\/p>\n<p>As you can see, in the table <b>Alunos <\/b>I&#8217;ve a column <b>ID_Cursos<\/b> that links to the Course that the Student is signed up for.<\/p>\n<p>Now you know about the tables and the data. Before we go on with the script stuff, I&#8217;d like to show you a picture with the same tables but using something that we are more used to seeing in our real life.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1182-Fabiano2.jpg\" alt=\"1182-Fabiano2.jpg\" \/><\/p>\n<p>I&#8217;m sure you already saw one of this card indexes in a movie rental store, a library, your dental clinic or even in your college. Before databases became commonplace, this kind of material was very popular and very useful, but in the past few years we have evolved IT systems (thanks to God for that) and now we don&#8217;t need to store our customers&#8217; info into this kind of table since we have SQL Server to do that job.<\/p>\n<p>You&#8217;ll notice that the card index <b>Cursos<\/b> is sorted by <b>ID_Curso<\/b> and the card index<b> Alunos<\/b> is ordered by <b>Name<\/b>.<\/p>\n<p>If you are using the card index and I ask you &#8220;Please, could you tell me what is the Fabiano&#8217;s course?&#8221; , what is the first thing you have to do? First of all you have to scan the Aluno&#8217;s card index to find record with the info about the student Fabiano, then you have to look at the field <b>ID_Cursos <\/b>to know the number of the course, finally you have to search at the card index<b> Cursos<\/b> to find the course relative to the ID you found into the Aluno&#8217;s card.<\/p>\n<p>You just did a join!<\/p>\n<p>Now what if I change my question to this? &#8216;Please, can you give me the details about all students and all their courses? &#8216;. What would you have to do then?<\/p>\n<p>One alternative would entail starting to scan the card index of courses and, for each course you read, you scan the card index for the students that relate to the course, and you would repeat until all the courses are read. That means that, for each course, you would have to scan the whole card index with the students&#8217; info.<\/p>\n<p>Now, can you tell me what will happen if, before we start the process of reading the students card index, I manually re-order the student card index from the Student Name to <b>ID_Cursos<\/b> order?<\/p>\n<p>If I do this, I can just read one Course (which is already sorted by <b>ID_Cursos<\/b>) and read just a small part of the Students card index, because now the <b>ID_Cursos<\/b> = 1 will be the first occurrence I&#8217;ll read in the Course card, and the <b>ID_Course<\/b> = 1 will also be the first card in the Students card index. Did you understand that you don&#8217;t need to scan the Students card index anymore?<\/p>\n<p>This is exactly what SQL Server does with the Merge Join algorithm, it takes advantage of the column orders and performs a very fast join. It will read the rows and, if one of inputs get to the end, it just stops the read, because that means that all possible rows have been joined.<\/p>\n<h1>Sort Merge Join<\/h1>\n<p>As you can see, both sides of the join must be sorted by the key columns before you can use the Merge Join. Sometimes the Query Optimizer can explicitly force a <a href=\"http:\/\/www.simple-talk.com\/sql\/learn-sql-server\/showplan-operator-of-the-week---sort\/\">sort operator<\/a> to precede the merge join operator, or it can read the rows from an index that is already sorted in the expected order.<\/p>\n<p>The following query is using a hint to force the use of the merge join operator.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT Alunos.Nome_Aluno, Cursos.Nome_Curso  FROM Alunos\r\nINNER JOIN Cursos\r\nON Alunos.ID_Cursos = Cursos.ID_Cursos\r\nOPTION (MERGE JOIN)\r\n<\/pre>\n<p>For the query above we&#8217;ve the following execution plan:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1182-Fabiano3.jpg\" alt=\"1182-Fabiano3.jpg\" \/><\/p>\n<p>You can see in the execution plan that the Query Optimiser explicitly sorted the table<b> Alunos<\/b> by the <b>ID_Cursos<\/b> column so as to use the Merge Join , and the highest cost of the query is the <a href=\"http:\/\/www.simple-talk.com\/sql\/learn-sql-server\/showplan-operator-of-the-week---sort\/\">Sort operator<\/a>.<\/p>\n<p>We could avoid this step just by creating an index into the table<b> Alunos<\/b> by the <b>ID_Cursos<\/b> column. The following creates the index:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE NONCLUSTERED INDEX ix_ID_Curso ON Alunos(ID_Cursos)<\/pre>\n<p>Now let&#8217;s look at the execution plan again.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT Alunos.Nome_Aluno, Cursos.Nome_Curso  FROM Alunos\r\nINNER JOIN Cursos\r\nON Alunos.ID_Cursos = Cursos.ID_Cursos\r\nOPTION (MERGE JOIN)\r\n<\/pre>\n<p>For the query above we&#8217;ve the following execution plan:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1182-Fabiano4.jpg\" alt=\"1182-Fabiano4.jpg\" \/><\/p>\n<p>Wow no! Now the plan is using the index and we don&#8217;t need the sort anymore, but we got a plan using a <a href=\"http:\/\/www.simple-talk.com\/sql\/learn-sql-server\/showplan-operator-of-the-week---bookmarkkey-lookup\/\">Key Lookup Operator<\/a>. Yes, that&#8217;s happened because we are asking for the <b>Cursos.Nome_Curso<\/b> column, and we didn&#8217;t include the column into the index, so let&#8217;s try creating a new index including the column <b>Nome_Curso<\/b>.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE NONCLUSTERED INDEX ix_ID_Curso_Nome_Aluno ON Alunos(ID_Cursos) INCLUDE (Nome_Aluno)<\/pre>\n<p>Let&#8217;s look at the execution plan again.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT Alunos.Nome_Aluno, Cursos.Nome_Curso  FROM Alunos\r\nINNER JOIN Cursos\r\nON Alunos.ID_Cursos = Cursos.ID_Cursos\r\nOPTION (MERGE JOIN)\r\n<\/pre>\n<p>For the query above we&#8217;ve the following execution plan:<\/p>\n<p>Now, everything is perfect.<\/p>\n<h1>Residual Predicate<\/h1>\n<p>Another thing that the Query Optimiser can do, using the Merge Join operator, is to filter a new predicate into the Merge that is not part of the join of the tables, for instance:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT Alunos.Nome_Aluno, Cursos.Nome_Curso  FROM Alunos\r\nLEFT OUTER JOIN Cursos\r\nON Alunos.ID_Cursos = Cursos.ID_Cursos\r\nAND Alunos.Nome_Aluno LIKE 'F%'\r\nOPTION (MERGE JOIN)\r\n<\/pre>\n<p>After checking whether rows from <b>Alunos<\/b> join with <b>Cursos<\/b>, SQL Server can also check whether the <b>Nome_Aluno<\/b> starts with the letter F. To see the residual predicate you can click with the right button on the Merge join Operator and then select &#8216;Properties&#8217;.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1182-Fabiano6.jpg\" alt=\"1182-Fabiano6.jpg\" \/><\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1182-Fabiano7.jpg\" alt=\"1182-Fabiano7.jpg\" \/><\/p>\n<h1>One to Many and Many to Many Merge Join<\/h1>\n<p>To show you this concept of One to Many, I would like to present a pseudo code of the merge join algorithm. All credits to Craig Freedman that wrote this code into his book.<\/p>\n<p>The code is the following:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">Get first row Cursos from input 1  Get first row Alunos from input 2\r\nWhile not at the end of either input\r\nBegin\r\n\u00a0 If Cursos joins with Alunos\r\n\u00a0 Begin\r\n\u00a0\u00a0\u00a0 Output(Cursos, Alunos)\r\n\u00a0\u00a0\u00a0 Get next row Alunos from input 2\r\n\u00a0 end\r\n\u00a0 else if Cursos &lt; Alunos\r\n\u00a0\u00a0\u00a0 get next row Cursos from input 1\r\n\u00a0 else\r\n\u00a0\u00a0\u00a0 get next row Alunos form input 2\r\nend\r\n<\/pre>\n<p>This is a very interesting code and this code works just for the One to Many joins (read more about it <a href=\"http:\/\/blogs.msdn.com\/b\/craigfr\/archive\/2006\/08\/03\/687584.aspx\">here<\/a>).<\/p>\n<p>When the Merge Join operator was executed using the Many to One algorithm you can see what happened when the property &#8220;Many to Many&#8221; of the operator is false. To see the &#8220;many to many&#8221; property you can click with the right button into the Merge join Operator and then select &#8216;Properties&#8217;.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1182-Fabiano8.jpg\" alt=\"1182-Fabiano8.jpg\" \/><\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1182-Fabiano9.jpg\" alt=\"1182-Fabiano9.jpg\" \/><\/p>\n<h1>Finally<\/h1>\n<p>I would like to thank you in advance, and ask you to write a comment giving your thoughts on this article, in my last article I received very good comments, and I really appreciate all your feedback, so please keep then coming.<\/p>\n<p>That&#8217;s all folks, I hope you&#8217;ve enjoyed learning about the Merge Join operator, and I&#8217;ll see you soon with more &#8220;Showplan Operators&#8221;.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Did you ever wonder how and why your indexes affect the performance of joins? Once you&#8217;ve read Fabiano&#8217;s unforgettable explanation, you&#8217;ll learn to love the MERGE operator, and plan your indexes so as to allow the Query Optimiser to use it.  <\/p>\n<p>&hellip;<\/p>\n","protected":false},"author":65554,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143525],"tags":[4178,5177,4149,5084,5162,5310,4150],"coauthors":[6809],"class_list":["post-1032","post","type-post","status-publish","format-standard","hentry","category-learn","tag-bi","tag-fabiano-amorim","tag-learn-sql-server","tag-optimiser","tag-showplan","tag-showplan-operator-of-the-week","tag-sql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1032","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/65554"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1032"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1032\/revisions"}],"predecessor-version":[{"id":74794,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1032\/revisions\/74794"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1032"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1032"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1032"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1032"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}