Unlocking the Power of FULL OUTER JOIN in SQL: Performance, Use Cases & Examples

A diagram showing the various JOIN types in SQL Server
Comments 0

Share to social media

The JOIN statement is one of the most common operations SQL developers perform. Yet in a world ruled by Inner and Left Joins, the poor Full Outer Join is like Cinderella before the ball – allowed out on only the rarest of occasions. In an (unscientific) survey I once performed of SQL developers at one Fortune 100 firm, half of the developers had never used a Full Outer Join in production code, and the rest had used it in a tiny handful of occasions.

The FULL OUTER JOIN (hereafter ‘FOJ’) is generally considered as a tool for rare corner cases such as comparing or merging different table versions, but the objective of this article is to show that its uses go beyond this, even as far as optimizing query performance in cases that most people would never consider. The FOJ is definitely an underappreciated tool. Cinderella has some tricks up her sleeve.

Understanding SQL Join Types

Let’s start with a quick review of the various join types … and how better to do that than with Venn diagrams? The image below illustrates a basic key value join between two tables:

A basic key value join between two tables.

An INNER JOIN on two tables returns all values common between them. (The ‘values’ in this case are those specified in the join condition).

A LEFT JOIN adds non-matching row from Table A, whereas a RIGHT JOIN yields the mirror image of that. Finally, the FULL OUTER JOIN returns everything: all rows from both tables are included, whether or not they match.

When to Use FULL OUTER JOIN

The most common use of the FOJ is in merging tables that contain some degree of overlap. Let’s consider a simple example: a school system database holds parents in one table and teachers in another. We want to combine these into an email list containing both — but some parents are also teachers, and some teachers are also parents. Because some rows exist in both tables, we can’t simply UNION the two result sets together. A UNION DISTINCT might work, but not only is this expensive performance-wise for larger tables, it requires that all retrieved columns be identical, which may not be the case.

With a FOJ, though, this becomes easy. Our (unrealistically simple) example tables look like this:

We’ll assume two contacts are identical if their emails match, so our join condition is on email. We also want a contact phone number, which for teachers should be their work number, and for parents, their cell number:

(If you wish to experiment with this yourself, Appendix I contains a script that creates these tables and loads sample data).

Structuring the query this way works, but gives a lot of null values — any row that doesn’t exist in both tables will contain nulls on one side:

Any row that doesn't exist in both tables will contain NULLs on one side.

If this is a problem, we can eliminate it by using COALESCE to “fold together” the left and right halves of the result set:

This yields a much more compact result set:

Using COALESCE to 'fold together' the left and right halves of the result set, yields a much more compact results set.

Fast, simple and concise: how can you not appreciate the FULL OUTER JOIN?

Real-World Examples of FULL OUTER JOIN in Action

Now let’s look at a problem for which you might not consider a FOJ. Imagine a table of vehicles held in an extended-stay parking lot. Once per day an attendant goes through the lot and inventories all the vehicles, recording the date and tag number. Our table looks like this:

The task is to write a query that returns all the vehicles that either entered or exited the lot today.  In other words, we want two sets of rows:

  • All vehicles present yesterday who are not there today
  • All vehicles present today who were not there yesterday

The problem, of course, is that we lack vehicle enter and exit dates – they must be calculated from changes in the data. A simple problem … but how best do we solve it?

Being tidy, structured SQL programmers, we begin the query with two Common Table Expressions (CTEs), one defining the “present yesterday” and the other the “present today” list:

Now what? We want a join that gives us all the rows from both tables except where they overlap. In Venn Diagram terms, we want this:

In Venn Diagram terms, we want this.

Using the canonical set operator EXCEPT, we would complete the query as:

This works, but is very slow, as it must make two passes through the table then combine the results. (Note: since the two EXCEPTs are mutually exclusive, we can use UNION ALL rather than the slower UNION DISTINCT).

Also, with this syntax we can directly select only the vehicle tag number — to get the day column, we need to wrap the EXCEPTs in a join, making the query not only cumbersome, but hard to read as well.

Many developers would use an alternate syntax: maintaining the union, but replacing the EXCEPTs with a NOT EXISTS subquery:

Again, it works, but is no faster than our first try — most SQL dialects will produce the same query plan for this as the one above. If our tables are large, a two-pass solution can be unacceptably slow.

FULL OUTER JOIN to the rescue! Look again at our Venn diagram above. What we want is a query that returns Section A and B only, excluding the Section C overlap. We can do this in a single pass using a FULL OUTER JOIN, with a WHERE clause to exclude the overlapping rows:

As we saw above, a simple SELECT * returns two sets of columns: one from each table, and one of which will always be NULL. If that’s inconvenient, we can use the same “folding” technique to eliminate the null values:

Using the random data from the script in Appendix I, this executes nearly twice as fast in SQL Server as the UNION approaches above — a significant improvement. The execution plans for each query reveal why: the FOJ query performs a single hash join on the table to itself, whereas the UNION query must perform this hash join twice (scanning our base table four times in total), then merge the results:

Query 1 (Union of Excepts) versus Query 2 (Full Outer Join).

The amount of data retrieved at each step, along with the presence of indexes, etc, will impact the actual performance difference – but there should be no case where the FOJ is slower than a UNION. Simpler is always better.

Exception Reporting

Another case where the FOJ is useful is in identifying broken or missing links in data. For our last example, consider a pair of tables in an auto enthusiasts’ database, to match owners to the vehicles they possess:

To be valid, each owner must have at least one registered vehicle, and each auto must also have an owner. Normally, referential integrity like this would be enforced via a database constraint, but let’s assume this didn’t happen for some reason, such as the data being imported from an external source. Our task is to write a query that returns both error types: owners without vehicles, and vehicles without owners.

This is another case where your first instinct likely is to be to UNION together the results of two queries: one for each error type. But again, a FOJ is faster and simpler. We’ll use a WHERE clause to include only the error cases, and, as we did above, use the COALESCE function to eliminate nulls. The actual query is very simple:

And on our small batch of test data, it yields the following output:

We've used the COALESCE function to eliminate nulls and, on our small batch of test data, it yields the output shown in the image.

Note: this is the first time in this article we’ve had a 1:M (one to many) relationship between joined tables. That’s fine: FOJ works perfectly well in this scenario. If that’s hard to visualize, think of a FOJ as a LEFT and RIGHT join combined: it returns all rows meeting the JOIN condition, as well as non-matched rows from both tables participating in the join.

Conclusion: Give FULL OUTER JOIN a Chance

FULL OUTER JOIN isn’t simply some historical anomaly of SQL, included only for completeness. Cinderella’s shoe may have only fit one foot, but the FOJ has many use cases for which it is indisputably the best tool for the job. It deserves more respect — and use — than it gets.

To tease Part II of this series, note that our Venn diagrams above apply only for joins where the join condition contains an equal sign. When we join on a different operator — the so-called “non-equi join” — the staid, conservative JOIN statement breaks out the tequila and gets wild.

Appendix I – Sample Script (SQL Server Format)


Article tags

Load comments

About the author

Lee is a 20-year industry veteran who has worked with IBM, AT&T, and many other firms. He holds a degree in physics, and more than 30 software-related patents. He currently works as a project consultant, specializing in database design and optimization. He can be reached at leeasher@myyahoo.com.