Anatomy Of A SELECT Statement – Part 3: Recursive Subquery Factoring

We need a baritone voiceover man. You know how those huge TV shows – 24, Prison Break, Jane the Virgin – always start with a recap sequence to bring you up-to-date in case you’d missed the last episode. And it’s always a deep-voiced male narrator: “Previously on The Walking Dead,” he’ll say. We need that guy. Previously on Anatomy of a… Continue Reading →

We need a baritone voiceover man. You know how those huge TV shows – 24, Prison Break, Jane the Virgin – always start with a recap sequence to bring you up-to-date in case you’d missed the last episode. And it’s always a deep-voiced male narrator: “Previously on The Walking Dead,” he’ll say.

We need that guy.

Previously on Anatomy of a SELECT Statement, we’ve discussed the subquery factoring clause (aka the With Clause), and we’ve discussed the hierarchical query clause (aka connect by). In this episode we’ll be marrying these two topics and talking about recursive subquery factoring.

Recursive subquery factoring. Ah, they love their scary, long names, don’t they? Basically, it is the ability to carry out hierarchical queries using with clauses (which is why it’s sometimes called the recursive with clause). And there’s a good reason for it too. Cos as simple and useful as the connect by syntax is, it is an Oracle-specific solution. However, recursive subquery factoring is ANSI SQL, which means it should pretty much work in any database that speaks the SQL lingua franca. It’s been part of Oracle since 11gR2 and, while it might take a little getting used to, it provides all the same functionality as connect by.

Let’s take a look.

The Basics

Let’s revisit the EMP table and run a query that’ll return the management structure.

SELECT ename "Employee", empno "Employee ID", mgr "Manager ID"
FROM emp
CONNECT BY PRIOR empno = mgr
START WITH mgr IS NULL;

Using the recursive with clause syntax we can obtain the same result. I’ll show you how, and afterwards we’ll study the query in a little detail.

WITH m (empno, "Employee", mgr) AS
 (SELECT empno, ename, mgr
  FROM emp
  WHERE mgr IS NULL
  UNION ALL
  SELECT e.empno, e.ename, m.empno
  FROM emp e, m
  WHERE e.mgr = m.empno)
SELECT * FROM m;

Okay, so it’s a little more verbose, which might seem like a bad thing if you’re used to terse connect by queries. But let’s put that to one side for now, and point out a few things  that we need to know and note.

The with clause must be followed by a list of column or column aliases (in our example it is empno, “Employee”, mgr). This isn’t a requirement for ordinary with clauses, but you’ll get an ORA-32039 error if you omit it when running a recursive query.

The first block is called the anchor member. This is the part that roots your query; it is the equivalent of your START WITH. However, unlike your START WITH, the anchor member is not optional. You must have one.

The second block, which must be cemented to the first by a UNION ALL, is the recursive member. What makes it particularly interesting is that it is joined not to the anchor member, but to the query alias, m in our example. It must always do this (reference the query alias); whereas the anchor member must never reference the query alias. Additionally, the recursive member must reference the query alias no more than once.

There are a few other restrictions, many of them are obvious. The anchor member must always precede the recursive member; they must always have the same number of columns; the recursive member cannot contain a DISTINCT or a GROUP BY.

The Search Clause

The connect by syntax allows for a specialist order by clause, ORDER SIBLINGS BY. What this does, if you don’t remember, is sort returned rows of the same level. The recursive subquery factoring equivalent for this is the search clause. It comes in two flavours: SEARCH DEPTH FIRST and SEARCH BREADTH FIRST. Search Depth First will return the children before the siblings; Search Breadth First does the reverse.

This is how you use it.

WITH m (empno, Employee, mgr) AS
  (SELECT empno, ename, mgr
   FROM emp
   WHERE mgr IS NULL
   UNION ALL
   SELECT e.empno, e.ename, m.empno
   FROM emp e, m
   WHERE e.mgr = m.empno)
SEARCH DEPTH FIRST by empno ASC SET ordseq
SELECT Employee, empno "Employee ID", mgr "Manager ID"
FROM m
ORDER BY ordseq;

In this example, ordseq is a pseudocolumn defined, in this case, as empno ASC. It is later used to order our resultset.

Running this query, as opposed to the earlier unsorted one, will show the results in a more organised version. To make that fact clearer, it would be useful if we had the equivalent of the LEVEL pseudocolumn; fortunately, we can construct one.

WITH m (empno, Employee, mgr, lvl) AS
 (SELECT empno, ename, mgr, 1 lvl
  FROM emp
  WHERE mgr IS NULL
  UNION ALL
  SELECT e.empno, e.ename, m.empno, lvl + 1
  FROM emp e, m
  WHERE e.mgr = m.empno)
SEARCH DEPTH FIRST by empno ASC SET ordseq
SELECT Employee, empno "Employee ID", mgr "Manager ID", lvl "Level"
FROM m
ORDER BY ordseq;
CYCLE PATH

If we have circular data (King is Scott’s boss while Scott is simultaneously King’s boss), we can save our connect by query from endlessly chasing its tail like an overexcited puppy by including the NOCYCLE keyword. (The truth is a little more prosaic; if we have circular data, Oracle would spit out the ORA-01436 error.)

With recursive subquery factoring, things are a little different. To start off, the error we’d get is different: ORA-32044: cycle detected while executing recursive with query. The solution, unsurprisingly, is a little different too. Using the CYCLE keyword we can untangle this knot and, at the same time, create an equivalent for connect by’s connect_by_iscycle.

WITH m (empno, Employee, mgr, lvl) AS
  (SELECT empno, ename, mgr, 1 lvl
   FROM emp
   WHERE mgr IS NULL
   UNION ALL
   SELECT e.empno, e.ename, m.empno, lvl + 1
   FROM emp e, m
   WHERE e.mgr = m.empno)
SEARCH DEPTH FIRST by empno ASC SET ordseq
CYCLE empno SET is_cycle TO 1 DEFAULT 0
SELECT Employee, empno "Employee ID", mgr "Manager ID", lvl "Level", is_cycle
FROM m
ORDER BY ordseq;

And since we’re on a roll, coming up with equivalents for connect by features, we might as well construct one for connect_by_path which, if your memory needs refreshing, maps the path from the root to the current record.

WITH m (empno, Employee, mgr, path) AS
  (SELECT empno, ename, mgr, ename as path
   FROM emp
   WHERE mgr IS NULL
   UNION ALL
   SELECT e.empno, e.ename, m.empno, m.path||' / '||e.ename as path
   FROM emp e, m
   WHERE e.mgr = m.empno)
SEARCH DEPTH FIRST by empno ASC SET ordseq
CYCLE empno SET is_cycle TO 1 DEFAULT 0
SELECT Employee, empno "Employee ID", mgr "Manager ID", is_cycle, path
FROM m
ORDER BY ordseq;
Conclusion

The Pacific island nation of Vanuatu has the most beautiful Pidgin English. It is wittily verbose in places where English is dry and terse. So instead of “son”, they say  boe blong mi (boy that belongs to me); to say “daughter” they simply substitute the ‘boe‘ for ‘gel‘. That’s the beauty of verbosity; it gives you the opportunity to make minor tweaks that may have major effects.

Whenever I need to write a quick, simple query, I still use connect by. (Chances of it being deprecated and phased out any time soon are, I think, slim.) However, when the requirement is more complex, I rely on the more verbose recursive with clause. Understanding how to use both of them will give you the opportunity to do the same.