My wife and I just welcomed our first child to the world. No, there’s no need to congratulate me; as Chris Rock once said, it’s no big deal, even cockroaches have babies. However, it does mean that I’ve often been up at 3 a.m. rocking my son, mumbling nursery rhymes to him. Which, of course, always has me thinking of hierarchical queries. Of course.
But first, a definition. A hierarchical query is a type of recursive query that returns a resultset that displays the natural relationship of data. Or, if you like pretty mental pictures, hierarchical queries return datasets in which each record is (potentially) holding hands with the record before it.
Using a vanilla select query, you can find out from the EMP table all the staff for whom King is their manager. However, you will then need to run a second select statement to find out all the staff who report to them. Using a hierarchical query, we can find out, in one move, who reports to King, who reports to them and who reports to them. King is Jones’s manager; Jones is Scott’s manager; Scott is Adams’ manager…
Which brings us back to 3 a.m with me singing blearily to my baby: the toe bone’s connected to foot bone; the foot bone’s connected to the ankle bone; the ankle bone’s connected to the shin bone…
Hierarchical data. It’s everywhere, even in nursery rhymes.
Syntax and Keywords
To query hierarchical data, you will need to refer to a family of keywords and build a hierarchical query clause. The hierarchical query clause comes, in a SELECT statement, right after the where clause.
As I said, there are a few keywords we’ll need to get acquainted with in order to understand hierarchical querying. The best way to find out what they are is to zoom in on the hierarchical query clause and study its syntax.
CONNECT BY tells Oracle what the relationship between the parent row and the child row should be. To do this we use a simple condition of the type which we might typically use in a WHERE clause; the only difference here is that we must also use the PRIOR keyword to indicate the parent.
In our earlier example from the EMP table, because we wanted to show managers followed by their employees, the manager row would be the parent, with the employee being the child. This means that whatever was the employee id in the previous – prior - row would be the manager id in the next.
CONNECT BY PRIOR empno = mgr
START WITH tells Oracle what is the root record of our hierarchy. It does this by also employing a condition.
In our earlier example, we wanted to start off with the big kahuna, the man with no manager.
START WITH mgr IS NULL
So if we throw it all together, we end up with:
SELECT ename "Employee", empno "Employee ID", mgr "Manager ID" FROM emp CONNECT BY PRIOR empno = mgr START WITH mgr IS NULL;
We’ve asked our query to start with the big boss, and that’s quite clear. However, from the third record things become a little less clear, as it is not immediately obvious if Scott reports to Jones or directly to King. The LEVEL pseudocolumn returns a number indicating what level of the hierarchy the record is.
Another way to show a node’s position in the hierarchy is to use the SYS_CONNECT_BY_PATH function. This function returns the full path from the root node to the current node, and has the following syntax:
SYS_CONNECT_BY_PATH (column , delimiter)
SELECT ename "Employee", LEVEL, SYS_CONNECT_BY_PATH (ename, '/') "Path" FROM emp CONNECT BY PRIOR empno = mgr START WITH mgr IS NULL;
Blake’s a busy boy. He has five direct reports – Allen, Ward, Martin, Turner and James. If we wanted to sort that list alphabetically, we would not be able to use the normal ORDER BY clause, as that would override the hierarchical order created by the CONNECT BY and we’ll end up with Adams on top and King somewhere in the middle. (GROUP BY overrides the hierarchy too.) To sort within the bounds of our CONNECT BY we must use a different clause:
ORDER SIBLINGS BY
So far we’ve been working with nice, tame data. Our lines are straight, and our hierarchical tree is uncomplicated. However, the real world can be messy like a plate of spaghetti. Imagine if King, instead of being the overall boss, had to report to a board headed by Jones. That would mean that while King is Jones’ boss, Jones is also King’s boss. If we update the EMP table to reflect this, and rerun our query, we get the following error:
ORA-01436: CONNECT BY loop in user data
To tell Oracle to proceed in spite of loops in our data, we need to modify our CONNECT BY statement to CONNECT BY NOCYCLE.
And, if we wish to identify records where any such loops have been resolved, we would need to include the CONNECT_BY_ISCYCLE pseudocolumn in our query. CONNECT_BY_ISCYCLE returns 1 where it identifies a loop; otherwise it returns 0.
SELECT ename "Employee", empno, mgr, CONNECT_BY_ISCYCLE, SYS_CONNECT_BY_PATH (ename, '/') "Path" FROM emp CONNECT BY NOCYCLE PRIOR empno = mgr START WITH ename = 'KING' ORDER SIBLINGS BY ename;
So that’s it.
Except that it’s not. Since 11gR2 we have been able to query hierarchical data using recursive subquery factoring clauses or, in other words, the WITH clause. This brings the ANSI standard to Oracle.
I’ll tell you how it all works in the next part of this article.