Unpivot

One of the sessions at the recent annual UKOUG technical conference (Tech16) was a panel session on optimisation for which I had posted a request for a few advance questions, and one of the questions that appeared in my inbox was as follows: I have a report which is a union all of 27 versions of the statement: select ‘ColumnX_name’

One of the sessions at the recent annual UKOUG technical conference (Tech16) was a panel session on optimisation for which I had posted a request for a few advance questions, and one of the questions that appeared in my inbox was as follows:

I have a report which is a union all of 27 versions of the statement:

The different versions of the statement are identical in structure but each query is for a different column. This does 27 tablescans – is there a way to get the result using a single tablescan of this table, which consists of roughly 6 million rows.

When I read it my first thought was “columns into rows”, that sounds like a task for the relatively new unpivot clause. It happened to be the first time I’d come across a “real” requirement for the clause, so I couldn’t resist running up a quick test to satisfy myself that it would work as required. Here’s how I created some starting data:

This wasn’t from the first example I ran – initially I created a dataset where all the columns were numeric but after passing that example to the originator of the question I changed what had been col3 column in the original to the date3 column you see above by adding trunc(sysdate) into the defining expression. Here, then, is the code I wrote to demonstrate the principle – first the simple version that will fail then a working version with its results:

The unpivot clause has given us two new columns a column called col_name which can take the values COL1′, COL2′, or COL3′; and a column called col_value which takes the value of the column called col1 when col_name = ‘COL1’, the value of the column called col2 when col_name = ‘COL2’, and the value of the column called col3 when col_name = ‘COL3’. (If you’ve used quoting to create a table with mixed case column names you’ll have to use the double-quote mark to delimit the names in this list (viz: “MixedCaseColumnName”, “AnotherMixedCaseColumnName”). Just as with UNION ALL queries, there’s a requirement to match datatypes all the way down. The simple query worked when all three columns were numeric but failed when I changed the 3rd column to a date datatype (the OP discovered the error for me because they had a mixture of character and numeric data). The error was:

With a little adjustment the solution was reasonably successful: the OP turned his original query into a UNION ALL of two aggregated subqueries with unpivot clauses because the original data consisted of two different data types (one of the queries included the necessary conversion functions after aggregating so that the UNION ALL didn’t get the same ORA-01790). I’ve lost the email with the figures, but I think the run-time dropped from 1 minute 50 seconds to 40 seconds as the UNION ALL dropped from 27 subqueries to 2 subqueries. The OP’s response to the ORA-01790 was very interesting, of course, and a good demonstration of the need to consider different possibilities. His workaround involved a rewrite where one part of the UNION ALL was an unpivot on the numeric columns and the other an unpivot on the character columns. Adding a second date column (date4) to my data set the resulting query would be shaped something like this:

Remember that there were 6 million rows in the table. Using my original version of the code with explicit coercion for some of the columns means calling (in my case) the to_number(to_char()) functions once per row for every date column – that’s a lot of CPU to do a lot of conversions. By using a UNION ALL and converting after aggregating we have a trade-off: we either do a second tablescan minimising the CPU usage, or we do a single tablescan but spend a lot more CPU doing it. If there’s any doubt about whether the optimizer tried to get too clever with the conversion and run it early we can show that the conversion does run late by examining the “Projection” section of the execution plan. Here’s the plan I pulled from memory after enabling rowsource execution stats (which added a surprising amount of time to the execution) and running the following query:

Notice particularly the column projection information at operation 8 (corresponding to the second hash group by), it’s showing “COL_VALUE[DATE,7]” – we’re aggregating dates; while operation 4 (the first hash group by) shows “COL_VALUE[NUMBER,22]”. It’s not until operation 3 (the parent of operation 8) that we can see that we have three columns of 5, 22, and 22 bytes as the date finally becomes a number.

Conclusion

This note has picked up three points. The primary intent, of course, was to introduce the unpivot clause – a very convenient mechanism for producing certain types of summary output. At the same time we’ve noted a critical restriction on how it can be used and shown two ways of dealing with that restriction – inline coercion, or late coercion after aggregation – and highlighted the cost/benefit analysis that should go into picking the most suitable strategy. Finally we’ve seen an example of how the (rarely used, and rarely needed) “Projection” section of an execution plan can assure us that the coercion must be happening where we hope.