Unpivot

One of the sessions at the recent annual UKOUG technical conference (Tech16) was a panel session on optimisation for which I had posted a request for a few advance questions, and one of the questions that appeared in my inbox was as follows: I have a report which is a union all of 27 versions of the statement: select ‘ColumnX_name’… Continue Reading →

One of the sessions at the recent annual UKOUG technical conference (Tech16) was a panel session on optimisation for which I had posted a request for a few advance questions, and one of the questions that appeared in my inbox was as follows:

I have a report which is a union all of 27 versions of the statement:

select
        'ColumnX_name' attr_name, colX attr_val, count(colX) attr_count 
from    tab1 
group by colX

The different versions of the statement are identical in structure but each query is for a different column. This does 27 tablescans – is there a way to get the result using a single tablescan of this table, which consists of roughly 6 million rows.

When I read it my first thought was “columns into rows”, that sounds like a task for the relatively new unpivot clause (the link should take you to the correct spot in the 11.2 reference manual – but Oracle keeps moving the manuals around). It happened to be the first time I’d come across a “real” requirement for the clause, so I couldn’t resist running up a quick test to satisfy myself that it would work as required. Here’s how I created some starting data:

rem
rem     Script:         unpivot_xxxx.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem

execute dbms_random.seed(0)

create table t1
as
select
        trunc(dbms_random.value(0,10))                          col1,
        trunc(dbms_random.value(20,30))                         col2,
        trunc(sysdate) + trunc(dbms_random.value(50,60))        date3
from
        all_objects
where
        rownum <= 10000
;

-- gather stats, no histograms.

This wasn’t from the first example I ran – initially I created a dataset where all the columns were numeric but after passing that example to the originator of the question I changed what had been col3 column in the original to the date3 column you see above by adding trunc(sysdate) into the defining expression. Here, then, is the code I wrote to demonstrate the principle – first the simple version that will fail then a working version with its results:

prompt  =====================================
prompt  Version that will fail with ORA-01790
prompt  =====================================

select
        col_name, col_value, count(col_value)
from
        t1
unpivot (
        col_value for col_name in (col1, col2, date3)
)
group by
        col_name, col_value
order by
        col_name, col_value
;

prompt  ==================
prompt  Version that works
prompt  ==================

select
        col_name, col_value, count(col_value)
from
        (select col1, col2, to_number(to_char(date3,'yyyymmdd')) col3 from t1)
unpivot (
        col_value for col_name in (col1, col2, col3)
)
group by
        col_name, col_value
order by
        col_name, col_value
;

COL_  COL_VALUE COUNT(COL_VALUE)
---- ---------- ----------------
COL1          0              965
COL1          1             1055
COL1          2              986
COL1          3              981
COL1          4             1008
COL1          5              988
COL1          6              977
COL1          7             1064
COL1          8              950
COL1          9             1026
COL2         20             1032
COL2         21              985
COL2         22              997
COL2         23             1004
COL2         24             1014
COL2         25             1027
COL2         26              968
COL2         27              975
COL2         28             1017
COL2         29              981
COL3   20170123             1034
COL3   20170124              984
COL3   20170125             1001
COL3   20170126              991
COL3   20170127              970
COL3   20170128              995
COL3   20170129              991
COL3   20170130             1044
COL3   20170131              990
COL3   20170201             1000

The unpivot clause has given us two new columns a column called col_name which can take the values COL1′, COL2′, or COL3′; and a column called col_value which takes the value of the column called col1 when col_name = ‘COL1’, the value of the column called col2 when col_name = ‘COL2’, and the value of the column called col3 when col_name = ‘COL3’. (If you’ve used quoting to create a table with mixed case column names you’ll have to use the double-quote mark to delimit the names in this list (viz: “MixedCaseColumnName”, “AnotherMixedCaseColumnName”). Just as with UNION ALL queries, there’s a requirement to match datatypes all the way down. The simple query worked when all three columns were numeric but failed when I changed the 3rd column to a date datatype (the OP discovered the error for me because they had a mixture of character and numeric data). The error was:

    ORA-01790: expression must have same datatype as corresponding expression

With a little adjustment the solution was reasonably successful: the OP turned his original query into a UNION ALL of two aggregated subqueries with unpivot clauses because the original data consisted of two different data types (one of the queries included the necessary conversion functions after aggregating so that the UNION ALL didn’t get the same ORA-01790). I’ve lost the email with the figures, but I think the run-time dropped from 1 minute 50 seconds to 40 seconds as the UNION ALL dropped from 27 subqueries to 2 subqueries. The OP’s response to the ORA-01790 was very interesting, of course, and a good demonstration of the need to consider different possibilities. His workaround involved a rewrite where one part of the UNION ALL was an unpivot on the numeric columns and the other an unpivot on the character columns. Adding a second date column (date4) to my data set the resulting query would be shaped something like this:

select
        *
from    (
        select
                col_name, col_value, count(col_value)
        from
                t1
        unpivot (
                col_value for col_name in (col1, col2)
        )
        group by
                col_name, col_value
        union all
        select
                col_name, to_number(to_char(col_value,'yyyymmdd')), count(col_value)
        from
                t1
        unpivot (
                col_value for col_name in (date3, date4)
        )
        group by
                col_name, col_value
        )
        )
order by
        1,2
;

Remember that there were 6 million rows in the table. Using my original version of the code with explicit coercion for some of the columns means calling (in my case) the to_number(to_char()) functions once per row for every date column – that’s a lot of CPU to do a lot of conversions. By using a UNION ALL and converting after aggregating we have a trade-off: we either do a second tablescan minimising the CPU usage, or we do a single tablescan but spend a lot more CPU doing it. If there’s any doubt about whether the optimizer tried to get too clever with the conversion and run it early we can show that the conversion does run late by examining the “Projection” section of the execution plan. Here’s the plan I pulled from memory after enabling rowsource execution stats (which added a surprising amount of time to the execution) and running the following query:

select * from table(dbms_xplan.display_cursor(null,null,'allstats last projection'));

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |      1 |        |     40 |00:00:01.39 |      78 |       |       |          |
|   1 |  SORT ORDER BY          |      |      1 |     27 |     40 |00:00:01.39 |      78 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW                  |      |      1 |     27 |     40 |00:00:01.39 |      78 |       |       |          |
|   3 |    UNION-ALL            |      |      1 |        |     40 |00:00:01.39 |      78 |       |       |          |
|   4 |     HASH GROUP BY       |      |      1 |     17 |     20 |00:00:00.63 |      39 |  2206K|  1858K| 1345K (0)|
|*  5 |      VIEW               |      |      1 |  20000 |  20000 |00:00:00.50 |      39 |       |       |          |
|   6 |       UNPIVOT           |      |      1 |        |  20000 |00:00:00.29 |      39 |       |       |          |
|   7 |        TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.09 |      39 |       |       |          |
|   8 |     HASH GROUP BY       |      |      1 |     10 |     20 |00:00:00.76 |      39 |  2209K|  1684K| 1347K (0)|
|*  9 |      VIEW               |      |      1 |  20000 |  20000 |00:00:00.67 |      39 |       |       |          |
|  10 |       UNPIVOT           |      |      1 |        |  20000 |00:00:00.26 |      39 |       |       |          |
|  11 |        TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.14 |      39 |       |       |          |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("unpivot_view_008"."COL_VALUE" IS NOT NULL)
   9 - filter("unpivot_view_011"."COL_VALUE" IS NOT NULL)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=2) "from$_subquery$_001"."COL_NAME"[VARCHAR2,5], "from$_subquery$_001"."COL_VALUE"[NUMBER,22],
       "from$_subquery$_001"."COUNT(COL_VALUE)"[NUMBER,22]
   2 - "from$_subquery$_001"."COL_NAME"[VARCHAR2,5], "from$_subquery$_001"."COL_VALUE"[NUMBER,22],
       "from$_subquery$_001"."COUNT(COL_VALUE)"[NUMBER,22]
   3 - STRDEF[5], STRDEF[22], STRDEF[22]
   4 - "unpivot_view_008"."COL_NAME"[CHARACTER,4], "unpivot_view_008"."COL_VALUE"[NUMBER,22],
       COUNT("unpivot_view_008"."COL_VALUE")[22]
   5 - "unpivot_view_008"."COL_NAME"[CHARACTER,4], "unpivot_view_008"."COL_VALUE"[NUMBER,22]
   6 - STRDEF[4], STRDEF[22]
   7 - "col1"[NUMBER,22], "T1"."COL2"[NUMBER,22]
   8 - "unpivot_view_011"."COL_NAME"[CHARACTER,5], "unpivot_view_011"."COL_VALUE"[DATE,7],
       COUNT("unpivot_view_011"."COL_VALUE")[22]
   9 - "unpivot_view_011"."COL_NAME"[CHARACTER,5], "unpivot_view_011"."COL_VALUE"[DATE,7]
  10 - STRDEF[5], STRDEF[7]
  11 - "DATE3"[DATE,7], "T1"."DATE4"[DATE,7]

Notice particularly the column projection information at operation 8 (corresponding to the second hash group by), it’s showing “COL_VALUE[DATE,7]” – we’re aggregating dates; while operation 4 (the first hash group by) shows “COL_VALUE[NUMBER,22]”. It’s not until operation 3 (the parent of operation 8) that we can see that we have three columns of 5, 22, and 22 bytes as the date finally becomes a number.

Conclusion

This note has picked up three points. The primary intent, of course, was to introduce the unpivot clause – a very convenient mechanism for producing certain types of summary output. At the same time we’ve noted a critical restriction on how it can be used and shown two ways of dealing with that restriction – inline coercion, or late coercion after aggregation – and highlighted the cost/benefit analysis that should go into picking the most suitable strategy. Finally we’ve seen an example of how the (rarely used, and rarely needed) “Projection” section of an execution plan can assure us that the coercion must be happening where we hope.