One of the sessions at the recent annual UKOUG technical conference (Tech16) was a panel session on optimisation for which I had posted a request for a few advance questions, and one of the questions that appeared in my inbox was as follows:
I have a report which is a union all of 27 versions of the statement:
1234 <em>select'ColumnX_name' attr_name, colX attr_val, count(colX) attr_countfrom tab1group by colX</em>The different versions of the statement are identical in structure but each query is for a different column. This does 27 tablescans – is there a way to get the result using a single tablescan of this table, which consists of roughly 6 million rows.
When I read it my first thought was “columns into rows”, that sounds like a task for the relatively new unpivot clause. It happened to be the first time I’d come across a “real” requirement for the clause, so I couldn’t resist running up a quick test to satisfy myself that it would work as required. Here’s how I created some starting data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
rem rem Script: unpivot_xxxx.sql rem Author: Jonathan Lewis rem Dated: Nov 2016 rem execute dbms_random.seed(0) create table t1 as select trunc(dbms_random.value(0,10)) col1, trunc(dbms_random.value(20,30)) col2, trunc(sysdate) + trunc(dbms_random.value(50,60)) date3 from all_objects where rownum <= 10000 ; -- gather stats, no histograms. |
This wasn’t from the first example I ran – initially I created a dataset where all the columns were numeric but after passing that example to the originator of the question I changed what had been col3 column in the original to the date3 column you see above by adding trunc(sysdate) into the defining expression. Here, then, is the code I wrote to demonstrate the principle – first the simple version that will fail then a working version with its results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
prompt ===================================== prompt Version that will fail with ORA-01790 prompt ===================================== select col_name, col_value, count(col_value) from t1 unpivot ( col_value for col_name in (col1, col2, date3) ) group by col_name, col_value order by col_name, col_value ; prompt ================== prompt Version that works prompt ================== select col_name, col_value, count(col_value) from (select col1, col2, to_number(to_char(date3,'yyyymmdd')) col3 from t1) unpivot ( col_value for col_name in (col1, col2, col3) ) group by col_name, col_value order by col_name, col_value ; COL_ COL_VALUE COUNT(COL_VALUE) ---- ---------- ---------------- COL1 0 965 COL1 1 1055 COL1 2 986 COL1 3 981 COL1 4 1008 COL1 5 988 COL1 6 977 COL1 7 1064 COL1 8 950 COL1 9 1026 COL2 20 1032 COL2 21 985 COL2 22 997 COL2 23 1004 COL2 24 1014 COL2 25 1027 COL2 26 968 COL2 27 975 COL2 28 1017 COL2 29 981 COL3 20170123 1034 COL3 20170124 984 COL3 20170125 1001 COL3 20170126 991 COL3 20170127 970 COL3 20170128 995 COL3 20170129 991 COL3 20170130 1044 COL3 20170131 990 COL3 20170201 1000 |
The unpivot clause has given us two new columns a column called col_name which can take the values ‘COL1′, ‘COL2′, or ‘COL3′; and a column called col_value which takes the value of the column called col1 when col_name = ‘COL1’, the value of the column called col2 when col_name = ‘COL2’, and the value of the column called col3 when col_name = ‘COL3’. (If you’ve used quoting to create a table with mixed case column names you’ll have to use the double-quote mark to delimit the names in this list (viz: “MixedCaseColumnName”, “AnotherMixedCaseColumnName”). Just as with UNION ALL queries, there’s a requirement to match datatypes all the way down. The simple query worked when all three columns were numeric but failed when I changed the 3rd column to a date datatype (the OP discovered the error for me because they had a mixture of character and numeric data). The error was:
1 |
ORA-01790: expression must have same datatype as corresponding expression |
With a little adjustment the solution was reasonably successful: the OP turned his original query into a UNION ALL of two aggregated subqueries with unpivot clauses because the original data consisted of two different data types (one of the queries included the necessary conversion functions after aggregating so that the UNION ALL didn’t get the same ORA-01790). I’ve lost the email with the figures, but I think the run-time dropped from 1 minute 50 seconds to 40 seconds as the UNION ALL dropped from 27 subqueries to 2 subqueries. The OP’s response to the ORA-01790 was very interesting, of course, and a good demonstration of the need to consider different possibilities. His workaround involved a rewrite where one part of the UNION ALL was an unpivot on the numeric columns and the other an unpivot on the character columns. Adding a second date column (date4) to my data set the resulting query would be shaped something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
select * from ( select col_name, col_value, count(col_value) from t1 unpivot ( col_value for col_name in (col1, col2) ) group by col_name, col_value union all select col_name, to_number(to_char(col_value,'yyyymmdd')), count(col_value) from t1 unpivot ( col_value for col_name in (date3, date4) ) group by col_name, col_value ) ) order by 1,2 ; |
Remember that there were 6 million rows in the table. Using my original version of the code with explicit coercion for some of the columns means calling (in my case) the to_number(to_char()) functions once per row for every date column – that’s a lot of CPU to do a lot of conversions. By using a UNION ALL and converting after aggregating we have a trade-off: we either do a second tablescan minimising the CPU usage, or we do a single tablescan but spend a lot more CPU doing it. If there’s any doubt about whether the optimizer tried to get too clever with the conversion and run it early we can show that the conversion does run late by examining the “Projection” section of the execution plan. Here’s the plan I pulled from memory after enabling rowsource execution stats (which added a surprising amount of time to the execution) and running the following query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
select * from table(dbms_xplan.display_cursor(null,null,'allstats last projection')); --------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | --------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 40 |00:00:01.39 | 78 | | | | | 1 | SORT ORDER BY | | 1 | 27 | 40 |00:00:01.39 | 78 | 2048 | 2048 | 2048 (0)| | 2 | VIEW | | 1 | 27 | 40 |00:00:01.39 | 78 | | | | | 3 | UNION-ALL | | 1 | | 40 |00:00:01.39 | 78 | | | | | 4 | HASH GROUP BY | | 1 | 17 | 20 |00:00:00.63 | 39 | 2206K| 1858K| 1345K (0)| |* 5 | VIEW | | 1 | 20000 | 20000 |00:00:00.50 | 39 | | | | | 6 | UNPIVOT | | 1 | | 20000 |00:00:00.29 | 39 | | | | | 7 | TABLE ACCESS FULL| T1 | 1 | 10000 | 10000 |00:00:00.09 | 39 | | | | | 8 | HASH GROUP BY | | 1 | 10 | 20 |00:00:00.76 | 39 | 2209K| 1684K| 1347K (0)| |* 9 | VIEW | | 1 | 20000 | 20000 |00:00:00.67 | 39 | | | | | 10 | UNPIVOT | | 1 | | 20000 |00:00:00.26 | 39 | | | | | 11 | TABLE ACCESS FULL| T1 | 1 | 10000 | 10000 |00:00:00.14 | 39 | | | | --------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - filter("unpivot_view_008"."COL_VALUE" IS NOT NULL) 9 - filter("unpivot_view_011"."COL_VALUE" IS NOT NULL) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=2) "from$_subquery$_001"."COL_NAME"[VARCHAR2,5], "from$_subquery$_001"."COL_VALUE"[NUMBER,22], "from$_subquery$_001"."COUNT(COL_VALUE)"[NUMBER,22] 2 - "from$_subquery$_001"."COL_NAME"[VARCHAR2,5], "from$_subquery$_001"."COL_VALUE"[NUMBER,22], "from$_subquery$_001"."COUNT(COL_VALUE)"[NUMBER,22] 3 - STRDEF[5], STRDEF[22], STRDEF[22] 4 - "unpivot_view_008"."COL_NAME"[CHARACTER,4], "unpivot_view_008"."COL_VALUE"[NUMBER,22], COUNT("unpivot_view_008"."COL_VALUE")[22] 5 - "unpivot_view_008"."COL_NAME"[CHARACTER,4], "unpivot_view_008"."COL_VALUE"[NUMBER,22] 6 - STRDEF[4], STRDEF[22] 7 - "col1"[NUMBER,22], "T1"."COL2"[NUMBER,22] 8 - "unpivot_view_011"."COL_NAME"[CHARACTER,5], "unpivot_view_011"."COL_VALUE"[DATE,7], COUNT("unpivot_view_011"."COL_VALUE")[22] 9 - "unpivot_view_011"."COL_NAME"[CHARACTER,5], "unpivot_view_011"."COL_VALUE"[DATE,7] 10 - STRDEF[5], STRDEF[7] 11 - "DATE3"[DATE,7], "T1"."DATE4"[DATE,7] |
Notice particularly the column projection information at operation 8 (corresponding to the second hash group by), it’s showing “COL_VALUE[DATE,7]” – we’re aggregating dates; while operation 4 (the first hash group by) shows “COL_VALUE[NUMBER,22]”. It’s not until operation 3 (the parent of operation 8) that we can see that we have three columns of 5, 22, and 22 bytes as the date finally becomes a number.
Conclusion
This note has picked up three points. The primary intent, of course, was to introduce the unpivot clause – a very convenient mechanism for producing certain types of summary output. At the same time we’ve noted a critical restriction on how it can be used and shown two ways of dealing with that restriction – inline coercion, or late coercion after aggregation – and highlighted the cost/benefit analysis that should go into picking the most suitable strategy. Finally we’ve seen an example of how the (rarely used, and rarely needed) “Projection” section of an execution plan can assure us that the coercion must be happening where we hope.
Load comments