In this part of the series we will have a look at different approaches to how Parallel Execution Skew can be addressed via manual re-writes of an affected query. Let’s have a look at a couple of options.
Approach 1: Mimic the new 12c skew aware feature
Looking at the new 12c special skew aware distribution feature that I’ve introduced in the previous instalment of this series, we can see that broadcasting the few skewed values and using a hash distribution for the remaining values is a viable solution to the problem, so we could come up with something like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
select count(t_2_filler) from ( -- The first part of the UNION ALL covers the non-popular values -- The optimizer should come up a hash distribution for that part automatically select /*+ monitor */ t_1.id as t_1_id , t_1.filler as t_1_filler , t_2.id as t_2_id , t_2.filler as t_2_filler from t_1 , t_2 where t_2.fk_id_skew = t_1.id and <span style="color: #ff0000;"><strong>t_2.fk_id_skew != 1</strong></span> and <span style="color: #ff0000;"><strong>t_1.id != 1</strong></span> and regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') --------- --------- <span style="color: #ff0000;"><strong>union all</strong></span> --------- --------- -- This part covers the popular values -- Assuming that the popular values are just a few out of T_1 this will result in a BROADCAST distribution of that very small row source -- By using the BROADCAST distribution for T_1 the skew of the popular value in T_2.FK_ID_SKEW doesn't matter as we don't base the distribution on the join key -- Optimizer prefers MERGE JOIN due to the "non-sort" operation of the MERGE JOIN -- since the join key is known at parse time as a constant in this particular case -- Therefore it might make sense to force a HASH JOIN here to avoid SORT operations select /*+ use_hash(t_1 t_2) */ t_1.id as t_1_id , t_1.filler as t_1_filler , t_2.id as t_2_id , t_2.filler as t_2_filler from t_1 , t_2 where t_2.fk_id_skew = t_1.id and <span style="color: #ff0000;"><strong>t_2.fk_id_skew = 1</strong></span> and <span style="color: #ff0000;"><strong>t_1.id = 1</strong></span> and regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') ); |
I’ve split the original query into two parts: The first part of the new query covers the non-popular values that should distribute well using a hash distribution – it explicitly filters the popular values.
The second part covers the popular values only. Based on the assumption that there are only a few popular values out of many, restricting T_1 on those popular values should actually result in a very small row source in comparison to T_2 – which is filtered on the popular values, too.
So this very small row source should be distributed by BROADCAST automatically by the optimizer, otherwise this could be hinted using a corresponding PQ_DISTRIBUTE hint. By using the BROADCAST distribution for T_1, the skewed value distribution in T_2.FK_ID_SKEW doesn’t matter as this information is not used to distribute the data – in this case it is simply a random distribution based on the PX BLOCK ITERATOR operator of the parallel full table scan of T_2.
Running this query on my test system I’m indeed back to my optimal 15 seconds runtime. However this construct has several disadvantages:
– You need to know your data and explicitly code into the query the popular values. It might be possible to automate this by dynamically generating the SQL based on a previous query that looks for popular values in the join columns, similar to what 12c does under the cover
– It requires accessing the row sources twice, instead of once – certainly a drawback for large row sources and more complex queries where the row source is not a simple table but itself already the result of a join
– It is a SQL construct that might be difficult to implement for tools that generate SQL automatically based on metadata like reporting tools
Here you can see the beauty of the new 12c skew aware distribution, as it does all above automatically and dynamically, but unfortunately applies only to a limited number of scenarios at present, as demonstrated in the previous part of this series.
Are there other ways we can achieve good distribution that don’t have above disadvantages?
Approach 2: Re-map popular values using a remapping table
Another possibility is a re-mapping of the popular values, basically spreading the popular values among many other values and by doing so avoiding the skewed distribution.
Assuming that the join keys are positive integers we could make use of the unused range of negative integer values to re-map the popular values, thus making sure that the remapped values don’t overlap with the original values in the columns. Depending on the actual case different solutions might be required for finding suitable values that can be used for re-mapping.
For that purpose let’s create a table that we can use to re-map our original popular values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
define distrib_size = 1e4 create table t_redistrib as select a.id as org_id , (a.id * -&distrib_size) + (rownum - offset * &distrib_size) * -1 as new_id from -- Row generator (select /*+ cardinality(&distrib_size) */ null from dual connect by level <= &distrib_size) m , (select id , rownum - 1 as offset from ( -- Put here the skewed values to distribute select 1 as id from dual /* Here could come more values to re-map union all select 3 from dual union all select 7 from dual */ ) ) a ; |
There are probably many ways how this table could be set up and used – above code assumes that there are several positive integer values that are popular in some tables and for each of those values it will generate &distrib_size entries of negative integer values in a non-overlapping way.
You can already see at this stage that this approach still requires you to know the data – as the table needs to be pre-populated, so it doesn’t address point number one above.
Now we can re-write our sample query in the following way to make use of the new table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
select count(t_2_filler) from ( select /*+ monitor */ t_1_r.new_id as t_1_id , t_1_r.filler as t_1_filler , t_2.id as t_2_id , t_2.filler as t_2_filler from t_2 -- This view distributes the skewed value in the "other" table on as many values as there are in the T_REDISTRIB table for that value -- The COALESCE expression ensures that Oracle will have to join T_1 to T_REDISTRIB before joining T_1 based on the skewed join key to anything else , ( select coalesce(x.new_id, t_1.id) as new_id , t_1.filler from t_1 , (select * from t_redistrib where org_id in (1/* ,3, 7*/)) x where t_1.id = x.org_id (+) ) t_1_r where -- This join maps the skewed value in table with the skewed distribution randomly to a value that maps to the values from the T_REDISTRIB table -- The data will be distributed on that value instead of the original skewed value t_1_r.new_id = case when t_2.fk_id_skew in (1/*, 3, 7*/) then (t_2.fk_id_skew * -&distrib_size) - mod(t_2.id, &distrib_size) - 1 else t_2.fk_id_skew end and regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1_r.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') ); |
The approach here is to use a view for the “other” row source holding the popular values as single entries / primary keys that spreads the popular values across &distrib_size other values that don’t overlap with any other existing values in the tables, and doesn’t touch any other value in that table that isn’t popular, hence the outer join between T_1 and T_REDISTRIB. I’ve filtered T_REDISTRIB on the popular values to re-map in that particular usage of the table, assuming that T_REDISTRIB could be used for other joins with other tables / values, too, otherwise this filter is unnecessary.
Then this view is joined to the row source with the popular value. Assuming there is a suitable value in that table (obviously most suitable in this case would be a primary key using positive integers) we can make use of that value to re-map the popular values to the same range of values as used to populate the T_REDISTRIB table and that come out of the view T_1_R for the popular values.
In the example above the popular value 1 according to T_REDISTRIB will be remapped to 10000 values ranging from -10001 to -20000, so instead of one row with value ID = 1 the view T_1_R will return 10000 rows in the negative range mentioned. The join expression “case when t_2.fk_id_skew in (1/*, 3, 7*/) then (t_2.fk_id_skew * -&distrib_size) – mod(t_2.id, &distrib_size) – 1 else t_2.fk_id_skew end” will do the same re-mapping for T_2.FK_ID_SKEW based on the primary key value T_2.ID, so whenever the original value is 1 it will be remapped to some random value in the range between -10001 and -20000.
The net effect of that re-mapping on both sides of the join is that the popular value will be replaced by &distrib_size other values and hence the hash distribution will be based on those replacement values rather than the original popular values without altering the result of the query.
If I run this query on my test system I’m again back to the optimal 15 seconds, so this approach seems to work fine. The good thing here is that we don’t need to access the row sources twice, so we address at least one important disadvantage of the other solution above. However, this construct is still hard to achieve with a generic SQL generator, and when looking at the execution plan the optimizer is clearly confused by the complex join expression and therefore the estimated join cardinality is way off – a potential problem for more complex queries where this join result would be to be joined to other row sources:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 226 | | | | | 1 | SORT AGGREGATE | | 1 | 226 | | | | | 2 | PX COORDINATOR | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10003 | 1 | 226 | Q1,03 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | 226 | Q1,03 | PCWP | | |* 5 | HASH JOIN | | <span style="color: #ff0000;"><strong>2010M| 423G</strong></span>| Q1,03 | PCWP | | | 6 | PX RECEIVE | | 2000K| 211M| Q1,03 | PCWP | | | 7 | PX SEND HYBRID HASH | :TQ10001 | 2000K| 211M| Q1,01 | P->P | HYBRID HASH| | 8 | STATISTICS COLLECTOR | | | | Q1,01 | PCWC | | | 9 | PX BLOCK ITERATOR | | 2000K| 211M| Q1,01 | PCWC | | | 10 | TABLE ACCESS FULL | T_2 | 2000K| 211M| Q1,01 | PCWP | | | 11 | PX RECEIVE | | 2010K| 220M| Q1,03 | PCWP | | | 12 | PX SEND HYBRID HASH | :TQ10002 | 2010K| 220M| Q1,02 | P->P | HYBRID HASH| |* 13 | HASH JOIN RIGHT OUTER| | 2010K| 220M| Q1,02 | PCWP | | | 14 | PX RECEIVE | | 10000 | 90000 | Q1,02 | PCWP | | | 15 | PX SEND BROADCAST | :TQ10000 | 10000 | 90000 | Q1,00 | S->P | BROADCAST | | 16 | PX SELECTOR | | | | Q1,00 | SCWC | | |* 17 | TABLE ACCESS FULL| T_REDISTRIB | 10000 | 90000 | Q1,00 | SCWP | | | 18 | PX BLOCK ITERATOR | | 2000K| 202M| Q1,02 | PCWC | | | 19 | TABLE ACCESS FULL | T_1 | 2000K| 202M| Q1,02 | PCWP | | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access(COALESCE("T_REDISTRIB"."NEW_ID","T_1"."ID")=CASE "T_2"."FK_ID_SKEW" WHEN 1 THEN "T_2"."FK_ID_SKEW"*(-10000)-MOD("T_2"."ID",1e4)-1 ELSE "T_2"."FK_ID_SKEW" END ) filter( REGEXP_REPLACE ("T_2"."FILLER",'^\s+([[:alnum:]]+)\s+$',' \1',1,1,'c')>= REGEXP_REPLACE ("T_1"."FILLER",'^\s+([[:alnum:]]+)\s+$',' \1',1,1,'c')) 13 - access("T_1"."ID"="T_REDISTRIB"."ORG_ID"(+)) 17 - filter("ORG_ID"(+)=1) |
An Optimized Approach 2
So can we do better? Yes, we can. We can do two things. First, we can put the inline view T_1_R into a stored view, so that it could be used like a regular table/view:
1 2 3 4 5 6 7 8 9 10 11 |
create or replace view t_1_r as select coalesce(x.new_id, t_1.id) as new_id , t_1.filler from t_1 , (select * from t_redistrib where org_id in (1/* ,3, 7*/)) x where t_1.id = x.org_id (+) ; |
Second, at least from Oracle 11g on we can make use of a virtual column for the complex join expression (and for pre-11g versions we could at least create a function-based index on the expression to collect statistics for the optimizer to come up with improved cardinality estimates):
1 2 3 |
alter table t_2 add new_fk_id_skew as (case when t_2.fk_id_skew in (1/*, 3, 7*/) then (t_2.fk_id_skew * -&distrib_size) - mod(t_2.id, &distrib_size) - 1 else t_2.fk_id_skew end); exec dbms_stats.gather_table_stats(null, 't_2', method_opt=>'for all columns size 1 for columns fk_id_skew size 254, new_fk_id_skew size 254', no_invalidate=>false) |
The effect of the virtual column is two-fold: First, we can now simply use the new virtual column in our join expression. Second, when gathering statistics for the virtual column, the optimizer can use these column statistics to come up with much improved join cardinality estimates. Our query would now look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
select count(t_2_filler) from ( select /*+ monitor */ t_1_r.new_id as t_1_id , t_1_r.filler as t_1_filler , t_2.id as t_2_id , t_2.filler as t_2_filler from t_2 -- This view distributes the skewed value in the "other" table on as many values as there are in the T_REDISTRIB table for that value , t_1_r where -- This join maps the skewed value in table with the skewed distribution randomly to a value that maps to the values from the T_REDISTRIB table -- The data will be distributed on that value instead of the original skewed value t_1_r.new_id = t_2.new_fk_id_skew and regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1_r.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') ); |
That looks pretty similar to our original query and should be something that could be generated by reporting tools or similar. The join cardinality estimate looks much better now as well if you check the resulting execution plan.
So this approach at least addresses two out of the three disadvantages mentioned above, which is not too bad. You still need to know the popular values and code that into the view and join expression, but apart from that this approach looks pretty good.
Load comments