It’s time to move away from the shape of an execution plan and turn our attention to some of the numerical information we get from the plan. In this article we’re going to look only at the predictions that the optimizer makes (explain plan), postponing any investigation of actual run-time figures (v$sql_plan_statistics_all) for future instalments.
As a reference point we’re going to start with a very simple query and plan (from an instance of 220.127.116.11):
explain plan for select t1.id, t2.id from t2, t1 where t2.n1 = 15 and t1.n1 = t2.n2 ; select * from table(dbms_xplan.display(null, null, 'projection')); -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 225 | 4500 | 35 (6)| 00:00:01 | |* 1 | HASH JOIN | | 225 | 4500 | 35 (6)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| T2 | 15 | 180 | 16 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | T2_I1 | 15 | | 1 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | T1 | 3000 | 24000 | 18 (6)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."N1"="T2"."N2") 3 - access("T2"."N1"=15) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=1) "T2"."ID"[NUMBER,22], "T1"."ID"[NUMBER,22] 2 - "T2"."ID"[NUMBER,22], "T2"."N2"[NUMBER,22] 3 - "T2".ROWID[ROWID,10] 4 - "T1"."ID"[NUMBER,22], "T1"."N1"[NUMBER,22]
If you’ve worked through the previous instalments in this series you’ve probably worked out very quickly that the run-time order of operation for this plan is 3, 2, 4, 1, 0: we do an index range scan (line 3) to get data from table t2 (line 2) to build an in-memory hash table, then do a full scan of table t1 (line 4) to probe the hash table for matching data (line 1) passing the result on to the end user (line 0). What we’re going to spend time on in this article is interpreting the numeric data – the columns headed Rows, Bytes, Cost (%CPU), and Time.
A key point to remember when looking at the predictions made by explain plan is that the numbers on a line of a plan describe one execution of that line (the same is not true when you look at the row source execution statistics from the view v$sql_plan_statistics_all); a second, equally important, point is that the Cost and Time figures accumulate upwards from child operation to parent operation.
A final, less helpful, point is that there are plenty of defects that make it hard to produce an accurate interpretation of the numbers – rounding blurs the issue, some figures are missing, some numbers appear on the wrong line, some of the numbers in the plan are not consistent with the numbers you can find in the corresponding 10053 traces. For various reasons you’re likely to find cases (and I will be showing you some very simple cases in later instalments) where the description I give seems to be broken around the edges.
Note: In this example I’ve included the “projection” information from the plan table in the output; that’s not something I usually do but I’ll be talking about it later on in the worked example.
Here’s a quick summary of the columns:
|Rows||The predicted number of rows supplied to its parent by one execution of this operation|
|Bytes||The predicted number of bytes (rows x expected row size) supplied to its parent by one execution of this operation|
|Cost||The prediction of resources required for one execution of this operation – including the resources required for every execution of every descendent of this operation that would be needed for this operation to complete once. The figure in parentheses (%CPU) is the fraction of the resource usage that can be attributed to CPU activity. As we will see below, there are two ways of interpreting the cost.|
|Time||The predicted elapsed time (hours:minutes:seconds) required to execute this operation just once. As with cost, the time for a single execution of the operation includes the time spent in all the executions of all the child operations needed to complete one execution of this operation.|
I’ve been a bit heavy-handed with my use of the word “predicted” and the phrase “one execution” but it’s important to remember that predictions can be (often are) wrong; and that part of the prediction the optimizer makes – and it’s a part that is sometime hidden – is “how many times will this operation execute”, so the figures for one execution of one operation become an important aspect of understanding the totals for the whole plan.
I made the point that there are two ways of interpreting the “cost” column. I’ve argued for many years that the optimizer’s cost figure represents the predicted time to run: (see, for example, “Cost is time”). If you take the cost figure from the execution plan output, multiply it by the single block read time (sreadtim) from the system statistics table (aux_stats$) then – allowing for rounding – the answer you produce will be the time reported in the execution plan. Alternatively, if you don’t want to believe that cost represents time, you could interpret the cost figure, informally, by saying “the resource impact of this query is equivalent to this many (real) single block random reads”.
With that background, let’s apply what we’ve learned to the example at the start of the article. It’s a particularly easy example to start with because each line of the plan executes exactly once. As we saw earlier, the order of operation is: line 3, line 2, line 4, line 1, line 0
- Line 3: index range scan – predicts 15 rows returned, but no bytes! The “projection” information that I requested from dbms_xplan.display() tells us something about the columns produced and forwarded by each line of the plan. In line 3 we can see that the only “column” returned by this operation is described as “T2”.ROWID[ROWID,10]. I don’t know why this didn’t reveal itself in the bytes column with a value of 150 (15 * 10), but perhaps it’s because the rowid is a pseudo-column and doesn’t have an entry in dba_tab_columns to give the optimizer a value for avg_col_len (average column length). The cost for this range scan is 1 – it’s a very small index (root block and 7 leaf blocks), the optimizer has assumed the root block will be cached and that it will do one physical read to find a single leaf block. The predicted time is 1 * sreadtim – which happens to be 6 milliseconds in my case – rounded up to one second.
- Line 2: table access by rowid – the optimizer thinks the 15 rowids we get from line 3 will allow line 2 to acquire 15 table rows and, since there are no filter operations, forward all of them. According to the bytes column the data we acquire from those 15 rows will have an average row length of 12 bytes (12 * 15 = 180). Checking the projection information we see that the columns projected will be: “T2”.”ID”[NUMBER,22], “T2”.”N2″[NUMBER,22], and checking the data dictionary for these columns we see that both have an avg_row_len of 4 bytes, which leaves me wondering why the bytes column didn’t report 120 rather than 180 (perhaps the optimizer is erroneously including in its calculation the t2.n1 column – also with an avg_row_len of 4 – which we used in line 3). The cost of this operation is 16, which is 1 carried forward from line 3 (cost accumulates upwards from child to parent) plus 15 because the optimizer thinks the 15 rows will be scattered across 15 table blocks which will be subject to physical read requests. The predicted time is 16 * sreadtim which is 96 milliseconds, which again rounds up to one second.
- Line 4: table access full of t1 – there are 3,000 rows in the table and the operation doesn’t filter out any of them. The bytes column is 24,000 which is an average of 8 bytes per row; this is good because when we check the projection information we see that the operation returns: “T1”.”ID”[NUMBER,22], “T1”.”N1″[NUMBER,22] and a check of the data dictionary shows that both columns have an avg_row_len of 4 – in this case the arithmetic is consistent with our expectation. The cost is 18 which, by one definition, means the resource requirement is equivalent to 18 single block random reads – even though we know that Oracle will be using multiblock (probably non-random) reads to scan the table – but the optimizer has some built-in arithmetic to adjust between single block and multiblock reads (See, for example, this article I wrote for Oracle Magazine 10 years ago). In this case, though we see that 6% of the cost has been attributed to CPU usage – that’s an interesting reminder that tablescans don’t just consume disc resources, examining every row in a table can use up a lot of CPU. If we want to look more closely at the detail we could query the plan_table directly to check the columns cost, io_cost, and cpu_cost which, in my case, held: (18, 17, 1108201). That article explains how CPU time is factored into the final cost, so I won’t repeat the details here. Again, the result of the conversion from cost to time evaluates to 1 second.
- Line 1: the hash join – you’ll have to take my word for it that the optimizer has actually predicted the row count correctly, I was expecting 225 rows from the join (I happen to know that each row in t2 corresponds to 15 rows in t1). Again, though, there’s an anomaly in the bytes calculation – the prediction gives us an average of 20 bytes per row when the projection tells us it’s just two columns which the data dictionary says each have an avg_col_len of 4; and again it looks as if the optimizer has factored in every column (all 5 of them) used in the plan up to that point. The cost figure is a little interesting – it should be accumulating up the plan from child to parent so, at first glance, you might expect it to be the sum of the costs of lines 2 and 4 (which happen once each for each execution of line 1), but 16 + 18 = 34, not 35. The explanation for the difference comes from the work inherent in doing the hash join itself. If we look at the cost, io_cost, and cpu_cost for those three lines we see the following:
ID OBJECT_NAME COST IO_COST CPU_COST -- --------------- ------ --------- --------- 2 T2 16 16 120643 4 T1 18 17 1108201 1 35 33 2131094
- The I/O costs show a simple accumulation – the hash join itself doesn’t (in this case) introduce any further I/O – but the sum of the CPU costs of lines 2 and 4 comes to just 1,228,844 which doesn’t match the reported CPU cost for line 1 (2131094); the CPU cost inherent in the hash join is a further 902,250. With scaling and rounding, this adds 2 to the total I/O cost to get us to the final cost. The extra CPU for the hash join also explains why the CPU component of cost is still 6%: (35 – 33)/35 = 0.057.
If you’re wondering why I’ve been making so much fuss about the oddities in the bytes column I can give you two reasons – firstly it’s just a convenience that in this very simple sample I can use the column to show that the information that’s supposed to be in a plan can be wrong; secondly, and more importantly, the optimizer’s decision on whether or not to use a hash join, and which input to use as the build and which the probe is (in part) dependent on the data volume i.e. the bytes prediction. The fuss I’ve been making about the detail of the bytes column might explain some of the cases where you’ve seen Oracle picking the wrong execution plan.
We’ve taken a very simple query, and walked through each line of the plan thinking about where the numbers come from and what they mean. In this simple example we’ve seen how cost and time accumulate up the plan and described the relationship between these two numbers; we’ve also seen how the cost is made up of an I/O cost and a CPU cost and noted that the CPU cost can (quite reasonably) be a significant fraction of the total cost.
Our example was so simple that we haven’t really had an opportunity to examine the consequences of the fact that the figures reported for an operation represent a single execution of that operation, and that’s something we will address in the next instalment. Even in this very simple example, though, we have managed to highlight a couple anomalies with the numbers (fortunately only in the bytes column, so far) which may be missing, or might be misleading. As we examine further examples we’ll see increasing numbers of anomalies that can confuse the issue if we stick too rigidly to the basic rules for interpreting the plan – a little flexibility and forgiveness will be needed.