Index Costing Threat

Comments 0

Share to social media

In the previous post I described a problem with the strategy that the optimizer uses for costing an index fast full scan, and the alternative strategy that Oracle had supplied at some point in the 10g timeline to allow a more rational cost to be calculated in special cases. In an earlier post still I had described the problem with the way Oracle derived the clustering_factor of an index that resulted in the costs of index range scans (and full scans) being given a cost that was too high.

No matter how many things you find out about inconsistencies in the costing algorithms there always seem to be just a few more traps waiting for you on your next assignment – and some of them even turn out to be oddities that you had noticed several years ago but had forgotten about because you “just knew” they couldn’t possibly turn into a production problem. Here’s one such oddity that I discovered about 5 years ago, and have just rediscovered as a (known) bug – fortunately a bug with at least two workarounds.

Let’s start by creating the demo data that I’ll be accessing with a simple piece of code that I’m going to hint in two different ways:

In 12c, of course, basic statistics for both the table and the index would be created as the code creates the objects, whereas only the index statistics would be created in 11g and I’d have to create both table and index stats manually for 10g – which is what I’ll do next before querying the data. You’ll note that the index is on a column of type varchar2(), populated with a string of exactly 10 characters in every row. With 1 million rows in the table this will give us an index of roughly 1e6 * (11 + 7 + 4) * 100/90 = 25MB … which means a little over 3,000 leaf blocks.

Note that I’ve invoked both queries twice. The important test here is what happens to the costs of the two queries when we set the parameter optimizer_index_cost_adj to a very small value. Now, for a long time (since 9i, basically) the parameter has been viewed as a highly undesirable parameter; to a large extent it was a fix to work around some of the problems of over-costing that should have disappeared when system statistics were introduced. Unfortunately various 3rd party applications still insist that this parameter should be set to a ridiculously small value (and often insist that the optimizer_index_caching should be set to a ridiculously large value), and a fair number of home-grown applictions still use a non-default value (albeit a more realistic value, perhaps) for the parameter.

Here are the plans when we leave the parameter at its default value:

The cost of the index fast full scan is much lower than the cost of the index full scan – and that seems reasonable in this case. Given these figures for the full cost we might expect that setting the optimizer_index_cost_adj to 1 would reduce the two costs by a factor of 100 – in other words to 31 and 5 respectively. This is what I actually got:

The cost of the index full scan is adjusted as expected, but the cost of the index fast full scan doesn’t change – and in this case it means the choice of execution plan would change as we change the optimizer_index_cost_adj. In fact we don’t have to go to the extreme value of 1; looking at the costs above we can infer that (to a first approximation) if we set the parameter to floor(467/3070) = 15 then the cost of the full scan would become a little less than the cost of the fast full scan and the default plan would change.

An interesting (and important) point for my example is that this probably wouldn’t make much difference to performance because the index has just been created; the FULL scan would probably do a large number of extremely efficient “db file parallel read” calls because all the required index leaf blocks would be nicely arranged with lots of “logically adjacent” leaf blocks also being physically adjacent. For an index that had grown slowly over time, though, with lots of index leaf blocks split as the index grew, we would probably see a lot of single block “db file sequential read” waits because logically adjacent leaf blocks would be physically randomly scattered throughout the index segment.

So What ?

It’s relatively easy to find “theoretical” issues with the optimizer – such as this one where the arithmetic looks inappropriate – but the first question you have to ask when you discover such a problem is how much it’s likely to matter and whether there’s a viable workaround.

In this case I suspect my initial response must have been to ignore the problem because:

  • No-one should be messing around with that parameter
  • An index fast full scan is a rarely a useful execution path anyway
  • If you really need it you can hint it

Of course I had overlooked the fact that various 3rd party applications (like Siebel and SAP) have installation directives that insist on all sorts of silly values for optimizer parameters – including the setting optimizer_index_cost_adj to 1; even so the chances of Oracle picking a full scan over a fast full scan seemed fairly remote except for the type of scenario where people end up looking closely at long running SQL anyway to see how they can optimize it in detail.

Inevitably, of course, once you start combining enough Oracle features you can always find a way to introduce a problem that has been around for years that no-one has noticed before. And then you discover that the problem was fixed years ago, but no-one really got around to mentioning it. There is a workaround to this optimizer anomaly – and I don’t know why it hasn’t been  made the default behaviour and published with a “Notable change in behaviour” warning.

In Oracle 10.2.0.5 (and possibly earlier versions of 10g, though I can’t test that) event 38085 was coded to bypass this issue. Set the event to level 1 and the optimizer applies the optimizer_index_cost_adj to the cost of index fast full scans:

So we can fix the problem, and we can fix it in a global fashion rather than having to mess around with hints. Will we ever need to fix it ? Possibly – for example this event is listed in a configuration document I’ve seen for SAP, perhaps as a necessary consequence or their requirement that optimzer_index_cost_adj be set to 1, so there may be cases where the problem shows up in “typical” processing.

For a specific example of the problem having an impact that you probably wouldn’t have predicted and might not recognise there’s a quirky little bug report (Bug 14690310) on MoS which complains about the performance of a dynamic sampling query against a partitioned indexed organized table in 11.2.0.3 because the optimizer does an “index full scan” instead of an “index (sample) fast full scan”. The analysis of the problem given in the bug report looks incomplete, but there can be little doubt that the failure to apply the optimizer_index_cost_adj to the calculated cost of the index fast full scan was a contributing factor and I have constructed a test case that demonstrates the point and shows that the 38085 event addresses the problem.

Summary

It’s always possible to find anomalies and inconsistencies in the optimizer’s calculations, particularly when you try to tweak rarely used parameters. The optimizer_index_cost_adj has been recognised as a source of undesirable side effects of a long time but can still be found in recent installations. One of the “destabilising” side effects of setting the parameter (especially to very low values) is that the optimizer may start to do single block “index full scans” when it would be better to do multi-block “index fast full scans”. There are two global options you can use to counteract this defect: either increase the setting for the optimizer_index_cost_adj (ideally back to the default, of course) or set event 38085 to level 1.

 

Load comments

About the author

Jonathan Lewis

See Profile

Jonathan Lewis is a well-known figure in the Oracle world with more than 30 years’ experience using the software. He has published three books about Oracle – the most recent being “Oracle Core” published by Apress Nov 2011 – and contributed to three others. He has posted more than 1,400 articles on his blog and has spent a lot of his time contributing to newsgroups, forums, and User Group magazines and events around the world. Jonathan has been self-employed for most of his time in the IT industry. For the last 25 years he has specialised in short-term assignments – typically of a design, review, or troubleshooting nature – often spending no more than two or three days on a client site to address problems. After visiting more than 50 countries (and more than a dozen US states) to talk about or troubleshoot Oracle systems, he has finally gone into near-retirement, though still writes articles and does a little consulting over the internet. Despite the differences in the software, he finds that the fundamental principles of solving performance issues don't really seem to change as you move from Oracle to SQL Server. His blog can be found at http://jonathanlewis.wordpress.com.