{"id":73172,"date":"2015-03-11T04:25:57","date_gmt":"2015-03-11T04:25:57","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/uncategorized\/execution-plans-part-13-sql-trace\/"},"modified":"2021-07-14T13:07:26","modified_gmt":"2021-07-14T13:07:26","slug":"execution-plans-part-13-sql-trace","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/execution-plans-part-13-sql-trace\/","title":{"rendered":"Execution Plans Part 13: SQL Trace"},"content":{"rendered":"<p>In parts <a title=\"Execution Plans Part 11: Actuals\" href=\"https:\/\/allthingsoracle.com\/execution-plans-part-11-actuals\/\">11<\/a> and <a title=\"Execution Plans Part 12: Cardinality Feedback\" href=\"https:\/\/allthingsoracle.com\/execution-plans-part-12-cardinality-feedback\/\">12<\/a> of this series I described the &#8220;<em>rowsource execution statistics<\/em>&#8221; that we can collect as Oracle runs a query, then described a strategy for generating and accessing these statistics in a way that was particularly convenient if you could use your own session to run the SQL you wanted to analyze. In this article we\u2019re going to look at a strategy for generating and accessing the statistics when it\u2019s not your session that\u2019s running the SQL.<\/p>\n<h2>Enabling Tracing<\/h2>\n<p>Although you can enable rowsource execution statistics at the system level rather than the session level, it\u2019s not really desirable to do so for two reasons. First, the basic overheads \u2013 every single execution of every single statement run by every single session would be doing work to collect the statistics; secondly, aggregation would hide interesting detail \u2013 each child cursor would accumulate the statistics for every execution so you wouldn\u2019t be able to see if there were any special cases that might matter.<\/p>\n<p>To get detail and precision we need to target something smaller, like a single statement, or a single session, and log every step of every execution; and we can do this with <em>&#8220;extended SQL trace\u201d<\/em> in one of its many forms.<\/p>\n<p>There are two popular strategies for enabling extended tracing \u2013 by session or by SQL_ID, the latter becoming possible only in the more recent versions of Oracle. There are numerous articles around the internet describing how to enable session tracing so I\u2019m not going to go into detail about all the possibilities, I\u2019ll just summarize one method which is probably the most approved method; I\u2019ll also restrict myself to one simple example of how to enable statement tracing. For one of the most up-to-date notes on what SQL trace can do, you should bookmark Christian Antognini\u2019s <a href=\"http:\/\/antognini.ch\/2012\/08\/event-10046-full-list-of-levels\/\">note on the full list of levels<\/a> of the 10046 trace, and Tanel Poder\u2019s <a href=\"http:\/\/blog.tanelpoder.com\/2009\/03\/03\/the-full-power-of-oracles-diagnostic-events-part-1-syntax-for-ksd-debug-event-handling\/\">note on many more options<\/a> that you can use with tracing.<\/p>\n<h2>Session Tracing<\/h2>\n<p>Historically you could either start tracing your own session with a call to <em>\u201calter session\u201d<\/em>, or you could start tracing in another session with a call to <em><strong>oradebug<\/strong><\/em>; then Oracle introduced some tracing calls in the packages <em><strong>dbms_system<\/strong><\/em>, and <em><strong>dbms_session<\/strong><\/em>; but the \u201cofficially approved\u201d mechanism is now through the <em><strong>dbms_monitior<\/strong><\/em> package (although, as you will see in Christian\u2019s blog note, the package still doesn\u2019t quite cover the full range of possible 10046 levels).<\/p>\n<pre>begin\r\n\tdbms_monitor.session_trace_enable(\r\n\t\tsession_id\t=&gt; 99,\r\n\t\tserial_num\t=&gt; 23154,\r\n\t\twaits\t\t=&gt; true,\r\n\t\tbinds\t\t=&gt; true,\r\n\t\tplan_stat\t=&gt; 'all_executions'\r\n\t);\r\nend;\r\n\/\r\n\r\n-- allow some time to pass\r\n\r\nbegin\r\n\tdbms_monitor.session_trace_disable(\r\n\t\tsession_id\t=&gt; 99,\r\n\t\tserial_num\t=&gt; 23154\r\n\t);\r\nend;\r\n\/<\/pre>\n<p>The values for <em><strong>session_id<\/strong><\/em> and <em><strong>serial_num<\/strong><\/em> come from <em><strong>v$session.sid<\/strong><\/em> and <em><strong>v$session.serial#<\/strong><\/em> for an active session. Between the calls to enable and disable tracing all statements executed by the session will be traced. In my sample code above I\u2019ve asked for all wait states and values of bind variables to be dumped into the trace file as well as the execution plan with rowsource execution stats; and I\u2019ve asked for the plan and stats for every single execution of every statement (by default the plan is dumped only on the first execution after tracing starts). I\u2019ll postpone talking about the content of the trace file until after the example of enabling tracing for a statement.<\/p>\n<h2>Statement Tracing<\/h2>\n<p>Although session-level tracing can be very helpful \u2013 especially when testing \u2013 what do you do when you can see a frequently executed statement in a production AWR report that seems to be using a lot of resources, but always seems to be very efficient whenever you test it from your own session? Somewhere, somehow, sometimes, it must be doing something expensive, but you can never see it happening. You could try enabling SQL trace on a session that you know executes this statement \u2013 but (a) you could still be unlucky and find that that session doesn\u2019t have any problems with the statement and (b) you could be tracing a lot of stuff you\u2019re not interested in at the same time.<\/p>\n<p>The solution is in the new 11g syntax for enabling tracing. (For more detail on the power and range of the 11g options, a good starting point is Tanel Poder\u2019s note about <a href=\"http:\/\/tech.e2sn.com\/oracle\/troubleshooting\/oradebug-doc\"><em><strong>oradebug<\/strong><\/em><\/a>.) Here\u2019s an example of a call I used recently on a client system demonstrating one use of the method:<\/p>\n<pre>alter system  set events\r\n\t'sql_trace[SQL:1104btkdvqx5k] plan_stat=all_executions,wait=true,bind=true'\r\n;\r\n\r\n-- waited 60 seconds\r\n\r\nalter system  set events 'sql_trace[SQL:1104btkdvqx5k] off';<\/pre>\n<p>In the short wait time between enabling and disabling tracing every execution of the statement with SQL_ID = \u20181104btkdvqx5k\u2019 made the session that was executing it dump the bind variables, plan, and wait states for that execution to disc in the session\u2019s trace file.<\/p>\n<p>The reason I had done this on the client site \u2013 a production system \u2013 was that the performance of various jobs run by the users showed extreme variation in run-time, particularly when the system was under heavy load. This SQL_ID identified a critical, frequently executed, statement that was one of the most important in the system, and by enabling tracing for this brief interval I got lucky and ended up with trace files from 9 separate sessions which, after processing through <em><strong>tkprof<\/strong><\/em>, gave us the results from 16 executions of the statement.<\/p>\n<p>The statement was an anonymous PL\/SQL block, and the effect of enabling tracing on the block was that every SQL statement called by the block was also traced. This turned out to be tremendously helpful as it showed us two things \u2013 first that a large fraction of the performance problem related to excessive calls (a couple of thousand) to some security predicates being used for row-level security (a problem that we had already identified by other means and were able to address); secondly that most of the remaining time (and variation) was caused by literally hundreds of calls to the following SQL statement:<\/p>\n<pre>SELECT TO_CHAR(TO_DATE(:B3 ,:B2 ,'NLS_CALENDAR=''GREGORIAN'''),:B1 )FROM DUAL<\/pre>\n<p>Since I had enabled bind variable tracing I went back to the raw trace files and found that the query was always called with one of two possible date values (either 01-01-1000, or 31-12-4172), and checking the output from tkprof I found an SQL statement with the following text in its where clause:<\/p>\n<pre>DATE_COL BETWEEN TO_DATE(UTILS.NORMALIZE_DATE('01-01-1000', 'DD-MM-YYYY'), UTILS.GET_NLS_DATE_FORMAT))\r\n         AND     TO_DATE(UTILS.NORMALIZE_DATE('31-12-4712', 'DD-MM-YYYY'), UTILS.GET_NLS_DATE_FORMAT))<\/pre>\n<p>Predicates using PL\/SQL functions that call SQL are dangerous things \u2013 but it\u2019s often easy to minimize the number of times the SQL is called, typically by declaring the function to be deterministic, or by embedding the call into a scalar subquery viz: (select function_call() from dual). A couple of minutes of highly targeted tracing made it very easy to find a couple of highly relevant issues that could be fixed fairly easily.<\/p>\n<h2>Examining the output<\/h2>\n<p>Whether you enable session tracing or statement tracing, the way you then use the resulting trace files is the same \u2013 first you use <em><strong>tkprof<\/strong><\/em> to summarize the content of the trace files, possibly after using the <em><strong>trcsess<\/strong> <\/em>utility to extract and combine data from several trace file; then you look for clues about where the most time was spent or most work was done or the greatest variability appeared; then (optionally) you go back to the trace files to extract and examine the detail of the issue you&#8217;ve identified.<\/p>\n<p>Here\u2019s an example of the <em><strong>tkprof<\/strong> <\/em>output that you might get from a single session after enabling the system-level tracing of a single statement in 11.2.0.4:<\/p>\n<pre>select id\r\nfrom\r\n t1 where access_code = :b1 and filter_code = :b2\r\n\r\ncall     count       cpu    elapsed       disk      query    current        rows\r\n------- ------  -------- ---------- ---------- ---------- ----------  ----------\r\nParse        0      0.00       0.00          0          0          0           0\r\nExecute      6      0.00       0.00          0          0          0           0\r\nFetch       10      0.01       0.01         48       1710          0           6\r\n------- ------  -------- ---------- ---------- ---------- ----------  ----------\r\ntotal       16      0.01       0.02         48       1710          0           6\r\n\r\nMisses in library cache during parse: 0 \r\nMisses in library cache during execute: 1\r\nOptimizer mode: ALL_ROWS\r\nParsing user id: 62  \r\nNumber of plan statistics captured: 6\r\n\r\nRows (1st) Rows (avg) Rows (max)  Row Source Operation\r\n---------- ---------- ----------  ---------------------------------------------------\r\n         2          1          2  TABLE ACCESS BY INDEX ROWID T1 (cr=285 pr=8 pw=0 time=3246 us cost=6 size=12 card=1)\r\n         5       3336      10000   INDEX RANGE SCAN T1_I1 (cr=9 pr=8 pw=0 time=9056 us cost=1 size=0 card=5)(object id 188624)\r\n\r\nElapsed times include waiting on following events:\r\n  Event waited on                             Times   Max. Wait  Total Waited\r\n  ----------------------------------------   Waited  ----------  ------------\r\n  SQL*Net message to client                       8        0.00          0.00\r\n  db file sequential read                         2        0.00          0.00\r\n  SQL*Net message from client                     4        0.00          0.00\r\n  db file scattered read                          3        0.00          0.00<\/pre>\n<p>One of the nice things that 11g introduced was the <em>\u201cthree column\u201d<\/em> summary of rows supplied by each rowsource \u2013 from the 1st execution, the average of all executions captured, and the maximum across all executions. (Note that each value in the <em>&#8220;Max&#8221;<\/em> column could have come from a different set of stats).<\/p>\n<p>In this case we can pick up three details very easily:<\/p>\n<ol>\n<li>From the summary table we had 6 executions<\/li>\n<li>From the following text notes we have captured 6 sets of plan statistics<\/li>\n<li>From the \u201cRows\u201d details in the one plan shown we see a massive variation in rows produced<\/li>\n<\/ol>\n<p>Closer inspection then shows us that the rowsource execution stats in the plan (particularly the <em><strong>cr=<\/strong><\/em> and <em><strong>pr=<\/strong><\/em> figures) are the averages across the 6 executions \u2013 a fact we deduce by dividing the <em><strong>disk<\/strong><\/em> and <em><strong>query<\/strong><\/em> columns in the <em><strong>Fetch<\/strong><\/em> line of the tabular data by 6. (It\u2019s unfortunate \u2013 though irrelevant in this case &#8211; that the rowsource lines don\u2019t show us the current (cu) block gets; this would be a useful reminder of the cost of updating indexes when we insert, update, or delete rows in a table).<\/p>\n<p>Given the wide variation in stats, we should be prompted to look at the trace file itself. First we look for the statement so that we can find the cursor number that we need to track (#140131987865592 in this case). Be aware that the cursor could close allowing the cursor number to be re-used later in the trace file.<\/p>\n<pre>PARSING IN CURSOR #140131987865592 len=64 dep=0 uid=62 oct=3 lid=62 tim=1424880601505379 hv=3683358447 ad='74fef4d0' sqlid='a5u6333dsr5rg'\r\nselect id from t1 where access_code = :b1 and filter_code = :b2\r\nEND OF STMT<\/pre>\n<p>Then we can look for the <em>\u201cBind values\u201d<\/em> and <em>\u201cExecution stats\u201d<\/em> for each execution. Here\u2019s the first one (with some lines deleted for simplicity):<\/p>\n<pre>BINDS #140131987865592:\r\n Bind#0\r\n  value=1216\r\n Bind#1\r\n  value=5\r\n...\r\nSTAT #140131987865592 id=1 cnt=2 pid=0 pos=1 obj=188623 op='TABLE ACCESS BY INDEX ROWID T1 (cr=8 pr=1 pw=0 time=405 us cost=6 size=12 card=1)'\r\nSTAT #140131987865592 id=2 cnt=5 pid=1 pos=1 obj=188624 op='INDEX RANGE SCAN T1_I1 (cr=3 pr=1 pw=0 time=350 us cost=1 size=0 card=5)'<\/pre>\n<p>Although it\u2019s cosmetically inelegant we can see that the first execution was fairly precise and efficient. Here\u2019s a slightly cleaned version of the STAT lines \u2013 notice that the trace file had <em>&#8220;id=\u201d<\/em> and <em>\u201cpid=\u201d<\/em> entries on each line to give us the parent\/child indentation that we normally expect to see:<\/p>\n<pre>cnt=2  TABLE ACCESS BY INDEX ROWID T1 (cr=8 pr=1 pw=0 time=405 us cost=6 size=12 card=1)  obj=188623\r\ncnt=5    INDEX RANGE SCAN T1_I1       (cr=3 pr=1 pw=0 time=350 us cost=1 size=0  card=5)  obj=188624<\/pre>\n<p>We can now track through the rest of the file checking for the other 5 sets of bind variables and rowsource execution stats to see if we can spot any reason why some of the executions were very expensive. My data set actually had an extreme value which introduced a problem \u2013 here are the details from one of the (two) bad executions:<\/p>\n<pre>BINDS #140131988046400:\r\n Bind#0\r\n  value=-1\r\n Bind#1\r\n  value=-1\r\n\u2026\r\nSTAT #140131988046400 id=1 cnt=0 pid=0 pos=1 obj=188623 op='TABLE ACCESS BY INDEX ROWID T1 (cr=841 pr=30 pw=0 time=9617 us cost=6 size=12 card=1)'\r\nSTAT #140131988046400 id=2 cnt=10000 pid=1 pos=1 obj=188624 op='INDEX RANGE SCAN T1_I1 (cr=22 pr=30 pw=0 time=27520 us cost=1 size=0 card=5)'<\/pre>\n<p>Picking your way through the details in the STAT lines you can see that the index range scan took 22 consistent gets (and 30 physical reads!) to find 10,000 rowids; then the table access took a further 841 consistent gets (and, presumably, a lot of \u201cbuffer is pinned count\u201d) to find no matching rows.<\/p>\n<p>My data set has an <em><strong>access_code<\/strong><\/em> of -1 that identifies a lot of rows, but a <em><strong>filter_code<\/strong><\/em> of -1 that discards it. It looks as if I need to take some defensive action (e.g. adding the filter_code to the index on access_code) to minimize the impact of that extreme case. (In passing, this example is a model of a client issue, and I used exactly this method to identify the problem \u2013 I wasn\u2019t allowed to look at the data, but I was allowed to see some trace files I\u2019d asked for.)<\/p>\n<h2>Summary<\/h2>\n<p>From 11g (particularly) it is very easy to target sessions and statements for precise tracing; and the <em><strong>tkprof<\/strong> <\/em>output from a trace file can easily point you in the direction of the most interesting SQL statements \u2013 the 1st\/avg\/max columns for rows returned give you a very nice indicator of data skew.<\/p>\n<p>Although the trace file itself can be a little difficult to read, if you\u2019ve got a specific statement that the <em><strong>tkprof<\/strong><\/em> output has highlighted then it\u2019s fairly straightforward to pick out the interesting lines from the trace file. I typically end up using <em><strong>vi<\/strong><\/em> and searching for the cursor number, but if you\u2019re proficient with Unix utilities it\u2019s easy to use <em><strong>awk<\/strong><\/em> to get the bind values and <em><strong>grep<\/strong><\/em> to get the rowsource execution stat lines.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In parts 11 and 12 of this series I described the &#8220;rowsource execution statistics&#8221; that we can collect as Oracle runs a query, then described a strategy for generating and accessing these statistics in a way that was particularly convenient if you could use your own session to run the SQL you wanted to analyze. In this article we\u2019re going&hellip;<\/p>\n","protected":false},"author":101205,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143533],"tags":[4783,48509,48515,48528],"coauthors":[],"class_list":["post-73172","post","type-post","status-publish","format-standard","hentry","category-oracle-databases","tag-execution-plans","tag-sql-trace","tag-statement-tracing","tag-tracing"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73172","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/101205"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=73172"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73172\/revisions"}],"predecessor-version":[{"id":91662,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73172\/revisions\/91662"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=73172"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=73172"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=73172"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=73172"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}