Taking the Guesswork out of SQL Server Performance Profiling Part 2

SQL 2005 wait event statistics are a good step forward, but are only useful for isolated testing, and to get a global impression on how the SQL Server is spending its time. Mario Broodbakker demonstrates why it would be much better if SQL Server collected those statistics per session and per subtask.

Dude, there’s my time…

In the previous article, Dude, where’s my time, I described a method, based on use of SQL Server wait events, for measuring response time and pinpointing the main bottlenecks in your SQL Server applications.

Wait Event tracing can be a very powerful diagnostic technique. It can show you, for example, exactly who is taking this exclusive latch (LATCH_EX) and why. Even just seeing the sequence of wait events can be interesting and revealing. In this article I demonstrate some more powerful, and slightly more complex, examples of where time can go, and where to look for it. We will examine SQL and Windows OS Scheduling latency, and concurrency for scheduler resources, using both regular and parallel queries.

NOTE:
If you need some further reference and background on wait event-based performance profiling please refer to my previous article and the references at the end of that article. In particular, two ex-members of Microsoft’s SQL Server Customer Advisory team, Tom Davidson and Gert Drapers, have been presenting and write on this subject since 2003. For research purposes I still maintain my own external stored procedures on www.sqlinternals.com, I encourage you to experiment with them, and learn how SQL statements are actually executed, how underlying mechanisms work.

The SQL Server Scheduler

As mentioned briefly in the previous article, the user mode scheduler (UMS) takes care of scheduling ‘work’ or a ‘task’ on a worker thread. This can be a real OS thread (sometimes called a system thread) or, when running in fiber mode (lightweight pooling enabled), a fiber or user mode thread.

I want to make it clear that ‘the scheduler’ is not a separately working entity or thread controlling everything; it’s just a code path executed by the task that’s currently running in the context of that scheduler. So ‘the scheduler’ is a logical concept, not a physical thing.

Normally there is one scheduler per CPU (or core), and the scheduler attempts to spread the workload between worker threads. These worker threads can run some code, start waiting when a waitable event occurs, resume work, or start another task running on another thread on the same scheduler and go to sleep until the resource wait is finished.

By setting the ‘affinity mask’ option, a scheduler can be pinned to it’s own CPUs Setting an affinity mask can be beneficial for cache reuse, TLB ((page) address translation look aside buffers) and other CPU or cache-related resource reuse. On my dual core test box it’s interesting to observe the behavior of the Windows OS scheduler decides when it runs a SQL Server thread. If I don’t set the affinity mask, then the Windows OS scheduler evenly spreads the ‘CPU workload’ across the two cores. If I set the affinity mask to use two CPU’s (cores in this case) then Windows will pin the workload to one CPU. For a short running query I cannot measure a noticeable difference, but for high-end workloads I know that it can be beneficial.

NOTE:
The BOL has a further explanation of this concept: http://msdn2.microsoft.com/en-us/library/ms187104.aspx.

Investigating a “killCPU” query using SQL 2005 wait events

To demonstrate the impact of SQL Server and OS scheduler decisions, I want to show some query examples and statistics, and try to explain how to interpret the data.

In the first example a 100% CPU bound query, called killCPU, is run against a 10000 row table (t1). The scripts required to run this query are included with the code download for this article (link):

  • begin_waitstats and end_waitstats take snapshot copies of sys.dm_os_wait_stats and sys.dm_exec_sessions at the beginning and end of a test workload, respectively, and display the difference between the two snapshots
  • beg_schedstats and end_schedstats take snapshots of the sys.dm_os_schedulers view and at the beginning and end of a test workload display the difference between the two.

Here is the killCPU query:

use mytest go exec begin_schedstats go exec begin_waitstats go SELECT COUNT(*)    FROM t1 a    INNER LOOP JOIN t1 b       ON b.c1 = a.c1 option (maxdop 1) go exec end_waitstats go exec end_schedstats go For the sake of the tests, Maxdop is set to 1 to force the query not to use a parallel query plan. I measured the results on my test box, an AMD Athlon64 X2 (dual core) running at 2000Ghz, with SQL Server 2005 SP2. This is the output:

Warning: The join order has been enforced because a local join hint is used.

———–
10000

(1 row(s) affected)

wait_type                      waits            wait_time    signal_wait_time
——————————- ———————————— ——–

session_id CPU_time            tot_sched_time elapsed_time PIO     writes LIO
———- ——————- ————– ———— ——- —— ——
56         31281                31548          31574        0       20     29174

sched_id   preemptive_switches context_switches     idle_switches  yield_count
———- ——————- ——————– ————————–
0          1                    22                   59              157
1          0                    98                   15             7861

The first set of columns of the above output (starting with the wait_type column) displays the differences between the beginning and end snapshots from sys.dm_os_wait_stats. As you can see, the columns are empty, indicating that no wait events were detected in this case.

The second set of columns, starting with session_id, displays the difference between the beginning and end snapshots from sys.dm_exec_sessions. From this output we can see:

  • 291744 LIO’s (logical I/O’s) – the query performs a recursive loop through all rows in t1. This results, according to sys.dm_exec_sessions, in almost 300000 logical reads from cache. Logical I/O is purely CPU and memory bound, which corresponds to the 100% CPU observation when running this query
  • 31.3 seconds were spent on the CPU
  • 31.5 seconds was the total scheduled time, which according to the BOL represents the time the requests for this session were scheduled for execution
  • 31.6 seconds was the total elapsed time (this time includes any wait time)

This means that while this query is running it consumes one full CPU (or core) for the length of the query: in this case 31 seconds.

NOTE:
Interestingly, this same query runs in about 20 seconds on SQL Server 2000 SP4, for the same number of logical I/Os. This seems to indicate that doing the same amount of work for this query is more expensive on SQL Server 2005. Since we are comparing CPU time with CPU, and no waits are involved, wait event analysis can’t help us here. It is possible to investigate this issue further using CPU profiling tools, but that is beyond the scope of this article.

The final set of columns, starting with sched_id, show the differences in the start and end snapshots on sys.dm_os_schedulers. It displays a row of data per regular scheduler, each row shows the preemptive switch, context switches and idle switch count and the count of scheduler yields.

On my dual core machine, we see two rows, indicating two schedulers (one scheduler per CPU core).The query was on Scheduler ‘1’, since it apparently did 7861 scheduler yields. This yielding enables other tasks that are bound to this scheduler to get work done too.

All over the SQL Server engine code are yield calls; this yielding is necessary to prevent CPU bound processes from monopolizing the CPU. For example, if the killCPU query was allowed to run on the CPU without yielding and checking for other SQL Server work, it would run for 30 seconds without allowing any other work within the SQL Server instance. This is a side effect of the user mode scheduler of SQL Server, since the operating system itself doesn’t care about SQL Server’s scheduling; it does its own scheduling and it can suspend a SQL Server worker thread at any time. By taking a quick look at the SQL Server 2005 disassembly with a debugger, I found at least 325 locations where scheduler yield was called. SQL Server has a built-in 4 millisecond time quantum, which a task can use up before yielding.

When a task yields the scheduler, the scheduler code will check to see if there are other runable tasks on the runable queue, and it will check for eventual I/O completions, and timer expirations. The scheduler can switch to another task or return to the original task. This behavior can be seen in the context_switches column. In this case, since only one query was running on the system, it hardly happened and all yields quickly resulted in resuming execution of the killCPU query. You can tell this from the fact that there are no scheduler wait events registered. If there was other work for this scheduler, you would start to see waits for the scheduler (we will see this effect later).

NOTE:
For further information on the workings of the scheduler, I can recommend Sameer Tejani’s Scheduling chapter in ‘SQL Server 2005 Practical Troubleshooting’.

This covers the case of a single killCPU query running on an otherwise idle machine. However, what happens when we have two of these killer queries running at the same time? Two things can happen:

  1. Each query gets it’s own scheduler and thus CPU
  2. The two killCPU queries can end up on the same scheduler

The former is preferable, to use an understatement. Fortunately the SQL Server 2005 Scheduler is pretty smart nowadays. It maintains load average statistics per scheduler, and it decides on a task-by-task basis where each task will run. However, once a task is on a scheduler, it will not move, even if it turned out to be a bad decision to run it on the chosen scheduler.

With SQL Server 2000 the situation is worse: each connection is bound to a specific scheduler (the scheduler being chosen on a round robin basis). If you happen to come into the office and start your work, and your application connection is assigned to the same scheduler as a “bulk load from hell” that your co-worker started just before you…well, you’d better get some coffee (or restart your application, and hope it connects to another scheduler).

Two Killer queries, two schedulers

Let’s start with an example of the ‘good’ case: two killCPU queries, one per scheduler, one per CPU.

Following are the results from the _exec_sessions and _os_schedulers views. For brevity and clarity I only present the statistics for one of the two queries:

session_id CPU_time             tot_sched_time elapsed_time PIO  writes  LIO ———- ——————– ————– ———— ———— ——- 56         31390                31564          31572        0    19      291735 sched_id    preemptive_switches  context_switches  idle_switches  yield_count ———– ——————-  —————-  ————-  ———— 0           0                   95                 16             7545 1           0                   164                14             7921 I observed the following, when running these queries:

  • The elapsed and CPU times are still about 31 seconds.
  • Both tasks do roughly the same number of scheduler yields, as expected
  • You could see wait events for sos_scheduler_yield, but hardly ever above 0 ms
  • Perfmon and task manager report that both the CPUs were 100% busy.

Two killer queries, one scheduler

Now the bad case, two killCPU queries on one scheduler and on one CPU. As I noted earlier, SQL 2005 is pretty smart and it’s pretty hard to get this to happen; just start up the queries in sequence, and wait and see where they are scheduled. On SQL Server 2000 it’s much easier to achieve this. Just connect one session after the other and the sessions get bound to scheduler 0 and scheduler 1 right after each other. Interestingly enough, what you will see in task manager depends on the CPU affinity setting in sp_configure. If you leave the affinity mask option at the default setting of ‘0’, then the Windows scheduler will evenly divide the workload across two CPU’s, and task manager will show each CPU 50% busy. However, if you use an affinity mask of ‘3’ (in this 2 CPU / core case) then each scheduler with its threads will be bound to one CPU. This results in a task manager picture where you will see one idle CPU and one CPU 100% busy. In any event, the end result of the test doesn’t depend on the affinity; in both cases the killCPU queries were bound to the same SQL Server scheduler and they had to context switch between each other. Here are the results from the os_wait_stats, exec_sessions and os_schedulers views for one of the two killCPU sessions:

wait_type                 waits  wait_time            signal_wait_time 
————————- —— ——————– —————-
SOS_SCHEDULER_YIELD       15078  58766                58766         
WRITELOG                  4      15                   15         

session_id CPU_time             tot_sched_time elapsed_time PIO  writes   LIO
———- ——————– ————– ———— —- ——– ——
56         31219                31463          60915        6    27       291897

sched_id    preemptive_switches  context_switches     idle_switches  yield_count
———– ——————– ——————– ————-  ————
0           1                    45                   93             260    
1           0                    14631                16             15264  

From these results we can see that:

  • We waited 58.8 seconds for sos_scheduler_yields
  • The elapsed time is now almost doubled to 60.9 seconds
  • CPU time for the query is still 31 seconds.
  • Both killCPU queries were on scheduler ‘1’.

It is important to reiterate that SQL Server doesn’t maintain wait event statistics on a per session basis. So, the fact that we measured almost 60 seconds of wait does not mean that this session waited 60 seconds. In reality, we cannot tell which session was waiting for how long – just that there were 60 seconds of wait time overall.

Of course, in this simple experiment there were only these two queries running on the server so we can safely deduce that each session waited about the same time: 30 seconds. Every time a session yielded the CPU, it resumed the running of the other query. So, we can conclude that the elapsed time of this query is made up from the 31 seconds CPU time and the 30 seconds wait time,

Although this is not a pretty sight, at least you can measure what is happening! The phenomenon observed in these results is called SOS scheduler pressure. It wasn’t possible to measure this in SQL Server 2000, at all. By using DBCC sqlperf(umsstats) you could see a runnable queue being there, but you could not tell how bad it was or how much of your valuable response time your were losing waiting for busy schedulers (or better worker threads on that scheduler).

Getting task- and session-level statistics

Although SQL 2005 wait event statistics are useful, SQL Server maintains these statistics per scheduler and exposes them accumulated for the whole server. Therefore they lose their value if you need to look at the work done by an individual session (unless you are doing isolated testing), and they also cannot be used to examine individual tasks.

As a result, I developed a set of tools to collect per session statistics from SQL Server. The tools were originally developed for SQL Server 2000 but I have since made a version for SQL Server 2005 SP2 (build 3042). You can download the tools for the various versions of SQL Server from the Tools page of http://www.sqlinternals.com/.

WARNING!
These tools are implemented using an external stored procedure (a dll) and are invasive: they modify ‘in memory’ SQL Server code. This is totally unsupported in a production environment and neither the author nor his employer can be held responsible for any damage or support. Having said this, the tools have been tested in a developer environment, and I’ve used the 2000 tools in heavily loaded benchmark environments without damaging any data (to my knowledge), or causing significant slowdown of the application. Check the readme.txt on how to install and use the tools.

In the following section I will demonstrate how the granular data that my tools provide can be used to illustrate the mechanics of time consumption in SQL Server. This sort of data is especially useful when running multiple queries, or queries that utilize sub tasks: i.e. parallel queries.

I use the xp_sqlstats ‘L’ to load the external stored procedure, and I use either readwait from the command prompt, or the xp_waitstats stored procedure to show per session (and per execution context (EC)) wait statistics.

Here’s what xp_waitstats shows for our single killCPU query:

Spid  EC   ResourceDescription   Time(ms)    Count  SignalTime AvgTime Percentage —– —- ——————— ———– —— ———- ——- ——– 58    0    Elapsed time          31186       0       0          0       n/a   58    0    CPU                   31063       5       0          6212.6  100 % 58    0    SOS_SCHEDULER_YIELD   0           7726    0          0       0 %  58    0    Unaccounted for       123         0       0          0       0 % And here are the stats for one session running two killCPU queries concurrently on one scheduler:

Spid  EC   ResourceDescription   Time(ms)    Count   SignalTime AvgTime Percentage
—– —- ——————— ———– ——- ———- ——- ———-
57    0    Elapsed time          60678       0        0          0       n/a  
57    0    CPU                   31390       5        0          6278    52 %
57    0    SOS_SCHEDULER_YIELD   29171       7787     29171      3.74611 48 %
57    0    Unaccounted for       117         0        0          0        0 %

Here we can see directly why our session takes 60 seconds, instead of deducing it from server wide statistics: next to spending 31 seconds on the CPU, we lose 30 seconds waiting on the scheduler (i.e. due to SQLOS scheduler pressure).

KillCPU plus external CPU pressure

Now let’s introduce a new ‘challenge’: external CPU pressure. To introduce this ‘external pressure’, I use a Windows ResourceKit tool called CPUstress. With CPUstress, you can simulate CPU load, and give the threads applying the load different scheduler priorities or different ‘busy’ loads.

For the next example, I set CPUstress to use two threads, on normal priority (just like SQL Server) one per CPU, each thread using maximum CPU cycles. However, both CPU’s are 100% busy, so now the SQL Server threads need to compete with threads external to SQL Server. Since it’s external to SQL Server, it cannot be measured by SQLServer.

This is what xp_waitstats now shows for our single killCPU query:

Spid  EC   ResourceDescription   Time(ms)    Count   SignalTime AvgTime Percentage —– —- ——————— ———– ——- ———- ——- ———- 51    0    Elapsed time          66216       0        0          0       n/a   51    0    CPU                   33656       1        0          33656   51 %  51    0    SOS_SCHEDULER_YIELD   0           8364     0          0        0 %  51    0    Unaccounted for       32560       0        0          0       49 % The total elapsed time is 66 seconds – 33 seconds on the CPU, and another 33 seconds “Unaccounted for”. I didn’t see any waits happening: ‘my’ CPU cycles and response time are lost to an external consumer, and this is shown as ‘Unaccounted for’ time. This is also reflected in the sys.dm_exec_sessions statistics. The total_sched_time is now equal to elapsed_time, but the CPU_time is still about 33 seconds:

session_id CPU_time             tot_sched_time elapsed_time PIO  writes LIO
———- ——————– ————– ———— —- —— ———
52         32500                62914          62927        0    29     291840

SQL Server doesn’t know better: it thinks we are on the scheduler, and we were in fact: our thread was waiting on the Windows OS run queue, waiting to be scheduled by Windows. The CPU time is still the 33 seconds because the CPU time shown by SQL Server is directly acquired from Windows (through the API call GetThreadTime). Unfortunately the Windows OS doesn’t maintain a counter for OS scheduling waits, unlike many UNIXes out there. So you can only deduce it from statistics such as those shown in the example. In real life, I would keep an eye on the run queue through perfmon, to see if the run queue stays high for prolonged periods.

Another way of losing measurable time can be memory pressure. If your system is running out of memory and starts paging, you cannot see this from the SQL Server statistics, it’s simply out of the view of SQL Server, and the waitstats tools also reports this as ‘unaccounted for’ time.

Increasing Signal wait time

Another way to deuce that you are experiencing SQL Server scheduler pressure is the observation of increasing signal wait times. The signal wait time is the time a task waited between the completion of the wait event and the time it took to get on the scheduler again to continue its work. The busier a scheduler gets, the higher the signal wait times get. Unfortunately, during my tests I also observed that the busier the system the higher all wait times get. This is (very likely) due to the fact that the maintenance of start, signal and stop times is done in the SQL Server code itself, so if a system is very busy it can take longer to process the end of a wait event.

This means that the same physical I/O timed at 10 ms at the OS level, can take 16ms in a PAGEIOLATCH_SH for instance, because it takes time to get scheduled on the CPU, and you only hope that your thread is not pushed off of the CPU, preemptively, by a higher priority thread that is ready to run.

This is something to be aware of: on very CPU bound systems, sometimes the waits are not what they seem.

Let’s take a look at an example. The scripts required to follow through are provided in the code download:

  • create_t1000_t10m_1: creates the necessary tables
  • ranread_1: reads from 2 different 10 million row tables, each table in a separate filegroup on different physical disks on a dual core box. This script also does some CPU intensive work at random intervals.
  • ranwait2: combines intensive CPU work, with random length sleeps

Here are the statistics collected using xp_waitstats, while the two scripts were running:

Spid  EC  ResourceDescription  Time(ms)    Count    SignalTime AvgTime Percent —– — ——————– ———– ——- ———- ——– ——- 54    0   Elapsed time         101622      0        0          0        n/a  54    0   CPU                  40531       4        0          10132.75 40 %  54    0   PAGEIOLATCH_SH       60390       8314     15         7.263652 59 %  54    0   SOS_SCHEDULER_YIELD  0           9785     0          0         0 %  54    0   ASYNC_NETWORK_IO     46          1        0          46        0 %  54    0   Unaccounted for      655         0        0          0         1 % Spid  EC  ResourceDescription  Time(ms)    Count    SignalTime AvgTime Percent —– — ——————– ———– ——– ———- ——- ——- 55    0   Elapsed time         80853       0         0          0       n/a   55    0   CPU                  6219        3         0          2073     8 % 55    0   SOS_SCHEDULER_YIELD  0           1594      0          0        0 % 55    0   WAITFOR              74453       19        0          3918.57 92 % 55    0   Unaccounted for      181         0         0          0        0 % As you can see the session with spid=54 spends 40% of it’s time on the CPU and (almost) 60% waiting for I/O’s. The other session, spid=55, spends 8% of it’s time on the CPU and 92% sleeping, waiting for a WAITFOR timeout to occur. You can also see yielding going on but since both the measurements were separately taken on an idle system, the yielding doesn’t end up in waits but in continuing with the session itself. You see that there is no (or hardly any) signal time. Now, if I start a whole bunch of those scripts, 12 ‘waitfor’ scripts, and 6 ‘random i/o’ scripts, you will see that signal times for the PAGEIOLATCH waits and the wait time for SOS_SCHEDULER_YIELD waits suddenly are way up (I only show 4 sessions for clarity reasons):

Spid  EC   ResourceDescription  Time(ms)    Count    SignalTime AvgTime  Percent
—– —- ——————– ———– ——– ———- ——– ——-
58    0    Elapsed time         107880      0         0          0        n/a 
58    0    CPU                  7234        3         0          2411.33   7 %
58    0    SOS_SCHEDULER_YIELD  14640       1977      14640      7.40515  14 % 
58    0    WAITFOR              84828       19        46         4464.63  79 %
58    0    Unaccounted for      1178        0         0          0         1 %
 
59    0    Elapsed time         104190      0         0          0        n/a 
59    0    CPU                  6953        3         0          2317.66   7 %
59    0    SOS_SCHEDULER_YIELD  14218       1894      14218      7.50686  14 %
59    0    WAITFOR              82078       19        46         4319.89  79 %
59    0    Unaccounted for      941         0         0          0         1 %
 
65    0    Elapsed time         274599      0         0          0        n/a 
65    0    CPU                  39563       4         0          9890.75  14 %
65    0    SOS_SCHEDULER_YIELD  27515       12037     27484      2.28586  10 %
65    0    PAGEIOLATCH_SH       184750      8760      2296       21.0901  67 %
65    0    ASYNC_NETWORK_IO     156         5         31         31.2      0 %
65    0    Unaccounted for      22615       0         0          0         8 %
 
66    0    Elapsed time         271361      0         0          0        n/a 
66    0    CPU                  46656       4         0          11664    17 %
66    0    SOS_SCHEDULER_YIELD  29125       14472     29093      2.01250  11 %
66    0    PAGEIOLATCH_SH       168968      8732      2390       19.3504  62 %
66    0    ASYNC_NETWORK_IO     31          1         0          31        0 %
66    0    Unaccounted for      26581       0         0          0        10 %
 
69    0    Elapsed time         266655      0         0          0        n/a 
69    0    CPU                  39110       4         0          9777.5   15 %
69    0    PAGEIOLATCH_SH       181031      8688      2171       20.8369  68 %
69    0    ASYNC_NETWORK_IO     156         2         78         78        0 %
69    0    SOS_SCHEDULER_YIELD  24031       11886     23968      2.02179   9 %
69    0    Unaccounted for      22327       0         0          0         8 %

From this we learn that a serious workload can increase the overall signal wait times. So increasing wait times can be a sign of scheduler pressure: more or faster CPU’s might help, or a better balanced load, or timing, of heavy CPU bound queries.

Scheduler usage and parallel queries

A different subject, but also related to scheduler usage, is parallel query. When you have multiple CPU’s and thus multiple schedulers, it can happen that queries are parallelized by the optimizer. The next example shows a ‘create clustered index’ on SQL Server 2005 SP2, as seen from xp_waitstats:

Spid  EC  ResourceDescription  Time(ms)    Count    SignalTime AvgTime Percent
—– — ——————– ———– ——– ———- ——- ——-
56    0   Elapsed time         160535      0         0          0        n/a 
56    0   CPU                  516         4         0          129      0 %
56    0   SOS_SCHEDULER_YIELD  0           88        0          0        0 % 
56    0   PAGEIOLATCH_SH       6781        1569      0          4.32186  4 %
56    0   CXPACKET             152984      3         0          50994.6  95 %
56    0   PAGEIOLATCH_EX       31          8         0          3.875    0 %
56    0   PAGELATCH_SH         0           1         0          0        0 %
56    0   WRITELOG             0           1         0          0        0 %
56    0   Unaccounted for      223         0         0          0        0 %
 
56    1   Elapsed time         153008      0         0          0        n/a 
56    1   CPU                  15282       5         0          3056.4   10 %
56    1   SOS_SCHEDULER_YIELD  15          2025      15         0.00740  0 %
56    1   PAGEIOLATCH_SH       20968       962       140        21.7962  14 %
56    1   PAGELATCH_UP         250         52        0          4.80769  0 %
56    1   IO_COMPLETION        112843      11889     15         9.49137  74 %
56    1   SLEEP_TASK           15          1         0          15       0 %
56    1   PAGEIOLATCH_UP       2984        28        0          106.571  2 %
56    1   LATCH_EX             312         5         0          62.4     0 %
56    1   CMEMTHREAD           0           3         0          0        0 %
56    1   CXPACKET             0           1         0          0        0 %
56    1   LATCH_SH             234         1         0          234      0 %
56    1   Unaccounted for      105         0         0          0        0 %
 
56    2   Elapsed time         151231      0         0          0        n/a 
56    2   CPU                  14422       5         0          2884.4   10 %
56    2   LATCH_EX             734         11        0          66.7272  0 %
56    2   SOS_SCHEDULER_YIELD  31          2027      31         0.01529  0 %
56    2   PAGEIOLATCH_SH       19890       897       125        22.1739  13 %
56    2   IO_COMPLETION        111265      11830     46         9.40532  74 %
56    2   PAGELATCH_UP         109         42        0          2.59523  0 %
56    2   SLEEP_TASK           562         36        0          15.6111  0 %
56    2   PAGEIOLATCH_UP       3671        31        0          118.419  2 %
56    2   QUERY_EXECUTION_INDE 15          1         0          15       0 %
56    2   CMEMTHREAD           0           2         0          0        0 %
56    2   CXPACKET             0           1         0          0        0 %
56    2   Unaccounted for      532         0         0          0        0 %

(35 row(s) affected)

For one session id different ‘EC’ columns appear. This shows the different execution context id’s for this one session, which means the query was parallelized. In this case ECID 1 ran on a different scheduler than ECID 2. ECID 0 spend most of it’s time waiting for a CXPACKET wait: waiting for the sub tasks to finish their work. You can nicely see the distribution of wait time per sub task: 10% CPU time, around 14% waiting for physical reads and 74% of the time waiting for IO_COMPLETION. By some tracing I found that this is related to sort/merge traffic related to the build of the clustered index.

Summary

Hopefully, the examples presented here demonstrate why I feel that, while SQL 2005 wait event statistics are useful and a good step forward, it would be much better if SQL Server collected those statistics per session and per subtask.

The way these statistics are represented now means that they are only useful for isolated testing, and to get a global impression on how the SQL Server is spending its time.