How much should you worry about CXPACKET?

Several experts already wrote about CXPACKET before, there are plenty of articles about this subject on the internet, so, why one more?

I think I can add my two cents to this subject with an example to emphasize an important sentence that several authors wrote, with small differences:

“CXPACKET is not a problem by itself, it’s a symptom of another problem”

What’s CXPACKET ?

Since several authors already wrote about this, I will summarize and give you some links.

A parallelized query is divided and distributed to several threads. If one of these threads run slower than the others, all threads but the slower will register CXPACKET waits, a complaint about waiting for the slower one.

The problem is: why one of the threads is slower than the others? This is the question that will lead you to the root cause of the problem, which can be something even unrelated to parallelism.

Demo

I will use the AdventureWorks2016 sample database (you can download it here) and the Adam Machanic’s script to create big tables inside AdventureWorks (you can download it here).

Let’s illustrate how CXPACKET is caused by other problems. First, I will block a single record inside a transaction:

begin transaction
SELECT *
FROM bigTransactionHistory AS th (updlock)
WHERE th.ProductID =1900

 

Let’s open a different connection to continue. We will need the SPID from this new connection:

select @@SPID
 
SPID

Using this new connection, we will execute the following query:

SELECT TOP (1000) *
FROM bigTransactionHistory AS th (updlock)
WHERE th.ProductID BETWEEN 1001 AND 2001
ORDER BY ActualCost DESC
option (recompile)

You may notice that this query is trying to read the blocked record, so it will be blocked until the former transaction is finished.

Let’s check all the workers being used by this session and see what they are waiting for.

select w.worker_address,last_wait_type
from sys.dm_os_workers w,sys.dm_os_tasks t
where w.task_address=t.task_address
and t.session_id=66
 
workers and CXPACKET

As you may notice in the image below, two of the workers are registering CXPACKET waits, only one worker is registering an LCK_M_U wait.

It’s easy to notice the problem with these scripts: The query is being blocked by a lock. However, it’s registering a lot of CXPACKET waits and only one LCK_M_U. This illustrates very well how CXPACKET easily increases in volume, leaving the root cause of a problem hidden. LCK_M_U amount may be way below the amount of CXPACKET waits and duration, but if you solve this problem, the CXPACKET will disappear, or at least be reduced.

Doesn’t CXPACKET means a problem with parallelism?

A configuration problem with parallelism can cause CXPACKET waits, however, usually will be the cause of only part of the amount of the CXPACKET waits not all of them.

You need to take care of two important configurations:

  • If your query uses too many threads, the overhead of managing the threads can affect the performance, instead of improving it. You can control the number of threads configuring the ‘Max degree of parallelism’
  • If small queries run in parallel, they will consume more than they would with a serialized execution. We can control this by tunning ‘Cost Threshold for parallelism’, setting the minimal cost for a query to be executed in parallel. The ‘Cost’ is a custom SQL Server measure, it’s not measured in time as many mistakenly think.

You can read much more details about CXPACKET and the parallelism configuration in this excellent article.