{"id":101758,"date":"2024-05-15T22:57:07","date_gmt":"2024-05-15T22:57:07","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=101758"},"modified":"2026-05-08T10:24:59","modified_gmt":"2026-05-08T10:24:59","slug":"snake-draft-sorting-in-sql-server-part-3","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/snake-draft-sorting-in-sql-server-part-3\/","title":{"rendered":"Snake Draft Sorting in SQL Server Part 3: Balancing Work Across Tables Within One Database"},"content":{"rendered":"<p><b>This is part 3 of a four-part series on snake-draft (serpentine) sorting to balance parallel work in SQL Server. Parts 1-2 established the technique at the database level: when running parallel jobs against many uneven databases, sort by size descending and assign to workers in a serpentine pattern (1\u21922\u21923\u21923\u21922\u21921\u21921\u21922\u21923&#8230;) to get markedly better load balance than round-robin. <\/b><\/p>\n<p><b>Part 3 extends the technique down one level &#8211; balancing work across 7 unevenly-sized tables within a single database. The same serpentine logic applies, but the work-unit granularity changes from &#8216;database&#8217; to &#8216;table&#8217;, and the worker coordination layer becomes more relevant (parts 1-2 could assume each worker owned its database exclusively; part 3 has workers operating against shared-database tables). <\/b><\/p>\n<p><b>The article walks through the adjustment, measures the runtime reduction on a real workload, and sets up part 4&#8217;s thread-completion coordination problem.<\/b><\/p>\n<div style=\"color: black; padding: 4px 8px; margin: 4px 0 12px 0; border: 1px solid #e00; background: #f7f7f7; font-size: 0.9rem;\">Part of a series: [\u00a0 <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/snake-draft-sorting-in-sql-server-part-1\" rel=\"noopener\">Part 1<\/a> \u00a0|\u00a0 <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/snake-draft-sorting-in-sql-server-part-2\" rel=\"noopener\">Part 2<\/a> \u00a0|\u00a0 <b style=\"color: #999;\">Part 3<\/b> \u00a0|\u00a0 <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/snake-draft-sorting-in-sql-server-part-4\/\" rel=\"noopener\">Part 4<\/a> \u00a0]<\/div>\n<p>In <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/snake-draft-sorting-in-sql-server-part-2\" target=\"_blank\" rel=\"noopener\">part 2 of this series<\/a>, I showed an example implementation of distributing a long-running workload in parallel, in order to finish faster. In reality, though, this involves more than just restoring databases. And I have significant skew to deal with: one database that is many times larger than all the rest and has a higher growth rate. So, even though I had spread out my 9-hour job with 400 databases to run faster by having four threads with 100 databases each, one of the threads still took 5 hours, while the others all finished within 1.5 hours.<\/p>\n<p><img decoding=\"async\" style=\"width: 90%; margin: 0 10px; min-width: 320px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/03\/snake-3-graph-1.png\" \/><\/p>\n<p>If we put little slices in there to represent the databases in each thread, it looks something like this:<\/p>\n<p><img decoding=\"async\" style=\"width: 45%; margin: 0 10px; min-width: 320px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/03\/snake-3-graph-2.png\" \/><\/p>\n<p><em style=\"color: #bbb;\">(Note: None of these graphs are precise.)<\/em><\/p>\n<p>I did what I suggested previously: I put this behemoth on its own thread. I simply ran my script again with a ThreadCount of 3, without the behemoth database, and just assigned it thread 4 manually.<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> DECLARE @ThreadCount tinyint = 3;\n \n SELECT name, size, r = RowNum, GroupNo, Odd, \n   ThreadID = RANK() OVER \n   (\n     PARTITION BY GroupNo \n     ORDER BY CASE Odd \n       WHEN 1 THEN -RowNum ELSE RowNum END, name\n   )\n FROM\n (\n   SELECT name, size, RowNum, \n          GroupNo = (RowNum-1)\/@ThreadCount,\n          Odd     = (RowNum-1)\/@ThreadCount % 2\n   FROM dbo.DatabasesBySize\n   WHERE name &lt;&gt; N'Behemoth'\n ) AS sq\n ORDER BY r;<\/pre>\n<p>This made the other three threads absorb the databases that used to be in thread 4, which made them take a little bit longer, but this made the collective finish times a little more balanced. Now we were down to 4 hours for the biggest database:<\/p>\n<p><img decoding=\"async\" style=\"width: 45%; margin: 0 10px; min-width: 320px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/03\/snake-3-graph-3.png\" \/><\/p>\n<h3>But I want more<\/h3>\n<p>The underlying task here processes data from 7 different tables. Like the databases, these tables aren&#8217;t evenly distributed, and this is most prominently the case in the behemoth database. It is the fastest growing table and many times larger than the others. When I analyzed the runtime for these processes, the thread this database was on still spent a majority of its time processing this one table, while the other tables had to wait:<\/p>\n<p><img decoding=\"async\" style=\"width: 45%; margin: 0 10px; min-width: 320px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/03\/snake-3-graph-4.png\" \/><\/p>\n<p>I determined that I could further parallelize the work by putting this one table on its own thread, and the other 6 tables could be processed in the meantime. And we&#8217;re down to a small enough number of objects that I no longer have to do this through code &#8211; I know which table is largest and I know the names of the other 6 tables. I don&#8217;t even have to use my toes! I made two sub-threads here, one with the biggest table, and one with the rest:<\/p>\n<p><img decoding=\"async\" style=\"width: 45%; margin: 0 10px; min-width: 320px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/03\/snake-3-graph-5.png\" \/><\/p>\n<p>This is great progress! We&#8217;ve taken a job that used to take 9 hours and shrunk it down to just over 3 hours, with mostly negligible performance impact on the server (aside from slightly elevated CPU).<\/p>\n<h3>But I still want more<\/h3>\n<p>The work in this database is still unevenly distributed, and I knew that I could reduce the runtime even further by parallelizing this table on its own, using pagination. If I can have 4 threads processing this one table simultaneously then, at least in theory, I should be able to have an even bigger impact on runtime.<\/p>\n<p>The table currently has about 161 million relatively wide rows, and a primary key on a column called <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">ElephantID<\/code>. I used the system metadata to tell me how to distribute those rows across 4 threads, but I also only wanted any one thread to process 250,000 rows at a time. This query told me everything I needed to know:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> DECLARE @TableName   sysname = N'dbo.Elephant',\n         @ThreadCount int     = 4,\n         @RowsPerPage int     = 250000;\n\n ;WITH threads AS \n (\n   SELECT ThreadID = value \n   FROM GENERATE_SERIES(1, @ThreadCount)\n )\n SELECT \n     [RowCount],\n     PageCount,\n     RowsPerThread,\n     ThreadID,\n     [FirstRow] = RowsPerThread * (ThreadID - 1) + 1, \n     [LastRow]  = RowsPerThread *  ThreadID, \n     FirstPage  = ((ThreadID-1) * [PageCount] \/ @ThreadCount) + 1,\n     LastPage   = ((ThreadID)   * [PageCount] \/ @ThreadCount)\n  FROM threads CROSS JOIN\n  (\n    SELECT \n\t  [Rowcount], \n\t  RowsPerThread = [Rowcount] \/ @ThreadCount, \n\t  [PageCount]   = [Rowcount] \/ @RowsPerPage\n    FROM\n    (\n      SELECT [Rowcount] = SUM([rows])\n        FROM sys.partitions \n        WHERE [object_id] = OBJECT_ID(@TableName) \n          AND index_id = 1\n    ) AS rowcounts\n  ) AS breakdown;<\/pre>\n<p>Output:<\/p>\n<p><img decoding=\"async\" style=\"width: 90%; margin: 0 10px; min-width: 320px; border: 1px solid #333;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/03\/snake-3-page-parameters.png\" \/><\/p>\n<p>The jobs or scripts would run that query and pull the information only for their thread, calling the following procedure in a loop for each page in their bucket:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> CREATE PROCEDURE dbo.PagingTheElephant\n   @PageNumber int,\n   @PageSize   int\n AS\n BEGIN\n   SET NOCOUNT ON;\n\n   ;WITH ThisPage AS \n   (\n     SELECT ElephantId\n       FROM dbo.Elephant \n       ORDER BY Id \n       OFFSET @PageSize*(@PageNumber-1) ROWS\n       FETCH NEXT @PageSize ROWS ONLY\n   )\n   SELECT e.ElephantId \/* , e.OtherCols *\/ \n     FROM dbo.Elephant AS e \n     INNER JOIN ThisPage AS tp\n        ON tp.ElephantId = e.ElephantID\n     ORDER BY e.ElephantId;\n END<\/pre>\n<p>Now we&#8217;re really getting somewhere! With four threads processing chunks of that table together, I cut the execution time almost in half:<\/p>\n<p><img decoding=\"async\" style=\"width: 45%; min-width: 320px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2024\/03\/snake-3-graph-6.png\" \/><\/p>\n<p>As I mentioned before, I could experiment with adding even more threads to process some of these simultaneously, and possibly bring the runtime down even further. For now, though, I am extremely happy with these improvements.<\/p>\n<h3>Next time<\/h3>\n<p>In my next post &#8211; my last in this series, I promise &#8211; I&#8217;ll show how I coordinate all of the threads finishing with starting the next task, avoiding conflicts or unnecessary delays.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Part 3 of the snake-draft sorting series &#8211; extending the technique from balancing work across databases (parts 1-2) to balancing work across 7 unevenly-sized tables within a single database. Shows how to apply serpentine assignment at the intra-database level and measures the runtime reduction versus naive round-robin.&hellip;<\/p>\n","protected":false},"author":341115,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143527,53,143524,143531],"tags":[4151],"coauthors":[158980],"class_list":["post-101758","post","type-post","status-publish","format-standard","hentry","category-database-administration-sql-server","category-featured","category-sql-server","category-t-sql-programming-sql-server","tag-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101758","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/341115"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=101758"}],"version-history":[{"count":20,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101758\/revisions"}],"predecessor-version":[{"id":110412,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101758\/revisions\/110412"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=101758"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=101758"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=101758"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=101758"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}