{"id":105060,"date":"2025-01-08T22:17:27","date_gmt":"2025-01-08T22:17:27","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=105060"},"modified":"2025-01-09T11:07:45","modified_gmt":"2025-01-09T11:07:45","slug":"high-concurrency-data-pipelines-in-fabric","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/data-analytics\/microsoft-fabric\/high-concurrency-data-pipelines-in-fabric\/","title":{"rendered":"High Concurrency Data Pipelines in Fabric"},"content":{"rendered":"<p>Data Pipelines can orchestrate many activities, creating a flow for data ingestion. One of these activities is the notebook execution activity.<\/p>\n<p>However, every time a data pipelines executes a notebook, it creates a completely new session and spark pool.<\/p>\n<p>This makes the Data Pipeline very slow and expensive.<\/p>\n<h2>How bad it can be<\/h2>\n<p>Imagine your pipeline will run a notebook inside a loop. The loop executes the notebook many times.<\/p>\n<p>Each execution means a completely new spark pool. This is expensive.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1330\" height=\"649\" class=\"wp-image-105061\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/01\/a-screenshot-of-a-computer-description-automatica.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>Besides being expensive, the default configurations for a spark session and a capacity will not support this running in parallel. You will need to limit the number of parallel notebook executions, using the ForEach activity, like in the image below<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"522\" height=\"190\" class=\"wp-image-105062\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/01\/a-screenshot-of-a-computer-description-automatica-1.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<h2>High Concurrency to the Rescue<\/h2>\n<p>The solution is to enable High Concurrency for Data Pipelines running notebooks. This can be done in two steps:<\/p>\n<ul>\n<li>Enable this configuration in the workspace settings<\/li>\n<li>Configure the session tag in the notebook activity<\/li>\n<\/ul>\n<p>In the workspace settings, you find this option to be enabled in Spark Settings, like in the image below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"807\" height=\"582\" class=\"wp-image-105063\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/01\/a-screenshot-of-a-computer-description-automatica-2.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<p>After that, the Session Tag configuration defines which notebook activities will use this feature or not. You can create groups of notebook activities running each group in a different session. You can use any string as &#8220;Session Tag&#8221;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"918\" height=\"560\" class=\"wp-image-105064\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/01\/a-screenshot-of-a-computer-description-automatica-3.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<h2>The High Concurrency Results<\/h2>\n<p>The image below shows a comparison between the execution without high concurrency and with high concurrency.<\/p>\n<p>The execution time dropped from almost 13 minutes to less than 3.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"938\" height=\"366\" class=\"wp-image-105065\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/01\/a-screenshot-of-a-computer-description-automatica-4.png\" alt=\"A screenshot of a computer\n\nDescription automatically generated\" \/><\/p>\n<h2>References<\/h2>\n<p><a href=\"https:\/\/www.youtube.com\/watch?v=oc2b1Xcu1ts\" target=\"_blank\" rel=\"noopener\">Fabric Monday 55: Pipelines High Concurrency to Save Yout Time and Money<\/a><\/p>\n<h2>Summary<\/h2>\n<p>If you plan to orchestrate notebooks using Data Pipelines, the High Concurrency configuration is essential for you<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data Pipelines can orchestrate many activities, creating a flow for data ingestion. One of these activities is the notebook execution activity. However, every time a data pipelines executes a notebook, it creates a completely new session and spark pool. This makes the Data Pipeline very slow and expensive. How bad it can be Imagine your&#8230;&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[159160,159164,2],"tags":[145486,159236,158997,159035],"coauthors":[6810],"class_list":["post-105060","post","type-post","status-publish","format-standard","hentry","category-data-analytics","category-microsoft-fabric","category-other","tag-data-factory","tag-data-pipelines","tag-microsoft-fabric","tag-notebook"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/105060","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=105060"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/105060\/revisions"}],"predecessor-version":[{"id":105066,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/105060\/revisions\/105066"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=105060"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=105060"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=105060"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=105060"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}