{"id":109133,"date":"2026-04-14T14:04:00","date_gmt":"2026-04-14T14:04:00","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=109133"},"modified":"2026-04-14T07:56:05","modified_gmt":"2026-04-14T07:56:05","slug":"data-manipulation-techniques-in-esproc-spl-a-complete-guide","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/development\/python\/data-manipulation-techniques-in-esproc-spl-a-complete-guide\/","title":{"rendered":"Data manipulation techniques in esProc SPL: a complete guide"},"content":{"rendered":"\n<p><strong>In this article, part of the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/collections\/moving-from-python-to-esproc-spl\/\" target=\"_blank\" rel=\"noreferrer noopener\">&#8220;Moving from Python to esProc&#8221;<\/a> series, we&#8217;ll look at <a href=\"https:\/\/www.esproc.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">esProc SPL&#8217;s<\/a> data manipulation capabilities compared to <a href=\"https:\/\/www.red-gate.com\/simple-talk\/tag\/python\/\" target=\"_blank\" rel=\"noreferrer noopener\">Python<\/a>.<\/strong><\/p>\n\n\n\n<p>As you may already know, real-world data rarely arrives in a clean, analysis-ready format. You&#8217;ll frequently need to clean <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/theory-and-design\/bad-data-and-dirty-databases\/\" target=\"_blank\" rel=\"noreferrer noopener\">messy data<\/a>, reshape it to suit your analysis needs, merge multiple datasets, and perform complex filtering and calculations before extracting meaningful insights. Luckily, <a href=\"https:\/\/www.esproc.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">esProc SPL<\/a> handles these common data manipulation tasks with remarkable elegance and efficiency.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-clean-messy-data-with-esproc-spl\">How to clean &#8216;messy&#8217; data with esProc SPL<\/h2>\n\n\n\n<p>Datasets are rarely perfect. <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/missing-data-2\/\" target=\"_blank\" rel=\"noreferrer noopener\">Missing values<\/a>, inconsistent formats, and outliers can significantly impact analysis quality. Let&#8217;s look at how esProc SPL handles these <a href=\"https:\/\/support.microsoft.com\/en-us\/office\/top-ten-ways-to-clean-your-data-2844b620-677c-47a7-ac3e-c2e157d1db19\" target=\"_blank\" rel=\"noreferrer noopener\">data cleaning<\/a> challenges &#8211; often some of the most time-consuming aspects of data analysis.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-identify-and-handle-missing-values-in-esproc-spl\">How to identify and handle missing values in esProc SPL<\/h3>\n\n\n\n<p>Missing values occur when data is incomplete. You need to detect and handle them to ensure accurate results. This flowchart illustrates the systematic approach to dealing with missing data in a dataset:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"736\" height=\"513\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/03\/image-2.png\" alt=\"A flowchart illustrating the systematic approach to dealing with missing data in a dataset.\" class=\"wp-image-109135\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/03\/image-2.png 736w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/03\/image-2-300x209.png 300w\" sizes=\"auto, (max-width: 736px) 100vw, 736px\" \/><\/figure>\n\n\n\n<p>It begins with loading raw data, then moves to identifying missing values through counting and visualization techniques. Once missing values are detected, the process branches into three strategic options: removing records with missing values (which is simple but can lose valuable information), replacing missing values with defaults or calculated substitutes (like means or medians), or computing new values derived from other fields in the dataset. <\/p>\n\n\n\n<p>All three strategies ultimately converge to produce a clean dataset ready for analysis. This flowchart provides a structured decision framework that helps data analysts choose the most appropriate method for handling missing values based on their specific data context and analysis requirements.<\/p>\n\n\n\n<p>The first step in cleaning messy data is identifying missing values. In esProc SPL, missing values are represented as `null`. Let&#8217;s start by creating a sample dataset with some missing values:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; A<\/td><td>&nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp;&nbsp;&nbsp; 1<\/td><td>=file(&#8220;document\/sales.csv&#8221;).import@ct()&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Sales data with 100 rows &nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp;&nbsp;&nbsp; 2<\/td><td>=A1.run(AMOUNT=if(rand()&lt;0.1,null,AMOUNT))<\/td><td>Randomly set ~10% of AMOUNT values to null &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Now, let&#8217;s identify rows with missing values:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp; 3<\/td><td>=A2.select(AMOUNT==null)&nbsp;&nbsp;&nbsp;<\/td><td>Rows with missing AMOUNT &nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp; 4<\/td><td>=A3.len()&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>&nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The output of A4 will show that we have 10 rows with missing AMOUNT values, which is approximately 10% of our dataset, as expected. Now, let&#8217;s count missing values by region:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp; 5<\/td><td>=A2.groups(REGION; count(AMOUNT==null):MISSING_COUNT, count(AMOUNT):TOTAL_COUNT)&nbsp;&nbsp;&nbsp;<\/td><td>Count missing values by region &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the distribution of missing values across regions. Now, let&#8217;s handle these missing values using different techniques:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-how-to-remove-rows-with-missing-values\">How to remove rows with missing values<\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp; 6<\/td><td>=A2.select(AMOUNT!=null)&nbsp;&nbsp;<\/td><td>Remove rows with missing AMOUNT &nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp;&nbsp; 7<\/td><td>=A6.len()&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>&nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The output of A7 shows the number of rows we have after removing the rows with missing AMOUNT values.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-how-to-fill-missing-values-with-a-constant\">How to fill missing values with a constant<\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 8<\/td><td>=A2.derive(ifn(AMOUNT, 0):AMOUNT_FILLED)&nbsp;&nbsp;&nbsp;<\/td><td>Replace nulls with 0 &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-how-to-fill-missing-values-with-statistical-measures\">How to fill missing values with statistical measures<\/h4>\n\n\n\n<p>You can first add the AMOUNT_FILLED field to A2, then fill values directly after grouping. This avoids using new and join, improving performance. This is an SPL-exclusive technique using retained grouped subsets.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp; 9<\/td><td>=A2.derive(:AMOUNT_FILLED) =A2.derive(:AMOUNT_FILLED) &nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp; 10<\/td><td>=A9.group(REGION) &nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp; 11<\/td><td>=A10.run(a=~.avg(AMOUNT),~.run(AMOUNT_FILLED=ifn(AMOUNT,a))) &nbsp;<\/td><\/tr><tr><td>&nbsp; 12<\/td><td>=A9.to(5) &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>A9: Add an empty AMOUNT_FILLED field to A2.<\/p>\n\n\n\n<p>A10: Group A9 by REGION and retain original record sets within each group. Updating field values in the grouped records will modify the corresponding values in the original table A9.<\/p>\n\n\n\n<p>A11: Iterate through A10. For each group, calculate the average of AMOUNT and store it in a temporary variable a. Then, iterate through the group\u2019s records and assign values to AMOUNT_FILLED. ifn(AMOUNT,a) means: return AMOUNT if it is not null; otherwise, return a.<\/p>\n\n\n\n<section id=\"my-first-block-block_a78aebcfd52b4c609b5589de77b46b29\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Simple Talk is brought to you by Redgate Software<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            Take control of your databases with the trusted Database DevOps solutions provider. Automate with confidence, scale securely, and unlock growth through AI.                                    <\/div>\n            <\/div>\n                            <a href=\"https:\/\/www.red-gate.com\/solutions\/overview\/\" class=\"btn btn--secondary btn--lg\">Discover how Redgate can help you<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-detect-and-handle-outliers-in-esproc-spl\">How to detect and handle outliers in esProc SPL<\/h3>\n\n\n\n<p><a href=\"https:\/\/statisticsbyjim.com\/basics\/outliers\/\" target=\"_blank\" rel=\"noreferrer noopener\">Outliers<\/a> can significantly skew your analysis and lead to incorrect conclusions. Let&#8217;s explore how to detect and handle outliers in esProc SPL.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-how-to-identify-outliers-using-z-score\">How to identify outliers using Z-score<\/h4>\n\n\n\n<p>The Z-score measures how many standard deviations a data point is from the mean. Values with a Z-score greater than 3 or less than -3 are often considered outliers. Since A14 only contains one row, there\u2019s no need to join with A6. You can directly add the Z_SCORE field to A6 and return the final result. The code is as follows:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>14<\/td><td>=A6.group(;~.avg(AMOUNT):AVG, sqrt(var@r(~.(AMOUNT))):STD) &nbsp;&nbsp;<\/td><td>Calculate mean and standard deviation<\/td><\/tr><tr><td>15<\/td><td>=A6.derive((AMOUNT-A14.AVG)\/A14.STD: Z_SCORE) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Join with statistics<\/td><\/tr><tr><td>16<\/td><td>=A15.select(abs(Z_SCORE)&gt;2) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Calculate Z-score<\/td><\/tr><tr><td>17<\/td><td>=A16.top(-5, Z_SCORE) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Select outliers (Z-score &gt; 2) &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the top outliers based on Z-score. The first two rows have Z-scores greater than 2, indicating they are potential outliers.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-how-to-identify-outliers-using-iqr-interquartile-range\">How to identify outliers using IQR (interquartile range)<\/h4>\n\n\n\n<p>Another common method for detecting outliers is the <a href=\"https:\/\/www.scribbr.co.uk\/stats\/interquartile-range-meaning\/\" target=\"_blank\" rel=\"noreferrer noopener\">IQR<\/a> method, which identifies values below Q1 &#8211; 1.5*IQR or above Q3 + 1.5*IQR as outliers:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp;&nbsp; 18<\/td><td>=A6.groups(;median(1:4,AMOUNT):Q1,median(3:4,AMOUNT):Q3)&nbsp;&nbsp;<\/td><td>Calculate Q1 and Q3<\/td><\/tr><tr><td>&nbsp;&nbsp; 20<\/td><td>A6.select(AMOUNT&lt;A21.LOWER_BOUND || AMOUNT&gt;A21.UPPER_BOUND)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Select outliers<\/td><\/tr><tr><td>&nbsp; 21<\/td><td>=A20.top(-5,AMOUNT)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>First 5 outliers<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-how-to-handle-outliers\">How to handle outliers<\/h4>\n\n\n\n<p>Once you&#8217;ve identified outliers, you can handle them in several ways, as shown in this flowchart:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"975\" height=\"402\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/03\/image-3.png\" alt=\"An image showing the several ways of handling outliers in esProc SPL.\" class=\"wp-image-109136\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/03\/image-3.png 975w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/03\/image-3-300x124.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/03\/image-3-768x317.png 768w\" sizes=\"auto, (max-width: 975px) 100vw, 975px\" \/><\/figure>\n\n\n\n<p>Let&#8217;s implement the capping strategy:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp; 22<\/td><td>=A18.derive(AMOUNT_CAPPED:if(AMOUNT&gt;UPPER_BOUND,UPPER_BOUND,if(AMOUNT&lt;LOWER_BOUND,LOWER_BOUND,AMOUNT)))&nbsp;&nbsp;&nbsp; Cap outliers Replace the nested if with:<br>if(AMOUNT&gt;UPPER_BOUND:UPPER_BOUND, AMOUNT&lt;LOWER_BOUND:LOWER_BOUND; AMOUNT)<br>The full line should be:<br>=A22.derive(if(AMOUNT&gt;UPPER_BOUND:UPPER_BOUND, AMOUNT&lt;LOWER_BOUND:LOWER_BOUND; AMOUNT):AMOUNT_CAPPED:) &nbsp;<\/td><\/tr><tr><td>&nbsp; 23<\/td><td>=A22.select(DATE,REGION,PRODUCT,AMOUNT,AMOUNT_CAPPED)&nbsp;&nbsp;&nbsp; Select relevant columns &nbsp;<\/td><\/tr><tr><td>&nbsp; 24<\/td><td>=A23.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Since our dataset doesn&#8217;t have extreme outliers based on the IQR criterion, the capped values are the same as the original values.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-remove-duplicates-in-esproc-spl\">How to remove duplicates in esProc SPL<\/h3>\n\n\n\n<p><a href=\"https:\/\/www.red-gate.com\/simple-talk\/devops\/database-devops\/but-the-database-worked-in-development-checking-for-duplicates\/\" target=\"_blank\" rel=\"noreferrer noopener\">Duplicate records<\/a> can skew your analysis and lead to incorrect conclusions. Let&#8217;s explore how to identify and remove duplicates in esProc SPL:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp; 29<\/td><td>=A1.groups(REGION+&#8221;-&#8220;+PRODUCT:DUPLICATE_KEY;count():COUNT)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Count occurrences of each key<\/td><\/tr><tr><td>&nbsp; 30<\/td><td>=A30.select(COUNT&gt;1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Select keys with multiple occurrences &nbsp;<\/td><\/tr><tr><td>&nbsp; 31<\/td><td>=A31.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>First 5 duplicate keys<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This example simply counts the number of records with the same REGION and PRODUCT. Now, let&#8217;s remove duplicates based on a specific key:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp;33<\/td><td>=A29.groups(REGION,PRODUCT:max(AMOUNT):MAX_AMOUNT)&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Keep only the record with the maximum AMOUNT for each key &nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp; 34<\/td><td>=A33.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>First 5 deduplicated records &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the deduplicated records, keeping only the record with the maximum AMOUNT for each REGION-PRODUCT combination.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-reshape-data-in-esproc-spl-wide-to-long-and-vice-versa\">How to reshape data in esProc SPL: wide to long and vice-versa<\/h2>\n\n\n\n<p>Data reshaping is a common operation in data analysis, especially when preparing data for visualization or specific analytical techniques. Let&#8217;s explore how esProc SPL handles data reshaping operations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-wide-to-long-format-unpivoting\">The wide to long format (unpivoting)<\/h3>\n\n\n\n<p>Let&#8217;s start by creating a wide-format dataset with quarterly sales by region:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp; 35<\/td><td>=A34.groups(REGION,year(DATE):YEAR,ceil(month(DATE)\/3):QUARTER;sum(AMOUNT):TOTAL_SALES)&nbsp;&nbsp;<\/td><td>Calculate total sales by region, year, and quarter<\/td><\/tr><tr><td>&nbsp;&nbsp; 36<\/td><td>=A35.pivot(REGION; QUARTER,TOTAL_SALES; 1:&#8221;Q1&#8243;,2:&#8221;Q2&#8243;,3:&#8221;Q3&#8243;, 4:&#8221;Q4&#8243;) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Pivot to create quarterly columns<\/td><\/tr><tr><td>&nbsp;&nbsp; 37<\/td><td>=A36.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>&nbsp;First 5 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the total sales by region and quarter in a wide format, with each quarter as a separate column. Now, let&#8217;s convert this wide format back to a long format:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp;38<\/td><td>=A36.pivot@r(REGION; QUARTER,TOTAL_SALES; Q1:1,Q2:2,Q3:3,Q4:4) &nbsp;&nbsp;&nbsp;<\/td><td>&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Unpivot quarterly columns<\/td><\/tr><tr><td>&nbsp;&nbsp; 39<\/td><td>=A38.select(SALES)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Remove rows with null sales<\/td><\/tr><tr><td>&nbsp;&nbsp; 40<\/td><td>=A39.to(10)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>First 10 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>After pivoting, the data order remains the same as the original, so no additional sorting is required. The data is displayed in a long format, with each row representing the sales for a specific region, year, and quarter. This format is often more suitable for visualization and certain types of analysis.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-long-to-wide-format-pivoting\">The long to wide format (pivoting)<\/h3>\n\n\n\n<p>Now, let&#8217;s create a different long-format dataset and convert it to a wide format:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;41<\/td><td>=A38.groups(PRODUCT,month(DATE):MONTH;sum(AMOUNT):TOTAL_SALES)&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Calculate total sales by product and month &nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp;42<\/td><td>=A41.to(10)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>First 10 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>When grouping, the data is automatically sorted by the grouping fields, so no additional sorting is needed. This shows the total sales by product and month in a long format. Now, let&#8217;s convert this to a wide format:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp; 43<\/td><td>=A38.pivot(PRODUCT;MONTH,TOTAL_SALES; 1:&#8221;JAN&#8221;,2:&#8221;FEB&#8221;,3:&#8221;MAR&#8221;,4:&#8221;APR&#8221;,5:&#8221;MAY&#8221;,6:&#8221;JUN&#8221;,7:&#8221;JUL&#8221;,8:&#8221;AUG&#8221;,9:&#8221;SEP&#8221;,10:&#8221;OCT&#8221;,11:&#8221;NOV&#8221;,12:&#8221;DEC&#8221;) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Pivot to create monthly columns<\/td><\/tr><tr><td>&nbsp;&nbsp; 44<\/td><td>=A43.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>First 5 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the total sales by product and month in a wide format, with each month as a separate column. This format is often more suitable for reporting and certain types of analysis.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-merge-and-join-datasets-in-esproc-spl\">How to merge and join datasets in esProc SPL<\/h2>\n\n\n\n<p>Joining datasets is a fundamental operation in data analysis, allowing you to combine information from multiple sources. SPL provides several methods for joining datasets, similar to SQL joins.<\/p>\n\n\n\n<p>Let&#8217;s create a second dataset with customer information. Use the &#8216;create&#8217; function to define the table structure and record to populate data, as shown below:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp; A45<\/td><td>=create(CUSTOMER,INDUSTRY,SIZE,COUNTRY).record( &nbsp; [&#8220;TechCorp&#8221;,&#8221;Technology&#8221;,&#8221;Large&#8221;,&#8221;USA&#8221;, &nbsp; &#8220;HomeOffice&#8221;,&#8221;Retail&#8221;,&#8221;Small&#8221;,&#8221;Canada&#8221;, &nbsp; &#8220;DataSystems&#8221;,&#8221;Technology&#8221;,&#8221;Medium&#8221;,&#8221;USA&#8221;, &nbsp; &#8220;EduCenter&#8221;,&#8221;Education&#8221;,&#8221;Medium&#8221;,&#8221;UK&#8221;, &nbsp; &#8220;CloudNet&#8221;,&#8221;Technology&#8221;,&#8221;Large&#8221;,&#8221;Germany&#8221;, &nbsp; &#8220;OfficeMax&#8221;,&#8221;Retail&#8221;,&#8221;Large&#8221;,&#8221;USA&#8221;, &nbsp; &#8220;HealthCare&#8221;,&#8221;Healthcare&#8221;,&#8221;Medium&#8221;,&#8221;Canada&#8221;, &nbsp; &#8220;FinServices&#8221;,&#8221;Finance&#8221;,&#8221;Large&#8221;,&#8221;UK&#8221;, &nbsp; &#8220;GovAgency&#8221;,&#8221;Government&#8221;,&#8221;Large&#8221;,&#8221;USA&#8221;, &nbsp; &#8220;SmallBiz&#8221;,&#8221;Retail&#8221;,&#8221;Small&#8221;,&#8221;Germany&#8221;] ) &nbsp; &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Now, let&#8217;s look at the different types of joins:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-inner-join\">The inner join<\/h3>\n\n\n\n<p>An inner join returns only the rows that have matching values in both datasets:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;46<\/td><td>=join(A1:o,CUSTOMER; A51:c,CUSTOMER) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Inner join with customer data<\/td><\/tr><tr><td>&nbsp; 47<\/td><td>=A46.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>First 5 rows &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the sales data enriched with customer information including industry, size, and country.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-left-join\">The left join<\/h3>\n\n\n\n<p>A left join returns all rows from the left dataset and the matching rows from the right dataset. Compared to the previous example, just add the @1 option to indicate a left join. Note: it\u2019s the number \u201c1\u201d, not the letter \u201cl\u201d:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp; 48<\/td><td>=join@1(A1:o,CUSTOMER; A51:c,CUSTOMER) &nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp; 49<\/td><td>=A48.new(o.DATE, o.CUSTOMER, o.PRODUCT, o.AMOUNT, c.INDUSTRY, c.SIZE, c.COUNTRY) &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>In this case, the output is the same as the inner join because all customers in the sales data have matching records in the customer data.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-full-join\">The full join<\/h3>\n\n\n\n<p>A full join returns all rows when there is a match in either the left or right dataset:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp; 50<\/td><td>=join@f(A1:o,CUSTOMER; A51:c,CUSTOMER) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Full join with customer data<\/td><\/tr><tr><td>&nbsp; 51<\/td><td>=A50.to(15)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>First 15 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The output might include rows from both datasets, even if there&#8217;s no match.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-joining-on-multiple-columns\">Joining on multiple columns<\/h3>\n\n\n\n<p>You can also join datasets on multiple columns:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp; 52<\/td><td>=A1.groups(CUSTOMER,month(DATE):MONTH;sum(AMOUNT):TOTAL_SALES)&nbsp;&nbsp;&nbsp;<\/td><td>Calculate total sales by customer and month<\/td><\/tr><tr><td>&nbsp;&nbsp; 53<\/td><td>=A52.to(10)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>First 10 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Now, let&#8217;s create another dataset with monthly targets:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp; A54<\/td><td>=create(CUSTOMER,MONTH,TARGET).record( &nbsp; [&#8220;TechCorp&#8221;,4,1500, &nbsp; &#8220;TechCorp&#8221;,5,2000, &nbsp; &#8220;TechCorp&#8221;,6,1800, &nbsp; &#8220;TechCorp&#8221;,7,1500, &nbsp; &#8220;HomeOffice&#8221;,4,500, &nbsp; &#8220;HomeOffice&#8221;,5,600, &nbsp; &#8220;HomeOffice&#8221;,6,550, &nbsp; &#8220;HomeOffice&#8221;,7,450, &nbsp; &#8220;DataSystems&#8221;,4,2000, &nbsp; &#8220;DataSystems&#8221;,5,1800, &nbsp; &#8220;DataSystems&#8221;,6,1600, &nbsp; &#8220;DataSystems&#8221;,7,1400] ) &nbsp; &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Now, let&#8217;s join the sales data with the targets data on both CUSTOMER and MONTH:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp; 55<\/td><td>=join(A54:o, CUSTOMER, MONTH; A65:c, CUSTOMER, MONTH) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Join on CUSTOMER and MONTH<\/td><\/tr><tr><td>&nbsp; 56<\/td><td>=A55.to(10)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>First 10 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the sales performance against targets for each customer and month.<\/p>\n\n\n\n<section id=\"my-first-block-block_21230ae76077b62e9f278dab8dea3fda\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Enjoying this article? Subscribe to the Simple Talk newsletter<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            Get selected articles, event information, podcasts and other industry content delivered straight to your inbox.                                    <\/div>\n            <\/div>\n                            <a href=\"https:\/\/www.red-gate.com\/simple-talk\/subscribe\/\" class=\"btn btn--secondary btn--lg\">Subscribe now<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-more-esproc-spl-filtering-techniques-and-conditions\">More esProc SPL filtering techniques and conditions<\/h2>\n\n\n\n<p>esProc SPL offers filtering capabilities that allow you to select rows based on conditions. Let&#8217;s explore some more filtering techniques.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-filter-with-multiple-conditions-in-esproc-spl\">How to filter with multiple conditions in esProc SPL<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp; 57<\/td><td>=A1.select(REGION==&#8221;East&#8221; &amp;&amp; (PRODUCT==&#8221;Laptop&#8221; || PRODUCT==&#8221;Server&#8221;) &amp;&amp; AMOUNT&gt;1000)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; filtering<\/td><\/tr><tr><td>&nbsp;&nbsp;&nbsp;58<\/td><td>=A57.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; First 5 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows high-value sales (over $1,000) of laptops or servers in the East region.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-filter-with-regular-expressions-in-esproc-spl\">How to filter with regular expressions in esProc SPL<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp; 59<\/td><td>=A1.select(like(CUSTOMER,&#8221;*Corp*&#8221;))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Filter customers with &#8220;Corp&#8221; in the name<\/td><\/tr><tr><td>&nbsp;&nbsp;&nbsp; 60<\/td><td>=A60.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; First 5 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows sales to customers with &#8220;Corp&#8221; in their name.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-filter-with-date-ranges-in-esproc-spl\">How to filter with date ranges in esProc SPL<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp;&nbsp; 61<\/td><td>=A1.select(DATE&gt;=date(&#8220;2023-05-01&#8221;) &amp;&amp; DATE&lt;=date(&#8220;2023-05-31&#8221;))&nbsp;&nbsp;&nbsp; Filter for May 2023<\/td><\/tr><tr><td>&nbsp;&nbsp;&nbsp;&nbsp; 62<\/td><td>=A61.to(2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =First 2 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The output of A62 shows sales in May 2023.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-filter-with-subqueries-in-esproc-spl\">How to filter with subqueries in esProc SPL<\/h3>\n\n\n\n<p>esProc SPL allows you to use the results of one query to filter another query, similar to subqueries in SQL:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;63<\/td><td>=A1.groups(CUSTOMER;sum(AMOUNT):TOTAL_PURCHASES)&nbsp;&nbsp; Calculate total purchases by customer<\/td><\/tr><tr><td>&nbsp; 64<\/td><td>=A63.select(TOTAL_PURCHASES&gt;5000)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Select high-value customers<\/td><\/tr><tr><td>&nbsp;&nbsp;65<\/td><td>=A64.to(2)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Now, let&#8217;s use these high-value customers to filter the original sales data:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp; 66<\/td><td>=A1.select(A65.(CUSTOMER).contain(CUSTOMER)) &nbsp; Filter for high-value customers<\/td><\/tr><tr><td>&nbsp; 67<\/td><td>=A66.sort(CUSTOMER,DATE)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Sort by customer and date<\/td><\/tr><tr><td>&nbsp; 68<\/td><td>=A67.to(5)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows sales to high-value customers (those with total purchases over $5,000), sorted by customer and date.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-filter-with-window-functions-in-esproc-spl\">How to filter with window functions in esProc SPL<\/h3>\n\n\n\n<p>Window functions allow you to perform calculations across a set of rows related to the current row. Let&#8217;s use window functions to filter for sales that are above the average for their region:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp; 69<\/td><td>=A1 .group(REGION; (AVG_REGION_AMOUNT =~.avg(AMOUNT), ~.select(AMOUNT &gt;AVG_REGION_AMOUNT)):ABOVE).conj(ABOVE) &nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp; 70<\/td><td>=A69.sort(REGION,AMOUNT:-1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Sort by region and amount<\/td><\/tr><tr><td>&nbsp;&nbsp; 71<\/td><td>=A70.to(10)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows sales that are above the average for their region, sorted by region and amount.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-create-calculated-fields-and-derived-columns-in-esproc-spl\">How to create calculated fields and derived columns in esProc SPL<\/h2>\n\n\n\n<p>Calculated fields and derived columns allow you to create new data based on existing columns. SPL&#8217;s `derive` method provides a way to create these columns.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-basic-calculations\">Basic calculations<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp; 72<\/td><td>=A1.derive(TAX:AMOUNT*0.1,TOTAL:AMOUNT+AMOUNT*0.1)&nbsp; Add tax and total columns<\/td><\/tr><tr><td>&nbsp;&nbsp; 73<\/td><td>=A72.to(5)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the original sales data with added tax (10% of the amount) and total (amount + tax) columns.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-conditional-calculations\">Conditional calculations<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp;74<\/td><td>=A1.derive(DISCOUNT:if(AMOUNT&gt;1000,AMOUNT*0.05,0),NET_AMOUNT:AMOUNT-if(AMOUNT&gt;1000,AMOUNT*0.05,0))&nbsp; Add discount and net amount columns<\/td><\/tr><tr><td>&nbsp;&nbsp; 75<\/td><td>=A74.to(5)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the original sales data with added discount (5% for amounts over $1,000) and net amount (amount &#8211; discount) columns.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-date-calculations\">Date calculations<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp; 76<\/td><td>=A1.derive(interval(DATE,now()):DAYS_SINCE_ORDER, DATE+7:DELIVERY_DATE) &nbsp;&nbsp; Add days since order and delivery date columns<\/td><\/tr><tr><td>&nbsp; 77<\/td><td>=A76.to(5)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the original sales data with added columns for days since order (assuming today is March 16, 2024) and delivery date (7 days after the order date).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-string-manipulations\">String manipulations<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp; 78<\/td><td>=A1.derive(upper(CUSTOMER):CUSTOMER_UPPER,if(like(PRODUCT,&#8221;Lap*&#8221;):&#8221;Computer&#8221;,like(PRODUCT,&#8221;Ser.*&#8221;:&#8221;Server&#8221;,like(PRODUCT,&#8221;Mon.*&#8221;:&#8221;Peripheral&#8221;:&#8221;Other&#8221;):PRODUCT_CATEGORY)&nbsp;&nbsp; Add uppercase customer and product category columns<\/td><\/tr><tr><td>&nbsp;&nbsp; 79<\/td><td>=A78.to(5)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the original sales data with added columns for uppercase customer names and product categories based on the product name.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-calculated-columns-with-window-functions\">Calculated columns with window functions<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;&nbsp;&nbsp; 80<\/td><td>=A1.sort(REGION,AMOUNT:-1).derive(rank(AMOUNT;REGION):REGION_RANK) &nbsp;<\/td><\/tr><tr><td>&nbsp;&nbsp; 81<\/td><td>=A80.sort(REGION,DATE).derive(cum(AMOUNT):RUNNING_TOTAL) &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the original sales data with added columns for running total (cumulative sum of AMOUNT by REGION, sorted by DATE) and rank (rank of AMOUNT within REGION, with higher amounts having lower ranks).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-window-functions-and-rolling-calculations-work-in-esproc-spl\">How window functions and rolling calculations work in esProc SPL<\/h2>\n\n\n\n<p>Window functions allow you to perform calculations across a set of rows related to the current row. SPL provides powerful window functions for various analytical tasks.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-ranking-functions\">Ranking functions<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp; 82<\/td><td>=A1.sort(AMOUNT:-1).derive(rank(AMOUNT):RANK, ranki(AMOUNT):DENSE_RANK, #:ROW_NUMBER)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the sales data with added ranking columns:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>RANK: Standard competition ranking (with gaps for ties)<\/li>\n\n\n\n<li>DENSE_RANK: Dense ranking (without gaps for ties)<\/li>\n\n\n\n<li>ROW_NUMBER: Unique row number<\/li>\n<\/ul>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-aggregate-window-functions\">Aggregate window functions<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;83<\/td><td>= A1.groups(1; avg(AMOUNT): AVG_AMOUNT ) &nbsp;<\/td><\/tr><tr><td>&nbsp;84<\/td><td>=A1.groups(REGION:avg(AMOUNT): AVG_REGION_AMOUNT) &nbsp;<\/td><\/tr><tr><td>&nbsp;85<\/td><td>=A1.join( 1,A100:#1, AVG_AMOUNT; REGION, A101:REGION, AVG_REGION_AMOUNT, AMOUNT- AVG_AMOUNT: DIFF_FROM_AVG, AMOUNT- AVG_REGION_AMOUNT: DIFF_FROM_REGION_AVG) .sort(REGION,AMOUNT:-1) &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>If you want results similar to SQL window functions, the above syntax works. However, SPL doesn\u2019t require using, join in such cases. You can first calculate AVG_AMOUNT = A1.avg(AMOUNT) and then use this variable directly &#8211; no join is needed. SQL requires joins to reference results within the same table, but SPL allows referencing independent variables freely.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-moving-averages-and-cumulative-sums\">Moving averages and cumulative sums<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;86<\/td><td>=A1.groups(month@y(DATE):YEAR_MONTH;sum(AMOUNT):MONTHLY_SALES)&nbsp;&nbsp; Calculate monthly sales<\/td><\/tr><tr><td>&nbsp;87<\/td><td>=A86.derive(MONTHLY_SALES[-1:1].avg():MA3, MONTHLY_SALES[:0].sum():CUMSUM)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>MONTHLY_SALES[-1:1] represents a set of MONTHLY_SALES values from the previous row to the next row. -n means \u201cn rows above\u201d, and n means \u201cn rows below\u201d.<\/p>\n\n\n\n<p>MONTHLY_SALES[:0] represents a set from the beginning up to the current row, where 0 refers to the current row.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-percentiles-and-quartiles\">Percentiles and quartiles<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp; 88<\/td><td>=A1.groups(REGION;median(1:4,AMOUNT):Q1,median(2:4,AMOUNT):MEDIAN,median(3:4,AMOUNT):Q3,min(AMOUNT):MIN,max(AMOUNT):MAX)&nbsp;&nbsp; Calculate quartiles by region<\/td><\/tr><tr><td>&nbsp; 89<\/td><td>=A88.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; First 5 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows quartiles and extremes for AMOUNT by region.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-create-pivot-tables-and-cross-tabulations-in-esproc-spl\">How to create pivot tables and cross-tabulations in esProc SPL<\/h2>\n\n\n\n<p><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/crosstab-pivot-table-workbench\/\" target=\"_blank\" rel=\"noreferrer noopener\">Pivot tables and cross-tabulations<\/a> are powerful tools for summarizing and analyzing data. SPL provides several methods for creating these types of summaries.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-basic-pivot-tables\">Basic pivot tables<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;90<\/td><td>=A1.groups(REGION,ceil(month(DATE)\/3):QUARTER;sum(AMOUNT):TOTAL_SALES)&nbsp;&nbsp; Calculate total sales by region and quarter<\/td><\/tr><tr><td>&nbsp;91<\/td><td>=A90.pivot(REGION; QUARTER,TOTAL_SALES; 1:&#8221;Q1&#8243;,2:&#8221;Q2&#8243;,3:&#8221;Q3&#8243;,4:&#8221;Q4&#8243;) &nbsp;&nbsp; Pivot to create quarterly columns<\/td><\/tr><tr><td>&nbsp;92<\/td><td>=A91.sort(Q1:-1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Sort by Q1 sales (descending)<\/td><\/tr><tr><td>&nbsp;93<\/td><td>=A92.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; First 5 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows total sales by region and quarter, with each quarter as a separate column.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-multi-level-pivot-tables\">Multi-level pivot tables<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;94<\/td><td>=A1.derive(QUARTER:ceil(month(DATE)\/3),PRODUCT_CATEGORY:if(like(PRODUCT,&#8221;Lap*&#8221;):&#8221;Computer&#8221;,like(PRODUCT,&#8221;Ser*&#8221;):&#8221;Server&#8221;,like(PRODUCT,&#8221;Mon*&#8221;):&#8221;Peripheral&#8221;:&#8221;Other&#8221;))&nbsp;&nbsp; Add QUARTER and PRODUCT_CATEGORY columns<\/td><\/tr><tr><td>95<\/td><td>A94.groups(REGION,PRODUCT_CATEGORY,QUARTER;sum(AMOUNT):TOTAL_SALES)&nbsp; Calculate total sales by region, product category, and quarter<\/td><\/tr><tr><td>96<\/td><td>=A95.pivot(REGION,PRODUCT_CATEGORY:QUARTERTOTAL_SALES;1:&#8221;Q1&#8243;,2:&#8221;Q2&#8243;,3:&#8221;Q3&#8243;,4:&#8221;Q4&#8243;)&nbsp;&nbsp; Create multi-level pivot table<\/td><\/tr><tr><td>97<\/td><td>=A96.to(10)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; First 10 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows total sales by region, product category, and quarter.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cross-tabulations-with-percentages\">Cross-tabulations with percentages<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp; 98<\/td><td>=A1.derive(QUARTER:ceil(month(DATE)\/3),PRODUCT_CATEGORY:if(like(PRODUCT,&#8221;Lap*&#8221;):&#8221;Computer&#8221;, like(PRODUCT,&#8221;Ser*&#8221;):&#8221;Server&#8221;, like(PRODUCT,&#8221;Mon*&#8221;):&#8221;Peripheral&#8221;;&#8221;Other&#8221;)) &nbsp;<\/td><\/tr><tr><td>&nbsp;99<\/td><td>=A98.groups(REGION,PRODUCT_CATEGORY;count():COUNT)&nbsp;&nbsp; Count sales by region and product category<\/td><\/tr><tr><td>&nbsp;100<\/td><td>=A99.pivot(REGION:PRODUCT_CATEGORYCOUNT; &#8220;Computer&#8221;, &#8220;Server&#8221;, &#8220;Peripheral&#8221;, &#8220;Other&#8221;)&nbsp;&nbsp;&nbsp;&nbsp; Create cross-tabulation<\/td><\/tr><tr><td>&nbsp;101<\/td><td>=A100.derive(TOTAL:Computer+Other+Peripheral,COMPUTER_PCT:Computer\/TOTAL,OTHER_PCT:Other\/TOTAL,PERIPHERAL_PCT:Peripheral\/TOTAL)&nbsp;&nbsp; Add total and percentage columns<\/td><\/tr><tr><td>&nbsp;102<\/td><td>=A101.to(5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; First 5 rows<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This shows the count and percentage of sales by region and product category.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-summary-and-next-steps\">Summary and next steps<\/h2>\n\n\n\n<p>In this article, we&#8217;ve looked at data manipulation techniques in esProc SPL. We&#8217;ve covered cleaning messy data, reshaping data between wide and long formats, merging datasets with various join operations, applying advanced filtering techniques, creating calculated fields, using window functions for analysis, and building pivot tables for summarizing data.<\/p>\n\n\n\n<p>These techniques provide a solid foundation for tackling data challenges. By combining these approaches, you can transform raw, messy data into clean, structured insights that drive informed decision-making.<\/p>\n\n\n\n<p>As you continue your journey with esProc SPL, remember that data manipulation is both an art and a science. The techniques presented here are powerful tools, but their effective application requires understanding your data and the questions you&#8217;re trying to answer. Practice these techniques with your own datasets to develop intuition for which approaches work best in different scenarios.<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/www.red-gate.com\/simple-talk\/collections\/moving-from-python-to-esproc-spl\/\" target=\"_blank\" rel=\"noreferrer noopener\">Click here for the full &#8220;Moving from Python to esProc SPL&#8221; series.<\/a><\/strong><\/p>\n\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: Data manipulation techniques in esProc SPL<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. How does esProc SPL&#039;s performance compare to pandas for large datasets?<\/h3>\n            <div class=\"faq-answer\">\n                <p>esProc SPL is designed for high-performance data processing and often outperforms pandas for large datasets, especially when operations can be parallelized. esProc SPL&#8217;s memory management through cursor operations also allows it to handle datasets larger than available RAM.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. Can I combine esProc SPL with Python libraries for specialized tasks?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Yes, esProc SPL provides integration with Python through its Python plugin. This allows you to call Python functions from within SPL scripts, combining the strengths of both languages.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. Is it possible to perform incremental updates to datasets in esProc SPL?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Yes, SPL supports incremental updates through its cursor operations and append functions. This allows for efficient processing of new data without recomputing entire datasets.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">4. How does esProc SPL handle data type conversions during transformations? <\/h3>\n            <div class=\"faq-answer\">\n                <p>SPL automatically handles many type conversions, but also provides explicit conversion functions like int(), float(), string(), and date() for more control over the process.<\/p>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Understand data manipulation in esProc SPL. Learn how to clean, filter, merge, and reshape data efficiently compared to Python.&hellip;<\/p>\n","protected":false},"author":342096,"featured_media":105002,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143526,146042],"tags":[159117,159374,5021],"coauthors":[159002],"class_list":["post-109133","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-development","category-python","tag-esproc-spl","tag-moving-from-python-to-esproc-spl","tag-python"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/109133","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/342096"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=109133"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/109133\/revisions"}],"predecessor-version":[{"id":109145,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/109133\/revisions\/109145"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/105002"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=109133"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=109133"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=109133"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=109133"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}