{"id":110048,"date":"2026-06-25T12:00:00","date_gmt":"2026-06-25T12:00:00","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=110048"},"modified":"2026-06-03T09:42:43","modified_gmt":"2026-06-03T09:42:43","slug":"analytics-in-esproc-spl-a-complete-guide","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/development\/analytics-in-esproc-spl-a-complete-guide\/","title":{"rendered":"Analytics in esProc SPL: a complete guide"},"content":{"rendered":"\n<p><strong>Understand analytics in esProc SPL with this comprehensive guide covering time series analysis, text processing, statistical methods, custom functions, machine learning integration, and interactive dashboards.<\/strong> <strong>Part of the &#8220;<a href=\"https:\/\/www.red-gate.com\/simple-talk\/collections\/moving-from-python-to-esproc-spl\/\" target=\"_blank\" rel=\"noreferrer noopener\">Moving from Python to esProc SPL<\/a>&#8221; series.<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/making-data-analytics-simpler-sql-server-and-r\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data analytics<\/a> has evolved from simple reporting to complex <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/predictive-analysis-basics\/\" target=\"_blank\" rel=\"noreferrer noopener\">predictive modeling<\/a>, and your choice of tools significantly impacts your analytical capabilities. Whether you&#8217;re analyzing sales trends, processing customer feedback, or building predictive models, <a href=\"https:\/\/www.esproc.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">esProc SPL<\/a> provides the tools you need to transform raw data into actionable insights.<\/p>\n\n\n\n<p>What makes esProc SPL stand out is its <em>focus<\/em> on data analysis. Unlike general-purpose languages where you might write dozens of lines of code for basic <a href=\"https:\/\/www.red-gate.com\/simple-talk\/development\/python\/data-manipulation-techniques-in-esproc-spl-a-complete-guide\/\" target=\"_blank\" rel=\"noreferrer noopener\">data manipulation<\/a>, esProc SPL offers specialized syntax designed specifically for analytical tasks. This means you can accomplish complex analyses with concise, readable code.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-time-series-analysis-works-in-esproc-spl\">How time-series analysis works in esProc SPL<\/h2>\n\n\n\n<p><a href=\"https:\/\/aws.amazon.com\/blogs\/database\/understanding-time-series-data-and-why-it-matters\/\" target=\"_blank\" rel=\"noreferrer noopener\">Time series data<\/a> is everywhere, from stock prices and sales figures to website traffic and sensor readings. Analyzing this data effectively requires specialized techniques, and that&#8217;s exactly where esProc SPL comes in.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-resample-time-series-data-in-esproc-spl\">How to resample time-series data in esProc SPL<\/h3>\n\n\n\n<p>Let&#8217;s start with a common challenge: you have daily sales data but need to analyze monthly or quarterly trends. This process of changing the frequency of your time series data is called resampling. Here&#8217;s how SPL handles it:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>1<\/td><td>=file(&#8220;document\/sales.csv&#8221;).import@tc(OrderDate,Amount)&nbsp;&nbsp;&nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>First, I&#8217;m importing our sales data from a CSV file, selecting just the order dates and amounts, and sorting everything chronologically. This gives us a clean dataset to work with. The sorting step is crucial for time series analysis because it ensures that our data points are in the correct temporal sequence.<\/p>\n\n\n\n<p>Now, let&#8217;s transform this daily data into monthly aggregates:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>2<\/td><td>=A1.groups(month@y(OrderDate):month;sum(Amount): sum_amount)&nbsp;<\/td><td>Group by month and calculate the sum of amounts, it aggregates daily data into monthly totals.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>In the first line, I&#8217;m creating a new column called &#8220;month&#8221; by extracting and combining the year and month from each order date. The `year()` function extracts the year (like &#8220;2021&#8221;) and the `month()` function extracts the month number (like &#8220;1&#8221; for January). By combining them with a hyphen, we get identifiers like &#8220;2021-1&#8221; for January 2021.<\/p>\n\n\n\n<p>Then, I&#8217;m grouping all records by this new month column and calculating the sum of all amounts within each month. The `groups()` function is powerful because it automatically handles the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/postgresql\/eager-aggregation-in-sql-queries\/\" target=\"_blank\" rel=\"noreferrer noopener\">aggregation<\/a> process, similar to a <code>GROUP BY<\/code> in <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/what-is-sql\/\" target=\"_blank\" rel=\"noreferrer noopener\">SQL<\/a> but with more flexibility.<\/p>\n\n\n\n<p>Now we can clearly see the total sales for each month &#8211; but what if we want to look at quarterly data instead? Well, we can do that too:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>3<\/td><td>=A1.groups(year(OrderDate)\/&#8221;-Q&#8221;\/ceil(month(OrderDate)\/3):quarter;sum(Amount): sum_amount,avg(Amount): avg_amount:)&nbsp;<\/td><td>Group by quarter and calculate multiple aggregates<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-concatenate-different-data-types-such-as-integers-and-strings\">How to concatenate different data types (such as integers and strings)<\/h3>\n\n\n\n<p>When concatenating different <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/guidelines-for-choosing-data-types\/\" target=\"_blank\" rel=\"noreferrer noopener\">data types<\/a> such as integers and strings, use the \/ operator. The + operator can only be used for string-to-string concatenation. Here, I&#8217;m creating a quarter label by taking the year and adding &#8220;Q&#8221; plus the quarter number. The quarter number is calculated by dividing the month by 3 and rounding up using the `ceiling()` function. For example:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>January (month 1): 1\/3 = 0.33, ceil(0.33) = 1, so it&#8217;s Q1<\/li>\n\n\n\n<li>April (month 4): 4\/3 = 1.33, ceil(1.33) = 2, so it&#8217;s Q2<\/li>\n\n\n\n<li>December (month 12): 12\/3 = 4, ceil(4) = 4, so it&#8217;s Q4<\/li>\n<\/ul>\n<\/div>\n\n\n<p>I&#8217;m then grouping by quarter and calculating both the <em>sum<\/em> and <em>average<\/em> of amounts. This gives us two different perspectives: the total sales volume and the typical order size for each quarter.<\/p>\n\n\n\n<p>This quarterly view might reveal seasonal patterns that weren&#8217;t obvious in the monthly data. It might indicate a seasonal peak in your business, perhaps due to holidays or industry-specific cycles.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-moving-averages-and-trend-analysis-explained\">Moving averages and trend analysis, explained<\/h3>\n\n\n\n<p>Raw time series data often contains <a href=\"https:\/\/www.techtarget.com\/searchbusinessanalytics\/definition\/noisy-data\" target=\"_blank\" rel=\"noreferrer noopener\">noise<\/a> and short-term fluctuations that can obscure the underlying trend. Moving averages help smooth out these fluctuations. Let&#8217;s calculate a 3-month moving average:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>4<\/td><td>=A3.derive(sum_amount[-1:1].avg():moving_avg_3m) &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The result of the grouping in A4 is already automatically sorted by the grouping field, so no need to sort again. sum_amount[-1:1] means a collection of three values: the previous, current, and next row\u2019s sum_amount. Likewise, to calculate a 5-month moving average, you can use sum_amount[-2:2].avg(). <\/p>\n\n\n\n<p><em>In esProc SPL, you don\u2019t need a special function for moving average. Just use <a href=\"https:\/\/www.red-gate.com\/simple-talk\/featured\/using-regex-in-sql-server-2025-complete-guide\/\" target=\"_blank\" rel=\"noreferrer noopener\">expressions<\/a> for full flexibility.<\/em><\/p>\n\n\n\n<p>For example, to compute a moving sum instead, use sum_amount[-1:1].sum(). Since the number of built-in functions is limited, writing custom expressions allows for much greater flexibility to meet various needs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-another-trend-analysis-technique-exponential-smoothing\">Another trend analysis technique: exponential smoothing<\/h3>\n\n\n\n<p>For more trend analysis, let&#8217;s implement <a href=\"https:\/\/www.influxdata.com\/blog\/exponential-smoothing-beginners-guide\/\" target=\"_blank\" rel=\"noreferrer noopener\">exponential smoothing<\/a>, which gives more weight to recent observations:<\/p>\n\n\n\n<p><em>In SPL, you can use colname[-1] to get the previous row\u2019s value directly, so there\u2019s no need for a for loop in this case &#8211; just compute the exponential smoothing directly within derive.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>A<\/td><\/tr><tr><td>5<\/td><td>=A4.derive(if(#==1, sum_amount, exp_smooth[-1]*0.7+sum_amount*0.3):exp_smooth)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This code implements single exponential smoothing using a smoothing factor of 0.3 to smooth time series data. It begins by creating a new column, &#8216;exp_smooth`, initialized with zeros. A cursor is then used to iterate through each row of the dataset. The first row&#8217;s actual value is assigned as the initial smoothed value, serving as a baseline. <\/p>\n\n\n\n<p>As the loop progresses through the remaining rows, each new smoothed value is computed by taking 70% of the previous smoothed value and adding 30% of the current <em>actual<\/em> value. This iterative process ensures that each row is updated with its corresponding smoothed value, gradually adjusting the trend while reducing short-term fluctuations.<\/p>\n\n\n\n<p>The output will show how exponential smoothing responds more quickly to recent changes than a simple moving average. The advantage of exponential smoothing is that it gives more weight to recent observations while still considering the entire history of the data. This makes it more responsive to recent changes in the trend compared to a simple moving average.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-seasonal-decomposition-and-forecasting-explained\">Seasonal decomposition and forecasting, explained<\/h3>\n\n\n\n<p>Many time series have seasonal patterns: retail sales spike during holidays, ice cream sales increase in summer, etc. To identify these patterns, we can decompose a time series into trend, seasonal, and residual components:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"766\" height=\"502\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-11.png\" alt=\"\" class=\"wp-image-110049\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-11.png 766w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-11-300x197.png 300w\" sizes=\"auto, (max-width: 766px) 100vw, 766px\" \/><\/figure>\n\n\n\n<p>Let&#8217;s implement a simple seasonal decomposition in esProc SPL:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>A<\/td><\/tr><tr><td>6<\/td><td>=A4.avg(sum_amount)<\/td><\/tr><tr><td>7<\/td><td>=A4.groups(month%100:season;avg(sum_amount)\/A21:seasonal_factor).keys@i(season)<\/td><\/tr><tr><td>8<\/td><td>=A4.derive((month\\100-2021)*12+month%100:period_num, sum_amount\/A22.find(month%100).seasonal_factor:deseasonalized)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>A6<\/strong>: First, calculate the overall average.<\/p>\n\n\n\n<p><strong>A7<\/strong>: In <strong>A4<\/strong>, the month field is an integer in yyyyMM format. The % symbol is the modulo operator, so month % 100extracts the <strong>month<\/strong> component. Group by month, and compute each month\u2019s <strong>seasonal average \/ overall average<\/strong> to obtain the <strong>seasonal factor<\/strong>.<\/p>\n\n\n\n<p><strong>A8<\/strong>: Divide each month\u2019s total sales amount in <strong>A4<\/strong> by the corresponding seasonal factor to get the <strong>seasonally adjusted sales amount<\/strong>.<\/p>\n\n\n\n<p>Now we can use the deseasonalized data to forecast future values using a simple linear regression:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>9<\/td><td>=linefit(A6.(period_num),A6.(deseasonalized)) &nbsp;<\/td><td>Fit a linear regression model, this finds the line of best fit through our deseasonalized data.<\/td><\/tr><tr><td>10<\/td><td>=A10*13 &nbsp;&nbsp;&nbsp;&nbsp;<\/td><td>Predict the deseasonalized value for period 13 &nbsp;<\/td><\/tr><tr><td>11<\/td><td>=A11*A7.find(13%12).seasonal_factor &nbsp;&nbsp;<\/td><td>Re-seasonalize the prediction &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This forecasting approach begins by setting up a regression model, where the period number serves as the independent variable (x), and the deseasonalized amount as the dependent variable (y).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-a-guide-to-text-analysis-and-natural-language-processing-capabilities-in-esproc-spl\">A guide to text analysis and natural language processing capabilities in esProc SPL<\/h2>\n\n\n\n<p>Numbers tell only part of the story. Text data (such as customer reviews, support tickets, and social media posts), contain valuable insights that can drive business decisions. esProc SPL then provides powerful tools for extracting meaning from text.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-text-preprocessing-in-esproc-spl\">Text preprocessing in esProc SPL<\/h3>\n\n\n\n<p>Let&#8217;s assume we have customer feedback data in our sales dataset:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>1<\/td><td>=file(&#8220;document\/sales.csv&#8221;).import@ct(CustomerID,Feedback)&nbsp;<\/td><td>Import the dataset and select relevant columns<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-word-frequency-analysis-in-esproc-spl\">Word frequency analysis in esProc SPL<\/h3>\n\n\n\n<p>One of the simplest yet most informative text analyses is counting word frequencies. Let&#8217;s see which words appear most often in our customer feedback:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>2<\/td><td>=A1.derive(Feedback.words():words) &nbsp;<\/td><td>&nbsp;<\/td><\/tr><tr><td>3<\/td><td>=A2.conj(words).groups(~:value;count(1): count)&nbsp;<\/td><td>Group by word value and count occurrences<\/td><\/tr><tr><td>4<\/td><td>=A3.sort(count:-1)&nbsp;<\/td><td>Sort by frequency in descending order<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The output will show the most frequent words. You might notice that common words like &#8220;the&#8221; dominate the results. These &#8220;stop words&#8221; don&#8217;t add much meaning, so let&#8217;s filter them out:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>5<\/td><td>=[&#8220;the&#8221;,&#8221;and&#8221;,&#8221;is&#8221;,&#8221;to&#8221;,&#8221;a&#8221;,&#8221;of&#8221;,&#8221;for&#8221;,&#8221;in&#8221;,&#8221;that&#8221;,&#8221;this&#8221;]&nbsp;&nbsp;<\/td><td>Define a list of common stop words to filter out. &nbsp;<\/td><\/tr><tr><td>6<\/td><td>=A3.select(!A5.contain(value))&nbsp;<\/td><td>Filter out stop words from our frequency analysis<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The `!A5.contain(value)` condition selects only words that are NOT in our stop words list. The tilde ! is the logical NOT operator in esProc SPL. Our word frequency analysis now focuses on meaningful content words, giving us better insights into what customers are talking about. <\/p>\n\n\n\n<p>This cleaner view immediately highlights the topics that matter to customers &#8211; product, quality, service, and delivery. You can use this information to identify what aspects of your business customers mention most frequently.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-sentiment-analysis-in-esproc-spl\">Sentiment analysis in esProc SPL<\/h3>\n\n\n\n<p>Are customers happy or unhappy with your product? Sentiment analysis can help answer this question. Let&#8217;s implement a simple lexicon-based approach:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>1<\/td><td>=file(&#8220;document\/sentiment_lexicon.csv&#8221;).import@ct()&nbsp;<\/td><td>&nbsp;<\/td><\/tr><tr><td>2<\/td><td>=A1.groups(word;avg(score):score)&nbsp;&nbsp;<\/td><td>Match words in each feedback to words in the lexicon<\/td><\/tr><tr><td>3<\/td><td>=A1.derive(A2.align(words:~,word).avg(score):sentiment_score)<\/td><td>&nbsp;<\/td><\/tr><tr><td>4<\/td><td>=A3.derive(if(sentiment_score&gt;0.2:&#8221;positive&#8221;,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; sentiment_score&lt;-0.2:&#8221;negative&#8221;; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#8220;neutral&#8221;): sentiment) &nbsp;<\/td><td>&nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The sentiment analysis process begins with importing a sentiment lexicon that assigns numerical scores to words, where positive words receive positive values (e.g., &#8220;excellent&#8221; might be +0.8), and negative words receive negative values (e.g., &#8220;terrible&#8221; might be -0.9). Since some words may appear multiple times with different scores, their values are aggregated by averaging them. <\/p>\n\n\n\n<p>Next, the `align()` function is used to match words in each feedback entry with those in the lexicon, creating a table of sentiment words for each piece of feedback. Once matched, the average sentiment score is calculated by averaging the scores of all matched words. <\/p>\n\n\n\n<p>Finally, the feedback is classified based on predefined thresholds: entries with an average score above 0.2 are categorized as positive, those below -0.2 as negative, and those between -0.2 and 0.2 as neutral.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-why-is-sentiment-analysis-important\">Why is sentiment analysis important?<\/h4>\n\n\n\n<p>This analysis gives you a quick overview of customer sentiment. For example, you might discover that:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Certain products have more negative feedback than others<\/li>\n\n\n\n<li>Feedback from specific regions<\/li>\n\n\n\n<li>Sentiment has been improving over time<\/li>\n<\/ul>\n<\/div>\n\n\n<p>These insights can help you identify areas for improvement and track the impact of changes to your products or services.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-topic-modeling-in-esproc-spl\">Topic modeling in esProc SPL<\/h3>\n\n\n\n<p>Beyond sentiment, you might want to know what topics customers are discussing. Let&#8217;s implement a simple keyword-based topic classification:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>5<\/td><td>=[[&#8220;price&#8221;,&#8221;cost&#8221;,&#8221;expensive&#8221;,&#8221;cheap&#8221;,&#8221;affordable&#8221;], &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [&#8220;quality&#8221;,&#8221;durable&#8221;,&#8221;broke&#8221;,&#8221;lasting&#8221;,&#8221;sturdy&#8221;], &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [&#8220;service&#8221;,&#8221;support&#8221;,&#8221;help&#8221;,&#8221;response&#8221;,&#8221;staff&#8221;]]&nbsp;&nbsp;<\/td><td>Define keyword lists for different topics<\/td><\/tr><tr><td>6<\/td><td>=[&#8220;price&#8221;,&#8221;quality&#8221;,&#8221;service&#8221;]&nbsp;&nbsp;<\/td><td>Associate each keyword list with a topic name &nbsp;<\/td><\/tr><tr><td>7<\/td><td>=A5.conj(~.new(~:words, A6(A5.#):topics)) &nbsp;<\/td><td>&nbsp;<\/td><\/tr><tr><td>8<\/td><td>=A1.derive(A7.align(words:~,words):topic_matches) &nbsp;<\/td><td>&nbsp;<\/td><\/tr><tr><td>9<\/td><td>=A8.derive(topic_matches.groups(topics;count(1):c).maxp@a(c).(topics):main_topic) &nbsp;<\/td><td>&nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This topic modeling approach begins with defining topics by creating lists of keywords for different aspects of the business, such as price, quality, and service. Each inner array contains words relevant to a specific category. To map these keywords efficiently, I use `conj()` to flatten the nested arrays and `align()` to associate each keyword with its corresponding topic. <\/p>\n\n\n\n<p>Next, I group keywords by topic to form a lookup table for easy reference. When analyzing feedback, I match words in each entry against the topic keywords using `align()`, ensuring all relevant matches are captured rather than just the first occurrence. Finally, I determine the main topic for each feedback entry by identifying the most frequently mentioned topic using the `A8.derive()` function. If no keywords match, the feedback remains unclassified with a null value.<br><br>The results classify feedback by main topic, helping you understand which aspects of your business customers are commenting on. For example, you might discover that 40% of feedback relates to product quality, 30% to price, and another 30% to service. Analyzing sentiment could show that price-related feedback is generally more negative than quality-related feedback, while customers who mention service are more likely to be repeat buyers. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-why-are-these-insights-important\">Why are these insights important?<\/h4>\n\n\n\n<p>These insights allow you to focus on the areas that matter most to customers &#8211; helping you to guide product improvements, refine customer service strategies, and tailor marketing messages.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-basic-statistical-analysis-and-hypothesis-testing-in-esproc-spl\">Basic statistical analysis and hypothesis testing in esProc SPL<\/h2>\n\n\n\n<p>Statistics help us understand patterns and relationships in data. esProc SPL provides comprehensive tools for both <a href=\"https:\/\/statistics.laerd.com\/statistical-guides\/descriptive-inferential-statistics.php\" target=\"_blank\" rel=\"noreferrer noopener\">descriptive and inferential statistics<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-work-with-descriptive-statistics-in-esproc-spl\">How to work with descriptive statistics in esProc SPL<\/h3>\n\n\n\n<p>Let&#8217;s calculate basic descriptive statistics for our sales data:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>10<\/td><td>=file(&#8220;document\/sales.csv&#8221;).import@ct(ProductID,Amount)<\/td><\/tr><tr><td>11<\/td><td>=A10.groups(ProductID; &nbsp;&nbsp;&nbsp; count(Amount): count,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Number of sales &nbsp;&nbsp;&nbsp; sum(Amount): sum,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Total revenue &nbsp;&nbsp;&nbsp; avg(Amount): avg,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Average sale amount &nbsp;&nbsp;&nbsp; median(Amount): median,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Median sale amount &nbsp;&nbsp;&nbsp; var@r(Amount): std,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Standard deviation (variability) &nbsp;&nbsp;&nbsp; min(Amount): min,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Minimum sale amount &nbsp;&nbsp;&nbsp; max(Amount): max,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Maximum sale amount &nbsp;&nbsp;&nbsp; max(Amount)-min(Amount): range)&nbsp;&nbsp;&nbsp;&nbsp; Range of sale amounts<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This single grouping operation calculates eight different statistics for each product:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Count: The number of sales transactions for each product<br><br><\/li>\n\n\n\n<li>Sum: The total revenue generated by each product<br><br><\/li>\n\n\n\n<li>Average: The mean sale amount (calculated as sum\/count)<br><br><\/li>\n\n\n\n<li>Median: The middle value when all amounts are sorted<br><br><\/li>\n\n\n\n<li>Standard Deviation: A measure of how spread out the amounts are<br><br><\/li>\n\n\n\n<li>Minimum: The smallest sale amount<br><br><\/li>\n\n\n\n<li>Maximum: The largest sale amount<br><br><\/li>\n\n\n\n<li>Range: The difference between max and min, showing the spread<\/li>\n<\/ul>\n<\/div>\n\n\n<p><br>These statistics provide valuable insights into your data. The count reveals which products are sold most frequently (indicating their popularity), while the sum highlights the highest revenue-generating items, even if they don\u2019t sell often. The average and median help you understand typical order sizes; a significant difference between them suggests the presence of outliers. <\/p>\n\n\n\n<p>The standard deviation measures variability, with higher values indicating a wider range of transaction amounts (possibly due to discounts or multiple product variations). Meanwhile, the min, max, and range showcase the spread of sales, where a large range may signal special promotions or bulk discounts.<\/p>\n\n\n\n<section id=\"my-first-block-block_5c3e93954de50c60db2a96defabf3f60\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Enjoying this article? Subscribe to the Simple Talk newsletter<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            Get selected articles, event information, podcasts and other industry content delivered straight to your inbox.                                    <\/div>\n            <\/div>\n                                            <a href=\"https:\/\/www.red-gate.com\/simple-talk\/subscribe\/\" class=\"btn btn--secondary btn--lg\" aria-label=\"Subscribe now: Enjoying this article? Subscribe to the Simple Talk newsletter\">Subscribe now<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-correlation-analysis-in-esproc-spl\">Correlation analysis in esProc SPL<\/h3>\n\n\n\n<p>Are sales quantities related to discounts? Do larger orders get bigger discounts? <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/statistics-sql-pearsons-correlation\/\" target=\"_blank\" rel=\"noreferrer noopener\">Correlation analysis<\/a> can answer these questions:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>12<\/td><td>=file(&#8220;document\/sales.csv&#8221;).import@ct(Amount,Quantity,Discount) &nbsp;<\/td><\/tr><tr><td>13<\/td><td>=new(pearson(A12.(Amount),A12.(Quantity)):Amount_Quantity, pearson (A12.(Discount),A12.(Quantity)):Discount_Quantity, pearson (A12.(Amount),A12.(Discount)):Amount_Discount) &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>To compute correlation coefficients in esProc SPL, use pearson() and spearman() functions respectively. The correlation analysis reveals key relationships between sales variables. These insights can help refine pricing strategies, while the current discounting approach appears to boost sales, businesses might consider optimizing volume discounts to encourage larger purchases without eroding profitability.<\/p>\n\n\n\n<p>Remember that correlation doesn&#8217;t imply causation. A strong correlation between two variables doesn&#8217;t necessarily mean that one causes the other. There might be other factors influencing both variables.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-hypothesis-testing-in-esproc-spl\">Hypothesis testing in esProc SPL<\/h3>\n\n\n\n<p>Do sales amounts differ significantly between regions? Let&#8217;s use a t-test to find out:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>14<\/td><td>=file(&#8220;document\/sales.csv&#8221;).import@ct(Region,Amount) &nbsp;<\/td><\/tr><tr><td>15<\/td><td>=A14.select(Region==&#8221;west&#8221; || Region==&#8221;east&#8221;) &nbsp;<\/td><\/tr><tr><td>16<\/td><td>=ttest_p(A15.(if(Region==&#8221;east&#8221;:0;1)),A15.(Amount)) &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>In esProc SPL, use the ttest_p function for t-tests<strong>. <\/strong>First, convert values like &#8220;east&#8221; and &#8220;west&#8221; into 0 and 1.<br>Then, pass the combined dataset into ttest_p directly &#8211; the function will automatically separate it into two groups based on the binary variable. <em>Note: ttest_p is in an external library, so make sure to load the library before use<\/em>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"950\" height=\"681\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-12.png\" alt=\"Image showing the Environment page in esProc SPL's settings.\" class=\"wp-image-110051\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-12.png 950w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-12-300x215.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-12-768x551.png 768w\" sizes=\"auto, (max-width: 950px) 100vw, 950px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"931\" height=\"850\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-13.png\" alt=\"Image showing the 'Select external libraries' page in esProc SPL's settings.\" class=\"wp-image-110052\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-13.png 931w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-13-300x274.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-13-768x701.png 768w\" sizes=\"auto, (max-width: 931px) 100vw, 931px\" \/><\/figure>\n\n\n\n<p>What if we want to compare all regions at once? For that, we use <a href=\"https:\/\/www.investopedia.com\/terms\/a\/anova.asp\" target=\"_blank\" rel=\"noreferrer noopener\">ANOVA (Analysis of Variance)<\/a>:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>1<\/td><td>=file(&#8220;document\/sales.csv&#8221;).import@ct(Region,Amount) &nbsp;<\/td><\/tr><tr><td>2<\/td><td>=fisher_p(A1.(Region),A1.(Amount)) &nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>For multiple-group <a href=\"https:\/\/www.scribbr.com\/statistics\/t-test\/\" target=\"_blank\" rel=\"noreferrer noopener\">t-tests<\/a>, use the fisher_p function in esProc SPL<strong>. <\/strong>You don\u2019t need to manually split the dataset &#8211; simply pass the grouping field sequence and the value sequence to fisher_p, and it&#8217;ll automatically compare across groups. These statistical tests are powerful tools for data-driven decision making. For example, if you find significant differences between regions, you might:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Investigate what successful regions are doing differently<br><br><\/li>\n\n\n\n<li>Allocate more resources to underperforming regions<br><br><\/li>\n\n\n\n<li>Adjust regional marketing strategies<br><br><\/li>\n\n\n\n<li>Set different sales targets for different regions based on their potential<\/li>\n<\/ul>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-create-custom-functions-for-calculations-in-esproc-spl\">How to create custom functions for calculations in esProc SPL<\/h2>\n\n\n\n<p>As your analyses become complex, you&#8217;ll want to create reusable functions for complex calculations. SPL&#8217;s programming capabilities make this easy.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-define-reusable-functions-in-esproc-spl\">How to define reusable functions in esProc SPL<\/h3>\n\n\n\n<p>Let&#8217;s create a function to calculate the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/getting-to-know-your-customers-better-cohort-analysis-and-rfm-segmentation-in-r\/\" target=\"_blank\" rel=\"noreferrer noopener\">Recency, Frequency, Monetary (RFM)<\/a> score for customer segmentation:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>A<\/td><td>B<\/td><\/tr><tr><td>1<\/td><td colspan=\"2\">=file(&#8220;document\/sales.csv&#8221;).import@ct(CustomerID,OrderDate,Amount)<\/td><\/tr><tr><td>2<\/td><td colspan=\"2\">func rfm_score(data,current_date,r_weight,f_weight,m_weight)<\/td><\/tr><tr><td>3<\/td><td>&nbsp;<\/td><td>=data.groups(CustomerID; interval(max(OrderDate),current_date):recency, count(1):frequency, sum(Amount):monetary)<\/td><\/tr><tr><td>4<\/td><td>&nbsp;<\/td><td>=B3.len()<\/td><\/tr><tr><td>5<\/td><td>&nbsp;<\/td><td>=B3.derive(:r_score,: f_score,:m_score,:rfm_score)<\/td><\/tr><tr><td>6<\/td><td>&nbsp;<\/td><td>=B5.sort(recency:-1).run(r_score=rank(recency)\/B4*5)<\/td><\/tr><tr><td>7<\/td><td>&nbsp;<\/td><td>=B5.sort(frequency).run(f_score=rank(frequency)\/B4*5)<\/td><\/tr><tr><td>8<\/td><td>&nbsp;<\/td><td>=B5.sort(monetary).run(m_score=rank(monetary)\/B4*5)<\/td><\/tr><tr><td>9<\/td><td>&nbsp;<\/td><td>=B5.derive((r_score*r_weight + f_score*f_weight + m_score*m_weight)\/(r_weight+f_weight+m_weight):rfm_score)<\/td><\/tr><tr><td>10<\/td><td>&nbsp;<\/td><td>return B9<\/td><\/tr><tr><td>11<\/td><td colspan=\"2\">=rfm_score(A1,now(),1,2,3)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This function processes customer transaction data to generate an RFM analysis by evaluating customer purchasing behavior. It takes three inputs: the transaction data, a reference date, and the weights assigned to each RFM component. Recency is calculated as the number of days since a customer\u2019s last purchase, with lower values being better. Frequency represents the total number of purchases, where higher values indicate more engagement. Monetary value is the total amount spent, with higher spending signifying a more valuable customer. <\/p>\n\n\n\n<p>To standardize these metrics, each component is ranked and converted into a score between 0 and 5. Recency is ranked from lowest to highest since recent transactions are preferable, while frequency and monetary value are ranked from highest to lowest, as more purchases and higher spending are better.<\/p>\n\n\n\n<p>The transformation formula `rank()\/count()*5` ensures that scores are proportionally distributed, maintaining consistency across different dataset sizes. Finally, the function calculates a weighted average RFM score based on the assigned importance of each component. <\/p>\n\n\n\n<p>For instance, if monetary value has the highest weight (3), frequency is moderately weighted (2), and recency has the least weight (1), customers with higher spending will be prioritized over those who shop frequently but spend less. This method allows businesses to segment customers effectively, identifying their most valuable customers and optimizing marketing strategies accordingly.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-rfm-analysis-in-summary\">RFM analysis in summary<\/h3>\n\n\n\n<p>RFM analysis is a common technique for customer segmentation, helping you understand purchasing behavior and tailor marketing strategies accordingly. By evaluating recency, frequency, and monetary value, you can identify your best customers, those who buy frequently, spend significantly, and have made recent purchases, allowing you to focus on retention and special treatment. <\/p>\n\n\n\n<p>Similarly, recent customers with high recency but low frequency may have the potential to become loyal, making them ideal candidates for onboarding efforts and second-purchase incentives. At-risk customers (those who once bought frequently but haven&#8217;t returned in a while) require targeted win-back campaigns to re-engage them. Meanwhile, loyal but low-value customers who purchase often but spend little, could benefit from upselling and cross-selling strategies to increase their overall value.<\/p>\n\n\n\n<p>By creating a reusable function in SPL, you can automate RFM analysis and adjust parameters to fit different business models or seasonal trends. This flexibility allows you to refine segmentation over time, ensuring your marketing efforts remain relevant and data-driven.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-implement-algorithms-in-esproc-spl\">How to implement algorithms in esProc SPL<\/h3>\n\n\n\n<p>Let&#8217;s implement a k-means clustering algorithm for customer segmentation:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>A<\/td><td>B<\/td><td>C<\/td><td>D<\/td><td>E<\/td><\/tr><tr><td>1<\/td><td colspan=\"5\">func myKmeans(data, k, max_iterations)<\/td><\/tr><tr><td>2<\/td><td>&nbsp;<\/td><td colspan=\"4\">=data.group@b(rand(k);~:d,~.avg():avgV)<\/td><\/tr><tr><td>3<\/td><td>&nbsp;<\/td><td colspan=\"4\">=0<\/td><\/tr><tr><td>4<\/td><td>&nbsp;<\/td><td colspan=\"4\">for max_iterations<\/td><\/tr><tr><td>5<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>for B2<\/td><td>for C5.d<\/td><td>=B2.pmin(power(D5-avgV,2))<\/td><\/tr><tr><td>6<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&gt;B2(#C5).d=B2(#C5).d\\D5<\/td><\/tr><tr><td>7<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&gt;B2(E5).d=B2(E5).d|D5<\/td><\/tr><tr><td>8<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&gt;B2(#C5).avgV=B2(#C5).d.avg()<\/td><\/tr><tr><td>9<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&gt;B2(E5).avgV=B2(E5).d.avg()<\/td><\/tr><tr><td>10<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td colspan=\"3\">=B2.sum(d.sum(power(d.~-avgV,2)))<\/td><\/tr><tr><td>11<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td colspan=\"3\">if(abs(C10-B6)&lt;0.000001)<\/td><\/tr><tr><td>12<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>break<\/td><td>&nbsp;<\/td><\/tr><tr><td>13<\/td><td>&nbsp;<\/td><td>&nbsp;<\/td><td>else<\/td><td colspan=\"2\">&gt;B6=C10<\/td><\/tr><tr><td>14<\/td><td>&nbsp;<\/td><td colspan=\"4\">return B2<\/td><\/tr><tr><td>15<\/td><td colspan=\"5\">=myKmeans(A2.(r_score)|A2.(f_score)|A2.(m_score),100,1000)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>A1: Define the k-means clustering function.<\/p>\n\n\n\n<p>B2: Randomly split the data into k clusters. d holds the data for each cluster, and avgV stores the average (centroid) of each cluster.<\/p>\n\n\n\n<p>B3: Initialize the total distance between all data points and their respective cluster centers to 0.<\/p>\n\n\n\n<p>B4: Loop through a maximum number of iterations.<\/p>\n\n\n\n<p>C5\u2013D5: Iterate over each data point:<\/p>\n\n\n\n<p>E5: Calculate the distance between the current data point and all cluster centers, and find the index of the cluster with the minimum distance.<\/p>\n\n\n\n<p>E6\u2013E9: Move the current data point to the cluster with the minimum distance and recalculate the averages (centroids) for both the original and the new cluster.<\/p>\n\n\n\n<p>C10: Recalculate the total distance between all data points and their updated cluster centers.<\/p>\n\n\n\n<p>C11: If the difference between the new total distance and the previous one is less than 0.000001, it indicates convergence, so exit the loop.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-a-note-on-the-euclidean-distance\">A note on the Euclidean distance<\/h4>\n\n\n\n<p>In the above code, the <a href=\"https:\/\/www.geeksforgeeks.org\/maths\/euclidean-distance\/\" target=\"_blank\" rel=\"noreferrer noopener\">Euclidean distance<\/a> is calculated without applying the square root. This is intentional, as only the difference compared to the previous value is needed, and an extra square root seems unnecessary.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-a-note-on-esproc-spl-s-kmeans-function\">A note on esProc SPL&#8217;s kmeans function<\/h4>\n\n\n\n<p><em>In an earlier example, the esProc SPL math external library was loaded, which already contains a built-in function named kmeans<\/em>. <em>So, if you define your own function named kmeans, be sure to rename it to avoid conflicts.<\/em><\/p>\n\n\n\n<p>These clusters represent natural groupings in your customer base. For example:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Cluster 1 might be &#8220;high-value loyal customers&#8221; with high scores across all dimensions<br><br><\/li>\n\n\n\n<li>Cluster 2 might be &#8220;recent high-potential customers&#8221; with high recency but moderate frequency and monetary values<br><br><\/li>\n\n\n\n<li>Cluster 3 might be &#8220;occasional shoppers&#8221; with lower scores overall<\/li>\n<\/ul>\n<\/div>\n\n\n<p>The power of clustering is that it finds patterns that might not be obvious from manual inspection. Unlike the predefined segments in RFM analysis, k-means discovers natural groupings based on the data itself.<\/p>\n\n\n\n<p>You can use these clusters to:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Develop targeted marketing campaigns for each segment<br><br><\/li>\n\n\n\n<li>Customize product recommendations based on cluster behavior<br><br><\/li>\n\n\n\n<li>Set different service levels for different customer groups<br><br><\/li>\n\n\n\n<li>Track customer movement between clusters over time<\/li>\n<\/ul>\n<\/div>\n\n\n<p>By implementing these algorithms as reusable functions, you create a toolkit for analytics that you can apply to various business problems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-use-esproc-spl-as-a-data-preprocessing-tool\">How to use esProc SPL as a data preprocessing tool<\/h2>\n\n\n\n<p>While esProc SPL isn&#8217;t primarily a <a href=\"https:\/\/www.red-gate.com\/simple-talk\/business-intelligence\/data-science\/building-machine-learning-models-to-solve-practical-problems\/\" target=\"_blank\" rel=\"noreferrer noopener\">machine learning<\/a> platform, it excels at <a href=\"https:\/\/blog.quantinsti.com\/data-preprocessing\/\" target=\"_blank\" rel=\"noreferrer noopener\">data preprocessing<\/a> \u2013 often the most time-consuming part of the ML workflow.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-prepare-data-for-machine-learning\">How to prepare data for machine learning<\/h3>\n\n\n\n<p>Let&#8217;s prepare a dataset for a sales prediction model:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>A<\/td><\/tr><tr><td>1<\/td><td>=file(&#8220;document\/sales.csv&#8221;).import@ct(OrderDate,ProductID,Region,CustomerID,Quantity,Discount,Amount)<\/td><\/tr><tr><td>2<\/td><td>=A1.groups(;min(Quantity):min_Quantity,max(Quantity):max_Quantity)<\/td><\/tr><tr><td>3<\/td><td>=A2.derive(max_Quantity-min_Quantity:range_Quantity)<\/td><\/tr><tr><td>4<\/td><td>=A1.derive((Quantity-A3.min_Quantity)\/A3.range_Quantity:Quantity_norm).sort(rand())<\/td><\/tr><tr><td>5<\/td><td>=A4.to(int(A4.len()*0.8))<\/td><\/tr><tr><td>6<\/td><td>=A4\\A5<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>A1<\/strong>: Specify the fields to read when importing the data.<\/p>\n\n\n\n<p><strong>A2<\/strong>: Calculate the <strong>maximum<\/strong> and <strong>minimum<\/strong> values of the Quantity field.<\/p>\n\n\n\n<p><strong>A3<\/strong>: Compute the <strong>value range<\/strong> of Quantity.<\/p>\n\n\n\n<p><strong>A4<\/strong>: <strong>Normalize<\/strong> the Quantity values and <strong>shuffle<\/strong> the data order.<\/p>\n\n\n\n<p><strong>A5<\/strong>: Take the first <strong>80%<\/strong> of the data as the <strong>training set<\/strong>.<\/p>\n\n\n\n<p><strong>A6<\/strong>: The remaining <strong>20%<\/strong> is used as the <strong>test set<\/strong>.<\/p>\n\n\n\n<p>Effective <a href=\"https:\/\/www.red-gate.com\/products\/test-data-manager\/\" target=\"_blank\" rel=\"noreferrer noopener\">data preparation<\/a> is essential for building accurate machine learning models. It involves several key steps:<\/p>\n\n\n\n<p><a href=\"https:\/\/www.tredence.com\/blog\/feature-engineering-in-machine-learning\" target=\"_blank\" rel=\"noreferrer noopener\">Feature engineering<\/a> helps extract meaningful patterns from date-related data by breaking it down into components like year, month, day, day of the week, and a weekend flag. <\/p>\n\n\n\n<p><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/database-normalization-basics\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data normalization<\/a> ensures that all features operate on the same scale, preventing features with larger ranges from disproportionately influencing the model. For example, if Quantity ranges from 1 to 100 while Discount only varies from 0 to 0.5, normalization scales them to a uniform range, typically between 0 and 1, using the formula: <em>(value &#8211; min) \/ (max &#8211; min)<\/em>. <\/p>\n\n\n\n<p>Lastly, splitting the dataset into training and testing sets (commonly 80% for training and 20% for testing) ensures that the model is evaluated on data it hasn\u2019t seen before, preventing overfitting and allowing for a more reliable assessment of performance. <\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-why-is-data-preprocessing-important\">Why is data preprocessing important?<\/h4>\n\n\n\n<p>Without these preprocessing steps, even the most sophisticated algorithms may misinterpret data. This leads to poor results, such as assuming ordinal relationships between categorical values or allowing large-scale features to dominate the model. Proper preprocessing lays the foundation for an effective and unbiased machine learning pipeline.<\/p>\n\n\n\n<section id=\"my-first-block-block_a35ce236b6546584bad6fced06f6f394\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Compliance without Compromise<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            Test data shouldn&#8217;t force you to choose between security, quality, and speed. Redgate Test Data Manager provides audit-ready test data in minutes.                                    <\/div>\n            <\/div>\n                                            <a href=\"https:\/\/www.red-gate.com\/products\/test-data-manager\/a\/\" class=\"btn btn--secondary btn--lg\" aria-label=\"Learn more &amp; try for free: Compliance without Compromise\">Learn more &amp; try for free<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-export-data-for-external-machine-learning-tools\">How to export data for external machine learning tools<\/h2>\n\n\n\n<p>esProc SPL can export prepared data to formats compatible with machine learning frameworks:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>A<\/td><\/tr><tr><td>7<\/td><td>=file(&#8220;training_data.csv&#8221;).export@tc(A4)<\/td><\/tr><tr><td>8<\/td><td>=file(&#8220;testing_data.csv&#8221;).export@tc(A5)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>These simple export commands allow you to save your prepared datasets as CSV files, making them compatible with specialized machine learning tools like <a href=\"https:\/\/scikit-learn.org\/stable\/\" target=\"_blank\" rel=\"noreferrer noopener\">scikit-learn<\/a>, <a href=\"https:\/\/www.tensorflow.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">TensorFlow<\/a>, or other ML frameworks. The typical workflow involves using esProc SPL for data cleaning, feature engineering and preprocessing, before exporting the refined data. Once saved in a structured format, you can import it into an ML framework to build, train, and evaluate models. <\/p>\n\n\n\n<p>After selecting the best-performing model, you can deploy it for real-world predictions. This hybrid approach takes advantage of esProc SPL\u2019s efficient data manipulation capabilities while leveraging the advanced algorithms provided by dedicated ML frameworks.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-implement-simple-machine-learning-algorithms-in-esproc-spl\">How to implement simple machine learning algorithms in esProc SPL<\/h3>\n\n\n\n<p>esProc SPL can implement basic machine learning algorithms directly. Let&#8217;s create a <a href=\"https:\/\/www.red-gate.com\/simple-talk\/blogs\/statistics-sql-simple-linear-regressions\/\" target=\"_blank\" rel=\"noreferrer noopener\">simple linear regression model<\/a>:\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>A<\/td><td>B<\/td><\/tr><tr><td>9<\/td><td colspan=\"2\">=linefit@1(transpose([A5.(Quantity_norm),A5.(1)]),A5.(Amount))<\/td><\/tr><tr><td>10<\/td><td colspan=\"2\">=A6.derive(A9(1)*Quantity_norm+A9(2):predicted,abs(Amount-predicted):error)<\/td><\/tr><tr><td>11<\/td><td colspan=\"2\">=A10.avg(error)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>A9<\/strong>: Call the linear regression function to compute the regression model using the <strong>training set<\/strong>.<\/p>\n\n\n\n<p><strong>A10<\/strong>: Use the regression result to <strong>calculate predicted values<\/strong> for the <strong>test set<\/strong>, and compute the <strong>difference between predicted and actual values<\/strong>.<\/p>\n\n\n\n<p><strong>A11<\/strong>: Calculate the <strong>average of the differences<\/strong>.<\/p>\n\n\n\n<p><strong>linefit function explanation: <\/strong>The first argument consists of two series: the first is Quantity_norm, the second is 1.<br>The function returns two values: the first is the <strong>slope<\/strong>, the second is the <strong>intercept<\/strong>.<br>Using the @1 option, the return value is a sequence, so use A9(1) to get the slope and A9(2) to get the intercept.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-a-guide-to-interactive-dashboards-with-esproc-spl-s-visualization-capabilities\">A guide to interactive dashboards with esProc SPL&#8217;s visualization capabilities<\/h2>\n\n\n\n<p>Data analysis isn&#8217;t complete until you communicate your findings effectively. esProc SPL can prepare data for visualization tools, enabling you to create interactive dashboards.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-create-basic-visualizations-in-esproc-spl\">How to create basic visualizations in esProc SPL<\/h3>\n\n\n\n<p>esProc SPL can generate data for various chart types:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>A<\/td><\/tr><tr><td>1<\/td><td>=file(&#8220;document\/sales.csv&#8221;).import@ct(OrderDate,Region,Amount)<\/td><\/tr><tr><td>2<\/td><td>=A108.groups(month@y(OrderDate):month,Region;sum(Amount):total)<\/td><\/tr><tr><td>3<\/td><td>=A109.pivot(month;Region,total)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This data preparation extracts month information from order dates, groups data by month and region (summing the amounts in the process), and then pivots the data to create a table with months as rows and regions as columns. This format makes it easy to generate a line chart that visualizes sales trends by region over time, allowing you to quickly identify performance patterns. <\/p>\n\n\n\n<p>By structuring the data in a pivoted format, it becomes readily compatible with visualization tools like Excel, <a href=\"https:\/\/www.tableau.com\/en-gb\" target=\"_blank\" rel=\"noreferrer noopener\">Tableau<\/a> and <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/power-bi-part-1-introduction\/\" target=\"_blank\" rel=\"noreferrer noopener\">Power BI<\/a>. Most of these platforms prefer a &#8220;wide&#8221; format for multi-series charts, making it easier to compare trends across regions and gain actionable insights.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-to-build-a-sales-dashboard-in-esproc-spl\">How to build a sales dashboard in esProc SPL<\/h3>\n\n\n\n<p>Let&#8217;s create a comprehensive sales dashboard with multiple components:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"773\" height=\"422\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-14.png\" alt=\"\" class=\"wp-image-110055\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-14.png 773w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-14-300x164.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/04\/image-14-768x419.png 768w\" sizes=\"auto, (max-width: 773px) 100vw, 773px\" \/><\/figure>\n\n\n\n<p>First, let&#8217;s prepare the data for each dashboard component:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>A<\/td><\/tr><tr><td>1<\/td><td>=file(&#8220;document\/sales.csv&#8221;).import@ct(OrderDate,Region,ProductID,CustomerID,Amount)<\/td><\/tr><tr><td>2<\/td><td>=A1.sum(Amount)<\/td><\/tr><tr><td>3<\/td><td>=A1.groups(month@y(OrderDate):month;sum(Amount):sales)<\/td><\/tr><tr><td>4<\/td><td>=A3.sort(month)<\/td><\/tr><tr><td>5<\/td><td>=A1.groups(Region;sum(Amount):sales)<\/td><\/tr><tr><td>6<\/td><td>=A5.derive(sales\/A113*100:percentage)<\/td><\/tr><tr><td>7<\/td><td>=A1.groups(ProductID;sum(Amount):sales,count(OrderDate):orders, avg(Amount):avg_order)<\/td><\/tr><tr><td>8<\/td><td>=A7.sort(sales:-1)<\/td><\/tr><tr><td>9<\/td><td>=A1.groups(CustomerID;sum(Amount):total_spent,count(OrderDate):order_count, avg(Amount):avg_order)<\/td><\/tr><tr><td>10<\/td><td>=A9.derive(if(total_spent&gt;=10000:&#8221;High Value&#8221;, total_spent&gt;=5000:&#8221;Medium Value&#8221;; &#8220;Low Value&#8221;):segment)<\/td><\/tr><tr><td>11<\/td><td>=A10.groups(segment;count(CustomerID):customers,sum(total_spent):revenue,avg(total_spent):avg_spent)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The queries behind each dashboard component ensure that critical business insights are easily accessible. The time series component tracks monthly sales trends, helping you identify growth patterns and seasonal fluctuations, with chronological sorting ensuring accurate trend visualization. The regional breakdown provides a clear view of sales distribution across different regions, showing each area&#8217;s contribution as a percentage of total revenue for proportional comparison. The product performance analysis ranks items based on sales volume, order count, and average order value, allowing you to pinpoint best-selling products and those that may need a strategic push. <\/p>\n\n\n\n<p>Meanwhile, the customer segmentation component categorizes buyers based on their spending habits, providing a summary of each group&#8217;s size and contribution to revenue. By leveraging these insights, your dashboard offers a comprehensive view of business performance, supporting data-driven decision-making.<\/p>\n\n\n\n<p>For instance, your analysis might reveal that sales have grown by 15% year-over-year, indicating a strong upward trajectory. These insights empower businesses to make informed decisions on resource allocation, marketing focus, product development, and customer relationship management, ensuring that strategies are backed by data rather than intuition.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-in-summary-data-analysis-in-esproc-spl\">In summary: data analysis in esProc SPL<\/h2>\n\n\n\n<p>Throughout this journey into analytics with esProc SPL, you&#8217;ve explored how this versatile language enables you to transform raw data into meaningful insights. <\/p>\n\n\n\n<p>From handling time series data to performing text analysis, statistical testing, and even integrating with machine learning models, esProc SPL provides a flexible environment for data processing. Its intuitive approach simplifies complex analytical tasks, allowing you to focus on uncovering patterns and making data-driven decisions.<\/p>\n\n\n\n<p>You\u2019ve also seen how esProc SPL can handle time series analysis efficiently &#8211; resampling data, calculating moving averages, and forecasting trends to reveal valuable insights from <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/temporal-data-techniques-in-sql\/\" target=\"_blank\" rel=\"noreferrer noopener\">temporal<\/a> datasets. Text analysis techniques such as preprocessing, sentiment analysis, and topic modeling have additionally demonstrated how esProc SPL can extract meaning from <a href=\"https:\/\/www.ibm.com\/think\/topics\/structured-vs-unstructured-data\" target=\"_blank\" rel=\"noreferrer noopener\">unstructured data<\/a>. <\/p>\n\n\n\n<p>Meanwhile, statistical analysis tools &#8211; including correlation analysis and hypothesis testing &#8211; allow you to explore relationships within datasets and validate findings.<\/p>\n\n\n\n<p>While esProc SPL is <em>not<\/em> a machine learning framework, you\u2019ve learned how to use it for preprocessing and simple model implementations. Additionally, esProc SPL&#8217;s visualization capabilities provide a way to present insights clearly through interactive dashboards.<\/p>\n\n\n\n<section id=\"my-first-block-block_3b7239a9577a4a7c03f22052238295e6\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Simple Talk is brought to you by Redgate Software<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            Take control of your databases with the trusted Database DevOps solutions provider. Automate with confidence, scale securely, and unlock growth through AI.                                    <\/div>\n            <\/div>\n                                            <a href=\"https:\/\/www.red-gate.com\/solutions\/overview\/\" class=\"btn btn--secondary btn--lg\" aria-label=\"Discover how Redgate can help you: Simple Talk is brought to you by Redgate Software\">Discover how Redgate can help you<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: Data analytics with esProc SPL<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. What is esProc SPL used for?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"136\" data-end=\"342\">esProc SPL is a data analysis tool designed for efficient data processing, time-series analysis, statistical modeling, and text analytics with concise, readable syntax.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. How is esProc SPL different from other data analysis tools?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Unlike general-purpose programming languages, esProc SPL uses specialized syntax for analytics, reducing code complexity and speeding up data manipulation tasks.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. What is time-series analysis in esProc SPL?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"576\" data-end=\"801\">Time-series analysis in esProc SPL involves analyzing data points over time, such as sales or traffic trends, using tools like resampling, moving averages, and forecasting.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">4. How do you resample time-series data in esProc SPL?<\/h3>\n            <div class=\"faq-answer\">\n                <p>You can resample data by grouping it into different time intervals (e.g., monthly or quarterly) using the <code data-start=\"970\" data-end=\"980\">groups()<\/code> function to aggregate values like sums or averages.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">5. Can esProc SPL calculate moving averages?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Yes, esProc SPL supports moving averages using simple expressions, allowing flexible calculations like 3-month or 5-month averages without dedicated functions.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">6. Does esProc SPL support forecasting?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Yes, it supports forecasting through techniques like exponential smoothing and linear regression on deseasonalized data.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">7. What text analysis features are available in esProc SPL?<\/h3>\n            <div class=\"faq-answer\">\n                <p>esProc SPL includes word frequency analysis, sentiment analysis, and topic modeling to extract insights from customer feedback and text data.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">8. How does esProc SPL handle sentiment analysis?<\/h3>\n            <div class=\"faq-answer\">\n                <p>It uses a lexicon-based approach to assign sentiment scores to words and classify feedback as positive, negative, or neutral.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">9. Can esProc SPL perform statistical analysis?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Yes, it supports descriptive statistics, correlation analysis, and hypothesis testing (e.g., t-tests and ANOVA).<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">10. Is esProc SPL suitable for machine learning?<\/h3>\n            <div class=\"faq-answer\">\n                <p>While not a full ML platform, esProc SPL excels in data preprocessing, feature engineering, and exporting datasets for tools like TensorFlow or scikit-learn.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">11. What is RFM analysis in esProc SPL?<\/h3>\n            <div class=\"faq-answer\">\n                <p>RFM (Recency, Frequency, Monetary) analysis helps segment customers based on purchasing behavior using custom functions in SPL.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">12. Can you build dashboards with esProc SPL?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Yes, esProc SPL prepares structured data for visualization tools like Excel, Tableau, and Power BI, enabling interactive dashboards.<\/p>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Understand analytics in esProc SPL with this comprehensive guide covering time series analysis, text processing, statistical methods, custom functions, machine learning integration, and interactive dashboards.&hellip;<\/p>\n","protected":false},"author":342096,"featured_media":104709,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143526,146042],"tags":[5992,159117,159374,5021],"coauthors":[159002],"class_list":["post-110048","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-development","category-python","tag-data-analysis","tag-esproc-spl","tag-moving-from-python-to-esproc-spl","tag-python"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/110048","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/342096"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=110048"}],"version-history":[{"count":9,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/110048\/revisions"}],"predecessor-version":[{"id":110103,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/110048\/revisions\/110103"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/104709"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=110048"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=110048"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=110048"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=110048"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}