{"id":2050,"date":"2015-06-12T00:00:00","date_gmt":"2015-06-12T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/getting-to-know-your-customers-better-cohort-analysis-and-rfm-segmentation-in-r\/"},"modified":"2021-05-17T18:33:14","modified_gmt":"2021-05-17T18:33:14","slug":"getting-to-know-your-customers-better-cohort-analysis-and-rfm-segmentation-in-r","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/getting-to-know-your-customers-better-cohort-analysis-and-rfm-segmentation-in-r\/","title":{"rendered":"Getting to know your customers better &#8211; cohort analysis and RFM segmentation in R"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\">In this article I  \twill demonstrate  a couple of  \thands-on ways to do a data analysis using&#160; \ta combination of  SQL  \tServer and R &#160;in order to get some  \tuseful knowledge about customers that &#160;can  \tbe almost immediately applied to informing business decisions. I will  \tdescribe a cohort analysis as a way of analyzing the value of a customer to  \ta business, and then dive into a slightly more complex segmentation called  \tRFM (Recency, Frequency and Monetary). <\/p>\n<p>RFM has been a  \tvery popular marketing tool for at least 10 years. It provides categories or  \tratings that are based on  how  \tlong ago that a customer made a purchase, how often they purchase, and the  \tamount that they spend.<\/p>\n<p>Keep in mind that  \tthese analyses are more widely useful than just for direct retail marketing,  \tin that the broad principles can be applied to almost any dataset of  \tcustomer behavior that require reports that are simple for business people  \tto understand.<\/p>\n<h2>Setting up the data<\/h2>\n<p>For the purpose  \tof this article, and to allow the reader to easily try the examples out, we  \twill use MS SQL Server as a data storage engine, the <strong>AdventureWorksDW2012<\/strong>  \tdatabase and the<strong> [dbo].[FactInternetSales] t<\/strong>able. The rest of the analysis  \tand the graphical representation will be done in R.&#160; \t<\/p>\n<p>We will create  \tthe following view:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tUSE [AdventureWorksDW2012];\n\tGO\n\t&#160;\n\tDROP VIEW [dbo].[vCustomer_Data]; \n\tGO\n\t&#160;\n\tCREATE VIEW [dbo].[vCustomer_Data]\n\tAS\n\t&#160;&#160;&#160; WITH&#160;&#160;&#160; CTE\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AS ( SELECT&#160;&#160; fis.[CustomerKey] ,\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; SUM(CONVERT(INT, [SalesAmount])) AS OrderValue ,\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; CONVERT(DATE, [OrderDate]) AS OrderDate\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; FROM&#160;&#160;&#160;&#160; [AdventureWorksDW2012].[dbo].[FactInternetSales] fis\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; JOIN [dbo].[DimCustomer] dc ON fis.[CustomerKey] = dc.[CustomerKey]\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; WHERE&#160;&#160;&#160; [CurrencyKey] = 100\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; GROUP BY fis.CustomerKey ,\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; OrderDate ,\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; dc.DateFirstPurchase\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; )\n\t&#160;&#160;&#160; SELECT&#160; *\n\t&#160;&#160;&#160; FROM&#160;&#160;&#160; CTE\n\t&#160;&#160;&#160; WHERE&#160;&#160; CTE.OrderDate BETWEEN '2007-01-01'\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160;&#160;&#160;&#160; '2008-03-31';\n\t&#160;\n\tGO\n<\/pre>\n<div class=\"note\">\n<p class=\"note\">Note: for  \tlearning purposes I have restricted the data to 15 months. In reality, you&#8217;d  \tprobably need to follow up longer timespans depending on the purpose of the  \tanalysis. <\/p>\n<\/div>\n<p>The data looks  \tsimilar to this:<\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2242-imgB.jpg\" alt=\"2242-imgB.jpg\" \/><\/p>\n<p>This is a very  \tcommon data set, which can be found in almost any transactional system. Here  \twe have the following features:<\/p>\n<ul>\n<li>&#160;Some uniquely-identified entity (whether it is an individual  \t\tcustomer, event type or other), in this case <strong>CustomerKey<\/strong><\/li>\n<li>&#160;Some indicator of when the event  \t\toccurred (it could be a timestamp, but not always), in this case <strong> \t\tOrderDate<\/strong><\/li>\n<li>&#160;Some value related to the unique  \t\tentity and the particular event occurrence, in this case<strong>  \t\tOrderValue <\/strong><\/li>\n<\/ul>\n<p> \tIn this case our  \tdataset contains a fourth column &#8211; <b> \tDateFirstPurchase<\/b>, however in this case it is not really a feature of  \tthe dataset, but it is merely a derived value. <\/p>\n<p>Now that we have  \tthe data ready, let&#8217;s try and extract some valuable knowledge from it. Let&#8217;s  \tdo a preliminary exploration of the data by doing a cohort analysis, to see  \tif there are any patterns in our customers&#8217; behavior. <\/p>\n<h2>Cohort analysis<\/h2>\n<p>Cohort analysis  \tis a broad topic, and it has many variations. Let&#8217;s first give some basic  \tdefinitions:<\/p>\n<ul>\n<li><strong>Cohort<\/strong> &#8211; this is  \ta group of people or events who share a common characteristic over time \t\t<\/li>\n<li><strong>Cohort analysis<\/strong> &#8211;  \tthis is the study of activity \/ behavior of a particular cohort or a group  \tof them over time (or other iteration).<\/li>\n<\/ul>\n<p>In metacode, a  \tcohort analysis is similar to this: &#8216;take a dataset which has uniquely  \tidentified entity (customer in this case), define unique cohorts by which  \tthe items can be grouped (in this case we will use the year and month of  \tfirst purchase), and follow the behavior of the cohorts over time (in this  \tcase the sum of the OrderValue for each cohort per timeslice)&#8217;.<\/p>\n<p>It may seem a bit  \tcomplex at first sight: However it is straightforward in practice. In this  \tcase we are asking the questions:<\/p>\n<ul>\n<li>&#8216;How do our users behave as they get further away from their  \t\tfirst purchase date?&#8217; <\/li>\n<li>&#8216;Do they keep buying equally a lot as  \t\tthey did in the first month of their activity, or do they phase out as  \t\ttime passes?&#8217; <\/li>\n<li>&#8216;Is there a pattern when we compare the  \t\tfirst purchase months of different cohorts?&#8217;<\/li>\n<\/ul>\n<p>&#160;Let&#8217;s load the  \tlibraries first:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tlibrary(dplyr)\/span&gt;\n\tlibrary(reshape2)\n\tlibrary(ggplot2)\n\tlibrary(RODBC)\n\t&#160;\n\t<\/pre>\n<p>Then we will  \tconnect to SQL Server and load the data from the view we created into  \tmemory:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tcn &lt;- odbcDriverConnect(connection=\"Driver={SQL Server};server=localhost;database=AdventureWorksDW2012;trusted_connection=yes;\")\n\t&#160;\n\torders &lt;- sqlFetch(cn, 'vCustomer_Data', colnames=FALSE,\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; rows_at_time=1000)\n\t<\/pre>\n<p>Then we have to  \tconvert and format the <b>OrderDate<\/b>  \tvariable:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">&#160;orders$OrderDate &lt;- as.Date(orders$OrderDate, format='%Y-%m-%d')<\/pre>\n<p>&#160;After this, we  \twill create a new dataset in memory. This dataset will have additional  \tfeatures that we&#8217;ll need for our graphical representation later on:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tcohort &lt;- orders %&gt;%\n\t&#160; arrange(CustomerKey, OrderDate) %&gt;%\n\t&#160; group_by(CustomerKey) %&gt;%\n\t&#160; mutate(FPD = min(OrderDate), \n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; CohortMonthly = paste0('Cohort-', format(FPD, format='%Y-%m')),&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; \n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; LifetimeMonth = ifelse(OrderDate==FPD, 1, ceiling(as.numeric(OrderDate-FPD)\/30))) %&gt;% \n\t&#160; ungroup()\n<\/pre>\n<p>Here is what the  \tcode does: <\/p>\n<ol>\n<li>Takes the orders dataset and arranges it  \t\tby <strong>CustomerKey<\/strong> and <strong>OrderDate (<\/strong>this is  \t\tto make sure the data is in chronological order per user)<\/li>\n<li>Groups it by <strong>Customerkey,<\/strong>  \t\tsince we need data on cohort level<\/li>\n<li>Creates three new variables (columns) in  \t\tour dataset: <strong>FPD<\/strong> (first purchase date), <strong> \t\tCohortMonthly<\/strong>  \t\tand <strong>LifetimeMonth<\/strong>.<\/li>\n<\/ol>\n<p>&#160;Next we need to  \tformat the <b>LifetimeMonth <\/b>variable  \ta bit more:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tcohort$LifetimeMonth &lt;- formatC(cohort$LifetimeMonth, width=2, format='d', flag='0')\n\tcohort$LifetimeMonth &lt;- paste('M', cohort$LifetimeMonth, sep='')\n\t&#160;\n<\/pre>\n<p>The new dataset  \tlooks like this:<\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2242-img17.gif\" alt=\"2242-img17.gif\" \/><\/p>\n<p>The next step is  \tto combine and summarize the cohorts:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tcomb.cohorts &lt;- cohort %&gt;%\n\t&#160;&#160;&#160; group_by(CohortMonthly, LifetimeMonth) %&gt;%\n\t&#160;&#160;&#160; summarise(Revenue = sum(OrderValue),\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; NumberOfCustomers = n_distinct(CustomerKey))\n\t&#160;\n\t<\/pre>\n<p>In this step we  \talready lose the individual data per user and instead summarize it by cohort  \tand timeframe. <\/p>\n<p>Now our data is  \tready for exploration and for creating a visualization:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tggplot(comb.cohorts, aes(x=LifetimeMonth, y=NumberOfCustomers, group=CohortMonthly, color=CohortMonthly)) + geom_line()<\/pre>\n<p>Based on our data  \tthe graph looks like this:<\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2242-img22.jpg\" alt=\"2242-img22.jpg\" \/><\/p>\n<p>Here is the  \tanalysis we can do by looking at this graph: on the Y axis we have the  \tNumber of Customers per month and per cohort, and on the X axis we have the  \tcohort months. This means that in this case all first purchase months are  \taligned at the point M01, and then they are followed up further by M02 to  \tM15.<\/p>\n<p>Something very  \tobvious is immediately noticeable: there are a few cohorts with a  \tsignificantly stronger first month with over 750 customers making their  \tfirst purchase in a specific month. By looking at the data, we establish  \tthat the periods between 2007-08 and 2007-12 are extremely strong, but they  \tdrop very fast by their second month. Then we have the other group of  \tcohorts, which have about 100 customers purchasing the first month, but they  \tpurchase relatively a lot in their second and third months, before they  \tphase out. Then we can see another fact which is quite interesting to  \texplore further: there is a group of cohorts which get a slight increase in  \tthe purchases in their 8th and 9th months.  \t<\/p>\n<p>All this leads to  \tintriguing &#160;business development  \tquestions: why are certain months significantly stronger than others and  \twhat can we do to increase the customer retention? Why do some customers  \treturn in their 8th and 9th months? Did we have a  \tsuccessful marketing campaign then, or was it just the holiday shopping  \tseason? <\/p>\n<p>This type of  \tanalysis is very effective from a business development perspective. There  \tare several reasons for this:<\/p>\n<ul>\n<li>The data knowledge extraction can go really fast, once the  \t\tdata is brought into memory<\/li>\n<li>There is no need for heavy duty  \t\trelational database modelling. In conventional SSAS and SSRS this step  \t\ttakes a very long time, partly due to the fact that the data is  \t\tpersisted on disks and partially because the physical data model has to  \t\tbe rebuilt quite often to match the questions asked.<\/li>\n<li>The iterations between business questions  \t\tand answers are very short, and thus a great amount of work can be  \t\tperformed in a very short time, since the dataset is in memory and all  \t\tit takes is a skilled data analyst and a savvy business development  \t\tspecialist to get together and iterate through all questions the data  \t\traises<\/li>\n<\/ul>\n<p>This analysis,  \thowever, is a bit limited because it shows only one side of the customers&#8217;  \tbehavior; either the number of customers who purchased and their retention,  \tor the revenue per cohort. Even though this is very useful and gives great  \tinsights, this type of analysis is still one-dimensional. In other words, we  \tcannot clearly explore the dependency between number of customers and their  \tpurchase value. After all, we might have 100 customers purchasing for &#194;&#163;1  \teach, vs 10 customers purchasing for &#194;&#163;100 each. We would certainly be  \tinterested to target these two groups differently. <\/p>\n<p>This is why there  \tis a different approach: RFM segmentation. Let&#8217;s look into this next.  \t<\/p>\n<h2>&#160;RFM segmentation  \t<\/h2>\n<p>&#160;RFM segmentation  \tis a technique used to get to know the customers better and to be able to  \tdivide them into groups which will make marketing targeting more effective  \tand cost-efficient.<\/p>\n<p>RFM stands for  \tRecency, Frequency and Monetary and it does exactly this &#8211; gives these three  \tdimensions to our customer base based on their transactions (how recently  \tthey purchased, how frequently they purchased and how much money they  \tspent). <\/p>\n<p>The premise is  \tthat customers who have bought more recently, purchased more often and have  \tspent more money are more likely to respond to a marketing offering, than  \tcustomers who have purchased less recently, not so often and for very little  \tmoney. RFM analysis can be used to segment and label the customers (active,  \tchurning, churned, never activated) and thus the proper marketing campaign  \tcan be directed to the right group of customers. <\/p>\n<p>&#160;<b>Defining the terms<\/b><\/p>\n<p>The RFM  \tsegmentation is very specific and is dictated by the business model, however  \tit is easily customizable. In general terms, the definitions look like this:<\/p>\n<ul>\n<li><b> \t\tRecency <\/b> \t\t&#8211; represents the  \t\t&#8220;freshness&#8221; of customer activity. Naturally, we would like to identify \t\t<i>active and<\/i> <i>inactive<\/i> customers. The basic definition for  \t\tthis attribute is the number of days since their last order or  \t\tinteraction.<\/li>\n<li><b> \t\tFrequency<\/b>&#160;- how often the  \t\tcustomer buys. By default this could be the total number of orders in  \t\tthe last year or during the whole of customer&#8217;s lifetime.<\/li>\n<li><b> \t\tMonetary&#160;<\/b>value indicates  \t\thow much the customer is spending. In many cases this is just the sum of  \t\tall the order values.<\/li>\n<\/ul>\n<p> \tAbove is the general definition, however,  \tand depending on the business model the definitions have to be fine-tuned.  \tFor example, the model will look slightly different for a company which  \tsells insurances, since the sales are usually based on half or yearly basis.  \tIn the online gaming industry, for example, the recency and frequency might  \tbe measured in minutes or even seconds and heavy aggregations might be  \tneeded, and averages might be used instead of sum values.  \t<\/p>\n<h3><b> \tSample data:<\/b><\/h3>\n<p>For the purpose  \tof this segmentation we will be using the following view:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tUSE [AdventureWorksDW2012];\n\tGO\n\t&#160;\n\tDROP VIEW [dbo].[vRFM_Data];\n\tGO\n\t&#160;\n\tCREATE VIEW [dbo].[vRFM_Data]\n\tAS\n\t&#160;&#160;&#160; WITH&#160;&#160;&#160; CTE\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AS ( SELECT&#160;&#160; [CustomerKey] ,\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; CONVERT(INT,[SalesAmount]) AS OrderValue ,\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; CONVERT(DATE, [OrderDate]) AS OrderDate ,\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; SalesOrderNumber ,\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; ProductKey\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; FROM&#160;&#160;&#160;&#160; [AdventureWorksDW2012].[dbo].[FactInternetSales]\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; WHERE&#160;&#160;&#160; [CurrencyKey] = 100\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; )\n\t&#160;&#160;&#160; SELECT&#160; *\n\t&#160;&#160;&#160; FROM&#160;&#160;&#160; CTE\n\t&#160;&#160;&#160; WHERE&#160;&#160; CTE.OrderDate BETWEEN '2007-01-01'\n\t&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; AND&#160;&#160;&#160;&#160; '2008-03-31';\n\t&#160;\n\tGO\n<\/pre>\n<p>The data looks  \tsimilar to this:<\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2242-img24.gif\" alt=\"2242-img24.gif\" \/><\/p>\n<p>We can load the  \tdata into R with the following code:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tlibrary(lubridate)\n\tlibrary(data.table)\n\tlibrary(RODBC)\n\tcn &lt;- odbcDriverConnect(connection=\"Driver={SQL Server};server=localhost;database=AdventureWorksDW2012;trusted_connection=yes;\")\n\traw_data &lt;- sqlFetch(cn, 'vRFM_Data', colnames=FALSE, rows_at_time=1000)\n\t&#160;\n<\/pre>\n<p>Then we have to  \tformat the <b>OrderDate<\/b>:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\traw_data$OrderDate &lt;- as.Date(raw_data$OrderDate, format='%Y-%m-%d')<\/pre>\n<p>&#160;&#8230; and then we  \tneed a variable which represents the point of view we would like to have on  \tthe data:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tNOW &lt;- as.Date(\"2008-04-05\", \"%Y-%m-%d\")<\/pre>\n<div class=\"note\">\n<p class=\"note\">Note: The point  \tof view is important, since we can use it as a reference point and perform  \tseveral segmentations at different points of view and compare the results.  \tFor example, this can be very useful if we wanted to determine the delta of  \tcustomer behavior before and after a marketing promotion. In this case we  \twould have one segmentation with a point of view before the promotion and  \tone with a point of view well after the promotion and then the results can  \tbe compared and the customer scores can be outlined to determine which  \tcustomers have changed their behavior in a positive way (i.e. the campaign  \thas been successful) and which customers need more attention (additional  \ttargeting).<\/p>\n<\/div>\n<p>Then we calculate  \tthe Recency, the Frequency and the Monetary values for each customer:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tR_table &lt;- aggregate(OrderDate ~ CustomerKey, raw_data, FUN=max) R_table$R &lt;- as.numeric(NOW - R_table$OrderDate)\n\tF_table &lt;- aggregate(SalesOrderNumber ~ CustomerKey, raw_data, FUN=length) \n\tM_table &lt;- aggregate(OrderValue ~ CustomerKey, raw_data, FUN=sum)\n<\/pre>\n<p>Then we merge the  \tdatasets, remove the unnecessary column and rename the columns:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tRFM_table &lt;- merge(R_table,F_table,by.x=\"CustomerKey\", by.y=\"CustomerKey\") \n\tRFM_table &lt;- merge(RFM_table,M_table, by.x=\"CustomerKey\", by.y=\"CustomerKey\") \n\tRFM_table$OrderDate &lt;- NULL \n\tnames(RFM_table) &lt;- c(\"CustomerKey\", \"R\", \"F\", \"M\") \n<\/pre>\n<p>Finally, we need  \tto assign each customer to the proper segment by using quantiles:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tRFM_table$Rsegment &lt;- findInterval(RFM_table$R, quantile(RFM_table$R, c(0.0, 0.25, 0.50, 0.75, 1.0)))\n\tRFM_table$Fsegment &lt;- findInterval(RFM_table$F, quantile(RFM_table$F, c(0.0, 0.25, 0.50, 0.75, 1.0)))\n\tRFM_table$Msegment &lt;- findInterval(RFM_table$M, quantile(RFM_table$M, c(0.0, 0.25, 0.50, 0.75, 1.0)))\n\t&#160;\n<\/pre>\n<p>The result looks  \tlike this:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2242-img33.gif\" alt=\"2242-img33.gif\" \/><\/p>\n<p>By looking at the  \tresult we can see the customer behavior and how they differ: customer 11024,  \tfor example, has bought something 69 days ago, 6 items in total for the  \tcustomer lifetime and the total purchase amount is $56. This customer is  \trecent, even though not a big spender. Customer 11036, however, has bought  \tonly 2 items for a large sum of $2355, but very long ago: 254 days.  \t<\/p>\n<p>This suggests  \tthat the first user needs to be targeted in a way that they stay as a  \tcustomer and spend larger amounts (maybe some discount code or specific  \tpromotion might interest them) and the second user needs to be encouraged to  \tcome back and spend more. <\/p>\n<p>Even though this  \tis a simplified example of a RFM segmentation, it shows that it is very  \tpowerful tool in following up on customer behavior and&#160;channelling&#160;it in the  \tright way. Furthermore, the RFM segmentation can be used to represent the  \tcustomer base in a three dimensional space from a specific point of view and  \tto give an insight of how the customers behave in certain situations:  \tpromotions, holidays, peak and downtime. <\/p>\n<h1>Conclusions<\/h1>\n<p>In this article  \twe have seen the power of analytics and how easy it is to do massive  \tanalysis of customer behavior with the help of SQL Server as a storage  \tengine and R as an analytical processing engine. There is one thing I hinted  \tearlier, which is very important: The quicker you can answer business  \tquestions reliably once the data has been extracted, the more&#160; value  \tyou get out of data. You can get a  \tgreat competitive advantage by quickly iterating a solution with your business&#160;colleagues&#160; rather than by  \ttaking a long time developing a slick  \t&#160;data&#160;modelling&#160;solution that provides an answer to the&#160;wrong  \tquestion. By rapid iterations with a tool such as R, you can quickly make  \tsure that you are answering the question that the business really meant to  \task.&#160;  \t<\/p>\n<p>As an extra read  \ton the topic of Quick &#8216;trial and error&#8217; iterations I would recommend the  \tstory of Paul McCready and how he approached the problem of creating a  \tman-powered airplane when people had been failing for 10 years before him.  \tAlthough the problem seemed to be &#8216;how can a man power&#160; a flight&#8217;, the  \treal problem that blocked progress was that there was no process for quickly  \ttesting ideas out, and rejecting them. Once that problem was fixed, a man  \twas soon able to fly, by pedal power alone, across the English Channel.  \tSometimes, in IT, we are often too eager to take problems at face value.&#160;(<a href=\"http:\/\/www.azarask.in\/blog\/post\/the-wrong-problem\/\">You Are Solving The Wrong Problem<\/a>)<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>It often pays to use a tool like R, in conjunction with a relational database, to quickly perform a range of analyses, and graphs, in order to ensure that you&#8217;re answering the right question, to explore alternative hypotheses, or to provide insight into a problem. Feodor demonstrates how to quickly, and interactively,  explore the ways that customers purchase goods and services using cohort analysis.&hellip;<\/p>\n","protected":false},"author":221902,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143528],"tags":[],"coauthors":[],"class_list":["post-2050","post","type-post","status-publish","format-standard","hentry","category-bi-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2050","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221902"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=2050"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2050\/revisions"}],"predecessor-version":[{"id":91008,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2050\/revisions\/91008"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=2050"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=2050"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=2050"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=2050"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}