{"id":1988,"date":"2015-04-28T00:00:00","date_gmt":"2015-04-28T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/making-data-analytics-simpler-sql-server-and-r\/"},"modified":"2021-05-17T18:33:15","modified_gmt":"2021-05-17T18:33:15","slug":"making-data-analytics-simpler-sql-server-and-r","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/making-data-analytics-simpler-sql-server-and-r\/","title":{"rendered":"Making Data Analytics Simpler: SQL Server and R"},"content":{"rendered":"<div class=\"article-content\">\n<p class=\"first\"> \tIn this article I will describe a way to couple SQL Server together with R, and show how we can get a good set of data mining possibilities out of this fusion. First I will introduce R as a statistical \/ analytical language, then I will show how to get data from and to SQL Server, and lastly I will give a simple example of a data analysis with R. <\/p>\n<h2> \tWhat is R and what noticeable features does it have <\/h2>\n<p> \tR is an open source software environment which is used for statistical data analysis. All operations are performed in memory, which means that it is very fast and flexible as long as there is enough memory available. <\/p>\n<p> \tR does not have a storage engine of its own other than the file system, however it uses libraries of drivers to get data from, and send data to, different databases. <\/p>\n<p> \tIt is very modular in that there are many libraries which can be downloaded and used for different purposes. Also, there is a rapidly growing community of developers and data scientists which contribute to the library development and to the methods for exploring data and getting value from it. <\/p>\n<p> \tAnother great feature is that it has built-in graphical capabilities. With R it takes couple of lines of code to import data from a data source and only one line of code to display a plot graph of the data distribution. An example of this graphical representation will be given shortly. Of course, aside from the built-in graphics, there are libraries which are more advanced in data presentation (ggplot2, for example) and there are even libraries which enable interactive data exploration. <\/p>\n<p> \tFor more details on R features and on how to install it, refer to the <a href=\"https:\/\/www.simple-talk.com\/dotnet\/software-tools\/r-basics\/\">R Basics<\/a> article, which was recently published on Simple-talk. <\/p>\n<h2> \tConnecting to SQL Server from R <\/h2>\n<p> \tThis part assumes that the reader has already gained some familiarity with the R environment and has the R and RStudio installed. <\/p>\n<p> \tAs mentioned, R does not have its own storage engine, but it relies on other systems to store the analyzed data. In this section we will go through some simple examples on how to couple R with SQL Server&#8217;s storage engine and thereby read data from, and write data to, SQL Server. <\/p>\n<p> \tThere are several options to connect to SQL Server from R and several libraries we can use: RODBC, RJDBC, rsqlserver for example. For the purpose of this article, however, we will just use the RODBC package <\/p>\n<p> \tLet&#8217;s get busy and setup our R environment. <\/p>\n<p> \tIn order to get the connectivity to SQL Server working, first we need to install the packages for the connection method and then we need to load the libraries. <\/p>\n<p> \tTo install and load the RODBC package, do the following: <\/p>\n<ul>\n<li>Open the RStudio console (make sure the R version is at least 3.1.3: If it isn&#8217;t, then use the updateR() function)<\/li>\n<li> \tRun the following command: <strong>install.packages(&#8220;RODBC&#8221;)<\/strong> <\/li>\n<li>Run the following command: <strong>library(RODBC)<\/strong><\/li>\n<\/ul>\n<p> \tNote: the R packages are usually available from the CRAN site, and depending on the server setup, they may not be directly accessible from the R environment, but instead it may be needed to be downloaded manually and installed manually. Here is the link to the package page: RODBC: <a href=\"http:\/\/cran.r-project.org\/web\/packages\/RODBC\/index.html\">http:\/\/cran.r-project.org\/web\/packages\/RODBC\/index.html<\/a> <\/p>\n<h2> \tExploring the functions in a package <\/h2>\n<p> \tR provides useful ways of exploring the functions of the R packages, If, for example, we wanted to list all functions in a specific package we would use a function similar to this: <\/p>\n<pre class=\"listing language-r\">lsp &lt;- function(package, all.names = FALSE, pattern) \n{\n  package &lt;- deparse(substitute(package))\n  ls(\n      pos = paste(\"package\", package, sep = \":\"), \n      all.names = all.names, \n      pattern = pattern\n  )\n}\n<\/pre>\n<p> \tAnd then we would call it like this: <\/p>\n<pre class=\"listing prettyprint language-r\">\t lsp(RODBC)\n<\/pre>\n<p> \tTyping <strong>??RODBC<\/strong> at the command prompt will bring out some help topics about the RODBC package. <\/p>\n<p> \tFurther, typing<strong> ? <\/strong>before any of the functions will bring out the help information about a function. For example, <\/p>\n<pre class=\"listing prettyprint language-r\">\t?dbReadTable\n<\/pre>\n<h2> \tGetting connected <\/h2>\n<p> \tFor the purpose of this exercise, we will be using the AdventureWorksDW database (it can be downloaded from <a href=\"http:\/\/msftdbprodsamples.codeplex.com\/releases\/view\/55330\">here<\/a>). <\/p>\n<p> \tLet&#8217;s say we are interested in calculating of the correlation coefficient between the annual sales and the actual reseller sales for each reseller. First we will create the following view: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">CREATE VIEW [dbo].[vResellerSalesAmountEUR]\n    AS\n    SELECT  fact.ResellerKey ,\n            SUM(fact.SalesAmount) \/ 1000 AS SalesAmountK ,\n            dimR.AnnualSales \/ 1000 AS AnnualSalesK\n    FROM    [dbo].[FactResellerSales] fact\n            INNER JOIN dbo.[DimReseller] dimR ON fact.ResellerKey = dimR.ResellerKey\n    WHERE   fact.CurrencyKey = 36 -- Euro\n    GROUP BY fact.ResellerKey ,\n            dimR.AnnualSales<\/pre>\n<p> \tI have divided the Sales and the Annual Sales amounts by a 1000, so it is easier to work with the numbers later on. We will go in details in the statistical analysis later on; let&#8217;s start by getting connected. <\/p>\n<p> \tFirst we need to create a variable with our connection string (assuming we have already loaded the library by running <strong>library(RODBC)<\/strong> ): <\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">   cn &lt;- odbcDriverConnect(connection=\"Driver={SQL Server Native Client \n   11.0};server=localhost;database=AdventureWorksDW2012;trusted_connection=yes;\")\n<\/pre>\n<h2>RODBC connection<\/h2>\n<p> \tAnd now we can use this variable as a parameter in the different calls to our database. In the RODBC package there are two different ways to connect to the SQL Server: there are two methods <strong><em>sqlFetch<\/em><\/strong> and <strong><em>sqlQuery<\/em><\/strong><strong><em>.<\/em><\/strong> <\/p>\n<p> \tHere is how we use sqlFetch: <\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tdataFetchEUR &lt;- sqlFetch(cn, 'vResellerSalesAmountEUR', colnames=FALSE,\n   rows_at_time=1000)\n<\/pre>\n<p> \tand then we check the contents of the frame with the following command: <\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tView(dataFetchEUR)\n<\/pre>\n<p> \tThe data looks like this: <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2180-1-ba575827-c5c9-41ca-82d9-00373e3323a7.png\" alt=\"2180-1-ba575827-c5c9-41ca-82d9-00373e332\" \/><\/p>\n<p> \tAnother way to get the data is to use sqlQuery like this: <\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tdataSQLQueryEUR &lt;- sqlQuery(cn, \"select * from vResellerSalesAmountEUR\")<\/pre>\n<p> \tAs you may have guessed, this is quite flexible if we want to get a subset of the data by using the <strong>WHERE<\/strong> clause. <\/p>\n<h2> \tBenchmarking of RODBC <\/h2>\n<p> \tTo benchmark the performance of the RODBC library, I have written a script which will read data from SQL Server to R. <\/p>\n<p> \tHere is the script which will be used: <\/p>\n<pre class=\"listing language-r\">library(RODBC)\nstartTime1 &lt;- Sys.time()\ncn &lt;- odbcDriverConnect(connection=\"Driver={SQL Server Native Client 11.0};server=localhost;database=TestData;trusted_connection=yes;\")\nreadData &lt;- 0\nreadData &lt;- sqlFetch(cn,'NarrowTable_10k')\n#readData &lt;- sqlFetch(cn,'NarrowTable_100k')\n#readData &lt;- sqlFetch(cn,'NarrowTable_1M')\n#readData &lt;- sqlFetch(cn,'NarrowTable_10M')\n#readData &lt;- sqlFetch(cn,'NarrowTable_30M')\n#readData &lt;- sqlFetch(cn,'WideTable_10k')\n#readData &lt;- sqlFetch(cn,'WideTable_100k')\n#readData &lt;- sqlFetch(cn,'WideTable_1M')\n#readData &lt;- sqlFetch(cn,'WideTable_10M')\n#readData &lt;- sqlFetch(cn,'WideTable_30M')\nendTime1 &lt;- Sys.time()\nodbcClose(cn)\ntimeRun &lt;- difftime(endTime1,startTime1,units=\"secs\")\nprint(timeRun)\n<\/pre>\n<p> \tThe script above creates a connection to a database called TestData. The database has two types of tables &#8211; a narrow table with 6 columns and a wide table with 31 columns. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">\tCREATE TABLE [dbo].[NarrowTable_X](\n\t[ID] [int] NOT NULL,\n\t[Number1] [int] NOT NULL,\n\t[Number2] [int] NOT NULL,\n\t[Number3] [int] NOT NULL,\n\t[Number4] [int] NOT NULL,\n\t[Number5] [int] NOT NULL,\nPRIMARY KEY CLUSTERED \n(\n\t[ID] ASC\n))\n\nAnd \n\n\tCREATE TABLE [dbo].[WideTable_X](\n\t[ID] [int] NOT NULL,\n\t[Number1] [int] NOT NULL,\n\t[Number2] [int] NOT NULL,\n\t[Number3] [int] NOT NULL,\n\t[Number4] [int] NOT NULL,\n\t[Number5] [int] NOT NULL,\n\t[Number6] [int] NOT NULL,\n\t[Number7] [int] NOT NULL,\n\t[Number8] [int] NOT NULL,\n\t[Number9] [int] NOT NULL,\n\t[Number10] [int] NOT NULL,\n\t[Number11] [int] NOT NULL,\n\t[Number12] [int] NOT NULL,\n\t[Number13] [int] NOT NULL,\n\t[Number14] [int] NOT NULL,\n\t[Number15] [int] NOT NULL,\n\t[Number16] [int] NOT NULL,\n\t[Number17] [int] NOT NULL,\n\t[Number18] [int] NOT NULL,\n\t[Number19] [int] NOT NULL,\n\t[Number20] [int] NOT NULL,\n\t[Number21] [int] NOT NULL,\n\t[Number22] [int] NOT NULL,\n\t[Number23] [int] NOT NULL,\n\t[Number24] [int] NOT NULL,\n\t[Number25] [int] NOT NULL,\n\t[Number26] [int] NOT NULL,\n\t[Number27] [int] NOT NULL,\n\t[Number28] [int] NOT NULL,\n\t[Number29] [int] NOT NULL,\n\t[Number30] [int] NOT NULL,\nPRIMARY KEY CLUSTERED \n(\n\t[ID] ASC\n))\n<\/pre>\n<p> \tThe X in the table name represents the different amount of data there is in each table. All tables in the database are as follows: <\/p>\n<pre class=\"listing language-r\">NarrowTable_100k      100k rows\nNarrowTable_10k\t       10k rows\nNarrowTable_10M\t       10 million rows\nNarrowTable_1M\t       1 million rows\nNarrowTable_30M\t       30 million rows\nWideTable_100k\t       100k rows\nWideTable_10k\t       10k rows\nWideTable_10M\t       10 million rows\nWideTable_1M\t       1 million rows\nWideTable_30M\t       30 million rows\n<\/pre>\n<p> \tThe tables are populated with Redgate&#8217;s Data Generator, and then the R script above is used to get all data from the respective table and measure the time it took in seconds. <\/p>\n<p> \tHere is how long it took (in seconds) to read the 10k, 100k, 1M, 10M and 30M rows: <\/p>\n<pre class=\"listing language-r\">                    RODBC\t\nNarrowTable_100k    0.1120012\t \nNarrowTable_10k\t    0.05099988\t \nNarrowTable_10M\t    6.29302\t \nNarrowTable_1M\t    0.6030009\t  \nNarrowTable_30M\t    17.48804\t \nWideTable_100k\t    0.322001\t \nWideTable_10k       0.07200003\t \nWideTable_10M\t    28.01206\t \nWideTable_1M\t    2.764018\t \nWideTable_30M\t    80.84013\t\n<\/pre>\n<h2> \tExploring the data <\/h2>\n<p> \tNow that we have loaded the data into memory, it is time to explore it. First, let&#8217;s see the density and the data distribution. R has great facilities to visualize data almost effortlessly. As we will see shortly, we can call a plot a graph with only a few clicks. <\/p>\n<p> \tFirst, let&#8217;s set the properties of our environment to display two plot graphs in one row. In R this is easily done by using the <strong>par()<\/strong> function like this: <\/p>\n<pre class=\"listing language-r\">\tpar(mfrow=c(1,2))\n<\/pre>\n<p> \tThis will create a plot matrix with two columns and one row, which in this case will be filled in by rows. Now, let&#8217;s run the two commands which will fill in the graphs in the plot matrix: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">\thist(dataFetchEUR$SalesAmount)\n\thist(dataFetchEUR$AnnualSales)\n<\/pre>\n<p> \tThis will give us the following graph (found in the Plots section in the RStudio environment): <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2180-55c5835e-701d-4cb0-9b4a-1c2d90a51c39.png\" alt=\"2180-55c5835e-701d-4cb0-9b4a-1c2d90a51c3\" \/><\/p>\n<p> \tFrom here we already can extract some valuable knowledge &#8211; and we did this with a few clicks! We can see that: <\/p>\n<ul>\n<li>there are generally two segments of resellers by AnnualSales &#8211; ones that peak at around 100K, and the other ones that are above 2.5M<\/li>\n<li>there are three different segments of resellers by SalesAmount &#8211; under 50k, around 150k and around 300k<\/li>\n<\/ul>\n<p> \tRemember that this was the data we loaded for the sales in Euros. Let&#8217;s load the data from USD sales, and compare the histograms. <\/p>\n<p> \tWe will be using the following view in our database: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">CREATE VIEW [dbo].[vResellerSalesAmountUSD]\nAS\n    SELECT  fact.ResellerKey ,\n            SUM(fact.SalesAmount) \/ 1000 AS SalesAmountK ,\n            dimR.AnnualSales \/ 1000 AS AnnualSalesK\n    FROM    [dbo].[FactResellerSales] fact\n            INNER JOIN dbo.[DimReseller] dimR ON fact.ResellerKey = dimR.ResellerKey\n    WHERE   fact.CurrencyKey = 100 --USD\n    GROUP BY fact.ResellerKey ,\n            dimR.AnnualSales\n<\/pre>\n<p> \tAnd then we will use similar commands to load the data in R: <\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\tdataFetchUSD &lt;- sqlFetch(cn, 'vResellerSalesAmountUSD', colnames=FALSE, \n   rows_at_time=1000)<\/pre>\n<p> \tAnd then we get the histograms like this: <\/p>\n<pre class=\"listing language-r\">\t hist(dataFetchUSD$AnnualSales)\n\t hist(dataFetchUSD$SalesAmount)\n<\/pre>\n<p> \tThe histograms look like this: <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2180-b406d805-fba6-44b3-8cf0-3179391ccd23.png\" alt=\"2180-b406d805-fba6-44b3-8cf0-3179391ccd2\" \/><\/p>\n<p> \tThis gives us some more insight into how the resellers perform in the USD market. <\/p>\n<p> \tLet&#8217;s go a bit further ad get a summary of our datasets. Very conveniently there is a built-in function in R, which does exactly that: summarizes our dataset (also called dataframe in R language). If we simply type: <\/p>\n<pre class=\"listing language-r\">\t summary(dataFetchEUR)<\/pre>\n<p> \tWe will get the following summary info: <\/p>\n<pre class=\"listing language-r\">  ResellerKey     SalesAmountK       AnnualSalesK \n Min.   : 14.0   Min.   :  0.0315   Min.   : 300  \n 1st Qu.:211.2   1st Qu.:  3.4665   1st Qu.: 800  \n Median :365.0   Median : 12.5177   Median :1500  \n Mean   :373.3   Mean   : 65.1193   Mean   :1718  \n 3rd Qu.:567.5   3rd Qu.:117.1346   3rd Qu.:3000  \n Max.   :687.0   Max.   :337.7567   Max.   :3000\n<\/pre>\n<p> \tAnd respectively for the USD resellers we will type <\/p>\n<pre class=\"listing language-r\">\t summary(dataFetchUSD)\n<\/pre>\n<p> \tAnd we will get the summary: <\/p>\n<pre class=\"listing language-r\">  ResellerKey     SalesAmountK       AnnualSalesK \n Min.   :  1.0   Min.   :  0.0013   Min.   : 300  \n 1st Qu.:169.0   1st Qu.:  8.8864   1st Qu.: 800  \n Median :345.0   Median : 54.8905   Median :1500  \n Mean   :348.5   Mean   :137.1135   Mean   :1593  \n 3rd Qu.:529.0   3rd Qu.:187.1142   3rd Qu.:3000  \n Max.   :700.0   Max.   :877.1071   Max.   :3000 \n<\/pre>\n<p> \tNow let&#8217;s do some clustering to dig a bit deeper in our data. <\/p>\n<h2> \tCluster Analysis <\/h2>\n<p> \tThere are countless ways to do Cluster Analysis, and R provides many libraries which do exactly that. There are no best solutions, and it all depends on the purpose of the clustering. <\/p>\n<p> \tLet&#8217;s suppose that we want to help our resellers do better in the Euro region, and we have decided to provide them with different marketing tools for doing that, based on their Annual Sales amount. So for this example, the marketing department needs to have the resellers grouped in three groups based on their <strong>AnnualSales<\/strong> and the <strong>SalesAmount<\/strong> needs to be displayed for each reseller. <\/p>\n<p> \tIn the example below, we will take the <strong>dataFetchEUR<\/strong> dataframe and will divide the resellers in three groups, based on their <strong>AnnualSales<\/strong>. Then we will write the data back into our SQL Server Data Warehouse, from where the marketing team will get a report. <\/p>\n<p> \tBy looking at the data summary for the <strong>AnnualSales<\/strong> column, we can decide to slice the data by the boundaries of 1,000K and 1,600K. These are imaginary boundaries, which in reality will be set by the data scientist after discussions with the team who will be using the data analysis. <\/p>\n<p> \tJust to verify, we can group the resellers count per their <strong>AnnualSales<\/strong>: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT  COUNT(*) AS CountResellers ,\n        AnnualSalesK\nFROM    [vResellerSalesAmountEUR]\nGROUP BY AnnualSalesK\nORDER BY AnnualSalesK DESC\n<\/pre>\n<p> \tWe get the following result: <\/p>\n<div>\n<table>\n<tbody>\n<tr>\n<td>CountResellers                 <\/td>\n<td> AnnualSalesK                 <\/td>\n<\/tr>\n<tr>\n<td>13                                      <\/td>\n<td>3000,00                 <\/td>\n<\/tr>\n<tr>\n<td>10                 <\/td>\n<td>1500,00                 <\/td>\n<\/tr>\n<tr>\n<td>4                 <\/td>\n<td>1000,00                 <\/td>\n<\/tr>\n<tr>\n<td>8                 <\/td>\n<td>800,00                 <\/td>\n<\/tr>\n<tr>\n<td>3                 <\/td>\n<td>300,00                 <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p> \tSo, this seems right. Our three segments will be: <\/p>\n<p> \tSegment 1 will be &lt; 1,000,000<br \/>     Segment 2 will be &gt;= 1,000,000 and &lt; 1,600,000<br \/> \tSegment 3 will be &gt; 1,600,000 <\/p>\n<p> \tFor this we can use a simple<strong> ifelse<\/strong> function to define the clusters: <\/p>\n<p> \t AnnualSalesHigh.cluster &lt;- ifelse(dataFetchEUR$AnnualSales &gt;= 1600, 3, 0) <br \/> \tAnnualSalesMedium.cluster &lt;- ifelse(dataFetchEUR$AnnualSales &gt;= 1000 &amp; dataFetchEUR$AnnualSales &lt; 1600, 2, 0) <br \/> \tAnnualSalesLow.cluster &lt;- ifelse(dataFetchEUR$AnnualSales &lt; 1000, 1, 0) <\/p>\n<p> \tNow we have three different datasets, and we need to combine them into one dataset (dataframe). We can use the <strong>cbind<\/strong> function like this: <\/p>\n<pre class=\"listing language-r\">\t  segmentCombined &lt;- cbind(AnnualSalesLow.cluster, AnnualSalesMedium.\ncluster, AnnualSalesHigh.cluster)\n<\/pre>\n<p> \tNow we can look at the segments with the View function: <\/p>\n<pre class=\"listing language-r\">\t  View(segmentCombined)\n<\/pre>\n<p> \tThe data looks like this: <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2180-1-809cb6bb-956b-4129-813c-1a9d3c2010e6.png\" alt=\"2180-1-809cb6bb-956b-4129-813c-1a9d3c201\" \/><\/p>\n<p> \tNow we need to sum up the values in each row and output another dataframe, which has only one column. We can do this with the following function: <\/p>\n<pre class=\"listing language-r\">\t AnnualSalesKey &lt;- AnnualSalesHigh.cluster + AnnualSalesMedium.cluster\n + AnnualSalesLow.cluster\n<\/pre>\n<p> \tAnd finally, we need to bind together the original dataframe and add the categorization per reseller: <\/p>\n<pre class=\"listing language-r\">\t  dataFetchEUR &lt;- cbind(dataFetchEUR,AnnualSalesKey)\n<\/pre>\n<p> \tThe data looks like this: <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2180-1-a94eac08-8c8c-48e8-ad64-b87cc40296fb.png\" alt=\"2180-1-a94eac08-8c8c-48e8-ad64-b87cc4029\" \/><\/p>\n<p> \tAnd finally, we will save this dataset into our data warehouse by using the <strong>sqlSave<\/strong> function in the RODBC package: <\/p>\n<pre class=\"listing language-r\">   sqlSave(cn,dataFetchEUR,rownames=FALSE,tablename=\"Marketing_EURAnnu\nalSalesCluster\",colname=FALSE,append=FALSE)\n   odbcClose(cn)\n<\/pre>\n<p> \tAnd now we have a table called <code>[dbo]<\/code>.<code>[Marketing_EURAnnualSalesCluster]<\/code> in our <code>[AdventureWorksDW2012]<\/code> database, which is ready for use by the marketing department. <\/p>\n<p> \tThis process can be automated and a batch file can be created with the R scripts &#8211; the entire flow from getting the data to writing it back to the data warehouse &#8211; and it can be scheduled to be run regularly. <\/p>\n<h2> \tConclusion: <\/h2>\n<p> \tIn this article we have seen how easy it is to connect to SQL Server from R and do an exploratory data analysis. This brings great value to business, especially because of the time savings of data modelling and visualization techniques which can be very time consuming with other technologies. <\/p>\n<p> \tWe have explored a way to connect to SQL Server (by using the RODBC library) and we have created a simple Cluster Analysis and segmentation, which provides immediate value to the end-users of the data. <\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>R and SQL Server are a match made in heaven. You don&#8217;t need anything special to get started beyond the basic instructions. Once you have jumped the hurdle of  reliably and quickly  transferring data between R and SQL Server you are ready to discover the power of a relational database when when combined with statistical computing and graphics.&hellip;<\/p>\n","protected":false},"author":221902,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143528],"tags":[4178,6012,6013,4168,5982,5460,4379,4150,4151,5296],"coauthors":[],"class_list":["post-1988","post","type-post","status-publish","format-standard","hentry","category-bi-sql-server","tag-bi","tag-data-modeling","tag-data-plotting","tag-database","tag-graphs","tag-r","tag-reporting-services","tag-sql","tag-sql-server","tag-statistics"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1988","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221902"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1988"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1988\/revisions"}],"predecessor-version":[{"id":91010,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1988\/revisions\/91010"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1988"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1988"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1988"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1988"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}