{"id":75456,"date":"2017-11-03T10:03:46","date_gmt":"2017-11-03T10:03:46","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=75456"},"modified":"2021-09-15T13:22:17","modified_gmt":"2021-09-15T13:22:17","slug":"sql-server-r-services-working-multiple-data-sets","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/sql-server-r-services-working-multiple-data-sets\/","title":{"rendered":"SQL Server R Services: Working with Multiple Data Sets"},"content":{"rendered":"<h4>The series so far:<\/h4>\n<ol>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-basics\/\">SQL Server R Services: The Basics<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-digging-r-language\/\">SQL Server R Services: Digging into the R Language<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-working-ggplot2-statistical-graphics\/\">SQL Server R Services: Working with ggplot2 Statistical Graphics<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-working-data-frames\/\">SQL Server R Services: Working with Data Frames<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-generating-sparklines-types-spark-graphs\/\">SQL Server R Services: Generating Sparklines and Other Types of Spark Graphs<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-working-multiple-data-sets\/\">SQL Server R Services: Working with Multiple Data Sets<\/a><\/li>\n<\/ol>\n\n<p>Throughout this series, we\u2019ve looked at several examples of how to use SQL Server R Services to create R scripts that incorporate SQL Server data. The key to using SQL Server data is to pass in a T-SQL query as an argument when calling the <strong>sp_execute_external_script<\/strong> stored procedure. The R script can then incorporate the data returned by the query, while taking advantage of the many elements available to the R language for analyzing data.<\/p>\n<p>Despite the ease with which you can run an R script, the <strong>sp_execute_external_script<\/strong> stored procedure has an important limitation. You can specify only one T-SQL query when calling the procedure. Of course, you can create a query that joins multiple tables, but this approach might not always work in your circumstances or might not be appropriate for the analytics you\u2019re trying to perform. Fortunately, you can retrieve additional data directly within the R script.<\/p>\n<p>In this article, we look at how to import data from a SQL Server table and from a .csv file. We also cover how to save data to a .csv file as well as insert that data into a SQL Server table. Being able to incorporate additional data sets or save data in different formats provides us with a great deal of flexibility when working with R Services and allows us to take even greater advantage of the many elements available to the R language for data analytics.<\/p>\n<h2>Importing data from SQL Server<\/h2>\n<p>When you call the <strong>sp_execute_external_script<\/strong> stored procedure, you can pass in a single T-SQL query to the <strong>@input_data_1<\/strong> parameter. This allows you to incorporate the data returned by the query directly within your R script. In some cases, however, you might want to bring in additional data, either from SQL Server or from another source.<\/p>\n<p>For example, the following T-SQL code includes an R script that retrieves data separately from the <strong>SalesOrderHeader<\/strong> and <strong>SalesTerritory<\/strong> tables in the <strong>AdventureWorks2014<\/strong> database:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">USE AdventureWorks2014;\r\n  G\r\n  DECLARE @rscript NVARCHAR(MAX);\r\n  SET @rscript = N'\r\n  \u00a0 # retrieve sales data from InputDataSet\r\n  \u00a0 sales &lt;- InputDataSet\r\n  \u00a0 # retrieve territory data from SalesTerritory table\r\n  \u00a0 conn &lt;- \"Driver=SQL Server;\r\n  \u00a0 \u00a0 Server=localhost\\\\sqlsrv16b;\r\n  \u00a0 \u00a0 Database=AdventureWorks2014;\r\n  \u00a0 \u00a0 Uid=user1;\r\n  \u00a0 \u00a0 Pwd=password1\"\r\n  \u00a0 query &lt;- \"SELECT TerritoryID, Name FROM Sales.SalesTerritory;\"\r\n  \u00a0 territories &lt;- RxSqlServerData(connectionString = conn, sqlQuery = query)\r\n  \u00a0 territories &lt;- rxDataStep(territories)\r\n  \u00a0 # merge data sets based on territory IDs\r\n  \u00a0 df1 &lt;- merge(territories, sales, by = \"TerritoryID\")\r\n  \u00a0 names(df1) &lt;- c(\"TerritoryID\", \"Territory\", \"Sales\")\r\n  \u00a0 # build data frame with aggregated data\r\n  \u00a0 c1 &lt;- levels(as.factor(df1$Territory))\r\n  \u00a0 c2 &lt;- round(tapply(df1$Sales, df1$Territory, mean))\r\n  \u00a0 df2 &lt;- data.frame(c1, c2)\r\n  \u00a0 OutputDataSet &lt;- df2';\r\n  DECLARE @sqlscript NVARCHAR(MAX);\r\n  SET @sqlscript = N'\r\n  \u00a0 SELECT TerritoryID, Subtotal FROM Sales.SalesOrderHeader;';\r\n  EXEC sp_execute_external_script\r\n  \u00a0 @language = N'R',\r\n  \u00a0 @script = @rscript,\r\n  \u00a0 @input_data_1 = @sqlscript;\u00a0\r\n  GO<\/pre>\n<p>If you reviewed the previous articles in the series, you should be familiar with how to call the <strong>sp_execute_external_script<\/strong> stored procedure and run an R script. In this article, we focus primarily on the components within the R script. If you\u2019re uncertain about the other language elements, be sure to refer to the previous articles.<\/p>\n<p>The first element in the R script assigns the data retrieved from the <strong>SalesOrderHeader<\/strong> table to the <strong>sales<\/strong> variable, using the <strong>InputDataSet<\/strong> variable to access the SQL Server data:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">sales &lt;- InputDataSet<\/pre>\n<p>This gives us our first data set, which includes the sale amounts associated with the individual territory IDs. Now suppose we want to use territory names rather than IDs. We can do this in our original T-SQL query by joining the two tables, or we can call the tables individually. In this case, the tables are in the same database, so creating one query would be the most expedient approach, but data is not always accessible through a join, or sometimes an R script requires a different approach to working the data, such as needing to join data after aggregating one of the data sets.<\/p>\n<p>For this example, our primary purpose is to demonstrate how to bring in a second data set from SQL Server, so we\u2019ll retrieve the territory data separately. To do so, we must first define a connection to the SQL Server instance that contains the data:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">conn &lt;- \"Driver=SQL Server;\r\n  \u00a0 Server=localhost\\\\sqlsrv16b;\r\n  \u00a0 Database=AdventureWorks2014;\r\n  \u00a0 Uid=user1;\r\n  \u00a0 Pwd=password1\"<\/pre>\n<p>To define the connection, we create a single string that specifies the driver, instance, database, user ID, and password, separated by semi-colons. Note that you should not put spaces around the equal signs within the connection definition. If you do, you will receive an error.<\/p>\n<p>You can also define a connection that uses Windows Authentication, rather than having to provide a user name and password. If you take this approach, your connection definition would look similar to the following:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">conn &lt;- \"Driver=SQL Server;\r\n    Server=localhost\\\\sqlsrv16b;\r\n    Database=AdventureWorks2014;\r\n    Trusted_Connection=True\"<\/pre>\n<p>To use Windows Authentication, you might have to set up implied authentication on the target SQL Server instance, as described in the Microsoft article <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/advanced-analytics\/r\/modify-the-user-account-pool-for-sql-server-r-services\">Modify the User Account Pool for SQL Server R Services<\/a>.<\/p>\n<p>After we set up the connection, we can define a <strong>SELECT<\/strong> statement to retrieve the data we need from the <strong>SalesTerritory<\/strong> table:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">query &lt;- \"SELECT TerritoryID, Name FROM Sales.SalesTerritory;\"<\/pre>\n<p>With our connection and query in place, we can use the <strong>RxSqlServerData<\/strong> function to generate a SQL Server data source object:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">territories &lt;- RxSqlServerData(connectionString = conn, sqlQuery = query)<\/pre>\n<p>The <strong>RxSqlServerData<\/strong> function is part of the <strong>RevoScaleR<\/strong> library included with R Services. The library includes a collection of functions for importing, transforming, and analyzing data at scale. A subset of those functions, including <strong>RxSqlServerData<\/strong>, is specific to SQL Server.<\/p>\n<p>Using the <strong>RxSqlServerData<\/strong> function in this way gives us our second data set, which we assign it to the <strong>territories<\/strong> variable.<\/p>\n<p>Another SQL Server function in the <strong>RevoScaleR<\/strong> library is <strong>rxDataStep<\/strong>. We can use the function in an R script to transform the data returned by the <strong>RxSqlServerData<\/strong> function into a more workable format. Our next step, then, is to run the function against the <strong>territories<\/strong> data set, assigning the results back to the same variable:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">territories &lt;- rxDataStep(territories)<\/pre>\n<p>Not surprisingly, there\u2019s a lot more to the <strong>RevoScaleR<\/strong> library and SQL Server functions than what we\u2019ve covered here, so be sure to check out the rest when you have time. A good place to start is with the Microsoft topic <a href=\"https:\/\/docs.microsoft.com\/en-us\/machine-learning-server\/r-reference\/revoscaler\/revoscaler\">RevoScaleR package<\/a>.<\/p>\n<p>With our two data sets in place, we can use the <strong>merge<\/strong> function to create a single data frame, joining the data sets based on the <strong>TerritoryID<\/strong> column in each one:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">df1 &lt;- merge(territories, sales, by = \"TerritoryID\")<\/pre>\n<p>We can then use the <strong>names<\/strong> function to assign names to the data frame columns, making it easier to work with the data frame later in the script:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">names(df1) &lt;- c(\"TerritoryID\", \"Territory\", \"Sales\")<\/pre>\n<p>We now have a single data frame, <strong>df1<\/strong>, that contains the data we need to move forward with our (simple) analytics. The next step is to create a second data frame that aggregates the data based on the sales amounts. The goal is to come up with the mean sales per territory. To do this, we construct the new data frame one column at a time, starting with a column that lists the individual territory names:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">c1 &lt;- levels(as.factor(df1$Territory))<\/pre>\n<p>We begin by using the <strong>as.factor<\/strong> function to convert the values in the <strong>Territory<\/strong> column to a factor, which provides a way for us to work with categorical data. We can then use the <strong>levels<\/strong> function to retrieve the distinct territory names, assigning them to the <strong>c1<\/strong> variable.<\/p>\n<p>The next step is to define the column that will contain the aggregated sales data:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">c2 &lt;- round(tapply(df1$Sales, df1$Territory, mean))<\/pre>\n<p>We start by using the <strong>tapply<\/strong> function to aggregate the sales totals for each category. The first argument, <strong>df1$Sales<\/strong>, indicates that the <strong>Sales<\/strong> values are the ones to be aggregated. The second argument, <strong>df1$Territory<\/strong>, provides the basis for the aggregation, in this case, the territories. The territory names also serve as an index for the <strong>c2<\/strong> column. The third argument, <strong>mean<\/strong>, indicates that the mean sales value should be calculated for each territory. We then use the <strong>round<\/strong> function to round the aggregated totals to integers.<\/p>\n<p>With our two column definitions in place, we can use the <strong>data.frame<\/strong> function to create a data frame and assign it to the <strong>df2<\/strong> variable:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">df2 &lt;- data.frame(c1, c2)<\/pre>\n<p>At this point, we can take any number of other steps as part of our analytics, but we\u2019ll stop here. The only remaining task is to assign the <strong>df2<\/strong> data frame to the <strong>OutputDataSet<\/strong> variable:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">OutputDataSet &lt;- df2<\/pre>\n<p>Assigning the data frame to the <strong>OutputDataSet<\/strong> variable makes it possible for the R script to return the data to the calling application (in this case, SQL Server Management Studio). The following figure shows the results returned by the script when we run the <strong>sp_execute_external_script<\/strong> stored procedure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"297\" height=\"251\" class=\"wp-image-75823\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/10\/20figures6-stsqlsrvrsvcs6_fig01-png-1.png\" alt=\"%20figures6\/stSqlSrvRSvcs6_fig01.png\" \/><\/p>\n<p>That\u2019s all there is to incorporating additional SQL Server data into an R script, when a single T-SQL query not be enough. Although this is only a very basic example, it points to the potential for creating far more complex analytics that can include whatever data is necessary, as long as the target SQL Server instance is accessible from the instance running the script.<\/p>\n<h2>Adding conditional logic to an R script<\/h2>\n<p>Before we move on to importing data from a .csv file, I want to demonstrate another SQL Server function in the <strong>RevoScaleR<\/strong> library: <strong>rxSqlServerTableExists<\/strong>. The function checks whether a specified SQL Server table exists, information we can use in our R script to implement conditional logic. For example, the following script uses the function to verify whether the <strong>SalesTerritory<\/strong> table exists before trying to retrieve data from that table:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @rscript NVARCHAR(MAX);\r\n  SET @rscript = N'\r\n  \u00a0 # retrieve sales data from InputDataSet\r\n  \u00a0 sales &lt;- InputDataSet\r\n  \u00a0 # define connection to SQL Server\r\n  \u00a0 conn &lt;- \"Driver=SQL Server;\r\n  \u00a0 \u00a0 Server=localhost\\\\sqlsrv16b;\r\n  \u00a0 \u00a0 Database=AdventureWorks2014;\r\n  \u00a0 \u00a0 Uid=user1;\r\n  \u00a0 \u00a0 Pwd=password1\"\r\n  \u00a0 # define conditional logic\r\n  \u00a0 table = \"Sales.SalesTerritory\"\r\n  \u00a0 if (rxSqlServerTableExists(table, connectionString = conn)) {\r\n  \u00a0 \u00a0 # retrieve territory data from SalesTerritory table\r\n  \u00a0 \u00a0 query &lt;- \"SELECT TerritoryID, Name FROM Sales.SalesTerritory;\"\r\n  \u00a0 \u00a0 territories &lt;- RxSqlServerData(connectionString = conn, sqlQuery = query)\r\n  \u00a0 \u00a0 territories &lt;- rxDataStep(territories)\r\n  \u00a0 \u00a0 # merge data sets based on territory IDs\r\n  \u00a0 \u00a0 df1 &lt;- merge(territories, sales, by = \"TerritoryID\")\r\n  \u00a0 \u00a0 names(df1) &lt;- c(\"TerritoryID\", \"Territory\", \"Sales\")\r\n  \u00a0 \u00a0 # build data frame with aggregated data\r\n  \u00a0 \u00a0 c1 &lt;- levels(as.factor(df1$Territory))\r\n  \u00a0 \u00a0 c2 &lt;- round(tapply(df1$Sales, df1$Territory, mean))\r\n  \u00a0 \u00a0 df2 &lt;- data.frame(c1, c2)\r\n  \u00a0 } else {\r\n  \u00a0 \u00a0 # build data frame with aggregated data\r\n  \u00a0 \u00a0 c1 &lt;- levels(as.factor(sales$TerritoryID))\r\n  \u00a0 \u00a0 c2 &lt;- round(tapply(sales$Subtotal, sales$TerritoryID, mean))\r\n  \u00a0 \u00a0 df2 &lt;- data.frame(c1, c2)\r\n  \u00a0 }\r\n  \u00a0 OutputDataSet &lt;- df2';<\/pre>\n<p>Most of the script elements should look familiar, but notice that after setting up the connection, we create an <strong>if\u2026else<\/strong> structure to define the conditional logic, starting with the following statements:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">table = \"SalesTerritory\"\r\n  if (rxSqlServerTableExists(table, connectionString = conn))<\/pre>\n<p>We begin by identifying the target table and assigning it to the <strong>table<\/strong> variable. We then use the <strong>rxSqlServerTableExists<\/strong> function to check if the table exists. If it does, the function returns <strong>TRUE<\/strong>, and the <strong>if<\/strong> block runs, creating the <strong>df2<\/strong> data frame, just like in the preceding example.<\/p>\n<p>If the function returns <strong>FALSE<\/strong>, the <strong>else<\/strong> block runs instead, creating the data frame based on territory IDs, rather than names.<\/p>\n<p>In this case, we\u2019re using the <strong>rxSqlServerTableExists<\/strong> function to implement conditional logic; however, you might also find it useful for verifying a table\u2019s existence before trying to delete it or performing other operations in the R script.<\/p>\n<p>However, even if you don\u2019t use the function, the example points to something more important: the ability to implement conditional logic within R scripts, making it possible to work with SQL Server data in ways that can get quite difficult with T-SQL alone, especially when trying to perform sophisticated analytics.<\/p>\n<h2>Importing data from a .csv file<\/h2>\n<p>Rather than bringing additional data in from SQL Server, we might want to create an R script that uses data from another source, in addition to the data provided by the primary T-SQL query. For example, suppose the territory names we retrieved in the preceding examples are in a .csv file, as shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"281\" height=\"254\" class=\"wp-image-75824\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/10\/20figures6-stsqlsrvrsvcs6_fig02-png-1.png\" alt=\"%20figures6\/stSqlSrvRSvcs6_fig02.png\" \/><\/p>\n<p>This is the same data we had retrieved from the <strong>SalesTerritory<\/strong> table in the preceding examples. We can update that script to instead point to that file:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @rscript NVARCHAR(MAX);\r\n  SET @rscript = N'\r\n  \u00a0 # retrieve sales data from InputDataSet\r\n  \u00a0 sales &lt;- InputDataSet\r\n  \u00a0 # retrieve territory data from territories.csv file\r\n  \u00a0 inputcsv &lt;- \"C:\\\\DataFiles\\\\territories.csv\"\r\n  \u00a0 territories &lt;- read.csv(file = inputcsv, header = TRUE, sep = \",\")\r\n  \u00a0 # merge data sets based on territory IDs\r\n  \u00a0 df1 &lt;- merge(territories, sales, by = \"TerritoryID\")\r\n  \u00a0 names(df1) &lt;- c(\"TerritoryID\", \"Territory\", \"Sales\")\r\n  \u00a0 # build data frame with aggregated data\r\n  \u00a0 c1 &lt;- levels(as.factor(df1$Territory))\r\n  \u00a0 c2 &lt;- round(tapply(df1$Sales, df1$Territory, mean))\r\n  \u00a0 df2 &lt;- data.frame(c1, c2)\r\n  \u00a0 OutputDataSet &lt;- df2';<\/pre>\n<p>Rather than defining a connection like we do for SQL Server, we can read the data directly from the .csv file. To do so, we first specify the file location and save it to the <strong>inputcsv<\/strong> variable:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">inputcsv &lt;- \"C:\\\\DataFiles\\\\territories.csv\"<\/pre>\n<p>Note the use of the double backslashes to escape the single backslash. We can then use the <strong>read.csv<\/strong> function to retrieve the data, passing in the <strong>inputcsv<\/strong> variable for the <strong>file<\/strong> argument:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">territories &lt;- read.csv(file = inputcsv, header = TRUE, sep = \",\")<\/pre>\n<p>In addition to setting the <strong>file<\/strong> argument, we set the <strong>header<\/strong> argument to <strong>TRUE<\/strong> because the column names are included in the file, and we specify a comma (within quotes) for the <strong>sep<\/strong> argument, which indicates that a comma is used to separate the values within the file. That\u2019s all there is to it. We can then create the <strong>df1<\/strong> and <strong>df2<\/strong> data frames just like we did in the preceding examples. The R script will return the same results as before.<\/p>\n<p>Being able to import data from a .csv file can be useful whenever we need data other than what is available in SQL Server. This approach also allows us to get data from a legacy system, in which direct access is not possible.<\/p>\n<p>And we\u2019re not limited to .csv files. The R language lets us import data from a wide range of sources, including JSON and XML files, relational databases, non-relational data stores, web-based resources, and many more. That said, I have not tested all these source types from within R Services. The only way to know for sure what you can or cannot do in your particular circumstances is to try it out yourself.<\/p>\n<h2>Exporting data to a .csv file<\/h2>\n<p>There might be times when you want to export the results of your R script to a file, rather than (or in addition to) returning the data to the calling application. To send the data to a .csv file, for example, we can use the <strong>write.csv<\/strong> function, as shown in the following example:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @rscript NVARCHAR(MAX);\r\n  SET @rscript = N'\r\n  \u00a0 # retrieve sales data from InputDataSet\r\n  \u00a0 sales &lt;- InputDataSet\r\n  \u00a0 # retrieve territory data from territories.csv file\r\n  \u00a0 inputcsv &lt;- \"C:\\\\DataFiles\\\\territories.csv\"\r\n  \u00a0 territories &lt;- read.csv(file = inputcsv, header = TRUE, sep = \",\")\r\n  \u00a0 # merge data sets based on territory IDs\r\n  \u00a0 df1 &lt;- merge(territories, sales, by = \"TerritoryID\")\r\n  \u00a0 names(df1) &lt;- c(\"TerritoryID\", \"Territory\", \"Sales\")\r\n  \u00a0 # build data frame with aggregated data\r\n  \u00a0 c1 &lt;- levels(as.factor(df1$Territory))\r\n  \u00a0 c2 &lt;- round(tapply(df1$Sales, df1$Territory, mean))\r\n  \u00a0 df2 &lt;- data.frame(c1, c2)\r\n  \u00a0 # output data to territorysales.csv file\r\n  \u00a0 outputcsv &lt;- \"C:\\\\DataFiles\\\\territorysales.csv\"\r\n  \u00a0 write.csv(df2, file = outputcsv)';<\/pre>\n<p>Sending data to a .csv file is just as easy as retrieving it. In this case, we specify the path and target file (territorysales.csv), assigning them to the <strong>outputcsv<\/strong> parameter. We then use the <strong>write.csv<\/strong> function to write the data in the <strong>df2<\/strong> data frame to the file. The following figure shows the file with the inserted data.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"508\" height=\"286\" class=\"wp-image-75825\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/10\/20figures6-stsqlsrvrsvcs6_fig03-png-1.png\" alt=\"%20figures6\/stSqlSrvRSvcs6_fig03.png\" \/><\/p>\n<p>Notice that the first row includes an empty string to represent the first column and that each subsequent row includes two instances of the territory name. In this case, the R engine is also outputting the names used to index the <strong>c2<\/strong> column. Plus, quotes are used to enclose the character data, which we might not want to include. We can get around these issues by modifying the <strong>write.csv<\/strong> function call:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">write.csv(df2, file = outputcsv, quote = FALSE, row.names = FALSE)<\/pre>\n<p>This time, when calling the function, we add two arguments. The first is <strong>quote<\/strong>, which we set to <strong>FALSE<\/strong> to specify that character data should not be enclosed in quotation marks. (You can omit this argument or set it to <strong>TRUE<\/strong> if you want to include the quotes.)<\/p>\n<p>The second argument is <strong>row.names<\/strong>, which we set to <strong>FALSE<\/strong> so the extra territory names are not included, giving us the results shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"311\" height=\"279\" class=\"wp-image-75826\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/10\/20figures6-stsqlsrvrsvcs6_fig04-png-1.png\" alt=\"%20figures6\/stSqlSrvRSvcs6_fig04.png\" \/><\/p>\n<p>It should also be noted that we don\u2019t have to choose between sending data to a file or using the <strong>OutputDataSet<\/strong> variable to return data to the calling application. We can do both:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @rscript NVARCHAR(MAX);\r\n  SET @rscript = N'\r\n  \u00a0 # retrieve sales data from InputDataSet\r\n  \u00a0 sales &lt;- InputDataSet\r\n  \u00a0 # retrieve territory data from territories.csv file\r\n  \u00a0 inputcsv &lt;- \"C:\\\\DataFiles\\\\territories.csv\"\r\n  \u00a0 territories &lt;- read.csv(file = inputcsv, header = TRUE, sep = \",\")\r\n  \u00a0 # merge data sets based on territory IDs\r\n  \u00a0 df1 &lt;- merge(territories, sales, by = \"TerritoryID\")\r\n  \u00a0 names(df1) &lt;- c(\"TerritoryID\", \"Territory\", \"Sales\")\r\n  \u00a0 # build data frame with aggregated data\r\n  \u00a0 c1 &lt;- levels(as.factor(df1$Territory))\r\n  \u00a0 c2 &lt;- round(tapply(df1$Sales, df1$Territory, mean))\r\n  \u00a0 df2 &lt;- data.frame(c1, c2)\r\n  \u00a0 # output data to territorysales.csv file\r\n  \u00a0 outputcsv &lt;- \"C:\\\\DataFiles\\\\territorysales.csv\"\r\n  \u00a0 write.csv(df2, file = outputcsv, quote = FALSE, row.names = FALSE)\r\n  \u00a0 OutputDataSet &lt;- df2';<\/pre>\n<p>We can also return different data to the file and calling application, depending how we set up our script. The point is, R is extremely flexible when it comes to retrieving and returning data. The better you understand the R language, the better you can utilize these features.<\/p>\n<h2>Outputting data to a SQL Server table<\/h2>\n<p>SQL Server also makes it possible to insert data from the R script into a SQL Server table. The easiest way to do this is to call the <strong>sp_execute_external_script<\/strong> stored procedure as part of an <strong>INSERT<\/strong> statement:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @rscript NVARCHAR(MAX);\r\n  SET @rscript = N'\r\n  \u00a0 # retrieve sales data from InputDataSet\r\n  \u00a0 sales &lt;- InputDataSet\r\n  \u00a0 # retrieve territory data from territories.csv file\r\n  \u00a0 inputcsv &lt;- \"C:\\\\DataFiles\\\\territories.csv\"\r\n  \u00a0 territories &lt;- read.csv(file = inputcsv, header = TRUE, sep = \",\")\r\n  \u00a0 # merge data sets based on territory IDs\r\n  \u00a0 df1 &lt;- merge(territories, sales, by = \"TerritoryID\")\r\n  \u00a0 names(df1) &lt;- c(\"TerritoryID\", \"Territory\", \"Sales\")\r\n  \u00a0 # build data frame with aggregated data\r\n  \u00a0 c1 &lt;- levels(as.factor(df1$Territory))\r\n  \u00a0 c2 &lt;- round(tapply(df1$Sales, df1$Territory, mean))\r\n  \u00a0 df2 &lt;- data.frame(c1, c2)\r\n  \u00a0 # output data to territorysales.csv file\r\n  \u00a0 outputcsv &lt;- \"C:\\\\DataFiles\\\\territorysales.csv\"\r\n  \u00a0 write.csv(df2, file = outputcsv, quote = FALSE, row.names = FALSE)\r\n  \u00a0 OutputDataSet &lt;- df2';\r\n  DECLARE @sqlscript NVARCHAR(MAX);\r\n  SET @sqlscript = N'\r\n  \u00a0 SELECT TerritoryID, Subtotal FROM Sales.SalesOrderHeader;';\r\n  DROP TABLE IF EXISTS #TerritorySales;\r\n  CREATE TABLE #TerritorySales\r\n  (Territories NVARCHAR(50), Sales MONEY);\r\n  INSERT INTO #TerritorySales\r\n  EXEC sp_execute_external_script\r\n  \u00a0 @language = N'R',\r\n  \u00a0 @script = @rscript,\r\n  \u00a0 @input_data_1 = @sqlscript;\u00a0\r\n  GO\r\n  SELECT * FROM #TerritorySales;<\/pre>\n<p>In this case, we\u2019re creating the <strong>#TerritorySales<\/strong> temporary table to demonstrate the insert operation. We then run the <strong>INSERT<\/strong> statement, calling the <strong>sp_execute_external_script<\/strong> stored procedure, which provides us with the data we need to add to the table. We can then run a <strong>SELECT<\/strong> statement to verify our results, which are shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"292\" height=\"254\" class=\"wp-image-75827\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/10\/20figures6-stsqlsrvrsvcs6_fig05-png-1.png\" alt=\"%20figures6\/stSqlSrvRSvcs6_fig05.png\" \/><\/p>\n<p>Being able to insert data returned from an R script into a temporary table or table variable can be a useful tool for incorporating analytics into more complex T-SQL scripts. We can also return the results to a permanent table when we want to persist the data. Again, it\u2019s all about flexibility.<\/p>\n<h2>Much more to the R language<\/h2>\n<p>In this series, we\u2019ve covered many of the basics of how to use SQL Server R Services to run R scripts that incorporate data from SQL Server and other sources. Once you understand how to use the <strong>sp_execute_external_script<\/strong> stored procedure and construct R scripts, you can take advantage of the broad range of R language elements available for transforming and analyzing data.<\/p>\n<p>The examples we\u2019ve looked at throughout this series provide only a glimpse of the many operations you can perform with R. The language is widely implemented and has a large user community behind it and deserves far more attention than what we\u2019ve covered here. Fortunately, there are plenty of sources out there that describe the various R language elements and how to put them together. Once you understand how to run an R script in SQL Server, you can take advantage of these elements to perform complex analytics and generate various types of visualizations.<\/p>\n<p>With the release of SQL Server 2017, Microsoft changed the name of R Services to Machine Learning Services and added support for the Python language. In future articles, we\u2019ll dig into the Python side of things, taking the same approach we covered in this series, that is, using the <strong>sp_execute_external_script<\/strong> stored procedure to run our external scripts. Python is another widely implemented language with a large user community and could prove to be an important addition to SQL Server, so stay tuned as we get ready to dive into the world of SQL Server Machine Learning.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Although it is easy to use SQL Server R Services to create R scripts that incorporate SQL Server data by passing in a T-SQL query as an argument when calling the sp_execute_external_script stored procedure, you are limited to that one query, unless you pass additional data directly between R and SQL Server via CSV files. It is simple to do, and opens up many additional opportunities for data analysis. Robert Sheldon explains how.&hellip;<\/p>\n","protected":false},"author":221841,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143528],"tags":[5134],"coauthors":[6779],"class_list":["post-75456","post","type-post","status-publish","format-standard","hentry","category-bi-sql-server","tag-sql-prompt"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/75456","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221841"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=75456"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/75456\/revisions"}],"predecessor-version":[{"id":75835,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/75456\/revisions\/75835"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=75456"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=75456"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=75456"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=75456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}