{"id":72120,"date":"2017-08-11T16:42:29","date_gmt":"2017-08-11T16:42:29","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=72120"},"modified":"2021-09-15T13:22:18","modified_gmt":"2021-09-15T13:22:18","slug":"sql-server-r-services-working-ggplot2-statistical-graphics","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/sql-server-r-services-working-ggplot2-statistical-graphics\/","title":{"rendered":"SQL Server R Services: Working with ggplot2 Statistical Graphics"},"content":{"rendered":"<h4>The series so far:<\/h4>\n<ol>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-basics\/\">SQL Server R Services: The Basics<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-digging-r-language\/\">SQL Server R Services: Digging into the R Language<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-working-ggplot2-statistical-graphics\/\">SQL Server R Services: Working with ggplot2 Statistical Graphics<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-working-data-frames\/\">SQL Server R Services: Working with Data Frames<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-generating-sparklines-types-spark-graphs\/\">SQL Server R Services: Generating Sparklines and Other Types of Spark Graphs<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-r-services-working-multiple-data-sets\/\">SQL Server R Services: Working with Multiple Data Sets<\/a><\/li>\n<\/ol>\n\n<p>SQL Server 2016 ships with SQL Server R Services, a new component that incorporates the analytic and visualization capabilities of the R language into the SQL Server environment. An important feature of the R language is the ability to import R packages into a script, making a wide range of functions available for working with SQL Server data.<\/p>\n<p>One of these packages is <strong>ggplot2<\/strong>, which I introduced in the second article in this series. The <strong>ggplot2<\/strong> package includes an extensive set of functions for generating statistical visualizations. The package\u2019s functions let you create layered graphics using data from a data frame. This layered approach is an important concept in <strong>ggplot2<\/strong> because that is how you build a graphic\u2014one layer at a time. We used the package in conjunction with SQL Server R Services to generate a bar chart based on SQL Server data.<\/p>\n<p>Given that the <strong>ggplot2<\/strong> package is so widely used to visualize data, I thought it worthwhile to spend more time looking at various <strong>ggplot2<\/strong> concepts. To that end, this article walks through the process of building a scatter plot graphic, using a series of R scripts to add each layer. I follow these with a few examples of other graphic types, based on the same core data.<\/p>\n<p>To run an R script in R Services, you must call the <strong>sp_execute_external_script<\/strong> stored procedure, passing in the necessary parameter values, one of which is the R script itself. To use the <strong>ggplot2<\/strong> package in a script, you must first install the package into the SQL Server library and then import it into the script. The first two articles in this series describe how to carry out these steps. In this article, we focus primarily on the graphic definitions themselves.<\/p>\n<h2>Creating the initial scatter plot<\/h2>\n<p>We\u2019ll start by creating the initial scatter plot structure, based on data from the <strong>AdventureWorks2014<\/strong> database. The following example defines an R script and T-SQL script, assigns them to variables, and uses those variables when calling the <strong>sp_execute_external_script<\/strong> stored procedure:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">USE AdventureWorks2014;\r\n  GO\r\n  DECLARE @rscript NVARCHAR(MAX);\r\n  SET @rscript = N'\r\n  # import scales package\r\n  library(ggplot2)\r\n  # set up report file for chart\r\n  chartfile &lt;- \"C:\\\\DataFiles\\\\Chart01.png\"\r\n  png(filename=chartfile, width=1000, height=600)\r\n  # construct data frame\r\n  df &lt;- InputDataSet\r\n  # generate chart\r\n  chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  \u00a0 geom_point()\r\n  print(chart)\r\n  dev.off()';\r\n  DECLARE @sqlscript NVARCHAR(MAX);\r\n  SET @sqlscript = N'\r\n  SELECT s.Name Subcategory, c.Name Category,\u00a0\r\n  \u00a0 COUNT(p.ProductID) NumberProducts, AVG(p.ListPrice) AvgListPrice\r\n  FROM Production.Product p INNER JOIN Production.ProductSubcategory s\r\n  \u00a0 ON p.ProductSubcategoryID = s.ProductSubcategoryID\r\n  \u00a0 \u00a0 INNER JOIN Production.ProductCategory c\r\n  \u00a0 ON s.ProductCategoryID = c.ProductCategoryID\r\n  WHERE c.Name &lt;&gt; ''Clothing''\r\n  GROUP BY s.Name, c.Name;';\r\n  EXEC sp_execute_external_script\r\n  @language = N'R',\r\n  @script = @rscript,\r\n  @input_data_1 = @sqlscript\r\n  WITH RESULT SETS NONE;\u00a0\r\n  GO<\/pre>\n<p>The T-SQL script retrieves data from the <strong>Product<\/strong> table and <strong>ProductSubcategory<\/strong> table, giving us the following data set:<\/p>\n<table class=\"table--tight\">\n<thead>\n<tr>\n<td>\n<p><strong>Subcategory<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>Category<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>NumberProducts<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>AvgListPrice<\/strong><\/p>\n<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\n<p>Bike Racks<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>1<\/p>\n<\/td>\n<td>\n<p>120.00<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Bike Stands<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>1<\/p>\n<\/td>\n<td>\n<p>159.00<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Bottles and Cages<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>3<\/p>\n<\/td>\n<td>\n<p>7.99<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Cleaners<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>1<\/p>\n<\/td>\n<td>\n<p>7.95<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Fenders<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>1<\/p>\n<\/td>\n<td>\n<p>21.98<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Helmets<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>3<\/p>\n<\/td>\n<td>\n<p>34.99<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Hydration Packs<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>1<\/p>\n<\/td>\n<td>\n<p>54.99<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Lights<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>3<\/p>\n<\/td>\n<td>\n<p>31.3233<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Locks<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>1<\/p>\n<\/td>\n<td>\n<p>25.00<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Panniers<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>1<\/p>\n<\/td>\n<td>\n<p>125.00<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Pumps<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>2<\/p>\n<\/td>\n<td>\n<p>22.49<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Tires and Tubes<\/p>\n<\/td>\n<td>\n<p>Accessories<\/p>\n<\/td>\n<td>\n<p>11<\/p>\n<\/td>\n<td>\n<p>19.4827<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Mountain Bikes<\/p>\n<\/td>\n<td>\n<p>Bikes<\/p>\n<\/td>\n<td>\n<p>32<\/p>\n<\/td>\n<td>\n<p>1683.365<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Road Bikes<\/p>\n<\/td>\n<td>\n<p>Bikes<\/p>\n<\/td>\n<td>\n<p>43<\/p>\n<\/td>\n<td>\n<p>1597.45<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Touring Bikes<\/p>\n<\/td>\n<td>\n<p>Bikes<\/p>\n<\/td>\n<td>\n<p>22<\/p>\n<\/td>\n<td>\n<p>1425.2481<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Bottom Brackets<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>3<\/p>\n<\/td>\n<td>\n<p>92.24<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Brakes<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>2<\/p>\n<\/td>\n<td>\n<p>106.50<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Chains<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>1<\/p>\n<\/td>\n<td>\n<p>20.24<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Cranksets<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>3<\/p>\n<\/td>\n<td>\n<p>278.99<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Derailleurs<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>2<\/p>\n<\/td>\n<td>\n<p>106.475<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Forks<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>3<\/p>\n<\/td>\n<td>\n<p>184.40<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Handlebars<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>8<\/p>\n<\/td>\n<td>\n<p>73.89<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Headsets<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>3<\/p>\n<\/td>\n<td>\n<p>87.0733<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Mountain Frames<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>28<\/p>\n<\/td>\n<td>\n<p>678.2535<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Pedals<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>7<\/p>\n<\/td>\n<td>\n<p>64.0185<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Road Frames<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>33<\/p>\n<\/td>\n<td>\n<p>780.0436<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Saddles<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>9<\/p>\n<\/td>\n<td>\n<p>39.6333<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Touring Frames<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>18<\/p>\n<\/td>\n<td>\n<p>631.4155<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Wheels<\/p>\n<\/td>\n<td>\n<p>Components<\/p>\n<\/td>\n<td>\n<p>14<\/p>\n<\/td>\n<td>\n<p>220.9292<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The R script imports the <strong>ggplot2<\/strong> package, creates the <strong>Chart01.png<\/strong> graphic file for the visualization, constructs the <strong>df<\/strong> data frame based on the above data set, and generates the initial scatter plot. We assign the R script to the <strong>@rscript<\/strong> variable and the T-SQL script to the <strong>@sqlscript<\/strong> variable, and then use those variables when calling the <strong>sp_execute_external_script<\/strong> stored procedure.<\/p>\n<p>Most of what we\u2019re doing here is covered in the preceding articles, so refer to them as necessary. Our primary concern for this article is the part of the R script that defines the graphic (the section that starts with the <em>generate chart<\/em> comment):<\/p>\n<pre class=\"theme:github lang:r decode:true\"># generate chart\r\n  chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  &amp;nbsp; geom_point()\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>For all the examples in this article, we\u2019ll use the same SQL Server data and <strong>df<\/strong> data frame (or a derivative of the data frame), changing only the chart definition as we progress through the article.<\/p>\n<p>To create the first layer of a <strong>ggplot2<\/strong> visualization, we use the <strong>ggplot<\/strong> function, which establishes the initial graphic structure. When calling the function, we must provide the source data frame and specify the visualization\u2019s initial <em>aesthetics,<\/em> the properties that define the graphic\u2019s visual aspects.<\/p>\n<p>Often when defining the aesthetics, we use the <strong>aes<\/strong> function, which maps variables to the aesthetics. Variables, in this case, can include the columns in the source data frame. (The R language essentially treats a column as a variable.) In this case, we\u2019re assigning the <strong>Subcategory<\/strong> column in the <strong>df<\/strong> data frame to the <strong>x<\/strong> property (X-axis), and the <strong>AvgListPrice<\/strong> column to the <strong>y<\/strong> property (Y-axis). Any aesthetics we define in the initial layer apply to all subsequent layers, unless we specifically override them in another layer.<\/p>\n<p>After calling the <strong>ggplot<\/strong> function, we can then define the other layers in the graphic, adding a plus sign for each layer we want to include. Normally, a good place to start is by specifying the graphic type, or <em>geom<\/em>. To define a scatter plot geom, we use the <strong>geom_point<\/strong> function, in this case, without any arguments, but as you\u2019ll see later in the article, we can add arguments that further define the graphic.<\/p>\n<p>The <strong>geom_point<\/strong> function represents the second layer in our graphic. Notice that we assign the graphic definition to the <strong>chart<\/strong> variable. We can then use the <strong>print<\/strong> function to call the variable. This will print the graphic to the <strong>Chart01.png<\/strong> file we defined earlier.<\/p>\n<p>The last step we take in this part of the R script is to call the <strong>dev.off<\/strong> function in order to close the <strong>png<\/strong> device. If we now run the <strong>sp_execute_external_script<\/strong> stored procedure, the R engine would generate a graphic similar to the one in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"600\" class=\"wp-image-72125\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/chart01-png.png\" alt=\"Chart01.png\" \/><\/p>\n<p>As you can see, the chart plots the average list price for each product subcategory. Unfortunately, the information is not very readable, especially the subcategory names, but we can define additional layers to fix those.<\/p>\n<h2>Flipping the chart\u2019s axis points<\/h2>\n<p>Each layer in the graphic definition defines a specific element of that graphic. Together, the layers create the entire picture. In the previous example, we defined two layers, the initial <strong>ggplot<\/strong> layer and the second <strong>geom_point<\/strong> layer. We\u2019ll now add two more layers that switch the X-axis and Y-axis and make the chart more readable:<\/p>\n<pre class=\"theme:github lang:r decode:true\">chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  \u00a0 geom_point() +\r\n  \u00a0 coord_flip() +\u00a0\r\n  \u00a0 xlim(rev(levels(df$Subcategory)))\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>If you reviewed the last article in this series, you might recognize what we\u2019re doing here. First, we use the <strong>coord_flip<\/strong> function to switch the X-axis and Y-axis. Next, we use the <strong>xlim<\/strong> function to modify the order of the subcategory names so they\u2019re listed alphabetically, starting at the top.<\/p>\n<p>The <strong>xlim<\/strong> function defines our data range, based on the values in the <strong>Subcategory<\/strong> column. When referencing that column, the function calls two additional functions. The <strong>levels<\/strong> function retrieves the list of values (levels) from the <strong>Subcategory<\/strong> column, and the <strong>rev<\/strong> function reverses the order of those levels. We need to reverse the order because flipping the chart causes the subcategories to be listed from bottom to top.<\/p>\n<p>That\u2019s all there is to flipping the chart and listing the subcategory names in the correct order. The R script now produces the results shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"600\" class=\"wp-image-72126\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/chart02-png.png\" alt=\"Chart02.png\" \/><\/p>\n<p>Although the subcategory labels are now easier to read, there is still much more we can do with our graphic to make it more presentable, so let\u2019s forge ahead.<\/p>\n<h2>Configuring the plot points<\/h2>\n<p>Our next step is to change the color and size of the individual plot points so they\u2019re easier to pick out in the chart. In this case, however, we\u2019re not going to add a layer, but will instead update the <strong>geom_point<\/strong> layer by specifying values for the <strong>color<\/strong> and <strong>size<\/strong> properties:<\/p>\n<pre class=\"theme:github lang:r decode:true\">chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  \u00a0 geom_point(color=\"slateblue2\", size=5) +\r\n  \u00a0 coord_flip() +\u00a0\r\n  \u00a0 xlim(rev(levels(df$Subcategory)))\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>The geom functions support numerous properties (aesthetics) that let you better define your graphics. In this case, we\u2019re assigning the value <strong>slateblue2<\/strong> to the <strong>color<\/strong> property and the value <strong>5<\/strong> (for points) to the <strong>size<\/strong> property. You can find lists of colors in various resources online, such as the one on the <a href=\"http:\/\/www.stat.columbia.edu\/~tzheng\/files\/Rcolor.pdf\">Columbia University website<\/a>.<\/p>\n<p>With these small changes, our R script now produces the following graphic.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/rs1.png\"><\/p>\n<p>The new settings make it easier to pick out the plot points than the original black pinpoints, although you might prefer different <strong>color<\/strong> and <strong>size<\/strong> values. Your choices will depend in part on the nature and amount of the data you\u2019re displaying. Feel free to play around with various settings until you come up with the combination that works best for your circumstances.<\/p>\n<h2>Mapping the plot points to categories<\/h2>\n<p>When defining a graphic\u2019s aesthetics, we can use other data in the data frame to control how those aesthetics are applied. For example, suppose we want to color-code the plot points based on product categories. We can achieve this by assigning the <strong>Category<\/strong> column to the <strong>color<\/strong> property, rather than an actual color:<\/p>\n<pre class=\"theme:github lang:r decode:true\">chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  \u00a0 geom_point(aes(color=Category), size=5) +\r\n  \u00a0 coord_flip() +\u00a0\r\n  \u00a0 xlim(rev(levels(df$Subcategory)))\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>Notice that we must use the <strong>aes<\/strong> function when setting the <strong>color<\/strong> property. The R engine can then use the values in the <strong>Category<\/strong> column as the basis for applying colors to the plot points. In this situation, the column includes only three values, giving us three different colors, as shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"600\" class=\"wp-image-72129\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/chart04-png.png\" alt=\"Chart04.png\" \/><\/p>\n<p>Notice that the R engine also adds the <strong>Category<\/strong> legend to the graphic to make it easier to identify the categories and their associated colors. The <strong>ggplot2<\/strong> package also provides additional methods for controlling the colors assigned to the categories, but we\u2019ll leave it at this for now.<\/p>\n<h2>Configuring the chart\u2019s labels<\/h2>\n<p>Now that we\u2019ve gotten the chart itself in order, it\u2019s time to focus on the labels, which are currently based on default values. To display more useful labels, we\u2019ll add several layers to our graphic definition:<\/p>\n<pre class=\"theme:github lang:r decode:true\">chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  \u00a0 geom_point(aes(color=Category), size=5) +\r\n  \u00a0 coord_flip() +\u00a0\r\n  \u00a0 xlim(rev(levels(df$Subcategory))) +\r\n  \u00a0 ggtitle(\"Average List Price per Subcategory\") +\r\n  \u00a0 labs(x=\"Product Subcategories\", y=\"List Prices (US dollars)\") +\r\n  \u00a0 guides(color=guide_legend(\"Categories\"))\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>We start by using the <strong>ggtitle<\/strong> function to define the graphic\u2019s title\u2014<em>Average List Price per Subcategory<\/em>\u2014passing in the title as an argument. We then follow with the <strong>labs<\/strong> function to define the X-axis and Y-axis labels, <em>Product Subcategories<\/em> and <em>List Prices (US dollars),<\/em> respectively. We could have used the <strong>labs<\/strong> function to define the title as well, but I wanted to demonstrate that there are different ways to do things in R, and using <strong>ggtitle<\/strong> makes it easier to pick the title out of the script.<\/p>\n<p>The next step is to change the title for the <strong>Category<\/strong> legend so it is plural. For that, we use the <strong>guides<\/strong> function, which lets you configure or remove a legend. In this case, we\u2019re setting the <strong>color<\/strong> property to <strong>Categories<\/strong>, using the <strong>guide_legend<\/strong> function to specify the new name. The <strong>guide_legend<\/strong> function lets us control a number of settings associated with the legend, such as color, size, shape, and transparency. Our graphic should now look like the one shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"600\" class=\"wp-image-72131\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/chart05-png.png\" alt=\"Chart05.png\" \/><\/p>\n<p>There are, of course, a wide range of changes we can make to how the labels and other elements are displayed, but this should give you a sense of types of steps we can take to control the various elements.<\/p>\n<h2>Applying a pre-built theme<\/h2>\n<p>The basic implementation of the R language in SQL Server supports several predefined themes, which specify the graphic\u2019s overall property settings, such as borders, margins, grid lines, and background colors. A theme is essentially a function that defines the various settings, allowing you to apply those settings by calling the function. For example, we can apply the <strong>theme_bw<\/strong> theme to our graphic by calling it in a new layer:<\/p>\n<pre class=\"theme:github lang:r decode:true\">chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  \u00a0 geom_point(aes(color=Category), size=5) +\r\n  \u00a0 coord_flip() +\u00a0\r\n  \u00a0 xlim(rev(levels(df$Subcategory))) +\r\n  \u00a0 ggtitle(\"Average List Price per Subcategory\") +\r\n  \u00a0 labs(x=\"Product Subcategories\", y=\"List Prices (US dollars)\") +\r\n  \u00a0 guides(color=guide_legend(\"Categories\")) +\r\n  \u00a0 theme_bw()\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>All we\u2019re doing is adding a layer that calls the <strong>theme_bw<\/strong> function, which changes the look of our scatter plot to the following chart:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/rs2.png\"><\/p>\n<p>The most obvious difference is that there is no longer a gray background, but other elements have changed as well, such as the weight of the grid lines.<\/p>\n<p>The basic R language includes only a few pre-built themes, but you can find plenty of other themes out there. You just need only to install the applicable R package. In some cases, however, you might want to update individual theme-related properties, rather than all settings. For that, we need to take a different approach.<\/p>\n<h2>Configuring theme elements<\/h2>\n<p>The <strong>ggplot2<\/strong> package supports a wide range of theme-related graphic properties that you can update by using the <strong>theme<\/strong> function. For example, we can change the size and color of the various titles:<\/p>\n<pre class=\"theme:github lang:r decode:true\">chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  \u00a0 geom_point(aes(color=Category), size=5) +\r\n  \u00a0 coord_flip() +\u00a0\r\n  \u00a0 xlim(rev(levels(df$Subcategory))) +\r\n  \u00a0 ggtitle(\"Average List Price per Subcategory\") +\r\n  \u00a0 labs(x=\"Product Subcategories\", y=\"List Prices (US dollars)\") +\r\n  \u00a0 guides(color=guide_legend(\"Categories\")) +\r\n  \u00a0 theme_bw() +\r\n  \u00a0 theme(title=element_text(size=16, color=\"blue3\"))\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>In this case, when calling the <strong>theme<\/strong> function, we\u2019re specifying a value for the <strong>title<\/strong> property, using the <strong>element_text<\/strong> function to set the <strong>size<\/strong> value to <strong>16<\/strong> points and the <strong>color<\/strong> value to <strong>blue3<\/strong>, giving us the results shown in the following figure.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/rs3.png\"><\/p>\n<p>By using the <strong>theme<\/strong> function, we can control dozens of settings, ranging from line sizes to legend margins to background colors. We can even control individual axis labels. For information about the type of settings you can control, see the RDocumentation topic <a href=\"https:\/\/www.rdocumentation.org\/packages\/ggplot2\/versions\/2.2.1\/topics\/theme\">theme<\/a>.<\/p>\n<h2>Faceting the chart based on categories<\/h2>\n<p>Another fun thing we can do with <strong>ggplot2<\/strong> is to split a graphic into multiple small graphics by faceting the data based on values in one of the data frame\u2019s columns. For this, we call the <strong>facet_wrap<\/strong> function and specify the column on which the faceting is based:<\/p>\n<pre class=\"theme:github lang:r decode:true\">chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  \u00a0 geom_point(aes(color=Category), size=5) +\r\n  \u00a0 coord_flip() +\u00a0\r\n  \u00a0 xlim(rev(levels(df$Subcategory))) +\r\n  \u00a0 ggtitle(\"Average List Price per Subcategory\") +\r\n  \u00a0 labs(x=\"Product Subcategories\", y=\"List Prices (US dollars)\") +\r\n  \u00a0 guides(color=guide_legend(\"Categories\")) +\r\n  \u00a0 theme_bw() +\r\n  \u00a0 theme(title=element_text(size=16, color=\"blue3\")) +\r\n  \u00a0 facet_wrap(~Category)\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>To implement faceting, we add the <strong>facet_wrap<\/strong> function as a new layer and include the <strong>Category<\/strong> column as an argument. Notice that we must precede the column name with a tilde (<strong>~<\/strong>) operator. This is because the function expects a formula as an argument, and the R language uses the tilde operator to define formulas. In this case, however, we\u2019re providing only the right side of the formula, which is the column name. (The topic of R formulas and the use of the tilde argument is beyond the scope of this article. For now, just know that you need to precede the column name with the operator so the R engine doesn\u2019t baulk.)<\/p>\n<p>When we add the <strong>facet_wrap<\/strong> function, specifying the <strong>Category<\/strong> column, the R script returns the graphic shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"600\" class=\"wp-image-72136\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/chart08-png.png\" alt=\"Chart08.png\" \/><\/p>\n<p>We now have one chart for each product category. Although this is a relatively simple example, it shows how easy it is to create individual charts based on a specific set of values. For complex data, this can provide a valuable resource to help display data more clearly.<\/p>\n<h2>Creating a bar chart<\/h2>\n<p>Our focus to this point has been on adding layers to our scatter plot graphic, but the same principles also apply to other geom types. For example, the following graphic definition defines a bar chart:<\/p>\n<pre class=\"theme:github lang:r decode:true\">chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  \u00a0 geom_bar(stat=\"identity\", aes(fill=Category)) +\r\n  \u00a0 coord_flip() +\u00a0\r\n  \u00a0 xlim(rev(levels(df$Subcategory))) +\r\n  \u00a0 ggtitle(\"Average List Price per Subcategory\") +\r\n  \u00a0 labs(x=\"Product Subcategories\", y=\"List Prices (US dollars)\", fill=\"Categories\") +\r\n  \u00a0 theme_bw() +\r\n  \u00a0 theme(title=element_text(size=16, color=\"blue3\"))\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>Notice that many of the layers are similar the scatter plot. We flip the X-axis and Y-axis, reverse the list of subcategories, format the labels, and apply theme settings. However, there are also several differences.<\/p>\n<p>To begin with, we specify the <strong>geom_bar<\/strong> function, rather than the <strong>geom_point<\/strong> function, and pass in the <strong>stat<\/strong> and <strong>fill<\/strong> arguments. The <strong>stat<\/strong> argument and its value, <strong>identity<\/strong>, ensure that the data values map correctly to the plot points. The <strong>fill<\/strong> argument specifies the bar fill colors. Because we want to color-code the bars based on product categories, we specify the <strong>Category<\/strong> column, wrapping the argument and value in the <strong>aes<\/strong> function.<\/p>\n<p>Another difference from previous examples is that we do not use the <strong>guides<\/strong> function to set the legend title. Instead, we include the <strong>fill<\/strong> property when calling the <strong>labs<\/strong> function, but we still set the value to <strong>Categories<\/strong>.<\/p>\n<p>You\u2019ll find many small differences like this from one geom to the next, but the general concept of adding layers remains the same. The graphic definition now generates the bar chart shown in the following figure.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/rs4.png\"><\/p>\n<p>That\u2019s all there is to creating a bar chart. If you reviewed the second article in this series, most of the script components should look familiar to you. Even if you have not, the approach to creating a bar chart is the same as creating a scatter plot or any other geom.<\/p>\n<h2>Adding labels to the chart\u2019s bars<\/h2>\n<p>We can also add labels to the graphic itself. As a quick example, let\u2019s add the number of products per subcategory by defining an additional layer:<\/p>\n<pre class=\"theme:github lang:r decode:true\">chart &lt;- ggplot(data=df, aes(x=Subcategory, y=AvgListPrice)) +\r\n  \u00a0 geom_bar(stat=\"identity\", aes(fill=Category)) +\r\n  \u00a0 coord_flip() +\u00a0\r\n  \u00a0 xlim(rev(levels(df$Subcategory))) +\r\n  \u00a0 ggtitle(\"Average List Price per Subcategory\") +\r\n  \u00a0 labs(x=\"Product Subcategories\", y=\"List Prices (US dollars)\", fill=\"Categories\") +\r\n  \u00a0 theme_bw() +\r\n  \u00a0 theme(title=element_text(size=16, color=\"blue3\")) +\r\n  \u00a0 geom_text(aes(label=NumberProducts), size=4, hjust=-0.2)\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>In this case, we\u2019re using the <strong>geom_text<\/strong> function to specify that the values in the <strong>NumberProducts<\/strong> column be used as bar labels. We do this by assigning the column to the <strong>label<\/strong> property. Notice that we must use the <strong>aes<\/strong> function to set the property value because we\u2019re specifying a column (variable), rather than a literal value.<\/p>\n<p>Next, we set the <strong>size<\/strong> value to <strong>4<\/strong> points and the <strong>hjust<\/strong> value to <strong>-0.2<\/strong> points to provide a bit of space between the bar and the label, as shown in the following figure.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/rs5.png\"><\/p>\n<p>This is just a small addition to our graphic, but it shows how we can keep adding layers to provide us with exactly the information we need, based on the available data. In fact, we can even incorporate different geoms, treating each one as an additional layer.<\/p>\n<h2>Creating a stacked bar chart<\/h2>\n<p>The <strong>ggplot2<\/strong> package also allows us to create a stacked bar chart, in which each bar is broken into segments, based on a set of data values. For example, we can create a stacked bar chart broken down by product category and then by subcategory, as shown in the following graphic definition:<\/p>\n<pre class=\"theme:github lang:r decode:true\">df2 &lt;- df[df$NumberProducts &gt; 3, ]\r\n  chart &lt;- ggplot(data=df2, aes(x=Category, fill=Subcategory)) +\r\n  \u00a0 geom_bar() +\r\n  \u00a0 ggtitle(\"Categories and Subcategories\") +\r\n  \u00a0 labs(x=\"Product Categories\", y=\"Number of Subcategories per Category\", fill=\"Subcategories\") +\r\n  \u00a0 theme_bw() +\r\n  \u00a0 theme(title=element_text(size=16, color=\"blue3\"))\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>You\u2019ve seen all these elements before, but notice that that the <strong>ggplot<\/strong> function now specifies the <strong>Category<\/strong> column as the X-axis and includes the <strong>fill<\/strong> property, with the <strong>Subcategory<\/strong> column as its value.<\/p>\n<p>Also note that, before defining the graphic, we filter the <strong>df<\/strong> data frame so it includes only subcategories with more than three products. We then assign the filtered data frame to the <strong>df2<\/strong> variable, which we then use in the <strong>ggplot<\/strong> function. The R script now generates the graphic shown in the following figure.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/rs6.png\"><\/p>\n<p>That\u2019s all there is to creating a simple stacked bar chart. We need only assign the column name to the <strong>fill<\/strong> argument, using the <strong>aes<\/strong> function to wrap the argument and value. You might have also noticed here and elsewhere that we are able to include multiple arguments in a single <strong>aes<\/strong> call, in this case, <strong>x<\/strong> and <strong>fill<\/strong>. You can specify multiple aesthetics whenever calling the <strong>aes<\/strong> function.<\/p>\n<h2>Creating a density plot<\/h2>\n<p>Let\u2019s look at one more example that demonstrates a few additional concepts. This time we\u2019ll create a density plot, which shows how data values are distributed across a specified interval. For this example, we\u2019ll plot the <strong>AvgListPrice<\/strong> values for each product subcategory, as they\u2019re distributed across the three product categories:<\/p>\n<pre class=\"theme:github lang:r decode:true\">chart &lt;- ggplot(data=df, aes(x=AvgListPrice, fill=Category)) +\r\n  \u00a0 geom_density(alpha=0.4) +\r\n  \u00a0 ggtitle(\"Product Density Plot Chart\") +\r\n  \u00a0 labs(x=\"Average List Prices\", y=\"Density\", fill=\"Categories\") +\r\n  \u00a0 theme_bw() +\r\n  \u00a0 theme(title=element_text(size=16, color=\"blue3\"))\r\n  print(chart)\r\n  dev.off()';<\/pre>\n<p>To create a density plot, we start by defining the aesthetics when calling the <strong>ggplot<\/strong> function, setting the <strong>x<\/strong> property to the <strong>AvgListPrice<\/strong> column and the <strong>fill<\/strong> property to the <strong>Category<\/strong> column. Next, we call the <strong>geom_density<\/strong> function, passing in the <strong>alpha<\/strong> argument, which defines the percentage of transparency for the charted data (40% in this case).<\/p>\n<p>All other layers are similar to what you\u2019ve seen in earlier examples. This time, however, the R script returns the following graphic.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/08\/rs7.png\"><\/p>\n<p>The chart shows the value densities across the categories, with the highest rate in the <strong>Accessories<\/strong> category. Because we set the <strong>alpha<\/strong> property to 40%, the colors are transparent, which makes it easier to view overlapping categories.<\/p>\n<h2>Working with ggplot2 R package<\/h2>\n<p>There is a great deal more we can do with the <strong>ggplot2<\/strong> package when it comes to visualizing data, but what we\u2019ve covered here should give you a good idea of the package\u2019s capabilities. As we move on to the analytics side of the R language, keep the <strong>ggplot2<\/strong> package in mind for a quick way to render the analyzed data as graphics. The key is to get the data you need into a data frame and then use that data frame to visualize the data.<\/p>\n<p>In the meantime, play around with the <strong>ggplot2<\/strong> functions against various types of data so you can get a sense of the many options you have for visualizing data. Even if you rely solely on SQL Server data for now, as we have done here, you can still benefit from the extensive visualization capabilities at your fingertips (and impress the powers-that-be along the way). Then, as you dig deeper into other aspects of the R language, you\u2019ll be ready to turn those analytics into meaningful visualizations, using the large arsenal of tools already at your disposal.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It is when you use R in SQL Server with one of the huge range of packages that comes with it that you can begin to appreciate the power of the system.  With a package such as ggplot there are many &#8216;knobs one can twiddle&#8217; in order to get spectacular and informative visualisations. Rob Sheldon continues his beginners series for R in SQL Server by  showing how to refine the output to get it as you need it.&hellip;<\/p>\n","protected":false},"author":221841,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143528],"tags":[],"coauthors":[6779],"class_list":["post-72120","post","type-post","status-publish","format-standard","hentry","category-bi-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/72120","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221841"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=72120"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/72120\/revisions"}],"predecessor-version":[{"id":72279,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/72120\/revisions\/72279"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=72120"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=72120"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=72120"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=72120"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}