{"id":76519,"date":"2018-01-02T22:24:37","date_gmt":"2018-01-02T22:24:37","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=76519"},"modified":"2022-04-27T21:23:21","modified_gmt":"2022-04-27T21:23:21","slug":"sql-server-machine-learning-services-part-3-plotting-data-python","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/sql-server-machine-learning-services-part-3-plotting-data-python\/","title":{"rendered":"SQL Server Machine Learning Services \u2013 Part 3: Plotting Data with Python"},"content":{"rendered":"<h4>The series so far:<\/h4>\n<ol>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/machine-learning-services-part-1-python\/\">SQL Server Machine Learning Services \u2013 Part 1: Python Basics<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-2-python-data-frames\/\">SQL Server Machine Learning Services \u2013 Part 2: Python Data Frames<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-3-plotting-data-python\/\">SQL Server Machine Learning Services \u2013 Part 3: Plotting Data with Python<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-4-finding-data-anomalies-python\/\">SQL Server Machine Learning Services \u2013 Part 4: Finding Data Anomalies with Python<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-5-generating-multiple-plots-python\/\">SQL Server Machine Learning Services \u2013 Part 5: Generating multiple plots in Python<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-6-merging-data-frames-python\/\">SQL Server Machine Learning Services \u2013 Part 6: Merging Data Frames in Python<\/a><\/li>\n<\/ol>\n\n<p>With the release of SQL Server 2017, Microsoft changed the name of R Services to Machine Learning Services (MLS) and added support for Python. As with the R language, you can use Python to perform advanced analytics against SQL Server data within the context of a SQL Server database. Also like R, if the basic Python language elements are not enough to carry out specific operations, you can import modules into your script that offer additional tools for working with data.<\/p>\n<p>One of the most useful modules is the <strong>matplotlib<\/strong> library, which provides an extensive codebase for plotting data and creating rich, customized visualizations. You can use <strong>matplotlib<\/strong> components to generate a wide range of graphics, including bar charts, pie charts, scatter plots, histograms, and many others. For example, you can generate a series of line charts that aggregate inventory or sales data in your SQL Server database and then save those charts to .png or .pdf files.<\/p>\n<p>This article includes several examples that demonstrate how to create <strong>matplotlib<\/strong> visualizations and save them to .pdf files, using data from the <strong>AdventureWorks2017<\/strong> sample database. The article assumes that you know how to use the <strong>sp_execute_external_script<\/strong> stored procedure to run Python scripts in SQL Server. If you\u2019re not familiar with the stored procedure, you should review the first two articles in this series before continuing with this one.<\/p>\n<h2>Generating a basic bar chart<\/h2>\n<p>The <strong>matplotlib<\/strong> module provides an extensive set of tools for creating visualizations and saving them to files. Unfortunately, the <strong>matplotlib<\/strong> documentation does not achieve the same level of quality as its code. The information can be confusing and inadequate in places, making it difficult to understand certain aspects of plotting data and how to create the various graphics, especially when you\u2019re just getting started.<\/p>\n<p>Because of this, you might have better luck learning about <strong>matplotlib<\/strong> by seeing it in action. For example, the following T-SQL code generates a horizontal bar chart based on aggregated data from the <strong>AdventureWorks2017<\/strong> database. Be sure to modify the file path in the script for your environment or create the <strong>c:\\datafiles<\/strong> folder before running the code:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">Use AdventureWorks2017;\r\nGO\r\n\r\nDECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import matplotlib modules\r\nimport matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\n\r\n# define df data frame\r\ndf = InputDataSet.groupby(\"Territories\").sum()\r\n\r\n# create bar chart object\r\npt = df.plot.barh()\r\n\r\n# save bar chart to PDF file\r\nplt.savefig(\"c:\\\\datafiles\\\\TerritorySales01.pdf\", \r\n  bbox_inches=\"tight\", pad_inches=.5)';\r\n\r\nDECLARE @sqlscript NVARCHAR(MAX);\r\nSET @sqlscript = N'\r\n  SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\r\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\r\n    ON h.TerritoryID = t.TerritoryID\r\n  WHERE YEAR(h.OrderDate) = 2013;';\r\n\r\nEXEC sp_execute_external_script\r\n  @language = N'Python',\r\n  @script = @pscript,\r\n  @input_data_1 = @sqlscript;\r\nGO\r\n<\/pre>\n<p>If you\u2019ve come this far in the series, most of these elements should look very familiar to you. The example starts by defining a Python script and assigning it the <strong>@pscript<\/strong> variable. Next, the code defines a T-SQL query and saves that to the <strong>@sqlscript<\/strong> variable. The two variables are then passed to the <strong>sp_execute_external_script<\/strong> stored procedure so the data returned by the T-SQL query can be used within the Python script. If you have any questions about the general statement structure or about running the stored procedure, refer to the first article in this series.<\/p>\n<p>With that in mind, let\u2019s dig into the Python script itself, starting with the first <strong>import<\/strong> statement and its related <strong>use<\/strong> function:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">import matplotlib\r\nmatplotlib.use(\"PDF\")\r\n<\/pre>\n<p>The <strong>import<\/strong> statement calls the <strong>matplotlib<\/strong> module. The second statement then uses the module to call the <strong>use<\/strong> function, which sets the backend to <strong>PDF<\/strong>. A backend provides the structure necessary to support a specific output type. Because <strong>PDF<\/strong> is specified here, you can output your charts to .pdf files.<\/p>\n<p>The next step is to import the <strong>pyplot<\/strong> plotting framework that is included in the <strong>matplotlib<\/strong> package:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  import matplotlib.pyplot as plt\r\n<\/pre>\n<p>Because the <strong>import<\/strong> statement assigns the <strong>plt<\/strong> alias to the framework, you can use that alias to reference <strong>pyplot<\/strong> elements in your Python script. Note that you should run this <strong>import<\/strong> statement <em>after<\/em> you set up the <strong>PDF<\/strong> backend. Otherwise, the <strong>use<\/strong> function call will have no effect because a backend is already selected.<\/p>\n<p>With the modules in place, you can then set up your data frame. For this example, you need only use pandas <strong>groupby<\/strong> function to aggregate the SQL Server data available through the <strong>InputDataSet<\/strong> variable:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  df = InputDataSet.groupby(\"Territories\").sum()\r\n<\/pre>\n<p>The <strong>groupby<\/strong> function is used here just like it is in the first two articles in this series, except that it does not set the <strong>as_index<\/strong> parameter to <strong>False<\/strong>. As a result, the <strong>Territories<\/strong> column will be added to the outputted data frame as an index, rather than a numeric index being auto-generated. I\u2019ve taken this approach to demonstrate how indexes can be used when plotting data, as you\u2019ll see later in the article.<\/p>\n<p>After you\u2019ve set up the <strong>df<\/strong> dataframe, you can create a bar chart by calling the <strong>plot.barh<\/strong> function on the data frame:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt = df.plot.barh()\r\n<\/pre>\n<p>The <strong>plot.barh<\/strong> function generates a figure that contains a horizontal bar chart, which is then saved to the <strong>pt<\/strong> variable. (As we progress through the examples in this article, you\u2019ll see how the <strong>pt<\/strong> variable can be used to format different components of the chart.) If you want to create a visualization other than a horizontal bar chart, you would call a different function, specifying the data frame and <strong>plot<\/strong> root. For example, to generate a histogram, you would use <strong>df.plot.hist()<\/strong>. The only exception to this convention is the basic line chart, which requires only <strong>df.plot()<\/strong>.<\/p>\n<p>Once you\u2019ve created the bar chart, you can use the <strong>savefig<\/strong> function available to the <strong>pyplot<\/strong> framework to save the chart to a .pdf file:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">plt.savefig(\"c:\\\\datafiles\\\\TerritorySales01.pdf\", \r\n  bbox_inches=\"tight\", pad_inches=.5)';\r\n<\/pre>\n<p>When calling the function, you start by passing in the path to the target file. The example here targets the <strong>datafiles<\/strong> folder on the local <strong>C:<\/strong> drive, but you can specify any location. (Be sure to update the code accordingly if you run these examples.)<\/p>\n<p>The function\u2019s second parameter (<strong>bbox_inches<\/strong>) and third parameter (<strong>pad_inches<\/strong>) work together to control the amount of white space (bounding box) around the image. The <strong>bbox_inches<\/strong> parameter can be set to <strong>tight<\/strong> or <strong>None<\/strong> (the default), or it can be set to the actual bounding box dimensions in inches. For this example, you can use the <strong>tight<\/strong> setting, which removes the extra white space from around the graphic. The <strong>pad_inches<\/strong> parameter then adds a specified amount of white space around the image (in this case, .5 inches). The <strong>pad_inches<\/strong> parameter applies only when the <strong>bbox_inches<\/strong> parameter is set to <strong>tight<\/strong>. Using these two parameters together provides a simple way to define the bounding box, without having to calculate exact measurements.<\/p>\n<p>The Python script generates the chart shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-76521\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/01\/20screenshots3-stpython3_fig01-png.png\" alt=\"%20screenshots3\/stPython3_fig01.png\" width=\"1004\" height=\"693\" \/><\/p>\n<p>The figure shows a horizontal chart that uses the default formatting. The territories are listed on the Y-axis, and the sales amounts are listed on the X-axis (in scientific notation). The bars indicate the total amount of sales per territory.<\/p>\n<h2>Formatting the chart\u2019s components<\/h2>\n<p>The <strong>matplotlib<\/strong> module includes a wide range of options for adjusting how a chart is rendered. You can add labels, change their format, control how the legend is displayed, set the colors of the text and chart elements, and configure other components. For example, you can modify the chart above by changing the size and color of the bars, adding a title and grid, and removing the legend, as shown in the following Python script:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import matplotlib modules\r\nimport matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\n\r\n# define df data frame\r\ndf = InputDataSet.groupby(\"Territories\").sum()\r\n\r\n# create bar chart object\r\npt = df.plot.barh(color=\"blue\", alpha=.8, width=.4)\r\n\r\n# configure title, legend, and grid\r\npt.set_title(label=\"Sales by Territory\", y=1.04, \r\n  family=\"fantasy\", fontsize=14, weight=500, color=\"navy\")\r\npt.legend().set_visible(False)\r\npt.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\n\r\n# save bar chart to PDF file\r\nplt.savefig(\"c:\\\\datafiles\\\\TerritorySales02.pdf\", \r\n  bbox_inches=\"tight\", pad_inches=.5)';\r\n\r\nDECLARE @sqlscript NVARCHAR(MAX);\r\nSET @sqlscript = N'\r\n  SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\r\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\r\n    ON h.TerritoryID = t.TerritoryID\r\n  WHERE YEAR(h.OrderDate) = 2013;';\r\n\r\nEXEC sp_execute_external_script\r\n  @language = N'Python',\r\n  @script = @pscript,\r\n  @input_data_1 = @sqlscript;     \r\nGO\r\n<\/pre>\n<p>The Python script builds on the preceding example by first modifying the <strong>plot.barh<\/strong> function to include several parameters that change the color and width of the bars:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt = df.plot.barh(color=\"blue\", alpha=.8, width=.4)\r\n<\/pre>\n<p>The <strong>color<\/strong> parameter should be obvious. You need only specify one of the many supported colors. You can access color samples at <a href=\"https:\/\/matplotlib.org\/examples\/index.html\">Matplotlib Examples<\/a>, where you can also access examples of other language elements.<\/p>\n<p>The <strong>alpha<\/strong> parameter sets the bar transparency. The parameter takes any float value between <strong>0.0<\/strong> (transparent) and <strong>1.0<\/strong> (opaque). The final parameter, <strong>width<\/strong>, sets the bar width in inches.<\/p>\n<p>When creating a chart, you have multiple options for formatting its components. As you\u2019ve just seen with <strong>plot.barh<\/strong>, you can provide parameter values when calling the chart function. Each function supports a number of parameters, so be sure to refer to the <strong>matplotlib<\/strong> documentation as necessary.<\/p>\n<p>You can also use other functions available in the <strong>matplotlib<\/strong> module for formatting graphic components. For example, the next statement in the Python script calls the <strong>set_title<\/strong> function on the <strong>pt<\/strong> variable to create a chart title:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">pt.set_title(label=\"Sales by Territory\", y=1.04, \r\n  family=\"fantasy\", fontsize=14, weight=500, color=\"navy\")\r\n<\/pre>\n<p>The <strong>label<\/strong> parameter provides the text for the title. The <strong>y<\/strong> parameter refers to the title\u2019s Y coordinate, relative to the chart itself. A value of <strong>1<\/strong> means the label will sit directly on top of the chart. For this example, I\u2019ve bumped it up a bit (<strong>1.04<\/strong>) so the title sits a little higher. You can play with this and pick the setting that you think is best.<\/p>\n<p>The next parameter in the <strong>set_title<\/strong> function is <strong>family<\/strong>, which specifies the font family to use for the title. I picked <strong>fantasy<\/strong> just for fun, but you can specify any supported font you want, such as <strong>Times<\/strong> <strong>New<\/strong> <strong>Roman<\/strong>. The <strong>fontsize<\/strong> parameter provides the font\u2019s point size, and the <strong>weight<\/strong> parameter determines how light or bold the font is. The parameter takes a numeric value between <strong>0<\/strong> (lightest) to <strong>1000<\/strong> (boldest). It also supports such keywords as <strong>light<\/strong>, <strong>normal<\/strong>, <strong>semibold<\/strong>, or <strong>bold<\/strong>. The <strong>color<\/strong> parameter then sets the font\u2019s color to navy blue.<\/p>\n<p>If you refer back to the previous figure, you\u2019ll see that the chart includes a legend. Because there\u2019s only one category of data, <strong>Sales<\/strong>, the legend is not necessary. You can remove the legend by including the following statement in your script:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt.legend().set_visible(False)\r\n<\/pre>\n<p>As you can see, you need only call the <strong>pt<\/strong> variable, followed by the <strong>legend<\/strong> function, and then by the <strong>set-visible<\/strong> function, passing in a value of <strong>False<\/strong>.<\/p>\n<p>The next statement in the Python script uses the <strong>grid<\/strong> function to add a light-gray grid to the chart to make it easier to track totals:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\n<\/pre>\n<p>The function\u2019s parameters and values should be self-evident. The function adds a grid that is slate-gray with 50% transparency. In addition, the grid lines are dotted and have a width of only a half point. The Python script generates the chart shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-76522\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/01\/20screenshots3-stpython3_fig02-png.png\" alt=\"%20screenshots3\/stPython3_fig02.png\" width=\"1004\" height=\"737\" \/><\/p>\n<p>The chart now includes a title and grid, but no legend. In addition, the graph bars are narrower than before and a darker blue color. For your charts, you can use whatever fonts and colors you want. The important point, of course, is to make the data itself as clear and easy to understand as possible.<\/p>\n<h2>Formatting the Y-axis labels<\/h2>\n<p>In the previous chart, the Y-axis labels are listed in alphabetical order from the bottom up, but you can easily reverse the order. You can also format the axis and tick labels, as shown in the following example:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import matplotlib modules\r\nimport matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\n\r\n# define df data frame\r\ndf = InputDataSet.groupby(\"Territories\").sum()\r\n\r\n# create bar chart object\r\npt = df.plot.barh(color=\"blue\", alpha=.8, width=.4)\r\n\r\n# configure title, legend, and grid\r\npt.set_title(label=\"Sales by Territory\", y=1.04, \r\n  family=\"fantasy\", fontsize=14, weight=500, color=\"navy\")\r\npt.legend().set_visible(False)\r\npt.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\n\r\n# format Y axis\r\npt.invert_yaxis()\r\npt.set_ylabel(\"Regional Territories\", fontsize=12, color=\"navy\")\r\npt.set_yticklabels(labels=df.index, fontsize=9, color=\"navy\")\r\n\r\n# save bar chart to PDF file\r\nplt.savefig(\"c:\\\\datafiles\\\\TerritorySales03.pdf\", \r\n  bbox_inches=\"tight\", pad_inches=.5)';\r\n\r\nDECLARE @sqlscript NVARCHAR(MAX);\r\nSET @sqlscript = N'\r\n  SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\r\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\r\n    ON h.TerritoryID = t.TerritoryID\r\n  WHERE YEAR(h.OrderDate) = 2013;';\r\n\r\nEXEC sp_execute_external_script\r\n  @language = N'Python',\r\n  @script = @pscript,\r\n  @input_data_1 = @sqlscript;     \r\nGO\r\n<\/pre>\n<p>The Python script includes a new section of code that contains several statements specific to the Y-axis. The first of these statements calls the <strong>invert_yaxis<\/strong> function to reverse the order of the <strong>Territories<\/strong> values:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt.invert_yaxis()\r\n<\/pre>\n<p>You don\u2019t need to pass in any parameters when calling the function. You need only specify the <strong>pt<\/strong> variable to ensure that the changes are applied to that specific chart.<\/p>\n<p>The next statement in the Python script uses the <strong>set_ylabel<\/strong> function to rename the Y-axis label and specify a font size and color:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt.set_ylabel(\"Regional Territories\", fontsize=12, color=\"navy\")\r\n<\/pre>\n<p>You can also adjust the tick labels by using the <strong>set_yticklabels<\/strong> function:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt.set_yticklabels(labels=df.index, fontsize=9, color=\"navy\")\r\n<\/pre>\n<p>Notice that the <strong>labels<\/strong> parameter takes a value of <strong>df.index<\/strong>. You must use the data frame\u2019s index to get the <strong>Territories<\/strong> names because the <strong>groupby<\/strong> function does not set the <strong>as_index<\/strong> parameter to <strong>False<\/strong>. As a result, the unique <strong>Territories<\/strong> values are used as the index, so you must reference that index when you specify values for the tick labels. The following figure shows the changes to the chart\u2019s Y-axis.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-76523\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/01\/20screenshots3-stpython3_fig03-png.png\" alt=\"%20screenshots3\/stPython3_fig03.png\" width=\"1004\" height=\"749\" \/><\/p>\n<p>Now the territories are listed in alphabetical order from the top down and use a smaller font and different color. The axis label has also been renamed and reformatted with a different font size and color.<\/p>\n<h2>Formatting the X-axis labels<\/h2>\n<p>The final step for this chart is to format the labels associated with the X-axis. For that, you can add a section that is specific to that axis:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import matplotlib modules\r\nimport matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\n\r\n# define df data frame\r\ndf = InputDataSet.groupby(\"Territories\").sum()\r\n\r\n# create bar chart object\r\npt = df.plot.barh(color=\"blue\", alpha=.8, width=.4)\r\n\r\n# configure title, legend, and grid\r\npt.set_title(label=\"Sales by Territory\", y=1.04, \r\n  family=\"fantasy\", fontsize=14, weight=500, color=\"navy\")\r\npt.legend().set_visible(False)\r\npt.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\n\r\n# format Y axis\r\npt.invert_yaxis()\r\npt.set_ylabel(\"Regional Territories\", fontsize=12, color=\"navy\")\r\npt.set_yticklabels(labels=df.index, fontsize=9, color=\"navy\")\r\n\r\n# format X axis\r\npt.set_xlabel(\"Total Sales\", labelpad=10, fontsize=12, color=\"navy\")\r\npt.set_xticklabels(labels=df[\"Sales\"], fontsize=9, color=\"navy\")\r\npt.get_xaxis().set_major_formatter(matplotlib.ticker.FuncFormatter\r\n  (lambda x, p: format(int(x), \",\")))\r\n\r\n# save bar chart to PDF file\r\nplt.savefig(\"c:\\\\datafiles\\\\TerritorySales04.pdf\", \r\n  bbox_inches=\"tight\", pad_inches=.5)';\r\n\r\nDECLARE @sqlscript NVARCHAR(MAX);\r\nSET @sqlscript = N'\r\n  SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\r\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\r\n    ON h.TerritoryID = t.TerritoryID\r\n  WHERE YEAR(h.OrderDate) = 2013;';\r\n\r\nEXEC sp_execute_external_script\r\n  @language = N'Python',\r\n  @script = @pscript,\r\n  @input_data_1 = @sqlscript;     \r\nGO\r\n<\/pre>\n<p>The first statement in the X-axis section uses the <strong>set_xlabel<\/strong> function to format the axis label and add more space between it and the bottom of the chart:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt.set_xlabel(\"Total Sales\", labelpad=10, fontsize=12, color=\"navy\")\r\n<\/pre>\n<p>Most of the function\u2019s parameters should look familiar to you. The function sets the label text to <em>Total Sales<\/em>, the font size to 12 points, and the font color to navy blue. What\u2019s new here is the <strong>labelpad<\/strong> parameter, which specifies the number of points to put between the label and the X-axis. You can adjust the number of points as necessary, choosing whatever is best for your circumstances.<\/p>\n<p>After formatting the axis label, you can use the <strong>set_xticklabel<\/strong> function to format the tick labels:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt.set_xticklabels(labels=df[\"Sales\"], fontsize=9, color=\"navy\")\r\n<\/pre>\n<p>When you formatted the Y-axis labels, you used the data frame\u2019s index for the tick values. For the X-axis tick labels, you use the <strong>Sales<\/strong> column, whose values provide the bases for the tick labels. The actual labels depend on the type of data and how the ticks are spaced. In this case, the ticks will be spaced in increments of 2 million.<\/p>\n<p>The final X-axis statement calls the <strong>get_xaxis<\/strong> function, followed by the <strong>set_major_formatter<\/strong> function:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">pt.get_xaxis().set_major_formatter(matplotlib.ticker.FuncFormatter\r\n  (lambda x, p: format(int(x), \",\")))\r\n<\/pre>\n<p>The goal here is to add comma separators to tick labels. The <strong>get_xaxis<\/strong> function returns an <strong>XAxis<\/strong> instance, which is used to call the <strong>set_major_formatter<\/strong> function. The <strong>set_major_formatter<\/strong> function is used to format the major X-axis labels (as opposed to the minor labels). In this case, the function takes the <strong>matplotlib.ticker.FuncFormatter<\/strong> function as an argument, making it possible to run a lambda function against the tick values.<\/p>\n<p>As you\u2019ll recall from the second article in this series, a lambda function is simply an anonymous, unnamed function. In this case, the lambda function runs the <strong>format<\/strong> function against each tick value, adding comma separators as necessary. The following figure shows the chart generated by the updated Python script.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-76524\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/01\/20screenshots3-stpython3_fig04-png.png\" alt=\"%20screenshots3\/stPython3_fig04.png\" width=\"1005\" height=\"765\" \/><\/p>\n<p>The X-axis now includes the <em>Total Sales<\/em> label, just below the bottom of the chart. In addition, the X-axis tick labels are incremented at every 2 million and formatted to be consistent with the Y-axis labels. Notice that the tick values no longer appear as scientific notation. The <strong>set_xticklabels<\/strong> function takes care of that.<\/p>\n<h2>Generating a line chart<\/h2>\n<p>Now that you\u2019ve gotten a taste of how to use the <strong>matplotlib<\/strong> language elements to create a chart, let\u2019s mix things up a bit. This time around, we\u2019ll create a basic line chart, using the same SQL Server data as in the previous examples:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import matplotlib modules\r\nimport matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\n\r\n# define df data frame\r\ndf = InputDataSet.groupby(\"Territories\").sum()\r\n\r\n# create line chart object\r\npt = df.plot(color=\"blue\", alpha=.8, linewidth=2)\r\n\r\n# configure title, legend, and grid\r\npt.set_title(label=\"Sales by Territory\", y=1.04, \r\n  family=\"fantasy\", fontsize=14, weight=500, color=\"navy\")\r\npt.legend().set_visible(False)\r\npt.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\n\r\n# format Y axis\r\npt.set_ylabel(\"Total Sales\", labelpad=20, fontsize=12, color=\"navy\")\r\npt.set_yticklabels(labels=df[\"Sales\"], fontsize=9, color=\"navy\")\r\npt.get_yaxis().set_major_formatter(matplotlib.ticker.FuncFormatter\r\n  (lambda x, p: format(int(x), \",\")))\r\n\r\n# format X axis\r\npt.set_xlabel(\"Regional Territories\", labelpad=20, fontsize=12, color=\"navy\")\r\npt.set_xticklabels(labels=df.index, fontsize=9, color=\"navy\", \r\n  rotation=45, horizontalalignment=\"right\")\r\n\r\n# save line chart to PDF file\r\nplt.savefig(\"c:\\\\datafiles\\\\TerritorySales05.pdf\", \r\n  bbox_inches=\"tight\", pad_inches=.8)';\r\n\r\nDECLARE @sqlscript NVARCHAR(MAX);\r\nSET @sqlscript = N'\r\n  SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\r\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\r\n    ON h.TerritoryID = t.TerritoryID\r\n  WHERE YEAR(h.OrderDate) = 2013;';\r\n\r\nEXEC sp_execute_external_script\r\n  @language = N'Python',\r\n  @script = @pscript,\r\n  @input_data_1 = @sqlscript;     \r\nGO\r\n<\/pre>\n<p>The first statement worth noting in the Python script is the plot definition:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt = df.plot(color=\"blue\", alpha=.8, linewidth=2)\r\n<\/pre>\n<p>Because you\u2019re creating a line chart, you need to specify only the <strong>plot<\/strong> function, without the additional attribute (e.g., <strong>plot.barh<\/strong>). For this example, the function takes three parameters, which specify the color of the plot lines, their transparency, and their width.<\/p>\n<p>The next part of the Python script defines the title and grid and removes the legend, just like the previous examples, so we don\u2019t need to go into that again. This is followed by the statements that format the Y-axis labels:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">pt.set_ylabel(\"Total Sales\", labelpad=20, fontsize=12, color=\"navy\")\r\npt.set_yticklabels(labels=df[\"Sales\"], fontsize=9, color=\"navy\")\r\npt.get_yaxis().set_major_formatter(matplotlib.ticker.FuncFormatter\r\n  (lambda x, p: format(int(x), \",\")))\r\n<\/pre>\n<p>You\u2019ve seen all these settings before, but rearranged to format the Y-axis labels and add comma separators to the sales totals.<\/p>\n<p>Most of the X-axis settings are also similar to what you saw in the earlier examples:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">pt.set_xlabel(\"Regional Territories\", labelpad=20, fontsize=12, color=\"navy\")\r\npt.set_xticklabels(labels=df.index, fontsize=9, color=\"navy\", \r\n  rotation=45, horizontalalignment=\"right\")\r\n<\/pre>\n<p>The only new elements are the last two parameters passed into the <strong>set_xticklabels<\/strong> function. The <strong>rotation<\/strong> parameter specifies that the tick labels should be rotated 45 degrees, and the <strong>horizontalalignment<\/strong> parameter ensures that the labels line up with the tick marks correctly. The Python script now generates the chart shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-76525\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/01\/20screenshots3-stpython3_fig05-png.png\" alt=\"%20screenshots3\/stPython3_fig05.png\" width=\"1006\" height=\"864\" \/><\/p>\n<p>Here we have the same data as before, but presented in a much different way, with the labels and lines formatted specific to our needs. When choosing a chart type and determining how to format it, you should base your decisions on how to present the data in the most effective way possible.<\/p>\n<h2>Adding multiple subplots to a line chart<\/h2>\n<p>In the previous examples, the data was based on two types of data: territories and sales. There might be times, however, when you want to visualize more complex relationships, such as being able to view the amount of sales per territory per month. One way you can do this is to add multiple subplots (layers) to a chart in order to compare different groups of data. For example, the following Python script creates a series of line subplots that are merged into one chart:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import matplotlib, calendar, numpy modules\r\nimport matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\nimport calendar as cl\r\nimport numpy as np\r\n\r\n# define the df data frame\r\ndf = InputDataSet.groupby([\"Territories\", \"OrderMonths\"], as_index=False).sum()\r\ndf.columns = [\"Territories\", \"OrderMonths\", \"TotalSales\"]\r\ndf[\"OrderMonths\"] = df[\"OrderMonths\"].apply(lambda x: cl.month_abbr[x])\r\n\r\n# create territory-based line chart\r\nax1 = plt.subplot()\r\nts = list(df[\"Territories\"].unique())\r\npt = None\r\nif len(ts) &gt; 0:\r\n\tfor t in ts:\r\n\t\tpt = df.loc[df[\"Territories\"] == t].plot(x=\"OrderMonths\", y=\"TotalSales\", \r\n\t\t  ax=ax1, label=t, alpha=.8, linewidth=2)\r\nelse:\r\n\tprint(\"The list of territories is empty.\")\r\n\tquit()\r\n\r\n# configure title, legend, and grid\r\npt.set_title(label=\"Sales by Territory\", y=1.04, \r\n  family=\"fantasy\", fontsize=14, weight=500, color=\"navy\")\r\npt.legend(loc=7, bbox_to_anchor=(1.4, .5))\r\npt.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\n\r\n# format Y axis\r\npt.set_ylabel(\"Total Sales\", labelpad=20, fontsize=12, color=\"navy\")\r\npt.set_yticklabels(labels=df[\"TotalSales\"], fontsize=9, color=\"navy\")\r\npt.get_yaxis().set_major_formatter(matplotlib.ticker.FuncFormatter\r\n  (lambda x, p: format(int(x), \",\")))\r\n\r\n# format X axis\r\npt.set_xlabel(\"Sales per Month\", labelpad=20, fontsize=12, color=\"navy\")\r\npt.set_xticks(np.arange(12))\r\npt.set_xticklabels(labels=df[\"OrderMonths\"], fontsize=9, color=\"navy\")\r\n\r\n# save line chart to PDF file\r\nplt.savefig(\"c:\\\\datafiles\\\\TerritorySales06.pdf\", \r\n  bbox_inches=\"tight\", pad_inches=.8)';\r\n\r\nDECLARE @sqlscript NVARCHAR(MAX);\r\nSET @sqlscript = N'\r\n  SELECT t.Name AS Territories, MONTH(h.OrderDate) AS OrderMonths,\r\n    CAST(h.Subtotal AS FLOAT) AS Sales\r\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\r\n    ON h.TerritoryID = t.TerritoryID\r\n  WHERE YEAR(h.OrderDate) = 2013;';\r\n\r\nEXEC sp_execute_external_script\r\n  @language = N'Python',\r\n  @script = @pscript,\r\n  @input_data_1 = @sqlscript;     \r\nGO\r\n<\/pre>\n<p>This first item worth noting is that the T-SQL script now returns the numeric month values for the dates in the <strong>OrderDate<\/strong> column. As for the Python script, it starts just like the previous examples, by importing the <strong>matplotlib<\/strong> modules, but then adds two more modules: <strong>calendar<\/strong> and <strong>numpy<\/strong>:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">import calendar as cl\r\nimport numpy as np\r\n<\/pre>\n<p>The <strong>calendar<\/strong> module provides tools for working with date values, and the <strong>numpy<\/strong> module provides tools for scientific computing. The Python script includes two statements that leverage these modules. The first is when defining the <strong>df<\/strong> data frame:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">df = InputDataSet.groupby([\"Territories\", \"OrderMonths\"], as_index=False).sum()\r\ndf.columns = [\"Territories\", \"OrderMonths\", \"TotalSales\"]\r\ndf[\"OrderMonths\"] = df[\"OrderMonths\"].apply(lambda x: cl.month_abbr[x])\r\n<\/pre>\n<p>Unlike the previous examples, the initial statement uses the <strong>groupby<\/strong> function to group the data based on both the <strong>Territories<\/strong> and <strong>OrderMonths<\/strong> columns. In addition, the function call also includes the <strong>as_index<\/strong> parameter and sets its value to <strong>False<\/strong>. As a result, <strong>Territories<\/strong> and <strong>OrderMonths<\/strong> will be added to the data frame as columns, rather than being part of the index.<\/p>\n<p>The second statement specifies a name for each column in the data frame, and the third statement changes the <strong>OrderMonths<\/strong> integer values to their month abbreviations, using the <strong>month_abbr<\/strong> function from the <strong>calendar<\/strong> module. (For more information about the elements within these three statements, refer back to the second article in this series.)<\/p>\n<p>With the data frame in place, you\u2019re ready to define your chart. The goal here is to create a chart that includes a plot line for each territory, which requires that you define a subplot for each territory. To do so, start by using the <strong>subplot<\/strong> function available to the <strong>pyplot<\/strong> framework to create an <strong>AxesSubplot<\/strong> object:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  ax1 = plt.subplot()\r\n<\/pre>\n<p>You will use the <strong>ax1<\/strong> object in a <strong>for<\/strong> loop to create the subplots. But first, you must create a list of unique values from the <strong>Territories<\/strong> column and assign the list to the <strong>ts<\/strong> variable:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  ts = list(df[\"Territories\"].unique())\r\n<\/pre>\n<p>The next step is to define the <strong>pt<\/strong> variable with a value of <strong>None<\/strong> so we can pass the variable into and out of the <strong>for<\/strong> loop:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt = None\r\n<\/pre>\n<p>With these three variables defined, you can create an <strong>if\u2026else<\/strong> statement block that includes a <strong>for<\/strong> loop for iterating through the list of territories:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">if len(ts) &gt; 0:\r\n\tfor t in ts:\r\n\t\tpt = df.loc[df[\"Territories\"] == t].plot(x=\"OrderMonths\", y=\"TotalSales\", \r\n\t\t  ax=ax1, label=t, alpha=.8, linewidth=2)\r\nelse:\r\n\tprint(\"The list of territories is empty.\")\r\n\tquit()<\/pre>\n<p>The <strong>if\u2026else<\/strong> statement block verifies whether the <strong>ts<\/strong> variable contains values. If it does, the <strong>for<\/strong> loop iterates through each territory (<strong>t in ts<\/strong>). For each iteration, the statement filters the data frame so it includes only the sales data for that territory (<strong>df.loc[df[&#8220;Territories&#8221;] == t]<\/strong>).<\/p>\n<p>The <strong>for<\/strong> loop statement then calls the <strong>plot<\/strong> function to build the territory\u2019s subplot. The first two parameters passed into the <strong>plot<\/strong> function specify that the <strong>OrderMonths<\/strong> values be used for the X-axis and the <strong>TotalSales<\/strong> values be used for the Y-axis.<\/p>\n<p>The next parameter, <strong>ax<\/strong>, is set to the <strong>ax1<\/strong> variable, which tells the Python engine to use the <strong>ax1<\/strong> object to create the subplot. The <strong>label<\/strong> property then assigns the current territory name (<strong>t<\/strong>) to the subplot, which will be used for the legend entry associated with this territory. The <strong>alpha<\/strong> and <strong>linewidth<\/strong> properties set the transparency and line width, as you\u2019ve seen before.<\/p>\n<p>If the <strong>if\u2026else<\/strong> condition determines that the <strong>ts<\/strong> variable does not contain values, the statements in the <strong>else<\/strong> section run, in which case a message is returned to the console and the script quits executing.<\/p>\n<p>Note that Python uses indentation to control the logic flow within statement blocks, rather than brackets or other elements, so be sure not to add tabs arbitrarily to the beginning of your statements.<\/p>\n<p>The next section in the Python script formats the title, legend, and grid:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">pt.set_title(label=\"Sales by Territory\", y=1.04, \r\n  family=\"fantasy\", fontsize=14, weight=500, color=\"navy\")\r\npt.legend(loc=7, bbox_to_anchor=(1.4, .5))\r\npt.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\n<\/pre>\n<p>The main difference from previous examples is that this time you don\u2019t remove the legend. Instead, you use the <strong>legend<\/strong> function to move the legend off to the right side. The function takes two parameters. The <strong>loc<\/strong> parameter provides the legend\u2019s general position. The supported values range from <strong>0<\/strong> to <strong>10<\/strong>, with <strong>7<\/strong> locating the legend to the right side of the chart, in the center. To move the legend off the chart completely, you must also include the <strong>bbox_to_anchor<\/strong> parameter, which defines the location of the legend\u2019s bounding box, relative to the <strong>loc<\/strong> setting. You\u2019ll likely have to play with these settings from one chart to the next to get the position just right.<\/p>\n<p>The rest of the Python script is similar to the previous examples, except for one statement related to the X-axis:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">  pt.set_xticks(np.arange(12))\r\n<\/pre>\n<p>The statement calls the <strong>set_xticks<\/strong> function, which defines where the ticks appear along the X-axis. In this case, the function takes as its argument the <strong>arange<\/strong> function available through the <strong>numpy<\/strong> module. The <strong>arange<\/strong> function returns evenly spaced values within a given interval. Because the X-axis is based on sales months, a value of <strong>12<\/strong> is passed into the function to indicate that a tick should be included for each month. Otherwise, the axis displays only six ticks, one for every other month.<\/p>\n<p>The Python script generates the chart shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-76527\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/01\/20screenshots3-stpython3_fig06-png.png\" alt=\"%20screenshots3\/stPython3_fig06.png\" width=\"1004\" height=\"665\" \/><\/p>\n<p>The chart now includes a plot line for each territory. The legend, which is to the right of the chart, maps back to the plot lines, based on the color automatically assigned to each territory.<\/p>\n<h2>Visualizing SQL Server data<\/h2>\n<p>The <strong>matplotlib<\/strong> module supports many more types of visualizations than what we covered here, and for each visualization, you have a high degree of control over how those visualizations are formatted. Of course, Microsoft also offers such tools as Power BI and SQL Server Reporting Services for creating visualizations, but MLS and Python make it simple to generate graphics as part of your analytics, without needing to add another layer of complexity to your projects. Once you understand the basics of how to plot data in MLS and Python, you have a powerful set of tools for both analyzing and visualizing SQL Server data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the advantages of running Python from SQL Server is the ability to create graphics to assist in analysis of data. Robert Sheldon demonstrates matplotlib, a 2D plotting library, widely used with Python to create quality charts.&hellip;<\/p>\n","protected":false},"author":221841,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143528,146042],"tags":[5134],"coauthors":[6779],"class_list":["post-76519","post","type-post","status-publish","format-standard","hentry","category-bi-sql-server","category-python","tag-sql-prompt"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/76519","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221841"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=76519"}],"version-history":[{"count":10,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/76519\/revisions"}],"predecessor-version":[{"id":76540,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/76519\/revisions\/76540"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=76519"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=76519"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=76519"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=76519"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}