{"id":77149,"date":"2018-02-07T14:20:49","date_gmt":"2018-02-07T14:20:49","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=77149"},"modified":"2022-04-27T21:22:37","modified_gmt":"2022-04-27T21:22:37","slug":"sql-server-machine-learning-services-part-5-generating-multiple-plots-python","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/sql-server-machine-learning-services-part-5-generating-multiple-plots-python\/","title":{"rendered":"SQL Server Machine Learning Services \u2013 Part 5: Generating Multiple Plots in Python"},"content":{"rendered":"<h4>The series so far:<\/h4>\n<ol>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/machine-learning-services-part-1-python\/\">SQL Server Machine Learning Services \u2013 Part 1: Python Basics<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-2-python-data-frames\/\">SQL Server Machine Learning Services \u2013 Part 2: Python Data Frames<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-3-plotting-data-python\/\">SQL Server Machine Learning Services \u2013 Part 3: Plotting Data with Python<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-4-finding-data-anomalies-python\/\">SQL Server Machine Learning Services \u2013 Part 4: Finding Data Anomalies with Python<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-5-generating-multiple-plots-python\/\">SQL Server Machine Learning Services \u2013 Part 5: Generating multiple plots in Python<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-6-merging-data-frames-python\/\">SQL Server Machine Learning Services \u2013 Part 6: Merging Data Frames in Python<\/a><\/li>\n<\/ol>\n\n<p>SQL Server Machine Learning Services (MLS) offers a wide range of options for working with the Python language within the context of a SQL Server database. This series has covered a number of those options, with the goal of familiarizing you with some of Python\u2019s many capabilities. One of the most important of these capabilities, along with analyzing data, is being able to visualize data. You were introduced to many of these concepts in the third article, which focused on how to generate individual charts. This article takes that discussion a step further by demonstrating how to generate multiple charts that together provide a complete picture of the underlying data.<\/p>\n<p>The examples in this article are based on data from the Titanic dataset, available as a .csv file from the site <a href=\"https:\/\/vincentarelbundock.github.io\/Rdatasets\/datasets.html\">https:\/\/vincentarelbundock.github.io\/Rdatasets\/datasets.html<\/a>. There you can find an assortment of sample datasets, available in both .csv and .doc formats. This is a great resource to keep in mind as you\u2019re trying out various MLS features, whether using the R language or Python.<\/p>\n<p>The Titanic dataset includes a passenger list from the Titanic\u2019s infamous voyage. For each passenger, the data includes the individual\u2019s name, age, gender, service class and whether the passenger survived. The following figure shows a sample of the data.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1014\" height=\"682\" class=\"wp-image-77150\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/20screenshots5-stpython5_fig01-png.png\" alt=\"%20screenshots5\/stPython5_fig01.png\" \/><\/p>\n<p>A value of <strong>1<\/strong> in the <strong>Survived<\/strong> column indicates that the passenger survived the voyage. A value of <strong>0<\/strong> indicates that the passenger did not. The <strong>SexCode<\/strong> column is merely a binary duplicate of the <strong>Sex<\/strong> column. A value of <strong>0<\/strong> indicates that the passenger was male, and a value of <strong>1<\/strong> indicates that the passenger was female.<\/p>\n<p>I created the examples for this article in SQL Server Management Studio (SSMS). Each example retrieves data from a locally saved file named <strong>titanic.csv<\/strong>, which contains the Titanic dataset. In most cases, if you\u2019re running a Python script within the context of the SQL Server databases, you\u2019ll likely want to use data from that database, either instead of or in addition to a .csv data, but the principles covered in this article apply regardless of where the data originates, and using a .csv file helps to keep things simple, while demonstrating several important concepts, including how to retrieve data from that file.<\/p>\n<h2>Retrieving data from a .csv file<\/h2>\n<p>Python, with the help of the <strong>pandas<\/strong> module, makes it easy to retrieve data from a .csv file and save the data to a <strong>DataFrame<\/strong> object. The <strong>pandas<\/strong> module contains a variety of tools for creating and working with data frames, including the <strong>read_csv<\/strong> function, which does exactly what the name suggests, reads data from a .csv file.<\/p>\n<p>The Python script in following example uses the function to first retrieve data from the <strong>titanic.csv<\/strong> file and then to update the resulting dataset in preparation for grouping the data for the visualizations:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import pandas module\r\nimport pandas as pd\r\n# define the titanic data frame\r\ntitanic = pd.read_csv(\"c:\\\\datafiles\\\\titanic.csv\", index_col=0)\r\ntitanic = titanic[titanic[\"Age\"].notnull()]\r\ntitanic[\"Survived\"] = titanic[\"Survived\"].map({0:\"deceased\", 1:\"survived\"})\r\nfor i, row in titanic.iterrows():\r\n\tif (row[\"Age\"].is_integer() == False):\r\n\t\ttitanic.drop(i, inplace=True)\r\nprint(titanic)';\r\nEXEC sp_execute_external_script\r\n\u00a0 @language = N'Python',\r\n\u00a0 @script = @pscript;\u00a0 \u00a0 \u00a0\r\nGO<\/pre>\n<p>The <strong>sp_execute_external_script<\/strong> stored procedure, which is used to run Python and R scripts, should be well familiar to you by now. If you have any questions about the procedure works, refer to the first article in this series. In this article, we focus exclusively on the Python script.<\/p>\n<p>The first statement in the script imports the <strong>pandas<\/strong> module and assigns the <strong>pd<\/strong> alias:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\"> import pandas as pd<\/pre>\n<p>You can then use the <strong>pd<\/strong> alias to call the <strong>read_csv<\/strong> function, passing in the full path to the <strong>titanic.csv<\/strong> file as the first parameter value:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\"> titanic = pd.read_csv(\"c:\\\\datafiles\\\\titanic.csv\", index_col=0)<\/pre>\n<p>On my system, the <strong>titanic.csv<\/strong> file is saved to the <strong>c:\\datafiles\\<\/strong> folder. Be sure to update the path as necessary if you plan to try out this example or those that follow.<\/p>\n<p>The second function parameter is <strong>index_col<\/strong>, which tells the function to use the specified column as the index for the outputted data frame. The value <strong>0<\/strong> refers to the first column, which is unnamed in the Titanic dataset. Without this parameter, the data frame\u2019s index would be auto-generated.<\/p>\n<p>When the <strong>read_csv<\/strong> function runs, it retrieves the data from the file and saves it to a <strong>DataFrame<\/strong> object, which, in this case, is assigned to the <strong>titanic<\/strong> variable. You can then use this variable to modify the data frame or access its data. For example, the original dataset includes a number of passengers whose ages were unknown, listed as <strong>NA<\/strong> in the .csv file. You might choose to remove those rows from the dataset, which you can do by using the <strong>notnull<\/strong> function:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\"> titanic = titanic[titanic[\"Age\"].notnull()]<\/pre>\n<p>To use the <strong>notnull<\/strong> function, you merely append it to the column name. Notice, however, that you must first call the dataset and then, within brackets, reference the dataset again, along with the column name in its own set of brackets.<\/p>\n<p>Perhaps the bigger issue here is whether it\u2019s even necessary remove these rows. It turns out that over 500 rows have unknown ages, a significant portion of the dataset. (The original dataset includes over 1300 rows.) The examples later in article will use the <strong>groupby<\/strong> function to group the data and find the average ages in each group, along with the number of passengers. When aggregating grouped data, the <strong>groupby<\/strong> function automatically excludes null values, so in this case, the aggregated results will be the same whether or not you include the preceding statement.<\/p>\n<p>I\u2019ve included the statement here primarily to demonstrate how to remove rows that contain null values. (The subsequent examples also include this statement so they\u2019re consistent as you build on them from one example to the next.) One advantage to removing the rows with null values is that the resulting dataset is smaller, which can be beneficial when working with large datasets. Keep in mind, however, that modifying datasets can sometimes impact the analytics or visualizations based on those datasets. Always be wary of how you manipulate data.<\/p>\n<p>The next statement in the example replaces the numeric values in the <strong>Survived<\/strong> column with the values <strong>deceased<\/strong> and <strong>survived<\/strong>:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\"> titanic[\"Survived\"] = titanic[\"Survived\"].map({0:\"deceased\", 1:\"survived\"})<\/pre>\n<p>To change values in this way, use the <strong>map<\/strong> function to specify the old and new values. You must enclose the assignments in curly braces, separating each set with a comma. For the assignments themselves, you separate the old value from the new value with a colon.<\/p>\n<p>The next part of the Python script includes a <strong>for<\/strong> statement block that removes rows containing <strong>Age<\/strong> values that are not whole numbers:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">for i, row in titanic.iterrows():\r\nif (row[\"Age\"].is_integer() == False):\r\n\ttitanic.drop(i, inplace=True) <\/pre>\n<p>The Titanic dataset uses fractions for infant ages, and you might decide not to include these rows in your data. As with the <strong>notnull<\/strong> function, I\u2019ve included this logic primarily to demonstrate a useful concept, in this case, how you can loop through the rows in a dataset to take an action on each row.<\/p>\n<p>The initial <strong>for<\/strong> statement specifies the <strong>i<\/strong> and <strong>row<\/strong> variables, which represent the current index value and dataset row in each of the loop\u2019s iterations. The statement also calls the <strong>iterrows<\/strong> function on the <strong>titanic<\/strong> data frame for iterating through the rows.<\/p>\n<p>For each iteration, the <strong>for<\/strong> statement block runs an <strong>if<\/strong> statement, which evaluates whether the <strong>Age<\/strong> value for the current row is an integer, using the <strong>is_integer<\/strong> function. If the function evaluates to <strong>False<\/strong>, the value is not an integer, in which case the <strong>drop<\/strong> function is used to remove that row identified by the current <strong>i<\/strong> value. When using the <strong>drop<\/strong> function in this way, be sure to include the <strong>inplace<\/strong> parameter, set to <strong>True<\/strong>, to ensure that the data frame is updated in place, rather than copied.<\/p>\n<p>Once again, you should determine whether you do, in fact, want to remove these rows. In this case, the aggregated passenger totals will be impacted by their removal because there will be eight fewer rows, so be sure to tread carefully when making these sorts of changes.<\/p>\n<p>The final statement in the Python script uses the <strong>print<\/strong> function to return the <strong>titanic<\/strong> data to the SSMS console:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">print(titanic)<\/pre>\n<p>The following figure shows a portion of the results. Note that, when returning a dataset that is considered wide, Python wraps the columns, as indicated by the backslash to the right of the column names. As a result, only the index and the first four columns are included in this particular view.<\/p>\n<p><img decoding=\"async\" class=\"wp-image-77151\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/20screenshots5-stpython5_fig02-png.png\" alt=\"%20screenshots5\/stPython5_fig02.png\" width=\"600\" \/><\/p>\n<p>If you scroll down the results, you\u2019ll find the remaining columns, similar to those shown in the following figure.<\/p>\n<p><img decoding=\"async\" class=\"wp-image-77152\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/20screenshots5-stpython5_fig03-png.png\" alt=\"%20screenshots5\/stPython5_fig03.png\" width=\"200\" \/><\/p>\n<p>Notice that the index value for the first row is <strong>1<\/strong>. This is the first value from the first column from the original dataset. If the index for the <strong>titanic<\/strong> data frame had been auto-generated, the first value would be <strong>0<\/strong>.<\/p>\n<h2>Grouping and aggregating the data<\/h2>\n<p>Once you get the <strong>titanic<\/strong> data frame where you want it, you can use the <strong>groupby<\/strong> function to group and aggregate the data. The following example uses the function to group the data first by the <strong>PClass<\/strong> column, then the <strong>Sex<\/strong> column, and finally the <strong>Survived<\/strong> column:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import pandas module\r\nimport pandas as pd\r\n# define the titanic data frame\r\ntitanic = pd.read_csv(\"c:\\\\datafiles\\\\titanic.csv\", index_col=0)\r\ntitanic = titanic[titanic[\"Age\"].notnull()]\r\ntitanic[\"Survived\"] = titanic[\"Survived\"].map({0:\"deceased\", 1:\"survived\"})\r\nfor i, row in titanic.iterrows():\r\n\tif (row[\"Age\"].is_integer() == False):\r\n\t\ttitanic.drop(i, inplace=True)\r\n# define the df data frame based on titanic data\r\ndf = titanic.groupby([\"PClass\", \"Sex\", \"Survived\"],\u00a0\r\n\u00a0 as_index=False).agg({\"Age\": [\"mean\", \"count\"]})\r\ndf[\"Age\"] = df[\"Age\"].round(2)\r\ndf.columns = [\"class\", \"gender\", \"status\", \"age\", \"count\"]\r\nprint(df)';\r\nEXEC sp_execute_external_script\r\n\u00a0 @language = N'Python',\r\n\u00a0 @script = @pscript;\u00a0 \u00a0 \u00a0\r\nGO<\/pre>\n<p>You call the <strong>groupby<\/strong> function by tagging it onto the <strong>titanic<\/strong> data frame, specifying the three columns on which to base the grouping (in the order that grouping should occur):<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">df = titanic.groupby([\"PClass\", \"Sex\", \"Survived\"],\u00a0\r\nas_index=False).agg({\"Age\": [\"mean\", \"count\"]})<\/pre>\n<p>After specifying the grouped columns and <strong>as_index<\/strong> parameter, you add the <strong>agg<\/strong> function, which allows you to find the mean and count of the <strong>Age<\/strong> column values. (The <strong>groupby<\/strong> function is covered extensively in the first three articles of this series, so refer to them for more details about the function\u2019s use.)<\/p>\n<p>The <strong>groupby<\/strong> function results are assigned to the <strong>df<\/strong> variable. From there, you can use the variable to call the <strong>Age<\/strong> column in order to round the column\u2019s values to two decimal points:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\"> df[\"Age\"] = df[\"Age\"].round(2)<\/pre>\n<p>As you can see, you need only call the <strong>round<\/strong> function and pass in <strong>2<\/strong> as the parameter argument. You can then assign names to the columns in the <strong>df<\/strong> data frame:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\"> df.columns = [\"class\", \"gender\", \"status\", \"age\", \"count\"]<\/pre>\n<p>That\u2019s all you need to do to prepare the data for generating visualizations. The last statement in the example again uses the <strong>print<\/strong> function to return the data from the <strong>df<\/strong> data frame to the SSMS console. The following figure shows the results.<\/p>\n<p><img decoding=\"async\" class=\"wp-image-77153\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/20screenshots5-stpython5_fig04-png.png\" alt=\"%20screenshots5\/stPython5_fig04.png\" width=\"400\" \/><\/p>\n<p>You now have a dataset that is grouped first by class, then by gender, and finally by the survival status. If the rows with fractional <strong>Age<\/strong> values had not been removed, the passenger counts would be slightly different.<\/p>\n<h2>Generating a bar chart<\/h2>\n<p>From the <strong>df<\/strong> data frame you can create multiple charts that provide different insights into the data, but before we dive into those, let\u2019s start with a single chart. The following example creates a bar chart specific to the first-class passengers, showing the average age of the males and females who survived and who did not:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk \">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import Python modules\r\nimport matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\nimport matplotlib.patches as ptch\r\nimport pandas as pd\r\n# define the titanic data frame\r\ntitanic = pd.read_csv(\"c:\\\\datafiles\\\\titanic.csv\", index_col=0)\r\ntitanic = titanic[titanic[\"Age\"].notnull()]\r\ntitanic[\"Survived\"] = titanic[\"Survived\"].map({0:\"deceased\", 1:\"survived\"})\r\nfor i, row in titanic.iterrows():\r\n\tif (row[\"Age\"].is_integer() == False):\r\n\t\ttitanic.drop(i, inplace=True)\r\n# define the df data frame based on titanic data\r\ndf = titanic.groupby([\"PClass\", \"Sex\", \"Survived\"],\u00a0\r\n\u00a0 as_index=False).agg({\"Age\": [\"mean\", \"count\"]})\r\ndf[\"Age\"] = df[\"Age\"].round(2)\r\ndf.columns = [\"class\", \"gender\", \"status\", \"age\", \"count\"]\r\n# define the subplot data frame based on df data\r\ndf1 = df.loc[(df[\"class\"] == \"1st\"), [\"gender\", \"status\", \"age\"]]\r\n# define variables for the legend\r\ndc = ptch.Patch(color=\"navy\", label=\"deceased\")\r\nsv = ptch.Patch(color=\"darkgreen\", label=\"survived\")\r\n# define the ax1 bar chart\r\nax1 = df1.plot.bar(x=\"gender\", y=\"age\", alpha=.8,\u00a0\r\n\u00a0 color=df1.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax1.set_title(label=\"First-class passengers\", y=1.02)\r\nax1.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax1.xaxis.label.set_visible(False)\r\nax1.set_ylabel(\"Average ages\", labelpad=10, fontsize=12)\r\nax1.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax1.legend(handles=[dc, sv], loc=\"best\")\r\n# save the bar chart to PDF file\r\nplt.savefig(\"c:\\\\datafiles\\\\titanic1.pdf\",\u00a0\r\n\u00a0 bbox_inches=\"tight\", pad_inches=.5)';\r\nEXEC sp_execute_external_script\r\n\u00a0 @language = N'Python',\r\n\u00a0 @script = @pscript;\u00a0 \u00a0 \u00a0\r\nGO<\/pre>\n<p>The Python script starts with several new <strong>import<\/strong> statements related to the <strong>matplotlib<\/strong> module, along with the <strong>use<\/strong> function, which sets <strong>PDF<\/strong> as the backend:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">import matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\nimport matplotlib.patches as ptch<\/pre>\n<p>The first three statements should be familiar to you. If they\u2019re not, refer back to the third article. The fourth statement is new. It imports the <strong>patches<\/strong> package that is included with the <strong>matplotlib<\/strong> module. The <strong>patches<\/strong> package contains tools for refining the various charts. In this case, you\u2019ll be using it to customize the chart\u2019s legend.<\/p>\n<p>The next step is to create a data subset based on the <strong>df<\/strong> data frame:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">df1 = df.loc[(df[\"class\"] == \"1st\"), [\"gender\", \"status\", \"age\"]]<\/pre>\n<p>The <strong>df1<\/strong> data frame filters out all rows except those specific to first-class passengers. It also filters out all but the <strong>gender<\/strong>, <strong>status<\/strong>, and <strong>age<\/strong> columns. Filtering out the data in advance helps to simplify the code used to create the actual plot.<\/p>\n<p>The next step is to define the property values for the legend. For this, you use the <strong>Patch<\/strong> function that is part of the <strong>patches<\/strong> package:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">dc = ptch.Patch(color=\"navy\", label=\"deceased\")\r\nsv = ptch.Patch(color=\"darkgreen\", label=\"survived\")<\/pre>\n<p>The goal is to provide a legend that is consistent with the colors of the chart\u2019s bars, which will be color-coded to reflect whether a group represents passengers that survived or passengers that did not. You\u2019ll use the <strong>dc<\/strong> and <strong>sv<\/strong> variables when you define your chart\u2019s properties.<\/p>\n<p>Once you have all the pieces in place, you can create your bar chart, using the <strong>plot.bar<\/strong> function available to the <strong>df1<\/strong> data frame:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">ax1 = df1.plot.bar(x=\"gender\", y=\"age\", alpha=.8,\u00a0\r\ncolor=df1.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))<\/pre>\n<p>You\u2019ve seen most of these components before. The chart uses the <strong>gender<\/strong> column for the X-axis and the <strong>age<\/strong> column for the Y-axis. The <strong>alpha<\/strong> parameter sets the transparency to 80%. What\u2019s different here is the <strong>color<\/strong> parameter. The value uses the <strong>map<\/strong> function, called from the <strong>status<\/strong> column, to set the bar\u2019s color to navy if the value is <strong>deceased<\/strong> and to dark green if the value is <strong>survived<\/strong>.<\/p>\n<p>From here, you can configure the other chart\u2019s properties, using the <strong>ax1<\/strong> variable to reference the chart object:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">ax1.set_title(label=\"First-class passengers\", y=1.02)\r\nax1.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax1.xaxis.label.set_visible(False)\r\nax1.set_ylabel(\"Average ages\", labelpad=10, fontsize=12)\r\nax1.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)<\/pre>\n<p>Most of this should be familiar to you as well. The only new statement is the third one, which uses the <strong>set_visible<\/strong> function to set the visibility of the X-axis label to <strong>False<\/strong>. This prevents the label from being displayed.<\/p>\n<p>The next step is to configure the legend. For this, you call the <strong>legend<\/strong> function, passing in the <strong>dc<\/strong> and <strong>sv<\/strong> variables as values to the <strong>handles<\/strong> parameter:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\"> ax1.legend(handles=[dc, sv], loc=\"best\")<\/pre>\n<p>Notice that the <strong>loc<\/strong> parameter is set to <strong>best<\/strong>. This setting lets Python determine the best location for the legend, based on how the bars are rendered.<\/p>\n<p>The final step is to use the <strong>savefig<\/strong> function to save the chart to a .pdf file:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">plt.savefig(\"c:\\\\datafiles\\\\titanic1.pdf\",\u00a0\r\nbbox_inches=\"tight\", pad_inches=.5)';<\/pre>\n<p>When you run the Python script, it should generate a chart similar to the one shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1240\" height=\"976\" class=\"wp-image-77154\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/20screenshots5-stpython5_fig05-png.png\" alt=\"%20screenshots5\/stPython5_fig05.png\" \/><\/p>\n<p>The chart includes a bar for each first-class group, sized according to the average age of that group. In addition, each bar is color-coded to match the group\u2019s status, with a legend that provides a key for how the colors are used.<\/p>\n<h2>Generating multiple bar charts<\/h2>\n<p>The bar chart above provides a straightforward, easy-to-understand visualization of the underlying data. Restricting the results to the first-class group makes it clear how the data is distributed. We could have incorporated all the data from the <strong>df<\/strong> dataset into the chart, but that might have made the results less clear, although in this case, one chart might have been okay, given that the <strong>df<\/strong> dataset includes only 12 rows.<\/p>\n<p>However, another approach that can be extremely effective when visualizing data is to generate multiple related charts, as shown in the following example:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import Python modules\r\nimport matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\nimport matplotlib.patches as ptch\r\nimport pandas as pd\r\n# define the titanic data frame\r\ntitanic = pd.read_csv(\"c:\\\\datafiles\\\\titanic.csv\", index_col=0)\r\ntitanic = titanic[titanic[\"Age\"].notnull()]\r\ntitanic[\"Survived\"] = titanic[\"Survived\"].map({0:\"deceased\", 1:\"survived\"})\r\nfor i, row in titanic.iterrows():\r\n\tif (row[\"Age\"].is_integer() == False):\r\n\t\ttitanic.drop(i, inplace=True)\r\n# define the df data frame based on titanic data\r\ndf = titanic.groupby([\"PClass\", \"Sex\", \"Survived\"],\u00a0\r\n\u00a0 as_index=False).agg({\"Age\": [\"mean\", \"count\"]})\r\ndf[\"Age\"] = df[\"Age\"].round(2)\r\ndf.columns = [\"class\", \"gender\", \"status\", \"age\", \"count\"]\r\n# define the subplot data frames based on df data\r\ndf1 = df.loc[(df[\"class\"] == \"1st\"), [\"gender\", \"status\", \"age\"]]\r\ndf2 = df.loc[(df[\"class\"] == \"2nd\"), [\"gender\", \"status\", \"age\"]]\r\ndf3 = df.loc[(df[\"class\"] == \"3rd\"), [\"gender\", \"status\", \"age\"]]\r\n# define variables for the legend\r\ndc = ptch.Patch(color=\"navy\", label=\"deceased\")\r\nsv = ptch.Patch(color=\"darkgreen\", label=\"survived\")\r\n# set up the subplot structure\r\nfig, (ax1, ax2, ax3) = plt.subplots(nrows=1,\u00a0\r\n\u00a0 ncols=3, sharey=True, figsize=(12,6))\r\n# define the ax1 subplot\r\nax1 = df1.plot.bar(x=\"gender\", y=\"age\", alpha=.8, ax=ax1,\r\n\u00a0 color=df1.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax1.set_title(label=\"First-class passengers\", y=1.02)\r\nax1.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax1.xaxis.label.set_visible(False)\r\nax1.set_ylabel(\"Average ages\", labelpad=10, fontsize=12)\r\nax1.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax1.legend(handles=[dc, sv], loc=\"best\")\r\n# define the ax2 subplot\r\nax2 = df2.plot.bar(x=\"gender\", y=\"age\", alpha=.8, ax=ax2,\r\n\u00a0 color=df2.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax2.set_title(label=\"Second-class passengers\", y=1.02)\r\nax2.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax2.xaxis.label.set_visible(False)\r\nax2.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax2.legend(handles=[dc, sv], loc=\"best\")\r\n# define the ax3 subplot\r\nax3 = df3.plot.bar(x=\"gender\", y=\"age\", alpha=.8, ax=ax3,\r\n\u00a0 color=df3.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax3.set_title(label=\"Third-class passengers\", y=1.02)\r\nax3.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax3.xaxis.label.set_visible(False)\r\nax3.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax3.legend(handles=[dc, sv], loc=\"best\")\r\n# save the bar chart to PDF file\r\nplt.savefig(\"c:\\\\datafiles\\\\titanic2.pdf\",\u00a0\r\n\u00a0 bbox_inches=\"tight\", pad_inches=.5)';\r\nEXEC sp_execute_external_script\r\n\u00a0 @language = N'Python',\r\n\u00a0 @script = @pscript;\u00a0 \u00a0 \u00a0\r\nGO<\/pre>\n<p>Much of the code in the Python script is the same as the preceding example, with a few notable exceptions. The first is that the script now defines three subsets of data, one for each service class:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">df1 = df.loc[(df[\"class\"] == \"1st\"), [\"gender\", \"status\", \"age\"]]\r\ndf2 = df.loc[(df[\"class\"] == \"2nd\"), [\"gender\", \"status\", \"age\"]]\r\ndf3 = df.loc[(df[\"class\"] == \"3rd\"), [\"gender\", \"status\", \"age\"]]<\/pre>\n<p>The next addition to the script is a statement that creates the structure necessary for rendering multiple charts:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">fig, (ax1, ax2, ax3) = plt.subplots(nrows=1,\u00a0\r\nncols=3, sharey=True, figsize=(12,6))<\/pre>\n<p>The statement uses the <strong>subplots<\/strong> function that is part of the <strong>matplotlib.pyplot<\/strong> package to define a structure that includes three charts, positioned in one row with three columns. The <strong>nrows<\/strong> parameter determines the number of rows, and the <strong>ncols<\/strong> parameter determines the number of columns, resulting in a total of three charts. This will cause the function to generate three <strong>AxesSubplot<\/strong> objects, which are saved to the <strong>ax1<\/strong>, <strong>ax2<\/strong>, and <strong>ax3<\/strong> variables. The function also generates a <strong>Figure<\/strong> object, which is saved to the <strong>fig<\/strong> variable, although you don\u2019t need to do anything specific with that variable going forward.<\/p>\n<p>The third parameter in the <strong>subplots<\/strong> function is <strong>sharey<\/strong>, which is set to <strong>True<\/strong>. This will cause the Y-axis labels associated with the first chart to be shared across the row with all charts, rather than displaying the same labels for each chart. This is followed with the <strong>figsize<\/strong> parameter, which sets this figure\u2019s width and height in inches.<\/p>\n<p>The next step is to define each of three charts. The first definition is similar to the preceding example, with one important difference in the <strong>plot.bar<\/strong> function:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">ax1 = df1.plot.bar(x=\"gender\", y=\"age\", alpha=.8, ax=ax1,\r\ncolor=df1.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))<\/pre>\n<p>The function now includes the <strong>ax<\/strong> parameter, which instructs Python to assign this plot to the <strong>ax1<\/strong> <strong>AxesSubplot<\/strong> object. This ensures that the plot definition is incorporated into the single <strong>fig<\/strong> object.<\/p>\n<p>The remaining chart definitions take the same approach, except that <strong>ax1<\/strong> is swapped out for <strong>ax2<\/strong> and <strong>ax3<\/strong>, respectively, and the <strong>df1<\/strong> data frame is swapped out for <strong>df2<\/strong> and <strong>df3<\/strong>. The two remaining chart definitions also do not define a Y-axis label. The following figure shows the resulting charts.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1712\" height=\"952\" class=\"wp-image-77155\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/20screenshots5-stpython5_fig06-png.png\" alt=\"%20screenshots5\/stPython5_fig06.png\" \/><\/p>\n<p>Notice that only the first chart includes the Y-axis labels and that each chart is specific to the class of service. In addition, the legend is positioned differently within the charts, according to how the bars are rendered.<\/p>\n<p>Breaking data into smaller charts can make the groups easier to grasp and to compare. You can do the same thing with the passenger counts, as shown in the following example:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import Python modules\r\nimport matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\nimport matplotlib.patches as ptch\r\nimport pandas as pd\r\n# define the titanic data frame\r\ntitanic = pd.read_csv(\"c:\\\\datafiles\\\\titanic.csv\", index_col=0)\r\ntitanic = titanic[titanic[\"Age\"].notnull()]\r\ntitanic[\"Survived\"] = titanic[\"Survived\"].map({0:\"deceased\", 1:\"survived\"})\r\nfor i, row in titanic.iterrows():\r\n\tif (row[\"Age\"].is_integer() == False):\r\n\t\ttitanic.drop(i, inplace=True)\r\n# define the df data frame based on titanic data\r\ndf = titanic.groupby([\"PClass\", \"Sex\", \"Survived\"],\u00a0\r\n\u00a0 as_index=False).agg({\"Age\": [\"mean\", \"count\"]})\r\ndf[\"Age\"] = df[\"Age\"].round(2)\r\ndf.columns = [\"class\", \"gender\", \"status\", \"age\", \"count\"]\r\n# define the subplot data frames based on df data\r\ndf1 = df.loc[(df[\"class\"] == \"1st\"), [\"gender\", \"status\", \"count\"]]\r\ndf2 = df.loc[(df[\"class\"] == \"2nd\"), [\"gender\", \"status\", \"count\"]]\r\ndf3 = df.loc[(df[\"class\"] == \"3rd\"), [\"gender\", \"status\", \"count\"]]\r\n# define variables for the legend\r\ndc = ptch.Patch(color=\"navy\", label=\"deceased\")\r\nsv = ptch.Patch(color=\"darkgreen\", label=\"survived\")\r\n# set up the subplot structure\r\nfig, (ax1, ax2, ax3) = plt.subplots(nrows=1,\u00a0\r\n\u00a0 ncols=3, sharey=True, figsize=(12,6))\r\n# define the ax1 subplot\r\nax1 = df1.plot.bar(x=\"gender\", y=\"count\", alpha=.8, ax=ax1,\r\n\u00a0 color=df1.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax1.set_title(label=\"First-class passengers\", y=1.02)\r\nax1.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax1.xaxis.label.set_visible(False)\r\nax1.set_ylabel(\"Total passenger count\", labelpad=10, fontsize=12)\r\nax1.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax1.legend(handles=[dc, sv], loc=\"best\")\r\n# define the ax2 subplot\r\nax2 = df2.plot.bar(x=\"gender\", y=\"count\", alpha=.8, ax=ax2,\r\n\u00a0 color=df2.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax2.set_title(label=\"Second-class passengers\", y=1.02)\r\nax2.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax2.xaxis.label.set_visible(False)\r\nax2.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax2.legend(handles=[dc, sv], loc=\"best\")\r\n# define the ax3 subplot\r\nax3 = df3.plot.bar(x=\"gender\", y=\"count\", alpha=.8, ax=ax3,\r\n\u00a0 color=df3.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax3.set_title(label=\"Third-class passengers\", y=1.02)\r\nax3.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax3.xaxis.label.set_visible(False)\r\nax3.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax3.legend(handles=[dc, sv], loc=\"best\")\r\n# save the bar chart to PDF file\r\nplt.savefig(\"c:\\\\datafiles\\\\titanic3.pdf\",\u00a0\r\n\u00a0 bbox_inches=\"tight\", pad_inches=.5)';\r\nEXEC sp_execute_external_script\r\n\u00a0 @language = N'Python',\r\n\u00a0 @script = @pscript;\u00a0 \u00a0 \u00a0\r\nGO<\/pre>\n<p>This example is very similar to the preceding one, except that the data subsets have been updated to return the <strong>count<\/strong> column, rather than the <strong>age<\/strong> column:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">df1 = df.loc[(df[\"class\"] == \"1st\"), [\"gender\", \"status\", \"count\"]]\r\ndf2 = df.loc[(df[\"class\"] == \"2nd\"), [\"gender\", \"status\", \"count\"]]\r\ndf3 = df.loc[(df[\"class\"] == \"3rd\"), [\"gender\", \"status\", \"count\"]]<\/pre>\n<p>You must also update the <strong>plot.bar<\/strong> function calls that defines the subplots to reflect that the <strong>count<\/strong> column should be used for Y-axis:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">ax1 = df1.plot.bar(x=\"gender\", y=\"count\", alpha=.8, ax=ax1,\r\ncolor=df1.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))<\/pre>\n<p>Also, be sure to update the Y-axis labels:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">ax1.set_ylabel(\"Total passenger count\", labelpad=10, fontsize=12)<\/pre>\n<p>Other than these changes, the rest of the script is the same. Your charts should now look like those shown in the following figure:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1714\" height=\"946\" class=\"wp-image-77156\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/20screenshots5-stpython5_fig07-png.png\" alt=\"%20screenshots5\/stPython5_fig07.png\" \/><\/p>\n<p>Although the charts are similar to those in the preceding example, the data is much different. However, you can still easily find meaning in that data.<\/p>\n<h2>Generating multiple types of bar charts<\/h2>\n<p>You can also mix things up, providing charts that reflect both the average ages and passenger counts within one figure. You need only do some rearranging and make some additions, as shown in the following example:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">DECLARE @pscript NVARCHAR(MAX);\r\nSET @pscript = N'\r\n# import Python modules\r\nimport matplotlib\r\nmatplotlib.use(\"PDF\")\r\nimport matplotlib.pyplot as plt\r\nimport matplotlib.patches as ptch\r\nimport pandas as pd\r\n# define the titanic data frame\r\ntitanic = pd.read_csv(\"c:\\\\datafiles\\\\titanic.csv\", index_col=0) # 1313 rows\r\ntitanic = titanic[titanic[\"Age\"].notnull()] # 756 rows\r\ntitanic[\"Survived\"] = titanic[\"Survived\"].map({0:\"deceased\", 1:\"survived\"})\r\nfor i, row in titanic.iterrows():\r\n\tif (row[\"Age\"].is_integer() == False):\r\n\t\ttitanic.drop(i, inplace=True) # 748 rows\r\n# define the df data frame based on titanic data\r\ndf = titanic.groupby([\"PClass\", \"Sex\", \"Survived\"],\u00a0\r\n\u00a0 as_index=False).agg({\"Age\": [\"mean\", \"count\"]})\r\ndf[\"Age\"] = df[\"Age\"].round(2)\r\ndf.columns = [\"class\", \"gender\", \"status\", \"age\", \"count\"]\r\n# define the subplot data frames based on df data\r\ndf1 = df.loc[(df[\"class\"] == \"1st\"), [\"gender\", \"status\", \"age\"]]\r\ndf2 = df.loc[(df[\"class\"] == \"1st\"), [\"gender\", \"status\", \"count\"]]\r\ndf3 = df.loc[(df[\"class\"] == \"2nd\"), [\"gender\", \"status\", \"age\"]]\r\ndf4 = df.loc[(df[\"class\"] == \"2nd\"), [\"gender\", \"status\", \"count\"]]\r\ndf5 = df.loc[(df[\"class\"] == \"3rd\"), [\"gender\", \"status\", \"age\"]]\r\ndf6 = df.loc[(df[\"class\"] == \"3rd\"), [\"gender\", \"status\", \"count\"]]\r\n# define variables for the legend\r\ndc = ptch.Patch(color=\"navy\", label=\"deceased\")\r\nsv = ptch.Patch(color=\"darkgreen\", label=\"survived\")\r\n# set up the subplot structure\r\nfig, ((ax1, ax2), (ax3, ax4), (ax5, ax6)) = plt.subplots(nrows=3,\u00a0\r\n\u00a0 ncols=2, sharex=True, figsize=(14,12))\r\n# define the ax1 subplot\r\nax1 = df1.plot.bar(x=\"gender\", y=\"age\", alpha=.8, ax=ax1,\r\n\u00a0 color=df1.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax1.set_title(label=\"First-class passengers\", y=1.02)\r\nax1.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax1.xaxis.label.set_visible(False)\r\nax1.set_ylabel(\"Average ages\", labelpad=5, fontsize=12)\r\nax1.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax1.legend(handles=[dc, sv], loc=\"best\")\r\n# define the ax2 subplot\r\nax2 = df2.plot.bar(x=\"gender\", y=\"count\", alpha=.8, ax=ax2,\r\n\u00a0 color=df2.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax2.set_title(label=\"First-class passengers\", y=1.02)\r\nax2.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax2.xaxis.label.set_visible(False)\r\nax2.set_ylabel(\"Total passenger count\", labelpad=5, fontsize=12)\r\nax2.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax2.legend(handles=[dc, sv], loc=\"best\")\r\n# define the ax3 subplot\r\nax3 = df3.plot.bar(x=\"gender\", y=\"age\", alpha=.8, ax=ax3,\r\n\u00a0 color=df3.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax3.set_title(label=\"Second-class passengers\", y=1.02)\r\nax3.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax3.xaxis.label.set_visible(False)\r\nax3.set_ylabel(\"Average ages\", labelpad=5, fontsize=12)\r\nax3.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax3.legend(handles=[dc, sv], loc=\"best\")\r\n# define the ax4 subplot\r\nax4 = df4.plot.bar(x=\"gender\", y=\"count\", alpha=.8, ax=ax4,\r\n\u00a0 color=df4.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax4.set_title(label=\"Second-class passengers\", y=1.02)\r\nax4.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax4.xaxis.label.set_visible(False)\r\nax4.set_ylabel(\"Total passenger count\", labelpad=5, fontsize=12)\r\nax4.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax4.legend(handles=[dc, sv], loc=\"best\")\r\n# define the ax5 subplot\r\nax5 = df5.plot.bar(x=\"gender\", y=\"age\", alpha=.8, ax=ax5,\r\n\u00a0 color=df5.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax5.set_title(label=\"Third-class passengers\", y=1.02)\r\nax5.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax5.xaxis.label.set_visible(False)\r\nax5.set_ylabel(\"Average ages\", labelpad=5, fontsize=12)\r\nax5.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax5.legend(handles=[dc, sv], loc=\"best\")\r\n# define the ax6 subplot\r\nax6 = df6.plot.bar(x=\"gender\", y=\"count\", alpha=.8, ax=ax6,\r\n\u00a0 color=df6.status.map({\"deceased\":\"navy\", \"survived\":\"darkgreen\"}))\r\nax6.set_title(label=\"Third-class passengers\", y=1.02)\r\nax6.set_xticklabels(labels=df1[\"gender\"], fontsize=9, rotation=0)\r\nax6.xaxis.label.set_visible(False)\r\nax6.set_ylabel(\"Total passenger count\", labelpad=5, fontsize=12)\r\nax6.grid(color=\"slategray\", alpha=.5, linestyle=\"dotted\", linewidth=.5)\r\nax6.legend(handles=[dc, sv], loc=\"best\")\r\n# save the bar chart to PDF file\r\nplt.savefig(\"c:\\\\datafiles\\\\titanic4.pdf\",\u00a0\r\n\u00a0 bbox_inches=\"tight\", pad_inches=.5)';\r\nEXEC sp_execute_external_script\r\n\u00a0 @language = N'Python',\r\n\u00a0 @script = @pscript;\u00a0 \u00a0 \u00a0\r\nGO<\/pre>\n<p>This time around, the script includes six subsets of data, three specific to age and three specific to passenger count:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">df1 = df.loc[(df[\"class\"] == \"1st\"), [\"gender\", \"status\", \"age\"]]\r\ndf2 = df.loc[(df[\"class\"] == \"1st\"), [\"gender\", \"status\", \"count\"]]\r\ndf3 = df.loc[(df[\"class\"] == \"2nd\"), [\"gender\", \"status\", \"age\"]]\r\ndf4 = df.loc[(df[\"class\"] == \"2nd\"), [\"gender\", \"status\", \"count\"]]\r\ndf5 = df.loc[(df[\"class\"] == \"3rd\"), [\"gender\", \"status\", \"age\"]]\r\ndf6 = df.loc[(df[\"class\"] == \"3rd\"), [\"gender\", \"status\", \"count\"]]<\/pre>\n<p>The idea here is to include each class of service in its own row, giving us a figure with three rows and two columns. To achieve this, you must update the <strong>subplots<\/strong> function call to reflect the new structure:<\/p>\n<pre class=\"lang:python theme:vs2012 decode:true\">fig, ((ax1, ax2), (ax3, ax4), (ax5, ax6)) = plt.subplots(nrows=3,\u00a0\r\nncols=2, sharex=True, figsize=(14,12))<\/pre>\n<p>In addition to the updated <strong>nrows<\/strong> and <strong>ncols<\/strong> parameter values, the statement also includes six subplot variables, rather than three, with the variables grouped together by row. In addition, the <strong>sharey<\/strong> parameter has been replaced with the <strong>sharex<\/strong> parameter and the figure size increased.<\/p>\n<p>The next step is to define the six subplots, following the same structure as before, but providing the correct data subsets, subplot variables, column names and labels. The result is a single .pdf file with six charts, as shown in the following figure.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1396\" height=\"1212\" class=\"wp-image-77157\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/20screenshots5-stpython5_fig08-png.png\" alt=\"%20screenshots5\/stPython5_fig08.png\" \/><\/p>\n<p>The challenge with this approach is that each chart scales differently. Mixing charts in this way can make it difficult to achieve matching scales. On the plus side, you\u2019re able to visualize a great number of details within a collected set of charts, while still making it possible to quickly compare and understand the data.<\/p>\n<p>The approach taken in this example is not the only way to create these charts. For instance, you might be able to set up a looping structure or use variables for repeated parameter values to simplify the subplot creation, but this approach allows you to see the logic behind each step in the chart-creation process. Like any other aspect of Python, there are often several ways to achieve the same results.<\/p>\n<h2>Python\u2019s expansive world<\/h2>\n<p>As datasets become larger and more complex, the need is greater than ever for tools that can present the data to various stakeholders in ways that are both meaningful and easy to understand. SQL Server\u2019s support for Python and the R language, along with the analytic and visualization capabilities they bring, could prove an invaluable addition to your arsenal for working with SQL Server data.<\/p>\n<p>When it comes to visualizations in particular, Python has a great deal to offer, and what we\u2019ve covered so far barely scratches the surface. Of course, we could say the same thing about most aspects of Python. It is a surprisingly extensive language that is both intuitive and flexible, and can be useful in a wide range of circumstances. Being able to work with Python within the context of a SQL Server database makes it all the better.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Visualization is often the first step in analyzing data. Python makes visualization easy. In this article, Robert Sheldon demonstrates how to generate multiple charts from one dataset using Python with SQL Server Machine Learning Services.&hellip;<\/p>\n","protected":false},"author":221841,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143528,146042],"tags":[5134],"coauthors":[6779],"class_list":["post-77149","post","type-post","status-publish","format-standard","hentry","category-bi-sql-server","category-python","tag-sql-prompt"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77149","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221841"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=77149"}],"version-history":[{"count":12,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77149\/revisions"}],"predecessor-version":[{"id":77644,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77149\/revisions\/77644"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=77149"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=77149"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=77149"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=77149"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}