{"id":76167,"date":"2017-11-30T15:04:19","date_gmt":"2017-11-30T15:04:19","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=76167"},"modified":"2026-04-15T19:08:44","modified_gmt":"2026-04-15T19:08:44","slug":"machine-learning-services-part-1-python","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/bi-sql-server\/machine-learning-services-part-1-python\/","title":{"rendered":"SQL Server Machine Learning Services: Python with sp_execute_external_script"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-executive-summary\">Executive Summary<\/h2>\n\n\n\n<p><strong>SQL Server Machine Learning Services allows you to run Python (and R) scripts directly within SQL Server using the sp_execute_external_script stored procedure. This eliminates the need to export data from SQL Server before running Python analysis &#8211; the Python script runs in the SQL Server process context, with your SQL query results passed in as the input dataset and the Python output returned as a SQL result set. Part 1 covers the fundamentals: enabling Machine Learning Services, calling sp_execute_external_script, naming the input and output dataset variables, manipulating the pandas data frame, and wrapping Python execution in a user-defined stored procedure for reuse. Requires SQL Server 2016 or later with Machine Learning Services installed.<\/strong><\/p>\n\n\n<h4>The series so far:<\/h4>\n<ol>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/machine-learning-services-part-1-python\/\">SQL Server Machine Learning Services \u2013 Part 1: Python Basics<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-2-python-data-frames\/\">SQL Server Machine Learning Services \u2013 Part 2: Python Data Frames<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-3-plotting-data-python\/\">SQL Server Machine Learning Services \u2013 Part 3: Plotting Data with Python<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-4-finding-data-anomalies-python\/\">SQL Server Machine Learning Services \u2013 Part 4: Finding Data Anomalies with Python<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-5-generating-multiple-plots-python\/\">SQL Server Machine Learning Services \u2013 Part 5: Generating multiple plots in Python<\/a><\/li>\n<li><a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/bi\/sql-server-machine-learning-services-part-6-merging-data-frames-python\/\">SQL Server Machine Learning Services \u2013 Part 6: Merging Data Frames in Python<\/a><\/li>\n<\/ol>\n\n\n\n\n<p>With the release of SQL Server 2017, Microsoft changed the name of R Services to Machine Learning Services (MLS) and added support for Python, a widely implemented programming language known for its straightforward syntax and code readability. As with the R language, you can use Python to transform and analyze data within the context of a SQL Server database and then return the modified data to a calling application.<\/p>\n\n\n\n<p>In this article, I explain how to run Python scripts within SQL Server. I start by introducing the <strong>sp_execute_external_script<\/strong> stored procedure and then dive into specifics about working with Python scripts. If you\u2019re familiar with running R scripts in SQL Server, you already have the foundation you need to get started with Python. You use the same stored procedure and its parameters, merely swapping out one programming language for the other. The differences between the two approaches are primarily within the languages themselves.<\/p>\n\n\n\n<p>This article is the first in a series about MLS and Python. In it, we\u2019ll review several examples that demonstrate how to run Python scripts, using data from the <strong>AdventureWorks2017<\/strong> database. The examples are very basic and meant only to introduce you to the fundamentals of working with Python in SQL Server. In subsequent articles, we\u2019ll dig into Python\u2019s capabilities in more depth, relying on this article as a foundation for the ones to follow.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-getting-started-with-python-scripting\">Getting started with Python scripting<\/h2>\n\n\n\n<p>To run a Python script in SQL Server, use the <strong>sp_execute_external_script<\/strong> stored procedure. First you must ensure that the database engine\u2019s MLS and Python components are installed, and the external scripting feature is enabled. For details on how to get all this set up, see the Microsoft document <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/advanced-analytics\/python\/setup-python-machine-learning-services\">Set up Python Machine Learning Services (In-Database)<\/a>.<\/p>\n\n\n\n<p>After MLS has been installed and enabled, you can use the <strong>sp_execute_external_script<\/strong> stored procedure to run Python scripts (or R scripts, for that matter). The following syntax shows the procedure\u2019s basic elements:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sp_execute_external_script\n    @language = N'language',\n    @script = N'script' [,]\n    [ @input_data_1 = 'SelectStatement' [,] ]\n    [ @input_data_1_name =  N'InputDataVariable' [,] ]\n    [ @output_data_1_name = N'OutputDataVariable' [,] ]\n    [ @params =  N'@ParamName DataType [ OUT | OUTPUT ] [ ,...n ]' [,] ]\n    [ @ParamName = 'value' [ OUT | OUTPUT ] [ ,...n ] ]\n    [ WITH RESULT SETS\n      { UNDEFINED | NONE | (( OutputColumnName [,...n ] )) } ] [;]<\/pre>\n\n\n\n<p>You\u2019ll learn the specifics of the syntax as you work through the examples in this article. Note, however, that the SQL Server document <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/system-stored-procedures\/sp-execute-external-script-transact-sql\">sp_execute_external_script (Transact-SQL)<\/a> no longer includes the <strong>WITH<\/strong> <strong>RESULTS<\/strong> <strong>SETS<\/strong> clause as part of the syntax as it has in the past. Instead, the document points to the <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/t-sql\/language-elements\/execute-transact-sql\">EXECUTE (Transact-SQL)<\/a> topic for details on how to use this clause. I\u2019ve included it here for completeness, but it\u2019s actually part of the <strong>EXECUTE<\/strong> command.<\/p>\n\n\n\n<p>To demonstrate how to run the <strong>sp_execute_external_script<\/strong> stored procedure, start with a simple example that retrieves sales data from the <strong>AdventureWorks2017<\/strong> database, passes the data into a Python script, and then returns the data without modifying it:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Use AdventureWorks2017;\nGO\n\n-- define Python script\nDECLARE @pscript NVARCHAR(MAX);\nSET @pscript = N'\n# assign SQL Server dataset to df variable\ndf = InputDataSet\n\n# return df dataset\nOutputDataSet = df';\n\n-- define T-SQL query\nDECLARE @sqlscript NVARCHAR(MAX);\nSET @sqlscript = N'\n  SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\n      ON h.TerritoryID = t.TerritoryID;';\n\n-- run procedure, using Python script and T-SQL query\nEXEC sp_execute_external_script\n  @language = N'Python',\n  @script = @pscript,\n  @input_data_1 = @sqlscript;    \nGO\n<\/pre>\n\n\n\n<p>The example can be broken into three distinct steps:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Declare the <strong>@pscript<\/strong> variable, define the Python script, and assign the script to the variable.<\/li>\n\n\n\n<li>Declare the <strong>@sqlscript<\/strong> variable, define the T-SQL query, and assign the query to the variable.<\/li>\n\n\n\n<li>Run the <strong>sp_execute_external_script<\/strong> stored procedure, passing in the <strong>@pscript<\/strong> and <strong>@sqlscript<\/strong> variables. You don\u2019t have to pass the Python script and T-SQL query as variables, but it makes the code more readable.<\/li>\n<\/ul>\n<\/div>\n\n\n<p>Begin with the first step. After declaring the <strong>@pscript<\/strong> variable, you create the Python script, which starts by assigning the output from the T-SQL query to the <strong>df<\/strong> variable:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  df = InputDataSet\n<\/pre>\n\n\n\n<p>By default, the data returned by the T-SQL query can be accessed within the Python script by calling the <strong>InputDataSet<\/strong> variable. In this way, you can assign the SQL Server dataset to the <strong>df<\/strong> variable and then work with that dataset as necessary. As you\u2019ll see later in the article, you can also specify another name to replace the <strong>InputDataSet<\/strong> variable, but for now, stick with the default.<\/p>\n\n\n\n<p>Next, assign the <strong>df<\/strong> dataset to the <strong>OutputDataSet<\/strong> variable:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  OutputDataSet = df\n<\/pre>\n\n\n\n<p>The <strong>OutputDataSet<\/strong> variable is the default variable used to return the script\u2019s data to the calling application. As with the <strong>InputDataSet<\/strong> variable, you can change the name, but for now, use the default.<\/p>\n\n\n\n<p>In this case, the script does nothing more than return the same data that you\u2019re retrieving from SQL Server, something you could have done without using a Python script. In fact, you do not even have to include the <strong>df<\/strong> variable and can instead simply assign the input variable to the output variable:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  OutputDataSet = InputDataSet\n<\/pre>\n\n\n\n<p>I\u2019ve included the <strong>df<\/strong> variable here to help demonstrate how to begin working with SQL Server data within a Python script. More often than not, you\u2019ll want to manipulate and analyze the SQL Server data. Assigning the source data to a separate variable allows you to modify the source data, while still maintaining the original dataset. It\u2019s up to you to decide the best way to move data from one object to the next, based on your specific circumstances.<\/p>\n\n\n\n<p>Next, define the T-SQL query and assign it to the <strong>@sqlscript<\/strong> variable. In this case, the query is just a simple join, but you can make it as complex as it needs to be. One item worth noting, however, is that the code converts the <strong>Sales<\/strong> column to the <strong>FLOAT<\/strong> data type. This is because Python supports a limited number of types when compared to SQL Server, and the Python engine can implicitly convert only some of the SQL Server types.<\/p>\n\n\n\n<p>For example, the <strong>Territories<\/strong> column is configured with the <strong>NVARCHAR<\/strong> data type. When the Python script runs, the Python engine implicitly converts the column to the Python <strong>str<\/strong> type. However, the <strong>Sales<\/strong> column is configured with the <strong>MONEY<\/strong> type. The Python engine cannot implicitly convert this type, so you must explicitly convert it to one that the engine can handle. In this case, you\u2019re converting the <strong>Sales<\/strong> column to <strong>FLOAT<\/strong>, which the Python engine can then convert to the Python <strong>float64<\/strong> type. (For more information on the SQL Server types that Python supports, see the Microsoft document <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/advanced-analytics\/python\/python-libraries-and-data-types\">Python Libraries and Data Types<\/a>.)<\/p>\n\n\n\n<p>After defining the Python script and T-SQL query, you can call the <strong>sp_execute_external_script<\/strong> stored procedure, passing in the necessary parameter values. For the <strong>@language<\/strong> parameter, specify <strong>Python<\/strong>. If you were working with an R script, you would pass in <strong>R<\/strong> as the value. Currently, those are the only two values that the parameter supports.<\/p>\n\n\n\n<p>For the <strong>@script<\/strong> parameter, specify the <strong>@pscript<\/strong> variable, and for the <strong>@input_data_1<\/strong> parameter, specify the <strong>@sqlscript<\/strong> variable. The <strong>@input_data_1<\/strong> parameter is optional. You do not have to include SQL Server data in the Python script, but if you\u2019re going to be running Python within the SQL Server environment, it seems likely you\u2019ll want to use SQL Server data.<\/p>\n\n\n\n<p>With the parameters in place, you can call the stored procedure. When you do, the Python engine runs the Python script, using the SQL Server data, and returns the dataset to the calling application. The following figure shows the first chunk of rows returned by the script.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"255\" height=\"243\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/11\/20screenshots1-stpython1_fig01-png-1.png\" alt=\"%20screenshots1\/stPython1_fig01.png\" class=\"wp-image-76189\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>As you can see, the Python script returns a basic result set with the data organized into columns and rows, just like the source data. You might have noticed, however, that the returned data includes no column names. For that, you need to specifically assign names to the returned data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-assigning-names-to-python-script-objects\">Assigning names to Python script objects<\/h2>\n\n\n\n<p>When using the <strong>sp_execute_external_script<\/strong> stored procedure to run Python scripts, you can specify names for the default variables assigned to the input and output datasets. You can also specify column names for the outputted dataset. The following procedure call assigns the name <strong>RawData<\/strong> to the input variable, the name <strong>SumData<\/strong> to the output variable, and the names <strong>Territories<\/strong> and <strong>Sales<\/strong> to the output columns:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">DECLARE @pscript NVARCHAR(MAX);\nSET @pscript = N'\n# assign SQL Server dataset to df\ndf = RawData\n\n# return df dataset\nSumData = df';\n\nDECLARE @sqlscript NVARCHAR(MAX);\nSET @sqlscript = N'\n  SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\n      ON h.TerritoryID = t.TerritoryID;';\n\nEXEC sp_execute_external_script\n  @language = N'Python',\n  @script = @pscript,\n  @input_data_1 = @sqlscript,\n  @input_data_1_name = N'RawData', --input variable name \n  @output_data_1_name = N'SumData' --output variable name\n  WITH RESULT SETS(\n    (Territories NVARCHAR(50), Sales MONEY)); --output column names \nGO\n<\/pre>\n\n\n\n<p>To rename the input variable, you must include the <strong>@input_data_1_name<\/strong> parameter when calling the <strong>sp_execute_external_script<\/strong> stored procedure, assigning the new name to that parameter. You must then use the name within the Python script when referencing the input dataset:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  df = RawData\n<\/pre>\n\n\n\n<p>To rename the output variable, you must include the <strong>@output_data_1_name<\/strong> parameter when calling the stored procedure, assigning the new name to that parameter. You must then use that name within the Python script when referencing the output dataset:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  SumData = df\n<\/pre>\n\n\n\n<p>The next step is to assign names to the columns. To do this, include the <strong>WITH<\/strong> <strong>RESULT<\/strong> <strong>SETS<\/strong> clause when calling the stored procedure, specifying the column names and their data types. The list of column names must be enclosed in double parentheses with the names separated by commas. SQL Server will then assign those names to the output dataset in the order they\u2019re specified, as shown in the following results.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"197\" height=\"246\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/11\/20screenshots1-stpython1_fig02-png-1.png\" alt=\"%20screenshots1\/stPython1_fig02.png\" class=\"wp-image-76190\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>When assigning column names, keep in mind that the database engine must be able to implicitly convert the returned data to that type. For example, in the <strong>WITH<\/strong> <strong>RESULT<\/strong> <strong>SETS<\/strong> clause above, you assign the <strong>Territories<\/strong> name to the first column, specifying the <strong>NVARCHAR(50)<\/strong> data type. If you had instead specified <strong>CHAR(5)<\/strong>, many of the column values would have been truncated. If you had specified <strong>INT <\/strong>for<strong> Sales<\/strong>, the database engine would have returned an error.<\/p>\n\n\n\n<p>With regard to the <strong>Sales<\/strong> column, SQL Server easily converts the Python <strong>float64<\/strong> type to the SQL Server <strong>MONEY<\/strong> type without changing the data. You can get away with this because the values are already limited to four decimal places when you import the data into the Python script. If there had been a greater number of decimal places in the data returned by the Python script, the values would have been rounded when they were converted. Often this will not be a problem, depending on the level of precision that\u2019s required, but it points to the fact that, whenever data is implicitly converted, you need to be aware of its potential impact.<\/p>\n\n\n\n<p>When it comes to the output dataset, it\u2019s up to you whether to specify column names. You\u2019ll likely want to decide on a case-by-case basis, depending on your application needs.<\/p>\n\n\n\n<p>The same goes for renaming the input and output variables. There might be times when renaming them can be useful for clarity, especially if you\u2019re importing additional datasets. Often, however, it\u2019s just a matter of preference. Personally, I prefer not to rename the input and output variables if there\u2019s no overriding reason to do so. I find it easier to review code that uses the default names.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-manipulating-the-python-data-frame\">Manipulating the Python data frame<\/h2>\n\n\n\n<p>If you\u2019ve worked with the R language in SQL Server, you\u2019re no doubt familiar with the concept of the data frame, a two-dimensional data structure similar to a SQL Server table. Data frames make it easier to manipulate and analyze sets of data within your scripts. Python data frames operate much the same way as they do in the R language and are essential to working with the data within the context of SQL Server.<\/p>\n\n\n\n<p>The <strong>InputDataSet<\/strong> and <strong>OutputDataSet<\/strong> variables both store data as data frames. This is true even if you rename the variables. The variables are both <strong>DataFrame<\/strong> objects, based on the <strong>DataFrame<\/strong> class. The SQL Server data imported into a Python script is converted to a <strong>DataFrame<\/strong> object, and the data returned by a Python script is passed to the output variable as a <strong>DataFrame<\/strong> object. And much of what you do with that data in between often involves <strong>DataFrame<\/strong> objects.<\/p>\n\n\n\n<p>A simple way to confirm that you\u2019re working with data frames is to modify the first example to return only the data type of the <strong>df<\/strong> variable:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">DECLARE @pscript NVARCHAR(MAX);\nSET @pscript = N'\n# assign SQL Server dataset to df\ndf = InputDataSet\n\n# retrieve type of df object\nprint(type(df))';\n\nDECLARE @sqlscript NVARCHAR(MAX);\nSET @sqlscript = N'\n  SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\n    ON h.TerritoryID = t.TerritoryID;';\n\nEXEC sp_execute_external_script\n  @language = N'Python',\n  @script = @pscript,\n  @input_data_1 = @sqlscript;    \nGO\n<\/pre>\n\n\n\n<p>To get the data type, you call the <strong>type<\/strong> function, specifying the <strong>df<\/strong> variable, and then call the <strong>print<\/strong> function to return the data type to the application. When I ran this script in SQL Server Management Studio, I received the following message:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">STDOUT message(s) from external script: \n&lt;class 'pandas.core.frame.DataFrame'&gt;\n<\/pre>\n\n\n\n<p>As you can see, the <strong>df<\/strong> variable is a <strong>DataFrame<\/strong> object, which it inherited from the <strong>InputDataSet<\/strong> variable. Notice, however, that the namespace is actually <strong>pandas.core.frame.DataFrame<\/strong>. This is because the <strong>DataFrame<\/strong> class is part of the Pandas library which is included in the MLS Python installation. We\u2019ll be covering the Pandas library in more detail in subsequent articles. Until then, you can learn more about Pandas at <a href=\"https:\/\/pandas.pydata.org\/\">https:\/\/pandas.pydata.org\/<\/a>.<\/p>\n\n\n\n<p>Now try to do something with the data you\u2019re importing from SQL Server. You\u2019ll start by aggregating the <strong>Sales<\/strong> values to determine the total amount of sales per territory:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">DECLARE @pscript NVARCHAR(MAX);\nSET @pscript = N'\n# calculate sales totals for each territory\ndf1 = InputDataSet\ndf2 = df1.groupby(\"Territories\", as_index=False).sum()\n\n# return aggregated dataset\nOutputDataSet = df2';\n\nDECLARE @sqlscript NVARCHAR(MAX);\nSET @sqlscript = N'\n  SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\n    ON h.TerritoryID = t.TerritoryID;';\n\nEXEC sp_execute_external_script\n  @language = N'Python',\n  @script = @pscript,\n  @input_data_1 = @sqlscript\n  WITH RESULT SETS(\n    (Territories NVARCHAR(50), TotalSales MONEY));    \nGO\n<\/pre>\n\n\n\n<p>To aggregate the data in the <strong>df1<\/strong> data frame, call the object\u2019s <strong>groupby<\/strong> function which is a member of the Pandas <strong>DataFrame<\/strong> class:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  df2 = df1.groupby(\"Territories\", as_index=False).sum()\n<\/pre>\n\n\n\n<p>When calling the <strong>groupby<\/strong> function, specify the <strong>df1<\/strong> variable, followed by a period and then the function name. Although the function can take several arguments, this example includes only two. The first is the <strong>Territories<\/strong> column which indicates that the data will be grouped based on values in that column. The second argument is the <strong>as_index<\/strong> option which is set to <strong>False<\/strong>. This tells the Python engine to return the <strong>Territories<\/strong> values, ensuring that the unique <strong>Territories<\/strong> names are included in the results along with the aggregated sales data.<\/p>\n\n\n\n<p>After specifying the <strong>groupby<\/strong> arguments, tag on the <strong>sum<\/strong> function to indicate that the <strong>Sales<\/strong> values should be added together for each group. You can use any of the supported aggregate functions, such as <strong>mean<\/strong>, <strong>max<\/strong>, <strong>min<\/strong>, <strong>first<\/strong>, or <strong>last<\/strong>.<\/p>\n\n\n\n<p>Because the <strong>Sales<\/strong> column is the only column in the dataset other than <strong>Territories<\/strong>, you do not need to do anything else with the function itself. Had the dataset contained additional columns that you did not want to include, you would have had to specify the <strong>Sales<\/strong> column when calling the <strong>groupby<\/strong> function.<\/p>\n\n\n\n<p>Next, assign the aggregated results to the <strong>df2<\/strong> variable and then assign that variable to the <strong>OutputDataSet<\/strong> variable, giving the following results.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"241\" height=\"240\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/11\/20screenshots1-stpython1_fig04-png-1.png\" alt=\"%20screenshots1\/stPython1_fig04.png\" class=\"wp-image-76191\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The next step is to add a calculated column to the <strong>df2<\/strong> data frame that rates a territory\u2019s performance based on the aggregated <strong>Sales<\/strong> values. To do so, modify the data frame by defining the range of each predefined value:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">DECLARE @pscript NVARCHAR(MAX);\nSET @pscript = N'\n# calculate sales totals for each territory\ndf1 = InputDataSet\ndf2 = df1.groupby(\"Territories\", as_index=False).sum()\n\n# add Performance column\ndf2[\"Performance\"] = \"Acceptable\"\ndf2[\"Performance\"][df2[\"Sales\"] &lt; 7000000] = \"Poor\"\ndf2[\"Performance\"][df2[\"Sales\"] &gt; 10000000] = \"Excellent\"\n\n# return aggregated dataset\nOutputDataSet = df2';\n\nDECLARE @sqlscript NVARCHAR(MAX);\nSET @sqlscript = N'\n  SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\n  FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\n    ON h.TerritoryID = t.TerritoryID;';\n\nEXEC sp_execute_external_script\n  @language = N'Python',\n  @script = @pscript,\n  @input_data_1 = @sqlscript\n  WITH RESULT SETS(\n    (Territories NVARCHAR(50), TotalSales MONEY, Performance NVARCHAR(20)));    \nGO\n<\/pre>\n\n\n\n<p>Start by defining a new column named <strong>Performance<\/strong> and assigning the default value <strong>Acceptable<\/strong> to the column:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  df2[\"Performance\"] = \"Acceptable\"\n<\/pre>\n\n\n\n<p>To add the column, you need only specify the <strong>df2<\/strong> variable, followed by brackets that include the name of the new column in quotes. You then use an equal operator (<strong>=<\/strong>) to set the column\u2019s default value to <strong>Acceptable<\/strong>.<\/p>\n\n\n\n<p>Next, modify the column definition to include a second possible value, using a conditional expression to define a data range for when to apply that value:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  df2[\"Performance\"][df2[\"Sales\"] &lt; 7000000] = \"Poor\"\n<\/pre>\n\n\n\n<p>Again, start by specifying the <strong>df2<\/strong> variable and the name of the column in brackets. Next, define the conditional expression, enclosed in a second set of brackets. The expression states that if the <strong>Sales<\/strong> value is less than 7 million, the <strong>Performance<\/strong> value should be set to <strong>Poor<\/strong>. You then repeat this process to add a third value, only this time the conditional expression indicates that, if the <strong>Sales<\/strong> value exceeds 10 million, the <strong>Performance<\/strong> value should be set to <strong>Excellent<\/strong>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  df2[\"Performance\"][df2[\"Sales\"] &gt; 10000000] = \"Excellent\"\n<\/pre>\n\n\n\n<p>That\u2019s all there is to adding a calculated column to the data frame. The Python script now returns the results shown in the following figure.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"309\" height=\"247\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/11\/20screenshots1-stpython1_fig05-png-1.png\" alt=\"%20screenshots1\/stPython1_fig05.png\" class=\"wp-image-76192\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Of course, you could have achieved the same results with T-SQL alone, but even such a simple example helps demonstrate some of the ways that you can manipulate data to achieve various results. In practice, your Python scripts will likely be much more complex if you are to perform meaningful analytics.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-running-a-python-script-within-a-stored-procedure\">Running a Python script within a stored procedure<\/h2>\n\n\n\n<p>Often when you\u2019re working with Python, you\u2019ll want to call the <strong>sp_execute_external_script<\/strong> stored procedure from within a user-defined stored procedure that can then be evoked from a calling application. To do so, simply embed the T-SQL and Python code and <strong>sp_execute_external_script<\/strong> procedure call in the procedure definition, as shown in the following example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">DROP PROCEDURE IF EXISTS dbo.GetSalesTotals;\nGO\n\nCREATE PROCEDURE dbo.GetSalesTotals\nAS\n  SET NOCOUNT ON;\n\n  DECLARE @pscript NVARCHAR(MAX);\n  SET @pscript = N'\n# calculate sales totals for each territory\ndf1 = InputDataSet\ndf2 = df1.groupby(\"Territories\", as_index=False).sum()\n\n# add Performance column\ndf2[\"Performance\"] = \"Acceptable\"\ndf2[\"Performance\"][df2[\"Sales\"] &lt; 7000000] = \"Poor\"\ndf2[\"Performance\"][df2[\"Sales\"] &gt; 10000000] = \"Excellent\"\n\n# return aggregated dataset\nOutputDataSet = df2';\n\n  DECLARE @sqlscript NVARCHAR(MAX);\n  SET @sqlscript = N'\n    SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\n    FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\n      ON h.TerritoryID = t.TerritoryID;';\n\n  EXEC sp_execute_external_script\n    @language = N'Python',\n    @script = @pscript,\n    @input_data_1 = @sqlscript\n    WITH RESULT SETS(\n      (Territories NVARCHAR(50), TotalSales MONEY, Performance NVARCHAR(20)));  \nGO\n<\/pre>\n\n\n\n<p>There\u2019s no magic here, just a basic definition that creates the <strong>GetSalesTotals<\/strong> procedure which calls the <strong>sp_execute_external_script<\/strong> procedure and passes in the Python script and T-SQL query as we\u2019ve seen in previous examples. You can then use an <strong>EXECUTE<\/strong> statement to run the procedure:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  EXEC dbo.GetSalesTotals;\n<\/pre>\n\n\n\n<p>The procedure will return the same results as the previous example, only now you\u2019re able to persist the logic. You can also define a stored procedure to include input parameters whose values can then be consumed by the Python script:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">DROP PROCEDURE IF EXISTS dbo.GetSalesTotals;\nGO\n\nCREATE PROCEDURE dbo.GetSalesTotals\n  (@low INT, @high INT) --add procedure parameters\nAS\n  SET NOCOUNT ON;\n\n  DECLARE @pscript NVARCHAR(MAX);\n  SET @pscript = N'\n# calculate sales totals for each territory\ndf1 = InputDataSet\ndf2 = df1.groupby(\"Territories\", as_index=False).sum()\n\n# add Performance column\ndf2[\"Performance \"] = \"Acceptable\"\ndf2[\"Performance \"][df2[\"Sales\"] &lt; LowSales] = \"Poor\"\ndf2[\"Performance \"][df2[\"Sales\"] &gt; HighSales] = \"Excellent\"\n\n# return aggregated dataset\nOutputDataSet = df2';\n\n  DECLARE @sqlscript NVARCHAR(MAX);\n  SET @sqlscript = N'\n    SELECT t.Name AS Territories, CAST(h.Subtotal AS FLOAT) AS Sales\n    FROM Sales.SalesOrderHeader h INNER JOIN Sales.SalesTerritory t\n      ON h.TerritoryID = t.TerritoryID;';\n\n  EXEC sp_execute_external_script\n    @language = N'Python',\n    @script = @pscript,\n    @input_data_1 = @sqlscript,\n    @params = N'@LowSales FLOAT, @HighSales FLOAT', --define Python variables\n    @LowSales = @low, --assign procedure parameter to Python variable\n    @HighSales = @high --assign procedure parameter to Python variable\n    WITH RESULT SETS(\n      (Territories NVARCHAR(50), TotalSales MONEY, Ratings NVARCHAR(20)));  \nGO\n<\/pre>\n\n\n\n<p>This time, when defining the stored procedure, include two input parameters, <strong>@low<\/strong> and <strong>@high<\/strong>, both defined with the <strong>INT<\/strong> data type. The first parameter will set the low range for the calculated column, and the second parameter will set the high range.<\/p>\n\n\n\n<p>You then need to tie these parameters to the Python script. To do so, include three additional parameters when calling the <strong>sp_execute_external_script<\/strong> stored procedure. The first is the <strong>@params<\/strong> parameter, which specifies the <strong>@LowSales<\/strong> and <strong>@HighSales<\/strong> parameters. These are used to pass values into the Python script. Both parameters are defined with the <strong>FLOAT<\/strong> data type.<\/p>\n\n\n\n<p>You must then assign the <strong>@low<\/strong> and <strong>@high<\/strong> parameters to the <strong>@LowSales<\/strong> and <strong>@HighSales<\/strong> parameters, respectively:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">@LowSales = @low,\n@HighSales = @high\n<\/pre>\n\n\n\n<p>You can then reference the <strong>@LowSales<\/strong> and <strong>@HighSales<\/strong> parameters inside the Python script using the names <strong>LowSales<\/strong> and <strong>HighSales<\/strong>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df2[\"Ratings\"][df2[\"Sales\"] &lt; LowSales] = \"Poor\"\ndf2[\"Ratings\"][df2[\"Sales\"] &gt; HighSales] = \"Excellent\"\n<\/pre>\n\n\n\n<p>Now when you call the stored procedure, you can pass in the low and high values used to define the ranges in the <strong>Performance<\/strong> column:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  EXEC dbo.GetSalesTotals @low = 7000000, @high = 10000000;\n<\/pre>\n\n\n\n<p>In this case, you\u2019re specifying the same values that you used in the previous example, so you will receive that same results. However, you can also specify different values:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  EXEC dbo.GetSalesTotals @low = 5000000, @high = 20000000;\n<\/pre>\n\n\n\n<p>The stored procedure now returns the results shown in the following figure which indicate that only one territory is performing poorly and only one above average.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"308\" height=\"244\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/11\/20screenshots1-stpython1_fig06-png-1.png\" alt=\"%20screenshots1\/stPython1_fig06.png\" class=\"wp-image-76193\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Although this is a fairly trivial example, it demonstrates how you can dynamically interact with the Python script by passing in values when calling stored procedures in order to better control the results.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-getting-to-know-python-and-machine-learning-services\">Getting to know Python and Machine Learning Services<\/h2>\n\n\n\n<p>Clearly, we\u2019ve barely skimmed the surface of all we can do with Python in SQL Server. Python is a flexible and extensive language that provides a wide range of options for manipulating and analyzing data. Best of all, being able to run Python within SQL Server makes it possible to analyze a database\u2019s data directly within the context of that database.<\/p>\n\n\n\n<p>As we progress through this series, we\u2019ll dig more into the Python language and the various ways we can use it to manipulate data and perform analytics. In the meantime, this article should provide you with a good foundation for getting started with Python in SQL Server. We\u2019ll be building on this foundation going forward. I also encourage you to dig into Python yourself and play with it in the context of SQL Server. The better you understand that language, the more powerful a tool you\u2019ll have for working with SQL Server data.<\/p>\n\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: SQL Server Machine Learning Services \u2013 Part 1: Python Basics<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. What is SQL Server Machine Learning Services?<\/h3>\n            <div class=\"faq-answer\">\n                <p>SQL Server Machine Learning Services is a feature that runs Python and R scripts in-process within SQL Server, using the sp_execute_external_script stored procedure. It allows data scientists and developers to run analytics, machine learning models, and data transformation scripts directly against SQL Server data without exporting the data first. Available from SQL Server 2016 (R only) and SQL Server 2017+ (Python and R). Requires the Machine Learning Services feature to be selected during SQL Server installation and enabled via sp_configure.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. How do I enable Python in SQL Server Machine Learning Services?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Run sp_configure &#8216;external scripts enabled&#8217;, 1; RECONFIGURE WITH OVERRIDE; on the SQL Server instance. Then restart the SQL Server service. Verify with: EXEC sp_execute_external_script @language = N&#8217;Python&#8217;, @script = N&#8217;print(1)&#8217;; &#8211; if it returns without error, Python is active. If the feature was not installed during SQL Server setup, run the SQL Server installer and add Machine Learning Services (In-Database) with the Python option selected.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. How do I pass SQL Server data to a Python script?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Use the @input_data_1 parameter of sp_execute_external_script with a SELECT query: EXEC sp_execute_external_script @language = N&#8217;Python&#8217;, @script = N&#8217;print(InputDataSet.describe())&#8217;, @input_data_1 = N&#8217;SELECT TOP 10 * FROM Sales.Orders&#8217;; By default, the input dataset is available as InputDataSet (a pandas DataFrame) in the Python script. The output dataset is OutputDataSet. Rename these with the @input_data_1_name and @output_data_1_name parameters to make the code self-documenting.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">4. Can I use pandas DataFrames in SQL Server Python scripts?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Yes. SQL Server Machine Learning Services includes pandas by default. The input dataset (@input_data_1) is automatically provided as a pandas DataFrame to your script. You can manipulate it with any pandas operation &#8211; filtering, grouping, merging, adding columns &#8211; and then assign the result to OutputDataSet (a pandas DataFrame) which is returned as a SQL result set. For packages not included by default, use sqlmlutils to install additional Python packages to the SQL Server Python environment.<\/p>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Run Python scripts in SQL Server using sp_execute_external_script in Machine Learning Services. Part 1 covers enabling MLS, naming input\/output variables, manipulating pandas data frames, and calling Python from stored procedures.&hellip;<\/p>\n","protected":false},"author":221841,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":true,"footnotes":""},"categories":[143528,146042],"tags":[5134],"coauthors":[6779],"class_list":["post-76167","post","type-post","status-publish","format-standard","hentry","category-bi-sql-server","category-python","tag-sql-prompt"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/76167","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221841"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=76167"}],"version-history":[{"count":10,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/76167\/revisions"}],"predecessor-version":[{"id":109858,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/76167\/revisions\/109858"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=76167"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=76167"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=76167"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=76167"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}