The importance of bringing Python to SQL Server

Guest post

This is a guest post from Coeo. Europe's most trusted analytics and data management expert, Coeo is the number one provider of database strategy in the Retail, Financial services and Gaming industries, and delivers technology strategy and support for businesses who need to get the most from their data.

The Coeo team hold more Microsoft certifications than any other data platform specialist in Europe and are passionate about sharing their knowledge and expertise to help customers become industry leaders.

Python and SQL ServerWhen SQL Server 2017 launched, Microsoft proudly announced that it was the first relational database engine to ship with built-in artificial intelligence (AI) capabilities.

In the future, database servers running its software won’t just be able to store, query, and analyse data. Instead, they’ll also be able to perform machine learning, natural language, and neural network processing – the foundations of modern AI. To make this happen, Microsoft has built support for the Python programming language into its database engine. But why Python and SQL Server? What is Python? And how does this compare to the R integration in SQL Server 2016?

What is Python?

Python is a general-purpose programming language, named after Monty Python. It favors code readability over formality, can be extended to do almost anything using libraries of additional functionality, and is popular with a very broad range of audiences. In case it risks sounding too lightweight to be powerful, history says otherwise – the Reddit.com web site is written using it and Google created a Python library that scales data science calculations to concurrently use thousands of processors. What makes it stand out from the crowd right now is the quantity and quality of AI related libraries for it. Very complex Artificial Intelligence functionality, such as machine learning, is available in relatively easy to consume add-ons, such as those listed here.

Why integrate Python and SQL Server?

One of the most likely reasons why Microsoft has now integrated Python into SQL Server is because it’s increasingly an application developer’s preferred language for performing AI operations. It’s broad language capabilities make it much easier for them to embed data analytics, machine learning and AI functionality into an existing application or database’s code – and then do something with the output, such as reformat it for further use.

This is a significant difference compared to other analytical languages, such as R, which are primarily used by data scientists to discover and analyse data in their own time, rather than perform near-real-time operations within an application. That’s not to say they couldn’t do that because they can, but it doesn’t change the fact that they’re more suited to being used for offline analysis rather than playing a part in application activity.

If anything sums the difference up best, then it’s a saying that the more you use the R language, the more of a PhD data scientist you are, while the more Python you use, the more of an engineer or developer you are. Similar capabilities yet different audiences, which is why Microsoft is now offering access to both.

How has Microsoft integrated it?

Microsoft has used the same integration approach for Python with SQL Server as it did for R. In a nutshell, that means the Python computation processes run outside of the database engine, but are connected to it using a very fast, secure and proprietary in-memory communications channel. This channel is used to send data from the database to the Python processes and return the results back to SQL Server. The Python code users want to execute, just like R code, is passed as a parameter to a T-SQL stored procedure. This means any application that can execute T-SQL can now also execute Python.

Finally, just like it did with R, Microsoft has provided a series of RevoScalePy packages which allow Python operations to use parallel processing and data sets without size limits.

Bringing the compute to the data

Despite historic reservations, bringing the compute to the data is the new model for enterprise-scale data analytics, machine learning and AI. Processor power in most database servers is now plentiful compared to just a few years ago, while the security and performance implications of sending data in and out of a database server are now far more problematic.

Another benefit is that this model allows all of the data in a database to be used as an AI model’s training data, whereas in data sent to the remote compute model, network bandwidth and remote compute resources often limited the amount, and therefore the accuracy, of the training data analyzed.

How soon?

Bringing Python capabilities to SQL Server is almost certainly a long term investment for Microsoft, just like integrating R was. It’ll take time for Python developers to consider how they can use a SQL Server platform while it’ll take time for SQL Server developers to consider how they can use Python. But neither are going away and innovation trends show us AI is being used in more and more everyday situations. It’s worth then remembering the capability is there.

Further reading

If you’re new to Python, this informative blog post from Letzgro is a great introduction to why and how to use Python for AI.

If you’d like to compare the capabilities of Python and R and appreciate how they’re intended to be used, EliteDataScience has a perfect summary of both approaches.

Coeo is a Microsoft Gold Partner for Data Platform, Data Analytics and Cloud Platform.
We work collaboratively to improve project outcomes and develop deeper knowledge
and appreciation of Microsoft technologies, improving our customers’ experience.
Find out more about us.