{"id":657,"date":"2009-08-27T00:00:00","date_gmt":"2009-08-27T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/python-for-the-sql-server-dba\/"},"modified":"2022-04-27T21:24:52","modified_gmt":"2022-04-27T21:24:52","slug":"python-for-the-sql-server-dba","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/database-administration-sql-server\/python-for-the-sql-server-dba\/","title":{"rendered":"Python for the SQL Server DBA"},"content":{"rendered":"<div id=\"PRETTY\">\n<h2>What is Python?<\/h2>\n<p>Python is a dynamic, interpreted language which was originally designed by Guido van Rossum with code readability as one of its primary goals.\u00a0 As part of that, one its most unusual features is the use of meaningful white space. Python uses code blocks that are shown and enforced through indentation.\u00a0 Although some of its extensions and libraries are specific to one operating system, the core language is available for all major operating systems and most Python scripts will run unaltered from one system to another.<\/p>\n<h2>Why would a SQL Server DBA be interested in Python?<\/h2>\n<p>The Microsoft T-SQL programming language that is built into SQL Server along with the capabilities of included tools such as SSIS and SSRS form a remarkably versatile platform without any third party tools whatsoever.\u00a0 In fact, T-SQL itself is Turing complete.<\/p>\n<p>However, they are not the best tool for all tasks.\u00a0 For instance, SQL Server by itself does not have full support for regular expressions. While it is fully capable of interacting with the file system there are generally better tools for any complex manipulation of the operating system.\u00a0 Pure SQL is also not the strongest option for complex mathematics or complicated file munging.<\/p>\n<p>Python is one tool capable of handling those tasks.\u00a0 It is certainly not the only option, with Microsoft C# and PowerShell also being excellent options for some of these tasks.\u00a0 Perl and Lisp are amongst many other options, each with their own strong points.\u00a0\u00a0<\/p>\n<p>Python has several strong points which make it worthy of attention.\u00a0 It is free and open source software, but far more significantly it is easy to learn and easy to read even for people who are not Python programmers.\u00a0 It is capable of being used to write full scale applications and server software, but for most DBAs it is more significant that it is excellent for jotting off quick utility scripts to solve a specific problem.\u00a0 For instance, it is excellent for writing scripts to reorganize files into a new directory structure and\u00a0 automatically verify that all the automated backup files actually are where they are supposed to be each morning.\u00a0<\/p>\n<h2>How Python can be used with SQL Server<\/h2>\n<p>\u00a0Python can be used to parse data that is not in a standard format so it can be imported into SQL Server.<\/p>\n<p class=\"footnote\">This is one of my primary uses for it for in connection with SQL. \u00a0I have received numerous files of data that was either structured in a nonstandard format or completely unstructured but with data such as e-mail addresses that need to be pulled out en mass.\u00a0 While it is possible to do these in SSIS or even pull it into a SQL Temp Table line by line and then parse it using pure T-SQL it is generally easier and faster to use Python or occasionally C# depending on the file type.\u00a0<\/p>\n<p>For instance, a file that is structured like this:<\/p>\n<pre>a, a,\r\nb, b, b,\r\nc, c, c, c\r\nd, d, d\r\ne\r\n<\/pre>\n<p>&#8230; does not work easily with tools such as bulk insert or SSIS. [2] However Python can be used to easily determine the needed number of commas and add them to all lines, such as:<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">#Standardizes the number of commas per line prior to being read by \r\n#programs such as BCP and required for compliance with RFC4180 for CSV files\r\n#initialize the variables, change FileNames as appropriate\r\nInFileName = r'C:\\test\\testimport.csv'\r\nOutFileName = r'C:\\test\\testimport2.csv'\r\nNumCommas = 0 \r\n\r\nFile = open(InFileName)\r\n\r\nfor line in File:\r\n&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;if line.count(',') &gt; NumCommas:\r\n&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;NumCommas = line.count(',')\r\n\r\n#return to the start of the file\r\nFile.seek(0)\r\n\r\nOutFile = open(OutFileName, 'w')\r\nfor line in File:\r\n&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;OutFile.write(line.rstrip() + ',' * (NumCommas - line.count(',')) + '\\n')\r\n\r\nOutFile.close()\r\nFile.close()\r\n\r\n<\/pre>\n<p>Afterwards the output file is quite amenable to Bulk Insert.<\/p>\n<p>There are several choices to actually connect with SQL Server within python.\u00a0 The options include the default odbc which comes as a standard library, the win32com client tools, <a href=\"http:\/\/www.egenix.com\/products\/python\/mxODBC\/\">mxODBC<\/a> (commercial product) and <a href=\"https:\/\/github.com\/mkleehammer\/pyodbc\">pyODBC<\/a>. While the odbc module will handle almost any requirement, the other options add additional features which can simplify development.<\/p>\n<p>Pyodbc, for example, permits access of values by name and makes cursors iterable.\u00a0 These changes from the default odbc package make it much easier to use.\u00a0 As an example,\u00a0 pyodbc can be used to connect to a SQL Server and then extract all e-mail addresses from a free form memo field.<\/p>\n<pre class=\"theme:vs2012 lang:python decode:true\">import pyodbc as p\r\nimport re #RegEx library\r\n\r\nserver = 'TWISEMAN-PC'\r\ndatabase = 'test'\r\ntable = 'salesData'\r\nfield = 'memo'\r\n\r\nconnStr = ( r'DRIVER={SQL Server};SERVER=' +\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0server + ';DATABASE=' + database + ';' +\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0'Trusted_Connection=yes'\u00a0\u00a0\u00a0\u00a0)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\r\n#e-mail regex examples available at\r\n#http:\/\/www.regular-expressions.info\/email.html\r\npattern = r'\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b'\r\nemailMatcher = re.compile(pattern, re.I)\r\nemails = [] #create a list\r\n\r\nconn = p.connect(connStr)\r\ndbCursor = conn.cursor()\r\nsql = ('select ' + field + ' as memo ' \r\n\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0' from '+ table+\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0' where ' + field + ' like \\'%@%\\'') \r\ndbCursor = conn.cursor()\r\ndbCursor.execute(sql)\r\nfor row in dbCursor:\r\n\u00a0\u00a0\u00a0\u00a0newEmails = emailMatcher.finditer(row.memo)\r\n\u00a0\u00a0\u00a0\u00a0if newEmails is not None:\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0emails.extend([addr.group() for addr in newEmails])\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\r\nconn.close()\r\nprint(\u00a0emails)<\/pre>\n<p>A similar script can scan an unstructured text document to extract e-mail addresses and insert them into a SQL table.\u00a0 Of course, it is often more useful to be able to call stored procedures.\u00a0 This can be done with pyodbc by sending the exec command for the procedure with all parameters through the execute method.\u00a0 Many of the libraries automatically begin transactions, so those most be committed or rolled back in the code, though some such as pyodbc permit a connection to be opened in autocommit mode.<\/p>\n<h2>How to acquire Python<\/h2>\n<aside>\n<p><b>Python.Org (not Python.Com)<\/b><\/p>\n<p>Python.Com has no affiliation with the PSF\u00a0<\/p>\n<\/aside>\n<p>Python is free and open source software maintained by the Python Software Foundation (PSF).\u00a0 The latest version can be downloaded directly from the PSF at <a href=\"http:\/\/www.python.org\/\">Python.ORG Python Programming Language &#8212; Official Website<\/a><\/p>\n<p>For many people that is not the best option though.\u00a0 For one thing, depending on what libraries and compatibility with other older code is needed, it may be wise to use a slightly more established version than the\u00a0 latest release.\u00a0\u00a0\u00a0 Currently, both the 2.7.x and 3.x lines are being maintained.\u00a0 The 3.x line is the more modern version with many enhancements, but not all libraries have been brought up to date and it is not entirely backwards compatible with the 2.7 line.\u00a0Python 3 has many, but since it breaks backwards compatibility, a need to work with an older code base or with libraries that have not been updated may make 2.7 the better choice in some cases.<\/p>\n<aside>\n<p><b>Python(x,y), ActivePython or Enthought<\/b><\/p>\n<p>I personally use WinPython. It is entirely a matter of taste though. WinPython is entirely free and every component in it can be downloaded separately. ActivePython and Enthought come in both a free community and Enterprise versions. Sage is a mathematically focused package including python that runs under VirtualBox on Windows.<\/p>\n<\/aside>\n<p>Also, it may be easier to download a pre-packaged <a href=\"https:\/\/timothyawiseman.wordpress.com\/2016\/07\/01\/python-distributions\/\">distribution<\/a> such as <a href=\"http:\/\/winpython.github.io\/\">WinPython<\/a>, <a href=\"http:\/\/www.activestate.com\/activepython\/\">ActivePython<\/a>, or the commercially supported <a href=\"http:\/\/www.enthought.com\/\">Enthought Canopy<\/a>. They come with many packages and libraries that are not part of the core Python Distribution as well as pre-configured development environments which are more sophisticated than IDLE. Getting one of the distributions can save a lot of time downloading and installing individual components later.<\/p>\n<p>There are many options for Python IDE&#8217;s and editors.\u00a0 Personally, I tend to use <a href=\"https:\/\/github.com\/spyder-ide\/spyder\">Spyder<\/a> when I want to work with Python interactively or write short, simple scripts.\u00a0 It is lightweight and fast, but has some basic editing and introspection features that are convenient.\u00a0 When I am writing a longer or more complicated script\/program I tend to use Eclipse with <a href=\"http:\/\/www.pydev.org\/\">Pydev.<\/a>\u00a0 It is loads slowly but has very sophisticated introspection and project management abilities.\u00a0\u00a0 It is largely a matter of taste.<\/p>\n<h2>How to learn and get help with Python<\/h2>\n<p>There are many excellent resources on learning python and getting help.\u00a0 Once the absolute basics have been mastered, one of the best sources of information is the Doc strings for the entity being called.\u00a0 Many IDE&#8217;s provide user friendly access to the doc strings, but they can always be called by typing:<\/p>\n<pre>print([object_name].__doc__)\"\r\n<\/pre>\n<p>at an interactive prompt.<\/p>\n<p>A few of the more useful resources for learning Python are:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.diveintopython.net\/\">Dive into Python<\/a><\/li>\n<li><a href=\"http:\/\/openbookproject.net\/thinkcs\/python\/english2e\/\">How to Think Like a Computer Scientist<\/a><\/li>\n<li><a href=\"http:\/\/www.greenteapress.com\/thinkstats\/\">Think Stats: Probability and Statistics for Programmers<\/a><\/li>\n<\/ul>\n<p>Also, there are some more generalized programming resources with excellent Python sections, such as <a href=\"http:\/\/stackoverflow.com\/\">Stack Overflow<\/a>.<\/p>\n<h2>Further Reading:<\/h2>\n<ul class=\"reference-list\">\n<li><a href=\"http:\/\/www.simple-talk.com\/sql\/database-administration\/comparing-python-and-powershell-dba-scripting-\/\">Comparing Python and PowerShell DBA Scripting<\/a> &#8211; An excellent article by Ron Dameron comparing Python and Powershell with examples in each.<\/li>\n<li><a href=\"http:\/\/vdevraj.blogspot.com\/2006\/11\/dba-script-exodus-to-python-or-ruby.html\">DBA Script Exodus to Python or Ruby Expected in the Near Term<\/a> &#8211; A broad overview of the value of scripting languages to the DBA.<\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Python is increasingly used by DBAs as a general-purpose scripting language, despite the pressure to adopt Microsoft&#8217;s PowerShell.  They find it easy to learn, powerful, and reasonably intuitive. Here Timothy Wiseman, a working DBA, explains the attraction of Python and gives a general introduction to the language, suitable for anyone who has been wondering whether to give it a try.<br \/>\n&hellip;<\/p>\n","protected":false},"author":199288,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143527,146042],"tags":[4170,4635,5021,4150,4151,5022,5020],"coauthors":[6791],"class_list":["post-657","post","type-post","status-publish","format-standard","hentry","category-database-administration-sql-server","category-python","tag-database-administration","tag-powershell","tag-python","tag-sql","tag-sql-server","tag-sql-server-dba","tag-timothy-wiseman"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/657","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/199288"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=657"}],"version-history":[{"count":11,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/657\/revisions"}],"predecessor-version":[{"id":92275,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/657\/revisions\/92275"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=657"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=657"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=657"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=657"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}