Generating Plots Automatically From PowerShell and SQL Server Using Gnuplot

Comments 0

Share to social media

There are many excellent ways of plotting data from SQL Server databases. There are many R libraries for example, there is Reporting Services, Plotly or PowerBI. I’ve even used Excel and JpGraph. If I had to learn just one though, I’d use GnuPlot because it is so versatile. If you haven’t heard of Gnuplot, it is the standard way of plotting scientific data which works via the command-line and uses a domain-specific scripting language (DSL) , and there are two books, along with countless samples, tutorials and articles about using it.

So what is Gnuplot?

GnuPlot, which runs under a free software license, has its origins in the nineteen eighties. You’d have thought that these were primitive times for graphs, but no, they weren’t. If anything, the standard of graphs have deteriorated since then as the tools have become more commoditised. GnuPlot has been cherished and developed mainly for scientific graphs, but it is very versatile. It covers a vast range of graph types and there are a large number of samples and tutorials around. It is a tool for the professional, and isn’t really suitable for fancy graphs in PowerPoint presentations.

GnuPlot is one of the last of the applications that can use a classic vector-based pen plotter as easily as a laser printer. It can draw plots onscreen, even in ASCII. Most of us nowadays want graphs as bitmapped graphics to display within a webpage or embed in a document. It does that as well, but also in vector graphics or as a Pdf. The list is endless. It can even do animated graphs. So many impoverished scientists, with antique output devices, have been accommodated over the years and have blessed Gnuplot’s ability to produce graphs on just about any printer. Python, Java, Ruby, Haskell Fortran 95 and Smalltalk all have libraries for accessing Gnuplot.

Another great advantage of GnuPlot is that allows the separation of script from data. GnuPlot runs a script, and generally the data is read from a data file specified by the script in the plot command. It likes columnar or tab-separated data, which suits SQL Server.

Although GnuPlot is best known for plotting scientific data and mathematical expressions, it is excellent in business, particularly for financial data. Here is an example

The existing teaching material on GnuPlot is so good that it would be silly to attempt to add to it. Instead, I’ll start by showing a basic PowerShell-integration.

Hello Plot

Gnuplot can be installed via Chocolatey, or done the old-fashioned way via Sourceforge.

There is a windows GUI version of Gnuplot, which is handy for creating scripts. With a script you can use the many ‘set’ commands to specify how the image is to be formatted and presented, including such things as the type of output, size of the graph, the fonts and colors to be used, the scale, x,y coordinates, and the output image name. The ‘plot’ command is then used to actually generate the image. The process has sensible defaults. There are a whole host of other commands, of course. Although you can execute commands interactively via the GUI or the command interface, it is usually best to end up with a script file that can be rendered reusable by means of variables in the script file. If the script contains any errors, these will be displayed during this process, highlighting where the error is.

Once you have the design roughed out, you can run it in PowerShell, or whatever other scripting language you choose.

I use Gnuplot mainly for SQL Server performance testing, because it can be done entirely automated as part of a rapid integration. Without twitching a muscle, if all goes well, you end up with a cute website stocked with all the graphs you’d possibly need for checking that your processes scale properly. However, as I can’t really show that for a general introduction, I’ll get some Open Data and do a bit of analysis on that.

As an example, I’ve taken the data for the annual consumption of Petroleum since 1980. I’ve imported it into SQL Server, created a normalised version and started generating reports. Although it is trivial for Gnuplot to get data from a tab delimited or column delimited file, we’ll, for the first two examples, just use SSMS to get the data. In SSMS, I’ve set the query to text. (Query -> Results to…->Results to Text) and set the query options (query->Query Options->Results-Text) to Output Format:tab delimited and ‘Include Column headers in the result set’ to ‘no’. With these settings we can simply cut and paste the results from SSMS into GnuPlot via PowerShell. Firstly we just sum the petroleum consumption for the whole world aggregated per year.

This seems a startling rise when we are all trying to prevent global pollution and climate change. Is this consistent, or are there some nations guzzling petroleum? We therefore go back to our SQL and produce a column for each continent which we can then plot as separate lines to see what is going on. We’ll change to a line graph as well.

Basically, the plots seem to illustrate that, while Europe and North America has got its petroleum consumption under control, Asia hasn’t. Our figures aren’t particularly accurate because asia is a very broad grouping, and Russia is considered to be in Europe in the data. (the Urals actually separate the Asian and European sides). As a side-note, petroleum consumption is also considered a sign of industrial economic activity.

At this point, we’d want to investigate the details of the main petroleum consumers to home in on the countries that are increasing their usage fastest. For this, we really need a way of finding the countries who are affecting the overall figurs the most.

If we assume that all countries are increasing, or decreasing their consumption in a linear fashion, then we can filter out the countries with the greatest rise as well as the highest consumers, simply by doing a linear regression in SQL Server.

Our data table is very simple.

..and our first plot data was created by executing ….

We can get the intercept, slope, and predicted petroleum consumption for a particular year from a slightly more complicated query that calculates the linear regression. Here we list the predicted top twenty consumers from the data (which is only as far as 2014)

Now that we know who is increasing their consumption most rapidly, and who the most prolific consumers are, we simply generate plots for the most significant fifty petrol consumers. This would be somewhat a trial by the dreaded ‘point-and-click’ With GnuPlot it is the work of a moment.

We can now inspect these graphs either by referencing them from a webpage or via a file-viewing application such as Directory Opus. You can also embed them in Word files.

Here is an example of the many plots generated by this routine

Conclusion

I’ve been as guilty as anyone in the past of clinging to Excel as the only way of doing any sort of graph plotting. It is good for one-off work but if you are doing lots of graphs, and automating their production and update, then the old classic, Gnuplot, is definitely worth considering. I know of nothing else that is so sympathetic with the way that SQL Server exports data, and which runs on the command line, generating bitmap or vector graphics directly, as well as most other formats that you’d want. I recommend it for those occasions such as data-science, performance testing and monitoring, where you need information more than you need fancy presentation. It is valuable for reporting on any automation tasks, and very good for integration tests and other aspects of continuous integration and deployment.

Further Reading

  1. Gnuplot in Action by Philipp K. Janert (2009). ISBN 978-1933988399.
  2. gnuplot Cookbook by Lee Phillips (2012). ISBN 978-1849517249.
  3. Gnuplot 5.0. by Thomas Williams and Colin Kelley (2015). ISBN 978-988-14436-4-9.
  4. Gnuplot on Wikipedia
  5. Gnuplot introduction on Wikimedia
  6. Gnuplot examples on Wikimedia commons
  7. Gnuplot Reference Card
  8. Images of plots with Gnuplot source code
  9. Official Gnuplot Documentation

About the author

Phil Factor

See Profile

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 40 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career. See also :

Phil Factor's contributions