Simple Talk is now part of the Redgate Community hub - find out why

Statistics in SQL: Simple Linear Regressions

Although linear regressions can get complicated, most jobs involving the plotting of a trendline are easy. Simple Linear Regression is handy for the SQL Programmer in making a prediction of a linear trend and giving a figure for the level probability for the prediction, and what is more, they are easy to do with the aggregation that is built into SQL.

The series so far:

  1. Statistics in SQL: Pearson’s Correlation
  2. Statistics in SQL: Kendall’s Tau Rank Correlation
  3. Statistics in SQL: Simple Linear Regressions
  4. Statistics in SQL: The Kruskal–Wallis Test
  5. Statistics in SQL: The Mann–Whitney U Test

Let’s imagine that we have two variables, X and Y which we then plot using scatter graph.

C:\Users\andrew.clarke\SkyDrive\Documents\scatterPlot.gif

It looks a bit like someone has fired a shotgun at a wall but is there a relationship between the two variables? If so, what is it? There seems to be a weak positive linear relationship between the two variables here so we can be fairly confident of plotting a trendline.

Here is the data, and we will proceed to calculate the slope and intercept. We will also calculate the correlation.

It gives the result…

Alpha was calculated by this expression, …

… Beta was calculated thus.

… and Rho by this

The Pearson’s Product Moment correlation can be used to calculate the probability of this being a significant correlation via the Fisher Transformation.

The slope (beta) and intercept (alpha) can be used to calculate any point on the trendline.

We can, for example calculate the value of Y when X is 100 by the equation
y’ = a + bx

Is, in our example

We can calculate the trendline X,Y points by slightly altering the SQL

We can then plot a trend line

C:\Users\andrew.clarke\SkyDrive\Documents\scatterPlot.gif

We are, of course, assuming a linear relationship and parametric data. With Pearson’s Rho and the number of X-Y pairs, we can go on to estimate the fisher transformation (t= 1.912) and probability of 0.061825.

The PowerShell to do the Gnuplot graphs is here

 

See also the previous in this series

How you log in to Simple Talk has changed

We now use Redgate ID (RGID). If you already have an RGID, we’ll try to match it to your account. If not, we’ll create one for you and connect it.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.

Continue

Simple Talk now uses Redgate ID

If you already have a Redgate ID (RGID), sign in using your existing RGID credentials. If not, you can create one on the next screen.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.

Continue