# Statistics in SQL: Simple Linear Regressions

### Share to social media

#### The series so far:

Let’s imagine that we have two variables, X and Y which we then plot using scatter graph.

It looks a bit like someone has fired a shotgun at a wall but is there a relationship between the two variables? If so, what is it? There seems to be a weak positive linear relationship between the two variables here so we can be fairly confident of plotting a trendline.

Here is the data, and we will proceed to calculate the slope and intercept. We will also calculate the correlation.

It gives the result…

Alpha was calculated by this expression, …

… Beta was calculated thus.

… and Rho by this

The Pearson’s Product Moment correlation can be used to calculate the probability of this being a significant correlation via the Fisher Transformation.

The slope (beta) and intercept (alpha) can be used to calculate any point on the trendline.

We can, for example calculate the value of Y when X is 100 by the equation
y’ = a + bx

Is, in our example

We can calculate the trendline X,Y points by slightly altering the SQL

We can then plot a trend line

We are, of course, assuming a linear relationship and parametric data. With Pearson’s Rho and the number of X-Y pairs, we can go on to estimate the fisher transformation (t= 1.912) and probability of 0.061825.

The PowerShell to do the Gnuplot graphs is here

### Phil Factor

See Profile

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 40 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career. See also :

343

2