Simple Talk is now part of the Redgate Community hub - find out why

Statistics in SQL: Student’s t-test

Many undergraduates have misunderstood the name 'Students' in the t-test to imply that it was designed as a simple test suitable for students. In fact it was William Sealy Gosset, an Englishman publishing under the pseudonym Student, who developed the t-test and t distribution in 1908, as a way of making confident predictions from small sample sizes of normally-distributed variables. As Gosset's employer was Guinness, the brewer, Phil Factor takes a sober view of calculating it in SQL.… Read more

Pseudonymization and the Inference Attack

It is surprising that so much can be identified by deduction from data. You may assume that you can safely distribute partially masked data for reporting, development or testing when the original data contains personal information. Without this sort of information, much medical or scientific research would be vastly more difficult. However, the more useful the data is, the easier it is to mount an inference attack on it to identify personal information. Phil Factor explains.… Read more

Is It Time To Stop Using IsNumeric()?

The old system function IsNumeric() often causes exasperation to a developer who is unfamiliar with the quirks of Transact SQL. It seems to think a comma or a number with a 'D' in the midde of it is a number. Phil Factor explains that though IsNumeric has its bugs, it real vice is that it doesn't tell you which of the numeric datatypes the string parameter can be coerced into, and because it doesn't check for overflow. Phil comes to the rescue with a couple of useful alternatives, one of which works whatever version of SQL Server you have, and which tell you what datatype the string can be converted to.… Read more

How to Automatically Create and Refresh Development and Test Databases using SQL Clone and SQL Toolbelt

In order to be able to deliver database changes more quickly, there are several tasks that must be automated. It can be a daunting job to ensure that the whole team has the latest database build when there is a proliferation of copies, and the database is big. Phil illustrates a solution by taking a set of Redgate tools to show how they can be used together, via PowerShell, to build a database from object-level source, stock it with data, document it, and then provision any number of test and development servers with the database build, taking care to save any DDL changes to the existing copies of the database.… Read more

Statistics in SQL: The Kruskal–Wallis Test

Before you report your conclusions about your data, have you checked whether your 'actionable' figures occurred by chance? The Kruskal-Wallis test is a safe way of determining whether samples come from the same population, because it is simple and doesn't rely on a normal distribution in the population. This allows you a measure of confidence that your results are 'significant'. Phil Factor explains how to do it.… Read more

SQL Server User-Defined Functions

User-Defined Functions (UDFs) are an essential part of the database developers' armoury. They are extraordinarily versatile, but just because you can even use scalar UDFs in WHERE clauses, computed columns and check constraints doesn't mean that you should. Multi-statement UDFs come at a cost and it is good to understand all the restrictions and potential drawbacks. Phil Factor gives an overview of User-defined functions: their virtues, vices and their syntax.… Read more

Visual Checks on How Data is Distributed in SQL Server

There are many reasons for wanting to know how data is distributed. Sometimes you just want a rough idea of the way that data is distributed in a column. You may think, wouldn't it be nice to have a SQL function that just showed you roughly what the distribution was, graphically, in the results pane. Phil Factor thought that was well and turned the vague wish into reality.… Read more

Statistics in SQL: Simple Linear Regressions

Although linear regressions can get complicated, most jobs involving the plotting of a trendline are easy. Simple Linear Regression is handy for the SQL Programmer in making a prediction of a linear trend and giving a figure for the level probability for the prediction, and what is more, they are easy to do with the aggregation that is built into SQL.… Read more

Statistics in SQL: Kendall’s Tau Rank Correlation

Statistical calculations in SQL are often perfectly easy to do. SQL was designed to be a natural fit for calculating correlation, regression and variance on large quantities of data. It just isn't always immediately obvious how. In the second of a series of articles, Phil factor shows how calculating a non-parametric correlation via Kendall's Tau or Spearman's Rho can be stress-free.… Read more

Generating Plots Automatically From PowerShell and SQL Server Using Gnuplot

When you are automating a number of tasks, or performing a batch of tests, you want a way of automating the production of your plots and graphs. Nothing beats a good graphical plot for giving the indications of how the process went. If you are using PowerShell and maybe also SQL Server, it pays to use a command-line plotting tool such as Gnuplot to do all the hard work. It turns out to be handy for a range of data jobs, turning PowerShell into a handy data science tool.… Read more

Doing Fuzzy Searches in SQL Server

A series of arguments with developers who insist that fuzzy searches or spell-checking be done within the application rather then a relational database inspired Phil Factor to show how it is done. When the database must find relevant material from search terms entered by users, the database must learn to expect, and deal with, both expected and unexpected … Read more

String Comparisons in SQL: The Metaphone Algorithm

When exploring the use of the Metaphone algorithm for fuzzy search, Phil couldn't find a SQL version of the algorithm so he wrote one. The Metaphone algorithm is built in to PHP, and is widely used for string searches where you aren't always likely to get exact matches, such as ancestral research and historical documents. It is particularly useful when comparing strings word-by-word. With a SQL version, it is easy to experiment on large quantities of data!… Read more

Lists With, or Without, Ranges in both T-SQL and PowerShell

Whether you are working in a procedural language like PowerShell or in T-SQL, there is something slightly bothersome about having to deal with parameters that are lists, or worse with ranges amongst the values. In fact, once you have a way of dealing with them, they can be convenient, especially when bridging the gulf between application and the database. Phil Factor shows how to deal with them.… Read more

Using SQLite with PowerShell and SQL Server

When you combine PowerShell and SQLite, you can perform powerful magic. Phil Factor is in awe of SQLite and gives a brief demonstration of how easy it is to use. Just to encourage anyone who is unfamiliar with the database, he includes a giant-sized SQLite version of the old PUBS database that the first generation of RDBMS developers cut their teeth on. … Read more

Reading, Writing, and Creating SQL Server Extended Properties

There is a great gulf between wanting to document your database properly with extended properties and actually doing it. Extended Properties have many uses but they aren't easy to use. Phil Factor is on a mission to make it easier for ordinary mortals to use extended properties as intended, to aid the database development process.… Read more

Representing Hierarchical Data for Mere Mortals

Why is it that we use XML, but with so little enthusiasm when it does so much, and is so feature-rich? Phil Factor argues that there are better ways of doing it, more complete than JSON, but easier to read than XML. To try to convince you, he gives a set of flying demos, using PowerShell and his PSYaml module, to illustrate how YAML can let you work faster, and more accurately.… Read more

PSYaml: PowerShell does YAML

PSYaml is a simple PowerShell module that I’ve written that allows you to serialize PowerShell objects to “YAML Ain’t Markup Language” (YAML) documents and deserialize YAML documents to PowerShell objects. It uses Antoine Aubry’s excellent YamlDotNet library To start, you can simply load the PowerShell file and the manifest from its home on GitHub PSYaml … Read more

How you log in to Simple Talk has changed

We now use Redgate ID (RGID). If you already have an RGID, we’ll try to match it to your account. If not, we’ll create one for you and connect it.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.


Simple Talk now uses Redgate ID

If you already have a Redgate ID (RGID), sign in using your existing RGID credentials. If not, you can create one on the next screen.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.