An Easier Way of Pivoting Data in SQL Server

Comments 0

Share to social media

The Pivot or Transpose

Many developers just want to ’rotate’, pivot or transpose a result as if it were a matrix. This involves turning columns into rows by, in effect, transposing the two coordinates. There is, of course, a lot of difference between a pivot and a transpose because columns in a SQL Table represent the discrete values of entities, not values in a range. However, we are all faced with data in a table format such as an Excel PivotTable or a tabular report that may be a table but not a relational table. Often, we want to pivot it/do a transpose. Usually, neither a pivot or a tabular report are, strictly, relational tables, but if we are in the midst of delivering meaningful reports, who cares? A lot of systems need reports in a particular format and we have to provide that.

Imagine we get something like this, the world’s oil consumption figures…

And we need to provide it in this format…

There is a problem here because someone is going to add some columns to represent recent data. What about the two subsequent years at least 2016 and 2017? I’ve recently described some of the classic ways of doing this, and a good brute-force JSON technique to achieve the same, but they have the problem that the code to do the rotation needs to change when those extra rows are added. What about a technique that goes further by transposing/pivoting data of any size? Well, sure, as long as it is a JSON table, but there is just a little essential task at the end to provide the column names and datatypes of the SQL Server result.

A Simple Demonstration

Let’s just demonstrate what I mean as simply as possible. We start with a simple result produced by this test code.

We now transpose it by executing this code using a function I’ll introduce and explain later. (I’ve embedded the values of the original result)

… which produces this…

That line at the end is important because it tells the system function OpenJSON() what columns you want from the JSON document and what SQL Datatype they should be. This task is essential because there is a loss of metadata when a table is converted into JSON. We don’t know much about numbers when they are stored in JSON, only that something is a number.

OK. Now we supply a result of a different size

Despite the change in the number of rows and columns, we can use the same routine, unaltered. We just need, when we use the function, to define the extra two columns that result from there being two extra rows in the original.

… transposed to…

You will have noticed a problem. What about the column headings (One to Five). Those have gone. Shouldn’t we have made them into row headings?

In this particular example of the matrix we don’t have to, but we can. We can reference what were the column headings as another column ‘’$.Col0’’, to transpose what were the row headings into column headings.

The Generic TransposedJSONMatrix Function

Here is the code of the procedure used in the preceding examples. All we are really doing is to take a matrix of one consistent data type and to transpose it. We can’t add a column of strings, because that would equate to a row of strings in a table of integers which makes no sense in SQL Server. JSON can cope with it without any problem at all, but only because it is not, by default, a way of representing tabular data. All we are really doing is to use the OpenJSON() function to unwrap the JSON into a relational table that records each cell and its value, together with its column and row. We then create a new JSON document with transposed rows and columns. Because we pass JSON in and out of the function we avoid the problems that we’d have with doing the same thing with tables.

Trying it out

Now we can put our generic function through its paces to test it out. Here is the data before the transpose.

— now we test it out

After the transpose, the data looks like this:

Now the first example we gave at the start of the article is generated simply by following the same pattern. Remember that this code would be very small were it not for the fact that we are generating the data table as a table-source, using several sets of row value expressions to be constructed into a table via the VALUE table-value constructor. This is useful when running a function through a series of unit tests because it keeps everything in one place.

What about doing other datatypes besides integers?

So far, we’ve stuck resolutely to integers in our matrix. As long as we are consistent, we can use whatever we like. Obviously, if we are pivoting a matrix, the cells will all need to have the same datatype in SQL Server.

We can to other datatypes. How about numeric?

… transposed to …

We can do strings too

… transposed to …

BITs are possible too …

… transposed to …

What About the row and column aggregations?

Yes, that first example, the oil consumption totals, needed row and column totals, didn’t it? We wanted to see how the petrol consumption increased over the years.

With a bit of a tweak that I’ll show you in a moment, we can do this (I’ve highlighted the aggregation column and row).

…which, this being my lucky day, checks out if we do the totals using the SUM() function within Excel.

How could we do this so easily? You see, we had to shred the JSON into its normalised form anyway in order to do the transpose, so doing the aggregation within the function was the natural thing to do. Unfortunately, we can’t do every type of aggregation within the one function, and we can’t aggregate every datatype. You will have to build very slightly different functions for each.

There isn’t that much code needed to perform this magic.

Inside the function, after we insert our data into the @TheNormalisedVersionOfTheMatrix table, we have a couple of extra insertions to do to get the extra column and row in.

I chose a label ‘Total’ for the column totals, and I specified a BIGINT, but you could choose all manner of datatypes, and even do a different aggregation such as average or minimum. Here is the code for the new function:

Here is the code that produced the Oil Consumption figures with the added  sum column and total row .

We can see the aggregation column just by specifying Colx as the column in the JSON path where x is one more than the number of data columns. (We used Col0 in the JSON path for the rotated column headings). The Total row just appears!

The code for both functions can be downloaded from the link at the bottom of the page.

Conclusions

I couldn’t do any of this sort of trickery until the JSON support came into SQL Server 2016. I suspect that there is a way to use OpenXML to work the same magic, but I leave that for the keen reader, if any are still with me at this point. JSON is just so handy, and seems fast. It is a very relaxed way of getting data in and out of functions. It is also easy to debug. It was relatively easy to bring it to bear on a problem that has previously proved resistant to easy solution, the generic function to pivot data regardless of the number of rows and columns.

Load comments

About the author

Phil Factor

See Profile

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 40 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career. See also :

Phil Factor's contributions