Using JSON for matrices in SQL Server.

From SQL Server 2017, it becomes more practical to use JSON arrays for representing and processing a matrix. SQL Server can read them, and update values in them but can’t create them. To do this, you have to create the JSON as a string. The great advantage of them is that you can pass them between procedures and functions as easily as any other string, and quickly turn them into tables.

For the SQL Server developer, matrices are probably most valuable for solving more complex string-searching problems, using Dynamic Programming. Once you get into the mindset of this sort of technique, a number of seemingly-intractable problems become easier. Here are fifty common data structure problems that can be solved using Dynamic programming. Until SQL Server 2017, these were hard to do in SQL because of the lack of support for this style of programming. Memoization, one of the principles behind the technique is easy to do in SQL but it is very tricky to convert existing procedural algorithms to use table variables. It is usually easier and quicker to use strings as pseudo-variables as I did with Edit Distance and the Levenshtein algorithm, the longest common subsequence, and the Longest Common Substring. The problem with doing this is that the code to fetch the array values can be very difficult to decypher or debug. JSON can do it very easily with path array references.

Would it be possible to use JSON arrays to solve one of these problems? If so, is it much slower? I thought it might be interesting to convert the Lowest Common Subsequence problem into a json-based form and run over a few tests back-to-back. The conclusion was, for those with the TLDR habit, was that it took twice to three times as long to run, but produced code that was easier to write, understand and debug. I suspect there are ways and means to make it faster.

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

IF Object_Id(N'LCS') IS NOT NULL DROP FUNCTION LCS;

CREATE FUNCTION LCS

/**

summary: >

The longest common subsequence (LCS) problem is the problem of finding the

longest subsequence common to all sequences in two sequences. It differs

from problems of finding common substrings: unlike substrings, subsequences

are not required to occupy consecutive positions within the original

sequences. For example, the sequences "1234" and "1224533324" have an LCS

of "1234":

Author: Phil Factor

Revision: 1.0

date: 05 April 2019

example:

code: |

Select dbo.lcs ('1234', '1224533324')

Select dbo.lcs ('thisisatest', 'testing123testing')

Select dbo.lcs ( 'XMJYAUZ', 'MZJAWXU')

Select dbo.lcs ( 'beginning-middle-ending',

'beginning-diddle-dum-ending')

returns: >

the longest common subsequence as a string

**/

(@xString VARCHAR(MAX), @yString VARCHAR(MAX))

RETURNS VARCHAR(MAX)

BEGIN

DECLARE @ii INT = 1; --inner index

DECLARE @jj INT = 1; --next loop index

DECLARE @West INT; --array reference number to left

DECLARE @NorthWest INT; --array reference previous left

DECLARE @North INT; --array reference previous

DECLARE @Max INT; --holds the maximum of two values

DECLARE @Current INT; --current number of matches

DECLARE @Matrix NVARCHAR(MAX);

DECLARE @PreviousRow NVARCHAR(2000); -- the previous matrix row

DECLARE @JSON NVARCHAR(4000); --json work variable

DECLARE @Numbers TABLE (jj INT);

-- SQL Prompt formatting off

INSERT INTO @numbers(jj) --this is designed for words of max 40 characters

VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),

(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),

(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40)

-- SQL Prompt formatting on

--the to start with, the first row is all zeros.

SELECT @PreviousRow =

N'[' + Replicate('0,', Len(@xString) + 1) + N'"'

+ Substring(@yString, 1, 1) + N'"]';

SELECT @Matrix = @PreviousRow;--add this to the matrix

/* we now build the matrix in bottom up fashion. */

WHILE (@ii <= Len(@yString))

BEGIN

SELECT @West = 0, @JSON = NULL;

--now create a row in just one query

SELECT @NorthWest =

Json_Value(@PreviousRow, '$[' + Cast(jj - 1 AS VARCHAR(5)) + ']'),

@North =

Json_Value(@PreviousRow, '$[' + Cast(jj AS VARCHAR(5)) + ']'),

@Max = CASE WHEN @West > @North THEN @West ELSE @North END,

@Current =

CASE WHEN Substring(@xString, jj, 1) = Substring(@yString, @ii, 1) THEN

@NorthWest + 1 ELSE @Max END,

@JSON =

Coalesce(@JSON + ',', '[0,')

+ Coalesce(Cast(@Current AS VARCHAR(5)), 'null'), @West = @Current

FROM @Numbers AS f

WHERE f.jj <= Len(@xString);

--and store the result as the previous row

SELECT @PreviousRow =

@JSON + N',"' + Substring(@yString, @ii, 1) + N'"]';

--and add the reow to the matrix

SELECT @Matrix = Coalesce(@Matrix + ',

', '') + @PreviousRow, @ii = @ii + 1;

END;

--we add the boundong brackets.

SELECT @Matrix = N'[' + @Matrix + N']';

SELECT @ii = Len(@yString), @jj = Len(@xString);

DECLARE @previousColScore INT, @PreviousRowScore INT, @Ychar NCHAR;

DECLARE @Subsequence NVARCHAR(4000) = '';

WHILE (@Current > 0)

BEGIN

SELECT @Ychar = Substring(@yString, @ii, 1);

IF (@Ychar = Substring(@xString, @jj, 1))

-- If current character in X[] and Y[] are same, then it is part of LCS

SELECT @ii = @ii - 1, @jj = @jj - 1,

@Subsequence = @Ychar + @Subsequence, @Current = @Current - 1;

ELSE

--If not same, then find the larger of two and traverse in that direction

BEGIN

--find out the two scores, one to the north and one to the west

SELECT @PreviousRowScore =

Json_Value(

@Matrix,

'strict $[' + Convert(VARCHAR(5), @ii - 1) + ']['

+ Convert(VARCHAR(5), @jj) + ']'

@previousColScore =

Json_Value(

@Matrix,

'strict $[' + Convert(VARCHAR(5), @ii) + ']['

+ Convert(VARCHAR(5), @jj - 1) + ']'

);

--either go north or west

IF @PreviousRowScore < @previousColScore SELECT @jj = @jj - 1;

ELSE SELECT @ii = @ii - 1;

END;

RETURN @Subsequence;

END;

-- Now we do a quick test and timing with the old version

DECLARE @timing DATETIME;

SELECT @timing = GetDate();

IF dbo.LongestCommonSubsequence('1234', '1224533324') <> '1234'

RAISERROR('test 1 failed', 16, 1);

IF dbo.LongestCommonSubsequence('thisisatest', 'testing123testing') <> 'tsitest'

RAISERROR('test 2 failed', 16, 1);

IF dbo.LongestCommonSubsequence('Patient', 'Complaint') <> 'Paint'

RAISERROR('test 3 failed', 16, 1);

IF dbo.LongestCommonSubsequence('XMJYAUZ', 'MZJAWXU') <> 'MJAU'

RAISERROR('test 4 failed', 16, 1);

IF dbo.LongestCommonSubsequence('yab', 'xabyrbyab') <> 'yab' RAISERROR(

'test 5 failed', 16, 1

);

IF dbo.LongestCommonSubsequence(

'beginning-middle-ending', 'beginning-diddle-dum-ending'

) <> 'beginning-iddle-ending'

RAISERROR('test 6 failed', 16, 1);

SELECT DateDiff(MILLISECOND, @timing, GetDate()) AS [ms FOR traditional way];

--now do the same test run with the current function

SELECT @timing = GetDate();

IF dbo.LCS('1234', '1224533324') <> '1234' RAISERROR('test 1 failed', 16, 1);

IF dbo.LCS('thisisatest', 'testing123testing') <> 'tsitest' RAISERROR(

'test 2 failed', 16, 1

);

IF dbo.LCS('Patient', 'Complaint') <> 'Paint'

RAISERROR('test 3 failed', 16, 1);

IF dbo.LCS('XMJYAUZ', 'MZJAWXU') <> 'MJAU' RAISERROR('test 4 failed', 16, 1);

IF dbo.LCS('yab', 'xabyrbyab') <> 'yab' RAISERROR('test 5 failed', 16, 1);

IF dbo.LCS('beginning-middle-ending', 'beginning-diddle-dum-ending') <> 'beginning-iddle-ending'

RAISERROR('test 6 failed', 16, 1);

SELECT DateDiff(MILLISECOND, @timing, GetDate()) AS [ms FOR JSON-based] ;

This returns …

ms FOR traditional way

----------------------

(1 row affected)

ms FOR JSON-based

-----------------

(1 row affected)

These tests form part of the build script for the procedure to try to make sure that as few as possible mistakes are left in!

Obviously, I’d like a bit more speed in the JSON querying but it is acceptable unless one is doing a lot of this sort of querying. I’m happy to us JSON for doing this because it is quicker to get things up-and-running. In my article ‘Doing Fuzzy Searches in SQL Server‘, I show how it is possible to cut down on the amount of searches by filtering the likely candidates with conventional SQL Commands first.

Register for Simple Talk

Using JSON for matrices in SQL Server.

About the author

Phil Factor

Phil Factor's contributions

Articles

Books

Top topics

Phil Factor's latest contributions:

Accessibility, and the Need for Ingenious Simplicity in Computer Scripts, Languages and Programs

Data Documents and Common Sense

AI and Databases

Recommended

About the author

Phil Factor

Phil Factor's contributions

Articles

Books

Top topics

Phil Factor's latest contributions:

Accessibility, and the Need for Ingenious Simplicity in Computer Scripts, Languages and Programs

Data Documents and Common Sense

AI and Databases