Problems Caused by Use of the SQL_VARIANT Datatype
Phil Factor illustrates the 'quirks' of the SQL_VARIANT datatype and why it's best to investigate when SQL Prompt alerts you to its use. It is only safe to store data as a SQL_VARIANT, if you explicitly convert it to its true type before you use it.
The sql_variant
datatype stores values from several different data types and is used internally by SQL Server. It is not part of the SQL standard and has limited uses in a relational database. It needs to be handled with care because its misuse will lead to performance problems and bugs that are hard to track down. A sql_variant
cannot be passed directly to some SQL operators and functions such as LIKE
, SUM()
or AVG()
, and can give misleading results when used in comparisons or expressions. It cannot be returned to an application via ODBC except as binary data.
SQL Server is a strongly-type language, and this is done to ensure data integrity, efficient storage and effective retrieval. For this reason, it is a bit of an oddity to use sql_variant
, and so it is not surprising that one can unwittingly cause problems by using it injudiciously. For these reasons, SQL Prompt enforces a ‘best practice’ code analysis rule (BP024) that will alert you to uses of the sql_variant
datatype.
As with many ‘best practice’ rules, the advice sometimes begins to sound like telling people not to run while holding scissors. In this case, it is only safe to store data as a sql_variant
, if you explicitly convert it to its true type before you use it.
Why have a sql_variant?
The sql_variant
datatype was first introduced when Microsoft developed SQL Server from Sybase. They needed to be able to import databases into SQL Server from Microsoft Access, Microsoft’s first entry into the database market, which supported a variant datatype. It is still used internally in SQL Server in parameters for system stored procedures, and for data such as extended properties.
The sql_variant
tends to be used as a catch-all datatype for columns, variables, parameters, or values returned by user-defined functions. It can hold up to 8,000 bytes and can store the basic datatypes such as integers, decimals, strings and dates. It can’t store some other datatypes, such as the (MAX) datatypes, the CLR datatypes or XML .
Occasionally, sql_variant
can be a useful tool, such as when working with inconsistent or unspecified data types, which usually arise because the database supports applications that allow user-defined data.
It stores the base datatype of the value it contains, so enforces all the rules of conversion between datatypes, when it is used as an intermediary. You can retrieve this base datatype using a data type function, sql_variant_property()
:
1 2 |
DECLARE @MyVariant SQL_VARIANT = '2.3657' SELECT SQL_VARIANT_PROPERTY(@MyVariant,'BaseType') |
Which in this case returns varchar
. There are some other useful properties here: Precision
, Scale
, TotalBytes
, Collation
and MaxLength
. The TotalBytes
parameter makes the function useful as a preliminary check, if you want to make a PRIMARY
KEY
from a sql_variant
, because of the 900-byte limit on the total size of a PRIMARY
KEY
(or index).
You can, incidentally, use this function on any datatype. For example:
1 |
SELECT SQL_VARIANT_PROPERTY(N'Béoáed mac Ocláin','collation') |
Aggregations
Let’s see what happens if we try to aggregate a sql_variant
column. We’ll do it from a derived table just to keep things simple.
1 2 3 4 5 6 7 8 9 |
SELECT Sum(ValueAsVariant) FROM ( VALUES (Convert(SQL_VARIANT, 'one'), 1, Convert(SQL_VARIANT, 1)), ('two', 2, 2), ('three', 3, 3), ('four', 4, 4), ('five', 5, 5) ) AS f (ValueAsString, ValueAsInt, ValueAsVariant); |
We see an error:
Msg 8117, Level 16, State 1, Line 3 Operand data type sql_variant is invalid for sum operator.
Whereas if we fist do explicit conversion to a number (int
, numeric
and so on), it works fine.
1 2 3 4 5 6 7 8 9 10 |
SELECT Sum(Convert(NUMERIC(9,4), ValueAsVariant)) -- try sum, avg, stdev, stdevp, var, varp, or string_agg FROM ( VALUES (Convert(SQL_VARIANT, 'one'), 1, Convert(SQL_VARIANT, 1)), ('two', 2, 2), ('three', 3, 3), ('four', 4, 4), ('five', 5, 5) ) AS f (ValueAsString, ValueAsInt, ValueAsVariant); |
The max()
and min()
aggregate functions seem to work fine with the sql_variant
datatype, so it cannot be a technical problem that prevents the others from being made to work.
Comparisons
You can’t use LIKE
to filter a sql_variant
column because LIKE
doesn’t support sql_variant
parameters.
1 2 3 4 5 6 7 8 9 10 |
SELECT f.ValueAsVariant, f.ValueAsInt, f.ValueAsString FROM ( VALUES (Convert(SQL_VARIANT,'one'), 1, Convert(VARCHAR(5),1)), ('two', 2, 2), ('three', 3, 3), ('four', 4, 4), ('five', 5, 5) ) AS f(ValueAsVariant, ValueAsInt, ValueAsString) WHERE ValueAsVariant like 't%' |
This is the error:
Msg 8116, Level 16, State 1, Line 4 Argument data type sql_variant is invalid for argument 1 of like function.
In fact, none of the string functions accept a sql_variant
and will not attempt to do an implicit conversion to a string. Instead, they simply reject the parameter. If, instead, we declare what sort of datatype it really is, it works:
1 2 3 4 5 6 7 8 9 10 |
SELECT f.ValueAsVariant, f.ValueAsInt, f.ValueAsString FROM ( VALUES (Convert(SQL_VARIANT,'one'), 1, Convert(NVARCHAR(5),1)), ('two', 2, 2), ('three', 3, 3), ('four', 4, 4), ('five', 5, 5) ) AS f(ValueAsVariant, ValueAsInt, ValueAsString) WHERE Convert(VARCHAR(20),ValueAsVariant) like 't%' |
Unless you explicitly convert a sql_variant
datatype in a WHERE
clause, you may get an incorrect result, the cause of which is hard to detect, when hidden inside a long and tortuous procedure. For example, this returns rows 4 and 5 only, which is what you’d expect:
1 2 3 4 5 6 7 8 9 10 11 12 |
DECLARE @ParameterAsINT INT SELECT @ParameterAsINT = 3 SELECT f.ValueAsString, f.ValueAsInt, f.ValueAsVariant FROM ( VALUES ('one', 1, Convert(SQL_VARIANT, 1)), ('two', 2, 2), ('three', 3, 3), ('four', 4, 4), ('five', 5, 5) ) AS f (ValueAsString, ValueAsInt, ValueAsVariant) WHERE ValueAsVariant > @ParameterAsInt |
However, what happens if we were to change the parameter to a sql_variant
and supply a character string value for it?
1 2 3 4 5 6 7 8 9 10 11 12 |
DECLARE @ParameterAsVariant sql_variant SELECT @ParameterAsVariant ='3' SELECT f.ValueAsString, f.ValueAsInt, f.ValueAsVariant FROM ( VALUES ('one', 1, Convert(SQL_VARIANT, 1)), ('two', 2, 2), ('three', 3, 3), ('four', 4, 4), ('five', 5, 5) ) AS f (ValueAsString, ValueAsInt, ValueAsVariant) WHERE ValueAsVariant > @ParameterAsVariant |
Now it returns all rows, which you probably would not expect. The problem here is that, in order to evaluate the expression, SQL Server checks the base type, or type family, of the sql_variant
and compares it to our variable type. The base type family for the sql_variant can be Unicode, exact numeric, approximate numeric, date and time, binary or uniqueidentifier, and our ValueAsVariant
column contains an exact numeric.
In the first example, which returned only rows 4 and 5, our parameter type is in the same family as the type of our ValueAsVariant
column. SQL Server performs an implicit conversion and the code works. However, in the second example, where we’re using a sql_variant
parameter with a string value, the @ParameterAsVariant
contains Unicode. Rather than do an implicit conversion of the Unicode type to the exact numeric, which is the ‘senior’ datatype, SQL Server judges that the senior datatype is ‘bigger’ and therefore our search condition evaluates to true
for every row.
This is a definite quirk of sql_variant
. If we’re comparing two sql_variant
values of the exact same base datatype, it will ‘work’. If we compare a sql_variant
to another data type in the same family, the implicit conversion will allow it to work. Beyond that, all bets are off.
ODBC support
ODBC does not fully support sql_variant
. You will notice this when using a connection to a table containing sql_variant
types, because data in sql_variant
columns will be returned as binary data (e.g. 0x32303931) when you use Microsoft OLE DB Provider for ODBC (MSDASQL).
Limits on the use of sql_variant in an index
You can include a sql_variant
column in an index only if the total length of the index is less than the 900-byte maximum. This means that an INSERT
operation on an indexed sql_variant
column will fail if the length of the value exceeds 900 bytes. If we create the table, or table variable:
1 |
DECLARE @MyTableVariable TABLE (MyProperty sql_Variant PRIMARY KEY) |
We get a warning:
Warning! The maximum key length for a clustered index is 900 bytes. The index 'PK__#B2961DC__8E45D1198BEEA325' has maximum length of 8016 bytes. For some combination of large values, the insert/update operation will fail.
If we ignore the warning …
1 2 3 4 5 6 7 |
DECLARE @MyTableVariable TABLE (MyProperty sql_Variant PRIMARY KEY) INSERT INTO @MyTableVariable (MyProperty) VALUES (N'Abbán moccu Corbmaic'), (N'Abel of Reims'), (N'Buíte [Boetius] mac Brónaig'), (N'Buriana'), (Replicate(N'Caillín [Caillén] mac Niataig Crom mac Feradaig, Comgall mac Sétnai, Comgán mac Dá Cherda, Commán mac Fáelchon, Mo ChommócCrónán of Balla, see Mo Chua mac Bécáin',3)) |
We get the error …
Msg 1946, Level 16, State 3, Line 45 Operation failed. The index entry of length 980 bytes for the index 'PK__#B72883F__8E45D1191C112AAE' exceeds the maximum length of 900 bytes for clustered indexes.
Conclusions
The use of sql_variant
datatypes in a user table is a code smell because it introduces an untyped datatype into a strongly typed language, and required investigation, just as you investigate if you smell burning in your home. It is probably just the barbeque, but it might be something more alarming.
sql_variant
has legitimate uses, but there is always a risk that, although you may know exactly how to use them, other people who must maintain or debug your code might not, and are most likely to cause problems if you do anything beyond using them purely for storage.
You must never rely on implicit conversion of a sql_variant
, because it often fails, either because it isn’t implemented or because it does it quirkily. Instead, explicitly convert them to a SQL datatype before doing comparisons, expressions or aggregations. If you are not entirely sure that you understand that last sentence, then it is probably best never to use sql_variant
.