Product articles SQL Prompt SQL Code Analysis
Problems Caused by Use of the…

26 June 2019

2 Comments

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

26 June 2019

2 Comments

Problems Caused by Use of the SQL_VARIANT Datatype

Phil Factor illustrates the 'quirks' of the SQL_VARIANT datatype and why it's best to investigate when SQL Prompt alerts you to its use. It is only safe to store data as a SQL_VARIANT, if you explicitly convert it to its true type before you use it.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

The sql_variant datatype stores values from several different data types and is used internally by SQL Server. It is not part of the SQL standard and has limited uses in a relational database. It needs to be handled with care because its misuse will lead to performance problems and bugs that are hard to track down. A sql_variant cannot be passed directly to some SQL operators and functions such as LIKE, SUM() or AVG(), and can give misleading results when used in comparisons or expressions. It cannot be returned to an application via ODBC except as binary data.

SQL Server is a strongly-type language, and this is done to ensure data integrity, efficient storage and effective retrieval. For this reason, it is a bit of an oddity to use sql_variant, and so it is not surprising that one can unwittingly cause problems by using it injudiciously. For these reasons, SQL Prompt enforces a ‘best practice’ code analysis rule (BP024) that will alert you to uses of the sql_variant datatype.

As with many ‘best practice’ rules, the advice sometimes begins to sound like telling people not to run while holding scissors. In this case, it is only safe to store data as a sql_variant, if you explicitly convert it to its true type before you use it.

Why have a sql_variant?

The sql_variant datatype was first introduced when Microsoft developed SQL Server from Sybase. They needed to be able to import databases into SQL Server from Microsoft Access, Microsoft’s first entry into the database market, which supported a variant datatype. It is still used internally in SQL Server in parameters for system stored procedures, and for data such as extended properties.

The sql_variant tends to be used as a catch-all datatype for columns, variables, parameters, or values returned by user-defined functions. It can hold up to 8,000 bytes and can store the basic datatypes such as integers, decimals, strings and dates. It can’t store some other datatypes, such as the (MAX) datatypes, the CLR datatypes or XML .

Occasionally, sql_variant can be a useful tool, such as when working with inconsistent or unspecified data types, which usually arise because the database supports applications that allow user-defined data.

It stores the base datatype of the value it contains, so enforces all the rules of conversion between datatypes, when it is used as an intermediary. You can retrieve this base datatype using a data type function, sql_variant_property():

Which in this case returns varchar. There are some other useful properties here: Precision, Scale, TotalBytes, Collation and MaxLength. The TotalBytes parameter makes the function useful as a preliminary check, if you want to make a PRIMARY KEY from a sql_variant, because of the 900-byte limit on the total size of a PRIMARY KEY (or index).

You can, incidentally, use this function on any datatype. For example:

Aggregations

Let’s see what happens if we try to aggregate a sql_variant column. We’ll do it from a derived table just to keep things simple.

We see an error:

Msg 8117, Level 16, State 1, Line 3 Operand data type sql_variant is invalid for sum operator.

Whereas if we fist do explicit conversion to a number (int, numeric and so on), it works fine.

The max() and min() aggregate functions seem to work fine with the sql_variant datatype, so it cannot be a technical problem that prevents the others from being made to work.

Comparisons

You can’t use LIKE to filter a sql_variant column because LIKE doesn’t support sql_variant parameters.

This is the error:

Msg 8116, Level 16, State 1, Line 4
  Argument data type sql_variant is invalid for argument 1 of like function.

In fact, none of the string functions accept a sql_variant and will not attempt to do an implicit conversion to a string. Instead, they simply reject the parameter. If, instead, we declare what sort of datatype it really is, it works:

Unless you explicitly convert a sql_variant datatype in a WHERE clause, you may get an incorrect result, the cause of which is hard to detect, when hidden inside a long and tortuous procedure. For example, this returns rows 4 and 5 only, which is what you’d expect:

However, what happens if we were to change the parameter to a sql_variant and supply a character string value for it?

Now it returns all rows, which you probably would not expect. The problem here is that, in order to evaluate the expression, SQL Server checks the base type, or type family, of the sql_variant and compares it to our variable type. The base type family for the sql_variant can be Unicode, exact numeric, approximate numeric, date and time, binary or uniqueidentifier, and our ValueAsVariant column contains an exact numeric.

In the first example, which returned only rows 4 and 5, our parameter type is in the same family as the type of our ValueAsVariant column. SQL Server performs an implicit conversion and the code works. However, in the second example, where we’re using a sql_variant parameter with a string value, the @ParameterAsVariant contains Unicode. Rather than do an implicit conversion of the Unicode type to the exact numeric, which is the ‘senior’ datatype, SQL Server judges that the senior datatype is ‘bigger’ and therefore our search condition evaluates to true for every row.

This is a definite quirk of sql_variant. If we’re comparing two sql_variant values of the exact same base datatype, it will ‘work’. If we compare a sql_variant to another data type in the same family, the implicit conversion will allow it to work. Beyond that, all bets are off.

ODBC support

ODBC does not fully support sql_variant. You will notice this when using a connection to a table containing sql_variant types, because data in sql_variant columns will be returned as binary data (e.g. 0x32303931) when you use Microsoft OLE DB Provider for ODBC (MSDASQL).

Limits on the use of sql_variant in an index

You can include a sql_variant column in an index only if the total length of the index is less than the 900-byte maximum. This means that an INSERT operation on an indexed sql_variant column will fail if the length of the value exceeds 900 bytes. If we create the table, or table variable:

We get a warning:

Warning! The maximum key length for a clustered index is 900 bytes. The index 'PK__#B2961DC__8E45D1198BEEA325' has maximum length of 8016 bytes. For some combination of large values, the insert/update operation will fail.

If we ignore the warning …

We get the error …

Msg 1946, Level 16, State 3, Line 45
  Operation failed. The index entry of length 980 bytes for the index 'PK__#B72883F__8E45D1191C112AAE' exceeds the maximum length of 900 bytes for clustered indexes.

Conclusions

The use of sql_variant datatypes in a user table is a code smell because it introduces an untyped datatype into a strongly typed language, and required investigation, just as you investigate if you smell burning in your home. It is probably just the barbeque, but it might be something more alarming.

sql_variant has legitimate uses, but there is always a risk that, although you may know exactly how to use them, other people who must maintain or debug your code might not, and are most likely to cause problems if you do anything beyond using them purely for storage.

You must never rely on implicit conversion of a sql_variant, because it often fails, either because it isn’t implemented or because it does it quirkily. Instead, explicitly convert them to a SQL datatype before doing comparisons, expressions or aggregations. If you are not entirely sure that you understand that last sentence, then it is probably best never to use sql_variant.

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter
  • BC

    > They needed to be able to import databases into SQL Server from Microsoft Access, Microsoft’s first entry into the database market, which supported a variant datatype.

    I don’t think that’s correct. Access databases never had such data types. [1] [2] You might be thinking about VBA which does such data type.

    • Phil Factor

      Ah right. That’s interesting, thanks for the info.
      I think I got that info from an old Byte magazine so I didn’t check it. Certainly, the variant datatype in VBA would have explained it. Interestingly, Visual Basic started out as the Embedded Basic engine designed for Microsoft’s abandoned “Omega” database system which was planned to provide a front-end to SQL Server. It got rolled into Project Cirrus, which then became Access. Could it be that the Variant data type of VB/VBA was in the Access version of VBA? I can’t find any reference to a variant database in Sybase, so it would seem to be a Microsoft addition done at about that time. Although I’d never like to admit to ever having done any MS Access programming I seem to have a strange false memory of (mis)using the Variant datatype.
      I’ll do some more research.

You may also like