Product articles SQL Prompt SQL Code Analysis
Misuse of the scalar user-defined…

15 February 2018

4 Comments

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

15 February 2018

4 Comments

Misuse of the scalar user-defined function as a constant (PE017)

The incorrect use of a scalar UDF as a global database constant is a major performance problem and should be investigated whenever SQL Prompt spots this in any production code. Unless you need to use these global constants in computed columns or constraints, it is generally safer and more convenient to store the value in an inline table-valued function, or to use a view.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Developers tend to expect to be able to set global values in a database to provide constants such as the value of Pi, or variables such as the tax rate, language, file URNs or URLs. User-defined scalar functions return a single value, and so seem to provide the ideal way of doing so. This is fine for functions that are executed infrequently, and process relatively small data sets, but in other cases it can cause dramatic query performance problems. The problem arises because SQL Server doesn’t trust non-schema verified scalar functions as being precise and deterministic, and so chooses the safest, though slowest, option when executing them.

There is an overhead in calling any SQL Server function that has a BEGIN…END block because, unless we allow SQL Server to verify its output, by creating the function using schema binding, it will re-execute the function for every row, before data can be filtered, even though it is obvious to you that it will return the same value every time. It’s a slightly insidious problem because it doesn’t really show its full significance in the execution plan, though an Extended Events session will reveal what is really going on.

In short, do not use a scalar user-defined function (UDF) in a JOIN condition, WHERE search condition, or in a SELECT list, unless the function is schema-bound. SQL Prompt implements static code analysis rule, PE017, designed precisely to help you detect and rectify this problem. Unless you are confident with schema binding, and its consequences when making database changes, it is better to use either transfer the value to a variable, or use a module such as a view or inline Table-value function.

Tackling the Problem

If SQL Prompt detects that your code flouts PE017, what should you do?

We’ll set up all of the possible options, run some performance tests, and make some recommendations.

Schema-qualifying UDFs

A scalar function can be used correctly by adding schema binding to ensure that it is system-verified. Listing 1 creates two versions of the same, simple Wordcount function, first without and then with schema binding, both of which simple return a constant. In each case, we check the IsDeterministic, IsPrecise and IsSystemVerified property values for each object.

Finally, it creates a third version that simply returns its parameter value, just to check to see if this is a factor in SQL Server’s verification process.

IF Object_Id('dbo.Wordcount') IS NOT NULL DROP FUNCTION dbo.Wordcount
GO

CREATE FUNCTION dbo.Wordcount()
/**
Summary: >
A simple scalar multi-statement function
without schemabinding that returns a constant
Author: PhilFactor
Date: 01/02/2018
Returns: >
  the integer value 5
**/
RETURNS INT
AS
  BEGIN
    RETURN 5
  END
GO

/* we now test it to see whether SQL Server trusts it */
SELECT ObjectPropertyEx(
         Object_Id('dbo.Wordcount'), N'IsDeterministic') AS deterministic;

--Is a precise number returned? Whenever floating point operations are used in resolving 
--expressions, the results are not precise, by the very nature of the way that the datatype is stored. 
SELECT ObjectPropertyEx(Object_Id('dbo.Wordcount'), N'IsPrecise') AS precise;

--Can SQL Server verify that the function is precise and deterministic?
SELECT ObjectPropertyEx(
         Object_Id('dbo.Wordcount'), N'IsSystemVerified') AS verified;
GO

IF Object_Id('dbo.WordcountSchemaBound') IS NOT NULL DROP FUNCTION dbo.WordcountSchemaBound
GO

CREATE FUNCTION dbo.WordcountSchemaBound()
/**
Summary: >
A second version of a simple scalar multi-statement function
with schemabinding that returns a constant
Author: PhilFactor
Date: 01/02/2018
Returns: >
  the integer value 5
**/
RETURNS INT
WITH SCHEMABINDING
AS
  BEGIN
    RETURN 5
  END
GO

/* Repeat tests to see if SQL Server trusts dbo.WordcountSchemaBound*/
SELECT ObjectPropertyEx(
         Object_Id('dbo.WordcountSchemaBound'), N'IsDeterministic') AS deterministic;

SELECT ObjectPropertyEx(
         Object_Id('dbo.WordcountSchemaBound'), N'IsPrecise') AS precise;

SELECT ObjectPropertyEx(
         Object_Id('dbo.WordcountSchemaBound'), N'IsSystemVerified') AS verified;
GO

IF Object_Id('dbo.Wordcounter') IS NOT NULL DROP FUNCTION dbo.Wordcounter
GO

CREATE FUNCTION dbo.Wordcounter
  /**
Summary: >
A third version of a simple scalar multi-statement function
without schemabinding that merely returns its parameter
To test whether the absense of any parameter is a determining
Factor -- whether adding a parameter here gets round the problem
Author: PhilFactor
Returns: >
  the integer value passed to it
**/
  (@howMany INT)
RETURNS INT
AS
  BEGIN
    RETURN @howMany
  END
GO

/* Repeat tests to see if SQL Server trusts dbo.Wordcounter*/
SELECT ObjectPropertyEx(
         Object_Id('dbo.Wordcounter'), N'IsDeterministic') AS deterministic;

SELECT ObjectPropertyEx(Object_Id('dbo.Wordcounter'), N'IsPrecise') AS precise;

SELECT ObjectPropertyEx(
         Object_Id('dbo.Wordcounter'), N'IsSystemVerified') AS verified;
GO

Listing 1

If you run Listing 1, you’ll see that only the second version of the function, WordCountSchemaBound, returns true for the three properties. We’ll see a little later just how much this affects the performance of any queries that call these functions.

Although schema-binding has many advantages, it means, in this case, that you’ll be explicitly prevented from treating the constant as a variable, which is no bad thing. If you alter the ‘constant’ function, which is being you’ve used in constraints or computed columns within tables, it will prove to be complicated. Also, if you try to change the constant when the database is working, the plans that use the function that are being executed will place schema stability locks on the function and this will prevent you changing the value of the constant, as they require schema modification lock.

Alternatives to scalar UDFs

Listing 2 shows a couple of alternatives to scalar UDFs to hold database-wide values, in cases where you can’t or don’t want to schema-bind them; first a view, and then a table-valued function.

IF Object_Id('dbo.WordCountView') IS NOT NULL DROP VIEW dbo.WordCountView
GO
CREATE VIEW dbo.WordCountView
AS
/**
Summary: >
A very simple view that returns a single row with one column
Author: PhilFactor
Date: 01/02/2018
Returns: >
  a single row with a column called 'wordcount'
**/
SELECT 5 AS wordcount
GO
IF Object_Id('dbo.WordCountTVF') IS NOT NULL DROP FUNCTION dbo.WordCountTVF
GO
CREATE FUNCTION dbo.WordCountTVF()
/**
Summary: >
A table valued function that returns a single 
row with a column called 'wordcount' 
Author: PhilFactor
Date: 01/02/2018
Returns: >
  a single row with a column called 'wordcount'
**/
RETURNS TABLE
AS
RETURN
  (SELECT 5 AS wordcount)
GO

Listing 2

The objects referred to in the view definition can’t be altered in a way that would make the view definition illegal, or force SQL Server to recreate the index on the view.

I’ve left out the alternative of using a table for holding your constants despite the extra protection of CHECK constraints. Tables just aren’t designed to be immutable! As a spoiler, I’ll tell you that they perform as well as a view.

The Performance tests

Having lined up all the candidate solutions, let’s see how they perform. I’m going to test how quickly each of the options can find out how many five-letter words there are in common usage in the English language. The tests require us to create a simple Commonwords table consisting of a single column (primary key) of all the common words. To populate it, you’ll need to download the commonwords file, then run Listing 3, with the correct path to the file.

DECLARE @AllCommonWords XML =
          (SELECT * FROM OpenRowset(BULK 'C:\MyPath\commonwords.XML',
                           SINGLE_BLOB) AS x );

IF Object_Id('commonwords', 'U') IS NOT NULL DROP TABLE commonwords;

CREATE TABLE commonwords (word VARCHAR(40) NOT NULL PRIMARY KEY);

INSERT INTO commonwords(word)
  SELECT word = word.value('@el', 'varchar(40)')
    FROM @AllCommonWords.nodes('/commonwords/row') AS CommonWords(word);

Listing 3

For the timings, I’m going to use the simple test harness described in my article, How to record T-SQL execution times using a SQL Prompt snippet.

-- create a temporary table variable to hold timings 
DECLARE @log TABLE
  (
  TheOrder INT IDENTITY(1, 1),
  WhatHappened VARCHAR(200),
  WhenItDid DATETIME2 DEFAULT GetDate()
  )

----start of timing
INSERT INTO @log(WhatHappened) SELECT 'Starting the test run'--place at the start

--first we see how fast it is with a literal number, as a benchmark
 SELECT Count(*) FROM commonWords WHERE Len(word)=5
INSERT INTO @log(WhatHappened) SELECT 'simple Query with literal number'--place at the start

--then see how long it takes if you transfer the 'constant' to a local variable
 DECLARE @NumberOfLetters INT=dbo.wordcount()
 SELECT Count(*) FROM commonWords WHERE Len(word)=@NumberOfLetters
INSERT INTO @log(WhatHappened) SELECT 'Same query but with value transferred from UDF to variable '

--Now we see how long with the scalar UDF that just passes the value back
 SELECT Count(*) FROM commonWords WHERE Len(word)=dbo.wordcounter(5)
iNSERT INTO @log(WhatHappened) SELECT 'Same but using a scalar function with parameter  '

-- And now with the use of a scalar UDF function as a global constant
 SELECT Count(*) FROM commonWords WHERE Len(word)=dbo.wordcount()
iNSERT INTO @log(WhatHappened) SELECT 'Same with a ''constant'' UD scalar function '

-- And now with the use of a schema-bound scalar UDF as a global constant
 SELECT Count(*) FROM commonWords WHERE Len(word)=dbo.wordcountSchemaBound()
iNSERT INTO @log(WhatHappened) SELECT 'Same with schema-bound ''constant'' UD scalar function'
--We'll now use a view to do the same thing
 SELECT Count(*) FROM commonWords
 INNER JOIN  dbo.WordCountView
 ON Len(word)=wordcount
iNSERT INTO @log(WhatHappened) SELECT 'Using a view containing a constant with inner join '

--We'll now use a view with a cross join to do the same thing
 SELECT Count(*) FROM commonWords
 CROSS JOIN  dbo.WordCountView
 where Len(word)=wordcount
iNSERT INTO @log(WhatHappened) SELECT 'Using a ''constant'' view with a cross join '

--and now with an inline table-valued function. Some functions are OK!
 SELECT Count(*) FROM commonWords
 INNER JOIN  dbo.WordCountTVF()
 ON Len(word)=wordcount
iNSERT INTO @log(WhatHappened) SELECT 'Using an inline TVF to provide a constant'

--we see if a different syntax makes a difference
 SELECT Count(*) FROM commonWords
 cross JOIN  dbo.WordCountTVF()
 WHERE Len(word)=wordcount
iNSERT INTO @log(WhatHappened) SELECT 'Using an inline TVF and cross join to provide a constant'
SELECT ending.whathappened AS test, DateDiff(ms, starting.whenItDid,ending.WhenItDid) [Time in ms] FROM @log starting
INNER JOIN @log ending ON ending.theorder=starting.TheOrder+1
--list out all the timings
GO
/* this is the end of the test section */

Listing 4

When we run this, we verify that all the forms of using a constant in a query produce the same results. The times show very clearly what the problem is, and the dramatic extent of it

A graph is scarcely necessary to emphasize the horror of PE017 – Incorrect usage of const UDF. The way SQL Server executes a scalar UDF that isn’t schema-bound, and therefore unverified, is so cautious (for every row, it asks “is it still returning 5?“) that it is fifty times slower.

Apart from avoiding use of non-schema bound scalar UDFs, the test showed that on average there is no real difference in performance between any of the other ways of getting a constant value to a query. The query execution plan is the same in each case.

Now, we end by using a tear-down section to leave everything tidy in our test database.

IF Object_Id('dbo.Wordcount') IS NOT NULL
   DROP function dbo.Wordcount
GO
IF Object_Id('dbo.WordcountSchemaBound') IS NOT NULL
   DROP function dbo.WordcountSchemaBound
GO
IF Object_Id('dbo.Wordcounter') IS NOT NULL
   DROP function dbo.Wordcounter
GO
IF Object_Id('dbo.WordCountView') IS NOT NULL
   DROP view dbo.WordCountView
GO
IF Object_Id('dbo.WordCountTVF') IS NOT NULL
   DROP function dbo.WordCountTVF
GO

Listing 5

Recommendations

If you use an unverified scalar function, then the query will be grindingly slow because you will be executing it on every row whether it has a parameter or not.

If you are faced with a vast tangle of inherited code that is using scalar UDFs as global constants, then redo them with schema binding. However, if these are global variables, which change infrequently in a live system, then I wouldn’t think of that alternative because you can’t alter a schema-bound function without temporarily altering every table that uses it in a constraint or computed column, to remove them, alter the function, and then replace the constraints and computed column.

Views or TVFs are more versatile, and so I’d me more inclined to use them to hold ‘global’ values. If these are changed, the change is logged because they require a DDL change. The only problem is that only scalar functions can be used in constraints or computed columns. If you use a table, then fine, but remember that changing a constant isn’t a DDL change and so you must set up access permissions to deny anyone the right to change, for example, the tax rate!

Share this post.

  • david wright

    For whatever reason (don’t want to get into that here) we use functions to return constants. For obvious reasons, those functions are all schema bound, but version 9.1.0.4138 of SQL Prompt’s code analysis insists on flagging their use as a performance issue. Is this a feature?

    CREATE FUNCTION dbo.f () RETURNS int
    WITH SCHEMABINDING
    AS
    BEGIN
    RETURN 5;
    END;
    GO

    CREATE PROCEDURE p AS
    BEGIN
    SELECT answer = dbo.f() FROM sys.all_objects;
    END;
    GO

    https://uploads.disquscdn.com/images/0e641eab2d6d7bc547a1ae2a6fa7c98f66188e89686854b66c3385b608172db3.png

    • Phil Factor

      This will have no impact on performance, as far as I can see. You can opt to turn this particular rule off in SQL Prompt as you are aware of the issues. There is no shame attached to using schemabound functions as global constants as long as nobody ever subsequently thinks of turning them into global variables!
      I am confident that this rule PE017 should be amended slightly to check for schema-binding. If no schemabinding, then the rule is valid, if the function is in the metadata as being precise and deterministic, then it isn’t. It just needs to check the ObjectProperty ‘IsSystemVerified’

  • Pingback: SQL Code Smells - Simple Talk()

  • Rune Bivrin

    For some reason, SQL Prompt is flagging FORMAT in the select list as a PE-017 violation. That doesn’t really make sense…

    • Phil Factor

      It should never mistake a built-in function for a user function. I can’t reproduce what you describe with error-free code, but it is true that Format() is mistaken for a constant UDF, which is wrong. I’ll pass this on to the Prompt team. Let me know if It is also a problem with error-free code.

You may also like