Product articles SQL Prompt SQL Code Analysis
Never Create Columns with ANSI_PADDING…

24 February 2020

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

24 February 2020

Never Create Columns with ANSI_PADDING set to OFF

There is no good reason for having ANSI_PADDING set to OFF when you create tables in SQL Server. It was provided purely for legacy databases that had code that assumed the old CHAR behavior for dealing with padding, and its use has now been deprecated.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

It is a bit weird to find a connection setting that changes permanently the way that some datatypes in a table are retrieved, but this is what happens if you use SET ANSI_PADDING OFF when creating a temporary or permanent table. A moment of silliness with settings, or an accidental use of an old application with a DBLib connection, when creating a table, leads to a table with some columns that handle the trailing spaces of some string or binary datatypes oddly, or inconsistently, from then on.

This option is deprecated and at some point, it will be removed (it will always be “on”). SQL Prompt has a deprecated syntax rule, DEP013, which will warn you of use of this and other deprecated SET options.

What and why is ANSI padding?

In the early years of SQL, the question of how to handle strings was a source of controversy. The CHAR datatype, where the length of string was defined and fixed, was designed to make data retrieval simple and efficient. Strings were stored in the CHAR datatype up to the designated length. With strings that were shorter, any remaining character positions in the datatype were padded with spaces (CHAR) or zeros (BINARY). Are these padded spaces part of the string? If not, how do you distinguish values that are deliberately filled with spaces from those that are automatically padded?

The early implementations of SQL generally trimmed out all trailing spaces in CHARs, when retrieving data, unless the column was NOT NULL. However, it became apparent that to conform to the ANSI SQL standard, this had to change. The NIST test suite checked that CHAR datatypes were always padded, and that trailing spaces entered by the user were never truncated, neither for the CHAR or the VARCHAR datatypes. The decision was made by SQL Server that, to comply, any trailing spaces, whether deliberate or added automatically as padding, would be retrieved with the data, and the same goes for trailing zeros in binary datatypes. However, because so much code was written under the old regime, a setting was introduced into Transact-SQL, called ANSI_PADDING. When it was switched off, it allowed this legacy code to work as it always had. Everyone seemed happy.

Once the controversies over ANSI-ISO standard for CHAR datatypes died down, new types of data and new types of table were introduced. The ANSI_PADDING controversy only affected types that existed then, with a length that the user defined for the purpose. Strings can now be stored as a defined size of NVARCHAR, VARCHAR, NCHAR or CHAR. Binary data can be stored as a defined size of BINARY or VARBINARY. For the older datatypes of CHAR(n), BINARY(n), VARCHAR(n) or VARBINARY(n), the setting of the ANSI_PADDING option at the time that the table was created affects how SQL Server subsequently handles these strings.

However, the same isn’t true of the later datatypes of NCHAR, NVARCHAR, NTEXT, TEXT or IMAGE. The types where the length isn’t defined, VARBINARY(MAX), VARCHAR(MAX), and NVARCHAR(MAX) aren’t affected either.

The only use that the old database developers really had for switching off ANSI padding was to do string concatenation without having to use the RTRIM() function. It might seem a good idea to avoid having to use the RTRIM() function but the padding rule isn’t consistent in its behavior with NULLable columns. Also, as new types of table were introduced, nobody bothered to make them backward compatible ands so the rules that apply, generally, when ANSI_PADDING is turned off, don’t work at all for table variables. Also, you are likely to get caught out if you try to create or change indexes on computed columns or indexed views. This simply isn’t allowed if you have ANSI_PADDING set to OFF.

So, what are the rules?

The simple, ANSI-standard behavior is that data inserted into fixed-width types is always padded with trailing spaces or zeroes, to the specified length, and subsequently any trailing spaces or zeroes are considered part of the data, for all datatypes, and so are never trimmed when SQL Server retrieves the data into memory.

The behavior gets a lot more complicated if you switch ANSI_PADDING to OFF when creating the tables and columns. Fortunately, the presence or absence of trailing spaces does not affect string comparisons in WHERE clauses because these always ignore them whatever the setting. It also does not greatly affect LIKE comparisons. The main effects of turning off ANSI_PADDING are as follows:

  • CHAR NOT NULL and BINARY NOT NULL columns are padded on inserting data and not subsequently trimmed (same behavior as ANSI standard)
  • Nullable CHAR and BINARY columns are trimmed on retrieval (and so, of course, are no longer padded on insertion). You lose any trailing spaces or zeroes added deliberately
  • VARBINARY and VARCHAR columns are trimmed on retrieval, so you lose any trailing spaces or zeroes added deliberately

We can demonstrate all this, if you need convincing.

You’ll see the following messages:

Creating a temporary table with ANSI_PADDING ON
Now creating identical temp table with ANSI_PADDING OFF
Now creating identical table variable with ANSI_PADDING OFF'
Switching ANSI_PADDING back on
inserting into both tables
Selecting from first table, created with ANSI padding ON (<> shows extent of string)
Selecting from second table, created with ANSI padding OFF: Same query
Selecting from table variable, created with ANSI padding OFF: Same query

And the results look like this:

Right. With ANSI_PADDING set to ON, as usual, when the table was created, the first row, to which we deliberately added trailing spaces or zeros was never trimmed. The second row that had no trailing spaces had them added, consistently, for the CHAR and BINARY datatype, whether it allowed NULLs or not.

The second result is from the table that was created while ANSI_PADDING was switched OFF. The NULLable first CHAR column was trimmed. The CHAR column with the NOT NULL constraint was padded with spaces. VARBINARY columns were all trimmed of trailing zeros whether nullable or not. VARCHAR columns were trimmed of trailing spaces.

The third result, from a table variable that was also created with ANSI_PADDING switched OFF, shows a complete innocence to the setting. Regardless of the setting, it behaves in an ANSI-compatible way.

If you are struggling to take in all the rules and exceptions, then you are not alone.

Finding rogue columns created with ANSI_PADDING off

Query behavior is consistent whatever setting you have for ANSI_PADDING, at the time when you access the tables. The setting is held for the table columns, and the connection settings are ignored. This allows a ‘legacy’ database to behave consistently, whatever connection settings are used to access it. We can check to see what has happened to the temporary tables by querying the metadata.

The sys.columns view has a column that is 1 if ANSI_PADDING is on and 0 if it’s off. This query will tell you very quickly if your database has any rogue columns that are set ANSI_PADDING off (just nick out the first line ‘USE tempdb‘).

The bottom line

Until the danger of anyone intentionally or unintentionally setting ANSI_PADDING off finally subsides, when the feature is finally removed from SQL Server, always set your connection to the ANSI behavior using SET ANSI_PADDING ON before executing a table CREATE statement, but nowhere else should you use the setting, because the setting and support for it is scheduled for deprecation, at which point you won’t be able to switch off ANSI compatibility.

 

You may also like

  • Article

    Refactoring Databases with SQL Prompt

    Louis Davidson demonstrates how SQL Prompt can significantly lessen the pain involved in occasional, 'heavyweight' database refactoring processes, such as renaming modules, tables and columns (Smart Rename), or splitting tables (Split Table).

  • Article

    The '= NULL' Mistake and other SQL NULL Heresies

    The SQL Prompt Best Practice rule checks whether a comparison or expression includes a NULL literal ('NULL'), which in SQL Server, rather than result in an error, will simply always produce a NULL result. Phil Factor explains how to avoid this, and other SQL NULL-related calamities.

  • Community event

    PASS Deutschland eV - Berlin chapter

    Redgate will be remotely sponsoring the October meeting of the PASS Deutschland – Berlin chapter. PASS Deutschland eV is the German Microsoft Data Platform Community, ie topics from this environment are treated within the framework of the association. Not only do we rely on the SQL Server and its many services, tools and components, we also deal with

  • Article

    SQL Prompt Code Analysis: A Hint is Used (PE004-7)

    Phil Factor suggests a philosophy of "the SQL query optimizer knows best" when it comes to choosing the right execution plan. Use hints as a last resort, and evaluate them carefully whenever SQL Prompt warns you of their presence in your SQL code.

  • Forums

    SQL Prompt Forum

    Write, format, and refactor SQL effortlessly