Product articles SQL Prompt SQL Code Analysis
Why you Should Always Specify Whether a…

Why you Should Always Specify Whether a Column Accepts Nulls

Phil Factor explains the factors that determine whether a column will allow null values, if you don't specify it explicitly in the column definition. If you rely on the default behavior established by your connection settings, you could be in for some nasty surprises.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

In most circumstances, you need to specify whether a column should or shouldn’t allow null entries. It isn’t a good idea to rely on defaults, and assume that, if don’t specify the nullability of a column explicitly, using NULL or NOT NULL, then the column should be nullable. The rules that govern what happens if you leave out that option for a given datatype are quite difficult to explain, and even if you understand them, will your team or successors be as willing to do so?

In SQL, the NOT NULL clause is a “logical constraint”, used to ensure that a column never gets a null value assigned to it. Conversely, the NULL clause makes it clear that you want the column to accept null values. If your tables specifications don’t include these clauses, then the presence or otherwise of nulls is determined from the database properties, connection settings, or the default for the datatype. Basically, you don’t always know, and you can easily end up with NOT NULL columns that you expected to allow nulls.

SQL Prompt’s Best Practice code analysis rule, BP014, will warn you (one violation at a time, in each expression) if you fail to specify a column’s nullability when creating or altering a table, or declaring a table variable.

Why care if a column is or isn’t nullable?

The reason you need to specify this is because a relational database is designed to make it efficient to prevent bad data getting in. Constraints are the means of doing this. You must therefore use NOT NULL for all columns that cannot legitimately contain nulls. If you specify that a column is NOT NULL, you are defining a constraint that ensures that it can never hold or accept NULL, so you can’t accidentally leave the value out. By allowing null values in a column you also makes aggregation trickier and can make WHERE clauses give unexpected results unless you use functions such as ISNULL(), IFNULL(), and NULLIF() for handling nulls.

Don’t assume that the column will allow nulls if you don’t specify it

This article is aimed at justifying why you should always specify whether a column that you are defining in any table allows null values. It cannot go too far into the issue of whether NOT NULL constraints are a good thing, but just explains why you need to state your preference.

You might think that if you don’t include the NOT NULL constraint in the column’s definition, then the column will be nullable. No, wrong. It may be nullable, but it may not. It depends on the datatype, the database settings and the settings of the connection. Unless you can memorize all the rules and guarantee the type of connection used for your DDL scripts, it is much simpler just to accept the single rule that you always specify whether your columns are NULL or NOT NULL.

We’ll now look at the various factors that could determine whether a column gets a NOT NULL constraint, if you don’t specify it.

Your database specifies the default

If the programmer does not specify the nullability of a column, when creating or altering a table, then the database settings, specifically the ANSI_NULL_DEFAULT option, determine whether that column is NULL or NOT NULL, unless you have a SQL Server connection that overrides it, or other factors, which I’ll explain, override it.

The ANSI_NULL_DEFAULT option is one of the sql_option settings and the syntax is:

If set to ON, it allows null values for all user-defined data types or columns that are not explicitly defined as NOT NULL, when issuing a CREATE TABLE or ALTER TABLE statement.

You can see the setting for the current database with:

Which returns either 1 or 0.

Your connection specifies the default

The settings for your connection from an application can be used to override the database setting. All the commonly used settings will do this. The connection that you use will usually specify that the default should be NULL. Doing this may not be that clever, but it is the ANSI standard, as indicated by the fact that if you set the more general ANSI_DEFAULTS setting to ON, you’ll find that it also sets ANSI_NULL_DFLT_ON to ON.

SSMS allows you to specify the defaults you use for your connection to SQL Server, overriding the ANSI standard, if you wish. There exists both a SET ANSI_NULL_DFLT_OFF and SET ANSI_NULL_DFLT_ON though they cannot be set ON at the same time. You can choose to switch both OFF, in which case you just opt to inherit the database defaults, or you can insist that the default is NULL by turning ANSI_NULL_DFLT_ON to ON. If you feel so inclined, you can override the database setting to make the default NOT NULL by turning ANSI_NULL_DFLT_OFF to ON.

There are slight variations in SQLCMD, BCP and SSMS, but in general they are consistent. The SQL Server Native Client ODBC driver and SQL Server Native Client OLE DB Provider for SQL Server automatically set ANSI_NULL_DFLT_ON to ON when connecting. However, the default for SET ANSI_NULL_DFLT_ON is OFF for connections from the legacy DB-Library applications.

So, if we want to see which settings are enabled for our connection, we can run…

Which returns:

The datatype is defined as NOT NULL

A couple of datatypes provided by Microsoft, timestamp and sysname, are NOT NULL unless you specify otherwise. You can specify your own alias datatypes, based on the system data types, and specify whether they default to nullable. To see which of your datatypes are NOT NULL you can use a query like this: …

To make them nullable or not nullable you use this syntax

You can see how handy this can be for handling data that has a specific dimension, meaning or use. You can refer to a surname datatype and know that it can’t be NULL (though it can be ‘null’) and that it has a maximum value that can be changed in just one place in your database, if you suddenly find it insufficient in length.

It is also very handy for numeric data that has a base type of numeric or decimal. It means that it is far more difficult to get it wrong and accidentally truncate a value. As well as precision and scale, you can specify its default nullability.

This means that a user alias type is one of the datatypes where it is generally better not to attempt to override the nullability specification, if it exists, by specifying NULL explicitly when creating the column based on this type. Someone has decided that the type must be NULL or NOT NULL and there is probably a good reason for this.

The column participates in a Primary Key

If you assign a column to a primary key it will be NOT NULL. Let’s demonstrate this:

So, let’s see if all those columns for which we didn’t specify nullability will allow null values:

None of those columns are nullable. We can prove this by getting SSMS to reverse-engineer the table with a generated build script

Look there! NOT NULL constraints on all the columns, even though your connection settings specified that allowing them was the default.

Conclusions

I’m not going to get into the debate about whether it is a good idea to allow null values in the columns of tables. However, it is safe to say that you should generally specify explicitly whether a column should allow them. I’d say ‘generally’, because there is an argument for leaving that out if you are using a user-defined alias type, so that its use will be consistent within the database. For this to be safe, you will need to make sure that it has been specified in the code that creates the alias type!

You may understand the CREATE TABLE code when you write it, and the state of your connection at the point at which the DDL code was executed, but is it repeatable? Will the poor souls that inherit your code or your team-members who must read it be as clued in? Would they even want to be as clued-in?