Why you Should Always Specify Whether a Column Accepts Nulls
Phil Factor explains the factors that determine whether a column will allow null values, if you don't specify it explicitly in the column definition. If you rely on the default behavior established by your connection settings, you could be in for some nasty surprises.
In SQL, you should, in most circumstances, specify explicitly whether a column should or shouldn’t allow NULL values. It isn’t a good idea to rely on defaults, and assume that, if don’t specify the nullability of a column explicitly, using NULL
or NOT
NULL
, then the column should be nullable. The rules that govern what happens if you leave out that option for a given datatype are quite difficult to explain, and even if you understand them, will your team or successors be as willing to do so?
SQL’s NOT
NULL
clause is a “logical constraint”, used to ensure that a column never gets a null value assigned to it. Conversely, the NULL
clause makes it clear that you want the column to accept null values. If your tables specifications don’t include these clauses, then the presence or otherwise of nulls is determined from the database properties, connection settings, or the default for the datatype. Basically, you don’t always know, and you can easily end up with NOT
NULL
columns that you expected to allow nulls.
SQL Prompt’s code analysis feature is a SQL syntax checker and includes a best practice rule, BP014, that will warn you (one violation at a time, in each expression) if you fail to specify a column’s nullability when creating or altering a table, or declaring a table variable.
Why care if a column is or isn’t nullable?
The reason you need to specify this is because a relational database is designed to make it efficient to prevent bad data getting in. Constraints are the means of doing this. You must therefore use NOT
NULL
for all columns that cannot legitimately contain nulls. If you specify that a column is NOT
NULL
, you are defining a constraint that ensures that that the column can never hold or accept NULL
, so you can’t accidentally leave the value out. By allowing null values in a column you also makes aggregation trickier and can make WHERE
clauses give unexpected results unless you use functions such as ISNULL()
, IFNULL()
, and NULLIF()
for handling nulls.
Don’t assume that the column will allow nulls if you don’t specify it. This article will explain why you should always specify whether a column that you are defining in any table allows null values. It cannot go too far into the issue of whether NOT
NULL
constraints are a good thing, but just explains why you need to state your preference.
You might think that if you don’t include the NOT
NULL
constraint in the column’s definition, then the column will be nullable. No, wrong. It may be nullable, but it may not. It depends on the datatype, the database settings and the settings of the connection. Unless you can memorize all the rules and guarantee the type of connection used for your DDL scripts, it is much simpler just to accept the single rule that you always specify whether your columns are NULL
or NOT
NULL
.
We’ll now look at the various factors that could determine whether a column gets a NOT
NULL
constraint, if you don’t specify it.
The ANSI_NULL_DEFAULT database setting
If the programmer does not specify the nullability of a column, when creating or altering a table, then the database settings, specifically the ANSI_NULL_DEFAULT
option, determine whether that column is NULL
or NOT
NULL
, unless you have a SQL Server connection that overrides it, or other factors, which I’ll explain, override it.
The ANSI_NULL_DEFAULT
option is one of the sql_option
settings and the syntax is:
1 |
ALTER Database SET { database_name | CURRENT } SET ANSI_NULL_DEFAULT { ON | OFF } |
If set to ON
, it allows null values for all user-defined data types or columns that are not explicitly defined as NOT
NULL
, when issuing a CREATE
TABLE
or ALTER
TABLE
statement.
You can see the setting for the current database with:
1 |
SELECT DATABASEPROPERTYEX(Db_Name(), 'IsAnsiNullDefault'); |
Which returns either 1 or 0.
Connection settings that specify the default ‘nullability’
The settings for your connection from an application can be used to override the database setting. All the commonly used settings will do this. The connection that you use will usually specify that the default should be NULL
. Doing this may not be that clever, but it is the ANSI standard, as indicated by the fact that if you set the more general ANSI_DEFAULTS
setting to ON
, you’ll find that it also sets ANSI_NULL_DFLT_ON
to ON
.
SSMS allows you to specify the defaults you use for your connection to SQL Server, overriding the ANSI standard, if you wish. There exists both a SET
ANSI_NULL_DFLT_OFF
and SET
ANSI_NULL_DFLT_ON
though they cannot be set ON
at the same time. You can choose to switch both OFF
, in which case you just opt to inherit the database defaults, or you can insist that the default is NULL
by turning ANSI_NULL_DFLT_ON
to ON
. If you feel so inclined, you can override the database setting to make the default NOT
NULL
by turning ANSI_NULL_DFLT_OFF
to ON
.
1 2 |
SET ANSI_NULL_DFLT_ON { ON | OFF } SET ANSI_NULL_DFLT_OFF { ON | OFF } |
There are slight variations in SQLCMD, BCP and SSMS, but in general they are consistent. The SQL Server Native Client ODBC driver and SQL Server Native Client OLE DB Provider for SQL Server automatically set ANSI_NULL_DFLT_ON
to ON
when connecting. However, the default for SET
ANSI_NULL_DFLT_ON
is OFF
for connections from the legacy DB-Library applications.
So, if we want to see which settings are enabled for our connection, we can run…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
SELECT Setting FROM (VALUES (1 , 'DISABLE_DEF_CNST_CHK'), (2 , 'IMPLICIT_TRANSACTIONS'), (4 , 'CURSOR_CLOSE_ON_COMMIT'), (8 , 'ANSI_WARNINGS'), (16 , 'ANSI_PADDING'), (32 , 'ANSI_NULLS'), (64 , 'ARITHABORT'), (128 , 'ARITHIGNORE'), (256 , 'QUOTED_IDENTIFIER'), (512 , 'NOCOUNT'), (1024 , 'ANSI_NULL_DFLT_ON'), (2048 , 'ANSI_NULL_DFLT_OFF'), (4096 , 'CONCAT_NULL_YIELDS_NULL'), (8192 , 'NUMERIC_ROUNDABORT'), (16384 , 'XACT_ABORT'))f(Bit,Setting) WHERE bit & @@Options =bit |
Which returns:
Datatypes that are NOT NULL by default
A couple of datatypes provided by Microsoft, timestamp
and sysname
, are NOT
NULL
unless you specify otherwise. You can specify your own alias datatypes, based on the system data types, and specify whether they default to nullable. To see which of your datatypes are NOT
NULL
you can use a query like this: …
1 |
SELECT name FROM sys.types WHERE is_nullable=0 |
To make them nullable or not nullable you use this syntax
1 2 3 4 5 6 7 |
CREATE TYPE [ schema_name. ] type_name { [ FROM base_type [ ( precision [ , scale ] ) ] [ NULL | NOT NULL ] ] |
You can see how handy this can be for handling data that has a specific dimension, meaning or use. You can refer to a surname datatype and know that it can’t be NULL
(though it can be ‘null’) and that it has a maximum value that can be changed in just one place in your database, if you suddenly find it insufficient in length.
It is also very handy for numeric data that has a base type of numeric
or decimal
. It means that it is far more difficult to get it wrong and accidentally truncate a value. As well as precision and scale, you can specify its default nullability.
This means that a user alias type is one of the datatypes where it is generally better not to attempt to override the nullability specification, if it exists, by specifying NULL
explicitly when creating the column based on this type. Someone has decided that the type must be NULL
or NOT
NULL
and there is probably a good reason for this.
A column that is part of a Primary Key will be NOT NULL
If you assign a column to a primary key it will be NOT
NULL
. Let’s demonstrate this:
1 2 3 4 5 6 7 8 |
CREATE TABLE TestOutNullability ( MyTimestamp TIMESTAMP, ObjectName sysname, MyInt INT, MyCode NVARCHAR(120), CONSTRAINT myPK PRIMARY KEY (MyInt,MyCode) ) |
So, let’s see if all those columns for which we didn’t specify nullability will allow null values:
1 2 3 4 5 |
SELECT c.name, c.is_nullable FROM sys.tables AS T INNER JOIN sys.columns AS C ON C.object_id = T.object_id WHERE t.name='TestOutNullability' ORDER BY c.column_id |
None of those columns are nullable. We can prove this by getting SSMS to reverse-engineer the table with a generated build script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
/****** Object: Table [dbo].[TestOutNullability] Script Date: 07/02/2020 19:39:12 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[TestOutNullability]( [MyTimestamp] [timestamp] NOT NULL, [ObjectName] [sysname] NOT NULL, [MyInt] [int] NOT NULL, [MyCode] [nvarchar](120) NOT NULL, CONSTRAINT [myPK] PRIMARY KEY CLUSTERED ( [MyInt] ASC, [MyCode] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO |
Look there! NOT
NULL
constraints on all the columns, even though your connection settings specified that allowing them was the default.
Conclusions
I’m not going to get into the debate about whether it is a good idea to allow null values in the columns of tables. However, it is safe to say that you should generally specify explicitly whether a column should allow them. I’d say ‘generally’, because there is an argument for leaving that out if you are using a user-defined alias type, so that its use will be consistent within the database. For this to be safe, you will need to make sure that it has been specified in the code that creates the alias type!
You may understand the CREATE
TABLE
code when you write it, and the state of your connection at the point at which the DDL code was executed, but is it repeatable? Will the poor souls that inherit your code or your team-members who must read it be as clued in? Would they even want to be as clued-in?