The Questions
- “I often come across T-SQL script in which the names of tables, columns and other objects are enclosed in brackets. Even data types are often treated this way. Do I have to enclose object names in brackets?“
- “I’m trying to create a table that contains a foreign key, but I keep getting an error message saying that there are no primary or candidate keys in the referenced table that match the referencing column list. What’s going on?“
- “Why are there so many different ways to create a primary key? “
- “I’m developing several stored procedures whose logic requires temporary structures for storing data. Which are better, table variables or temporary tables? “
- “I keep reading that I’m supposed to avoid null values in my databases. Should I define default constraints rather than permit nullable columns?“
- “When I try to run a CREATE TABLE AS SELECT (CTAS) statement in SQL Server, I receive an error. Does SQL Server support the CTAS statement?“
- “When creating objects in a SQL Server database, should I assign them to the default (dbo) schema or to a custom schema?“
- “When I try to incorporate a subquery in my CHECK constraints, SQL Server generates an error. Is there a way to use subqueries in CHECK constraints? “
- “I see that many SQL Server developers add a prefix to object names, such as ‘tbl’ for a table or ‘sp’ for a stored procedure. Is it considered best practices to include prefixes? “
- “Why bother using the FILESTREAM features in SQL Server for BLOB data? Why not just store the files directly on the file system?“
- “When I try to persist a computer column that contains the GETDATE function, I receive an error saying that I can’t persist the column because it’s non-deterministic. What difference does it make?“
“I often come across T-SQL script in which the names of tables, columns and other objects are enclosed in brackets. Even data types are often treated this way. Do I have to enclose object names in brackets?”
No, you don’t, except in certain situations. Part of the confusion comes from the auto-generated scripts in SQL Server Management Studio (SSMS), which enclose all identifiers in brackets, whether they need it or not. For example, I added the table to the
AdventureWorks2012
database and then used SSMS to generate the following T-SQL script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
USE [AdventureWorks2012] GO /****** Object: Table [dbo].[Region & Group] Script Date: 6/9/2014 7:09:18 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Region & Group]( [Region ID] [int] IDENTITY(1,1) NOT NULL, [Sales Region-Group] [nvarchar](30) NOT NULL, PRIMARY KEY CLUSTERED ( [Region ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY] GO |
Although this is only a simple example, you can see that the database, schema, table, column, data type and filegroup names are all enclosed in brackets. However, the brackets are actually required only for the table and column names because they contain spaces, special characters ( and
-
), and the reserved keyword Group
.
Many teams (most?) plan their naming strategies to avoid spaces, special characters, and reserved keywords. In some cases, however, a word used as an identifier in an earlier SQL Server version might become a reserved keyword in a later version. If you port your database to the newer version under such circumstances, you’ll need to delimit the identifier with brackets or, as you’ll see shortly, in double quotes.
It might surprise you to see data types bracketed along with the other identifiers, but data types are named objects, not simply reserved keywords. As a result, they must conform to the same naming rules as other objects. That means you can name user-defined types just like you do other objects, which opens up the possibility of spaces, special characters, and reserved keywords.
To avoid any possible confusion, SSMS takes the safest route possible when generating T-SQL script by enclosing all identifiers in brackets, without distinguishing between those that need them and those that do not. That said, many developers find this to be an annoying and cumbersome practice that makes the code difficult to read. For that reason, they might delimit identifiers only when necessary and skip the rest.
Let’s return to our sample table and look at its definition:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
USE AdventureWorks2012; GO IF OBJECT_ID('Region & Group', 'U') IS NOT NULL DROP TABLE Region & Group; GO CREATE TABLE Region & Group ( Region ID INT NOT NULL IDENTITY PRIMARY KEY, Sales Region-Group NVARCHAR(30) NOT NULL ); GO |
If we were to try to run this script as is, we’d receive syntax errors, not surprising given the hodgepodge of spaces and special characters. When the database engine tries to parse the code, it cannot distinguish the space in the name from other spaces. It also sees Group
as a reserved keyword and the ampersand () as a bitwise operator. In other words, the parser doesn’t have a clue what’s going on.
If we insist on continuing with this naming strategy, we can easily rectify the situation by enclosing the questionable table and column names in brackets, as in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
USE AdventureWorks2012; GO IF OBJECT_ID('Region & Group', 'U') IS NOT NULL DROP TABLE [Region & Group]; GO CREATE TABLE [Region & Group] ( [Region ID] INT NOT NULL IDENTITY PRIMARY KEY, [Sales Region-Group] NVARCHAR(30) NOT NULL ); GO |
Notice we need to delimit only the offending identifiers. Now the database engine will create the table with no problem. As an alternative to this approach, we can use double quotes instead of brackets, as in the following T-SQL:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
USE AdventureWorks2012; GO IF OBJECT_ID('Region & Group', 'U') IS NOT NULL DROP TABLE "Region & Group"; GO CREATE TABLE "Region & Group" ( "Region ID" INT NOT NULL IDENTITY PRIMARY KEY, "Sales Region-Group" NVARCHAR(30) NOT NULL ); GO |
Be aware, however, that we can use double quotes only if the database’s QUOTED_IDENTIFIER
option is set to ON
. This setting conforms to ISO standards and is the default when creating a database in SQL Server. If the option is set to OFF
, the names enclosed in double quotes are treated as literal string values, the same as with single quotes, although even if the option is set to OFF
, we can still use brackets:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
USE AdventureWorks2012; GO SET QUOTED_IDENTIFIER OFF; GO IF OBJECT_ID('Region & Group', 'U') IS NOT NULL DROP TABLE [Region & Group]; GO CREATE TABLE [Region & Group] ( [Region ID] INT NOT NULL IDENTITY PRIMARY KEY, [Sales Region-Group] NVARCHAR(30) NOT NULL ); GO |
Notice that the script includes a SET
statement that sets the QUOTED_IDENTIFIER
option to O FF
, but the table will still be created with no problem. That’s not the case with double quotes. The following example will generate syntax errors when you try to run it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
example will generate syntax errors when you try to run it: USE AdventureWorks2012; GO SET QUOTED_IDENTIFIER OFF; GO IF OBJECT_ID('Region & Group', 'U') IS NOT NULL DROP TABLE "Region & Group"; GO CREATE TABLE "Region & Group" ( "Region ID" INT NOT NULL IDENTITY PRIMARY KEY, "Sales Region-Group" NVARCHAR(30) NOT NULL ); GO |
But simply changing the QUOTED_IDENTIFIER
option to ON
fixes the problem:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
USE AdventureWorks2012; GO SET QUOTED_IDENTIFIER ON; GO IF OBJECT_ID('Region & Group', 'U') IS NOT NULL DROP TABLE "Region & Group"; GO CREATE TABLE "Region & Group" ( "Region ID" INT NOT NULL IDENTITY PRIMARY KEY, "Sales Region-Group" NVARCHAR(30) NOT NULL ); GO |
The database engine now creates the table without a hitch. Note, however, even with the QUOTED_IDENTIFIER
option is set to ON
, if we use single quotes for identifiers, as in the following example, the script will generate more syntax errors:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
USE AdventureWorks2012; GO SET QUOTED_IDENTIFIER ON; GO IF OBJECT_ID('Region & Group', 'U') IS NOT NULL DROP TABLE 'Region & Group'; GO CREATE TABLE 'Region & Group' ( 'Region ID' INT NOT NULL IDENTITY PRIMARY KEY, 'Sales Region-Group' NVARCHAR(30) NOT NULL ); GO |
Just to be safe, some SQL Server developers might choose to enclose all their identifiers in brackets or double quotes, even if they avoid spaces, special characters, and reserved keywords. Others might do everything they can to avoid having to delimit their identifiers. It’s mostly a matter of preference.
“I’m trying to create a table that contains a foreign key, but I keep getting an error message saying that there are no primary or candidate keys in the referenced table that match the referencing column list. What’s going on?”
Chances are, you’re receiving the error because the column referenced in your FOREIGN KEY
constraint is not configured with a PRIMARY KEY
or UNIQUE
constraint or is not part of a unique index. In other words, the referenced column or columns must be unique. A FOREIGN KEY
constraint controls the data that can be stored in a table and enforces data integrity. For this to be possible, the values in the referenced column must be unique so you can clearly establish a link between the rows in the two tables.
If we try to create a FOREIGN KEY
constraint that references a non-unique column we’ll receive an error. For example, suppose we add the SalesRegion
and RegionalSales
tables to our database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO IF OBJECT_ID('SalesRegion', 'U') IS NOT NULL DROP TABLE SalesRegion; GO CREATE TABLE SalesRegion ( RegionID INT NOT NULL IDENTITY, RegionName NVARCHAR(30) NOT NULL ); GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, RegionID INT NOT NULL REFERENCES SalesRegion(RegionID) ); GO |
The RegionalSales
table includes a FOREIGN KEY
constraint that references the RegionID
column in the SalesRegion
table. However, that column has not been defined with a PRIMARY KEY
or UNIQUE
constraint, nor is it part of a unique index. As a result, the script generates the following error:
1 2 3 4 5 |
Msg 1776, Level 16, State 0, Line 1 There are no primary or candidate keys in the referenced table 'SalesRegion' that match the referencing column list in the foreign key 'FK__RegionalS__Regio__4D9F7493'. Msg 1750, Level 16, State 0, Line 1 Could not create constraint. See previous errors. |
This, of course is easily fixed by defining a primary key on the RegionID
column:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO IF OBJECT_ID('SalesRegion', 'U') IS NOT NULL DROP TABLE SalesRegion; GO CREATE TABLE SalesRegion ( RegionID INT NOT NULL IDENTITY PRIMARY KEY, RegionName NVARCHAR(30) NOT NULL ); GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, RegionID INT NOT NULL REFERENCES SalesRegion(RegionID) ); GO |
Now when we run the script, both tables are created without a hitch. If we insert data into the RegionalSales
table that does not violate the foreign key, we should have no problem. But if the data violates the foreign key, we’ll receive an error.
“Why are there so many different ways to create a primary key?”
Why? Because circumstances can vary from one table to the next, depending on whether you’re creating a primary key on one column or on multiple columns and on whether you’re creating the primary key on an existing table or as part of the definition of a new table. That said, SQL Server still provides multiple ways to achieve the same results.
Take, for example, the following CREATE TABLE
statement, which defines the primary key as a constraint on the SalesID
column:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT PRIMARY KEY, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE() ); GO |
This is the easiest way to create a primary key: as part of a column definition in a CREATE TABLE
statement. You need only include the PRIMARY KEY
keywords in the column definition, and the database engine does the rest. It sets the column’s nullability to NOT NULL
, if the column is not already configured that way, and creates a unique clustered index on the column, if a clustered index is not defined elsewhere; otherwise, it creates a unique nonclustered index.
When defining a column PRIMARY KEY
constraint, you can get even more specific by providing the constraint name and specifying the type of index, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT CONSTRAINT pk_RegionalSales PRIMARY KEY CLUSTERED, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE() ); GO |
Notice that we’ve included the constraint name (pk_RegionalSales
) and index type (CLUSTERED
). If we don’t assign a name to the primary key, the database engine generates something messy such as PK__Regional__C952FB12B0CC5751
. Luckily, we can avoid it by including the information as part of a column constraint, as we’ve done here, or by defining a table constraint, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE(), CONSTRAINT pk_RegionalSales PRIMARY KEY CLUSTERED (SalesID ASC) ); GO |
This time around, the SalesID
column is set up as a regular INT
column, with the constraint definition added separately. Again, we’re able to name the constraint (pk_RegionalSales
) and specify that the index be clustered.
It’s up to you whether you use a column constraint or table constraint when defining a primary key on a single column. However, to create a composite primary key, that is, one that contains more than one column, you must use a table constraint. For example, the following CREATE TABLE
statement creates a primary key on the SalesID
and DetailID
columns:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL, DetailID INT NOT NULL, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE(), CONSTRAINT pk_RegionalSales PRIMARY KEY CLUSTERED (SalesID, DetailID ASC) ); GO |
Notice our constraint definition now specifies the two columns, in the order they should be indexed. Also, as before, we’ve named the constraint and specified the index type.
Now let’s look at another approach to defining a primary key, this time on an existing table. In the following example, we first create our table and then run an ALTER TABLE
statement to create the primary key:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE() ); GO ALTER TABLE RegionalSales ADD CONSTRAINT pk_RegionalSales PRIMARY KEY CLUSTERED (SalesID ASC) GO |
We use the ALTER TABLE
statement simply to add the constraint. Like the earlier examples, the constraint is based on only the SalesID
column. However, we can also use this approach to create a composite primary key on an existing table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL, DetailID INT NOT NULL, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE() ); GO ALTER TABLE RegionalSales ADD CONSTRAINT pk_RegionalSales PRIMARY KEY CLUSTERED (SalesID, DetailID ASC) GO |
That’s all there is to creating a primary key; however, there are a couple of other issues worth noting. If you define a table constraint in your CREATE TABLE
statement or add the constraint using an ALTER TABLE
statement, the primary key column must already be configured as NOT NULL
. In addition, if you specify that a clustered index be created when defining the primary key, a clustered index cannot already exist on the table.
“I’m developing several stored procedures whose logic requires temporary structures for storing data. Which are better, table variables or temporary tables?”
On the surface, temporary tables and table variables are very similar. They both store table data in the tempdb
database and are limited in duration. And creating either type is a simple enough process. For example, the following T-SQL script creates the #sales
temporary table, populates the table, and retrieves the data:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
retrieves the data: CREATE TABLE #sales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, SalesDate DATETIME NOT NULL DEFAULT GETDATE() ); INSERT INTO #sales (SalesAmount) VALUES (123.44); INSERT INTO #sales (SalesAmount) VALUES (499.22); INSERT INTO #sales (SalesAmount) VALUES (846.29); SELECT * FROM #sales; |
As you can see, we create a temp table similar to how we create a regular table, except that we precede the table name with the hash mark (#
). This instructs the database engine to create a local temporary table visible only to the current session. If we want to create a global temporary table available to all sessions, we use double hash marks (##
).
In either case, we can insert data into and retrieve data from the temporary table, as we’ve done with #sales
in the example above. The following table shows the results returned by the SELECT
statement:
|
|
|
1 |
123.44 |
2014-06-10 09:17:35.813 |
2 |
499.22 |
2014-06-10 09:17:35.817 |
3 |
846.29 |
2014-06-10 09:17:35.817 |
Now let’s try to do the same thing with a table variable. In the following T-SQL script, we declare a table variable, insert data into the variable, and then retrieve that data:
1 2 3 4 5 6 7 8 9 10 11 12 |
DECLARE @sales TABLE ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, SalesDate DATETIME NOT NULL DEFAULT GETDATE() ); INSERT INTO @sales (SalesAmount) VALUES (123.44); INSERT INTO @sales (SalesAmount) VALUES (499.22); INSERT INTO @sales (SalesAmount) VALUES (846.29); SELECT * FROM @sales; |
Notice that the declaration includes the TABLE
data type, which gives the variable its table structure. At the same time, we define the columns, their data types, and their nullability. Notice also that, like the temporary table example, we also define an IDENTITY
column, a PRIMARY KEY
constraint, and a DEFAULT
constraint. Next, we insert the same data we inserted into the temporary table and then retrieve that data, as shown in the following table:
|
|
|
1 |
123.44 |
2014-06-10 09:25:38.843 |
2 |
499.22 |
2014-06-10 09:25:38.843 |
3 |
846.29 |
2014-06-10 09:25:38.843 |
As you can see, the temporary table and the table variable achieve the same results, at least based on what we’ve done here. We need to look a little deeper to understand the differences between them.
In general, temporary tables are more robust than table variables. For example, you can run TRUNCATE
and ALTER TABLE
statements against a temp table, but not a table variable. A temp table can also be used in a SELECT INTO
statement, participate in a transaction, and explicitly create clustered and nonclustered indexes. A table variable cannot. A table variable is also limited in scope to the procedure or batch it is called within, and it cannot be referenced by other procedures in the same session. Plus, SQL Server maintains column statistics for temp tables, but not table variables, which can be important to query performance.
As a result, it would seem that temporary tables are the way to go, given all their advantages. So why use table variables at all? To begin with, you must use them when defining table-valued functions, which do not permit temp tables. Also, creating a table variable lets you skip the statistics overhead that goes with a temp table, if the optimal query plan is unlikely to change. Table variables also require fewer locking resources and are not affected by transactional rollbacks.
So for smaller workloads with a limited scope, a table variable might be all you need. However, when you require the advantages offered by temp tables, use those. Phil Factor offers a great discussion on temporary tables and table variables in his article “Temporary Tables in SQL Server.”
“I keep reading that I’m supposed to avoid null values in my databases. Should I define default constraints rather than permit nullable columns?”
The debate over whether or not to avoid null values has been raging on for years in the database community. For some developers, the avoidance of NULL
has taken on almost religious significance. Even Microsoft recommends that you avoid null values because they incur more complexity when querying and updating data. In addition, other column options, such as PRIMARY KEY
constraints, cannot be used with nullable columns. Yet if you search through the sample Adventure Works databases, you’ll find they’re riddled with null values.
Regardless of where one might fall on the debate, there’s no denying that NULL
is a different type of value that requires special handling in SQL Server. A null value indicates the absence of an actual data value. It is not the same as an empty string or a 0, and it provides no mechanism for understanding why a value might be missing. A null value is sometimes interpreted to mean “unknown” or “not applicable” or perhaps something else, but the only fact you can state with certainty is that the value is missing and NULL
is being used in its place.
Not surprisingly, a missing value can complicate even the simplest of operations. Consider the standard Boolean logic used in a WHERE
clause. A typical predicate expression might compare a column value with a literal value. If the values match, a True
is returned and the row is included. Otherwise, a False
is returned and the row is not included. Even if you use logical operators to create compound expressions, you’re still relying on that true/false logic.
However, when you add null values to the mix, you’re introducing a three-valued logic that must incorporate the missing data. Consequently, you might discover subtle differences in your query results that are not always apparent or readily understood. Comparing known data to missing data is never a straightforward process. Even a basic arithmetic operation can be affected if NULL
shows up on the scene. Imagine trying to multiply 10
by NULL
: 10 times something missing is still something missing. A full discussion of the impact of NULL
is beyond the scope of this answer, but suffice it to say that these reasons are enough for many SQL Server developers to want to avoid null values altogether.
In some cases, you can get around null values by modifying your schema, essentially taking an ultra-normalization approach by putting the offending column in a separate table, but that can lead to extraordinarily complex joins even to perform the simplest queries.
Another approach often touted is to use default values in place of null values. For example, suppose we start with the following table, which contains several nullable columns:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, Freight MONEY NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE(), ShipDate DATETIME NULL, Comments NVARCHAR(MAX) NULL ); GO INSERT INTO RegionalSales (SalesAmount) VALUES(476.36); SELECT * FROM RegionalSales; |
After we create the table, we insert data, leaving a couple columns with missing values. When we then retrieve that data, we receive results similar to the following:
|
|
|
|
|
|
1 |
476.36 |
NULL |
2014-06-10 10:00:18.250 |
NULL |
NULL |
Not surprisingly, if we were to insert lots of rows in this table, we could potentially have a table teaming with null values. To get around this, we can define defaults on the nullable columns and configure them as NOT NULL
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, Freight MONEY NOT NULL DEFAULT 0, OrderDate DATETIME NOT NULL DEFAULT GETDATE(), ShipDate DATETIME NOT NULL DEFAULT '', Comments NVARCHAR(MAX) NOT NULL DEFAULT '' ); GO INSERT INTO RegionalSales (SalesAmount) VALUES(476.36); SELECT * FROM RegionalSales; |
For the ShipDate
and Comments
columns, we define each default as an empty string, and for the Freight
column, we define the default as 0
. We then insert a row, once again specifying a value only for the SalesAmount
column. The default values take care of the rest, as shown in the following results:
|
|
|
|
|
|
1 |
476.36 |
0.00 |
2014-06-10 10:03:17.350 |
1900-01-01 00:00:00.000 |
For the Comments
column, the default approach might work okay in certain circumstances. The default is an empty string, so the column simply appears empty. Keep in mind, however, that using an empty string as a default can impact operations that rely on expressions such as COALESCE
, operators such as IS NULL
, aggregate functions such as MAX
, concatenation operations and sorting.
Using a default gets even more complicated for other column types. For example, the ShipDate
column also uses an empty string as a default. However, because this is a DATETIME
column, the stored value must be specific date and time (which is actually stored as two integers). When a value is not included in a DATETIME
column, the database engine uses the base date, 1900-01-01 00:00:00.000
.
Then we have the Freight
column, whose default value is 0
. Ask your general variety accountants about the use of 0, and they will undoubtedly tell you that 0 and NULL are absolutely not one in the same and should never be treated as such. Assuming a 0 where a value is not supplied can have a serious impact on calculations. For example, if you were to run the AVG
aggregate function against the Freight
column, the function would ignore null values but include all the zeroes in the calculations.
So the default route is not all its cracked up to be, yet adding tables doesn’t necessarily cut it either. Depending on the data, you might be able to use binary, XML or other types of storage to accommodate particular types of data that might have a lot of missing data, such the properties on a customer record. On the other hand, you might find the way you’re supporting null values presents no problems for your applications. In general, however, it’s a good idea to minimize null values. And in those cases when using them makes the most sense, be aware of how the data is being queried and what those null values mean to your result sets.
“When I try to run a CREATE TABLE AS SELECT (CTAS) statement in SQL Server, I receive an error. Does SQL Server support the CTAS statement?”
Yes, it does, sort of, in a somewhat different form. But first, let’s clarify what we’re talking about here. The CTAS statement is part of the SQL standard and looks something like the following:
1 2 3 4 5 6 7 8 9 |
USE AdventureWorks2012; GO IF OBJECT_ID('Employees', 'U') IS NOT NULL DROP TABLE Employees; GO CREATE TABLE Employees AS (SELECT * from vEmployees); GO |
Essentially, we’re using a SELECT
statement to create the Employees
table. Unfortunately, despite the fact that this form of the CREATE TABLE
statement is standard SQL and implemented by such database systems as Oracle and MySQL, SQL Server doesn’t support it, so if you try to run the statement you’ll receive a syntax error. The good news is that SQL Server provides another method for achieving the same results, the SELECT...INTO
statement, as shown in the following example:
1 2 3 4 5 6 7 8 9 |
USE AdventureWorks2012; GO IF OBJECT_ID('Employees', 'U') IS NOT NULL DROP TABLE Employees; GO SELECT * INTO Employees FROM vEmployees; GO |
Basically, we’ve added an INTO
clause to a SELECT
statement to create a table based on our result set. Consequently, the new table will have the same columns as the original columns and be populated with the returned data. However, this approach doesn’t copy over any constraints or indexes, leaving you with a basic heap. In addition, the SELECT...INTO
statement works only if the new table does not already exist. If it does exist, then you must use an INSERT
statement containing a SELECT
statement that retrieves the values to be inserted into the target table.
“When creating objects in a SQL Server database, should I assign them to the default (dbo) schema or to a custom schema?”
Microsoft recommends that you use custom schemas to control access in SQL Server databases, rather than the default dbo
schema. Custom schemas provide distinct namespaces for individual groups of objects and their data. For example, you might create a schema for Human Resources, one for Marketing, and another for Production. Each schema would contain the tables, views, stored procedures, functions, and other database objects specific to that area of focus.
Prior to SQL Server 2005, owners and database objects were tightly coupled, with users and object owners being treated as one in the same. In fact, a user could not be deleted without first removing the database objects or changing their owner. As of SQL Server 2005, however, that dependency has been removed, providing greater flexibility and control, while reducing administrative overhead.
Schemas provide secure containers for protecting database objects and their data. They also provide a way to logical group those objects and data. For example, objects and data related to production would reside in the Production
schema, objects and data related to Human Resources would reside in the HR
schema, and so on. Schemas make it easier to manage access across functional areas without increasing administrative complexities or jeopardizing data.
In addition, because the dbo
schema is the default, it could provide an easier path for hackers to exploit a database because the schema as well known and a more likely target. It’s also not a particularly descriptive name, as is a schema such as Sales
.
That said, there’s no technical reason why you should or should not use dbo
. Depending on your project, creating a lot of schemas might be overkill. For many teams, it comes down to personal preference. You won’t break anything by using dbo
, but you might appreciate the advantages that come from creating your own schemas.
“When I try to incorporate a subquery in my CHECK constraints, SQL Server generates an error. Is there a way to use subqueries in CHECK constraints?”
You cannot use subqueries in CHECK
constraints. Here’s proof. The following T-SQL script attempts to create the RegionalSales
table, which includes a CHECK
constraint on the ProdID
column:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE(), ProdID INT NOT NULL CHECK( (SELECT SUM(i.Quantity) FROM Production.ProductInventory i WHERE i.ProductID = ProdID) > 0) ); GO |
The CHECK
constraint uses a subquery to aggregate data in the ProductInventory
table in order to ensure that the item is in stock before entering the product ID. Unfortunately, if you were to try to run the CREATE TABLE
statement, you would receive the following error:
1 2 |
Msg 1046, Level 15, State 1, Line 8 Subqueries are not allowed in this context. Only scalar expressions are allowed. |
See? Just like I told you. No subqueries. Only scalar expressions.
One way around this dilemma is to create a scalar function that incorporates the subquery’s logic. The following function definition retrieves the number of products on-hand, based on the inputted product ID:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
USE AdventureWorks2012; GO IF OBJECT_ID(N'dbo.CheckQuantity', N'FN') IS NOT NULL DROP FUNCTION dbo.CheckQuantity; GO CREATE FUNCTION dbo.CheckQuantity (@ProdID INT) RETURNS INT WITH SCHEMABINDING AS BEGIN DECLARE @quantity SMALLINT SET @quantity = (SELECT SUM(i.Quantity) FROM Production.ProductInventory i WHERE i.ProductID = @ProdID) RETURN @quantity; END; GO |
Now we can use the function in our CHECK
constraint, as shown in the following CREATE TABLE
statement:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE(), ProdID INT NOT NULL CHECK( dbo.CheckQuantity(ProdID) > 0 AND dbo.CheckQuantity(ProdID) IS NOT NULL) ); GO |
Notice that when we call the function, we pass in the ProdID
column. We’ve also included AND dbo.CheckQuantity ( ProdID ) IS NOT NUL L
in the CHECK
constraint to handle the possibility of the function returning a null value because the product doesn’t exist.
Once we’ve created our table, we can try out our function by attempting to insert data into the table:
1 2 3 4 |
INSERT INTO RegionalSales (SalesAmount, ProdID) VALUES(438.22, 316); SELECT * FROM RegionalSales; |
Because 316 is a valid product ID and is in stock, we’re able to insert the data. The SELECT
statement confirms this by returning the following results:
|
|
|
|
1 |
438.22 |
2014-06-10 12:18:28.437 |
316 |
Now let’s try to insert a product ID for which no products are in stock:
1 2 |
INSERT INTO RegionalSales (SalesAmount, ProdID) VALUES(438.22, 853); |
As expected, we receive a message saying that we have a conflict with the CHECK
constraint:
1 2 3 |
Msg 547, Level 16, State 0, Line 1 The INSERT statement conflicted with the CHECK constraint "CK__RegionalS__ProdI__02133CD2". The conflict occurred in database "AdventureWorks2012", table "dbo.RegionalSales", column 'ProdID'. The statement has been terminated. |
We’ll receive the same message if we try to pass in a fake ID:
1 2 |
INSERT INTO RegionalSales (SalesAmount, ProdID) VALUES(438.22, 1111); |
So a scalar function can provide a reasonable workaround to the subquery restriction when creating a CHECK
constraint. Be aware, however, that scalar functions are executed on a row-by-row basis. If you’re inserting or updating lots of rows, there will be lots of processing, especially if you’re function is performing complex aggregations or other operations. Be aware, too, that some developers have found that implementing a user-defined function in a CHECK
constraint can result in unexpected results, so be sure to test your solution thoroughly. At times, you might find you can achieve the results you need by using other mechanisms, such as foreign keys or triggers. You’ll have to take it on a case-by-case basis.
“I see that many SQL Server developers add a prefix to object names, such as ‘tbl’ for a table or ‘sp’ for a stored procedure. Is it considered best practices to include prefixes?”
Few issues in the world of database development continue to be as hotly contested as whether to add prefixes to database object names. Many teams avoid them altogether. Others embrace them at all levels, using conventions such as tbl for tables, vw for views, and sp for stored procedures, ending up with names similar to tblRegionalSales
, vwQuarterlySales
, or spUpdateSalesAmounts
. Some teams use them for only certain types of objects, such as views or stored procedures.
Proponents of such an approach argue that adding prefixes provides a quick and easy method for identifying the type of object you’re working with so you know instantly whether you’re dealing with a table or a view or some other type of entity. At the same time, prefixes have no impact on performance or security, but can help expedite searching for particular object types.
Even so, prefixing database object names has grown out of favor with many database developers, the belief, at least in part, is that a name is no place for metadata. And they might be right. Consider this scenario. Your database includes the tblRegionalSales
table. Because of changing business needs, you must replace that table with tblRegionalSales2
. However, a number of other database objects and application components include references to tblRegionalSales
, so you create a view named tblRegionalSales
to avoid a major meltdown (even if only as a temporary fix). Suddenly you have a view with a tbl prefix, throwing your entire naming system out of whack.
A name and the object it represents are not one in the same, and you cannot assume that a name will continue to reference the same type of object forever. That said, breaking the naming conventions in this way is probably not the end of civilization as we know it, but you do want to be aware of the potential for confusion at some point down the line if you choose this route.
Perhaps more important than whether or not your team uses prefixes is to be certain that whatever naming conventions you agree upon, you apply them consistently to all database objects. If one developer uses prefixes and another does not, you’ll likely cause even more confusion than choosing one approach over the other.
“Why bother using the FILESTREAM features in SQL Server for BLOB data? Why not just store the files directly on the file system?”
In some cases, you might not want to use the FILESTREAM
capabilities in SQL Server. Microsoft recommends using FILESTREAM
storage only when your files are larger than 1 MB, you require fast read access, and your application’s middle-tier code needs access to those files.
One of the main advantages of using FILESTREAM
is that you can store and retrieve your BLOB files together with your relational data, all within the same transaction, thus avoiding an elaborate workaround to sync stored files with data in SQL Server. The files are also included in your database backup and restore operations, and you can access BLOB data from within your .NET code, which is typical of a middle-tier operation. At the same time, you can take advantage of the ability of the NTFS file system to save and retrieve the files faster than if they were stored directly in SQL Server.
To implement FILESTREAM
storage, you must first ensure that it is enabled on your SQL Server instance, which you can do through the server properties in SSMS. From there, you must add a FILESTREAM
filegroup to your database and then add a file to the filegroup, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
USE AdventureWorks2012; GO ALTER DATABASE AdventureWorks2012 ADD FILEGROUP AW_images CONTAINS FILESTREAM; GO ALTER DATABASE AdventureWorks2012 ADD FILE ( id="ProductImages"', FILENAME = 'C:\Datafiles\Filestream\ProductImages.ndf' ) TO FILEGROUP AW_images; GO |
In this case, we’ve created a filegroup named AW_images
and added the file ProductImages.ndf
. We can now define a table that includes a FILESTREAM
column:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE(), ProdID INT NOT NULL, ProdImage VARBINARY(MAX) FILESTREAM NULL, RowID UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWID() ); GO |
Notice that the ProdImage
column includes the FILESTREAM
option to indicate that FILESTREAM
storage should be used. Also notice that, in order to support FILESTREAM
storage, we must include a unique ROWGUIDCOL
column, either as a separate column (as we’ve done here) or as part of the primary key.
As you can see, implementing FILESTREAM
storage is a fairly straightforward process that lets you easily integrate your BLOB and relational data to facilitate application access to both sets of data. If you don’t need these capabilities, there’s no sense incurring the additional overhead, although with relatively little effort, you can have an integrated system that could benefit your applications as well as their users.
“When I try to persist a computer column that contains the GETDATE function, I receive an error saying that I can’t persist the column because it’s non-deterministic. What difference does it make?”
SQL Server doesn’t like surprises. If we want to persist a computed column, it must be deterministic, that is, it must always produce the same results when given the same input. Most of the time, this issue won’t come up. But if we introduce a non-deterministic function such as GETDATE
into the mix, we can no longer predict those results. Each time we run GETDATE
it returns a different value (unless we’re very, very fast). The database engine refuses to persist such ambiguity.
SQL Server’s resistance to the idea of a non-deterministic computed column is easy enough to demonstrate. The following T-SQL code creates the RegionalSales
table, which includes the DeliverDate
computed column:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE(), DeliverDate AS (DATEADD(dd, 14, GETDATE())) ); GO INSERT INTO RegionalSales (SalesAmount) VALUES (583.22); SELECT * FROM RegionalSales; |
Notice the GETDATE
function in the computed column definition. Because we have not persisted the column, we can create the table with no problem, after which, we can insert data and retrieve the results, shown in the following table:
|
|
|
|
1 |
583.22 |
2014-06-10 13:08:59.723 |
2014-06-24 13:09:02.783 |
Now let’s change our table definition slightly by adding the PERSISTED
keyword to our computed column:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE(), DeliverDate AS (DATEADD(dd, 14, GETDATE())) PERSISTED ); GO |
Because we’ve added the PERSISTED
keyword, SQL Server should create an index on the DeliverDate
column that persists all data. Unfortunately, when we now try to create the table, we receive the following error:
1 2 |
Msg 4936, Level 16, State 1, Line 2 Computed column 'DeliverDate' in table 'RegionalSales' cannot be persisted because the column is non-deterministic. |
Because we’ve included the GETDATE
function in our computed column, we cannot persist that column. However, in this case, we can get around the limitation by instead specifying the OrderDate
column, which itself uses the GETDATE
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
USE AdventureWorks2012; GO IF OBJECT_ID('RegionalSales', 'U') IS NOT NULL DROP TABLE RegionalSales; GO CREATE TABLE RegionalSales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY, SalesAmount MONEY NOT NULL, OrderDate DATETIME NOT NULL DEFAULT GETDATE(), DeliverDate AS (DATEADD(dd, 14, OrderDate)) PERSISTED ); GO |
Now we can create the table with no problem. The GETDATE
function used in the OrderDate
column is essentially invisible to the DeliverDate
column. You might not always be able to come up with this sort of workaround, but it does help give you a better sense of the deterministic nature of persisted computed columns.
Load comments