SQL Server Tables – 11 Questions You Were Too Shy to Ask

There are some aspects of tables in SQL Server that a lot of people get wrong, purely because they seem so obvious that one feels embarrassed about asking questions. Robert Sheldon reckons that no questions about SQL Tables are off-limits, and deserve frank answers.

The Questions

  1. I often come across T-SQL script in which the names of tables, columns and other objects are enclosed in brackets. Even data types are often treated this way. Do I have to enclose object names in brackets?
  2. I’m trying to create a table that contains a foreign key, but I keep getting an error message saying that there are no primary or candidate keys in the referenced table that match the referencing column list. What’s going on?
  3. Why are there so many different ways to create a primary key?
  4. I’m developing several stored procedures whose logic requires temporary structures for storing data. Which are better, table variables or temporary tables?
  5. I keep reading that I’m supposed to avoid null values in my databases. Should I define default constraints rather than permit nullable columns?
  6. When I try to run a CREATE TABLE AS SELECT (CTAS) statement in SQL Server, I receive an error. Does SQL Server support the CTAS statement?
  7. When creating objects in a SQL Server database, should I assign them to the default (dbo) schema or to a custom schema?
  8. When I try to incorporate a subquery in my CHECK constraints, SQL Server generates an error. Is there a way to use subqueries in CHECK constraints?
  9. I see that many SQL Server developers add a prefix to object names, such as ‘tbl’ for a table or ‘sp’ for a stored procedure. Is it considered best practices to include prefixes?
  10. Why bother using the FILESTREAM features in SQL Server for BLOB data? Why not just store the files directly on the file system?
  11. When I try to persist a computer column that contains the GETDATE function, I receive an error saying that I can’t persist the column because it’s non-deterministic. What difference does it make?

“I often come across T-SQL script in which the names of tables, columns and other objects are enclosed in brackets. Even data types are often treated this way. Do I have to enclose object names in brackets?”

No, you don’t, except in certain situations. Part of the confusion comes from the auto-generated scripts in SQL Server Management Studio (SSMS), which enclose all identifiers in brackets, whether they need it or not. For example, I added the table to the AdventureWorks2012 database and then used SSMS to generate the following T-SQL script:

Although this is only a simple example, you can see that the database, schema, table, column, data type and filegroup names are all enclosed in brackets. However, the brackets are actually required only for the table and column names because they contain spaces, special characters ( and -), and the reserved keyword Group.

Many teams (most?) plan their naming strategies to avoid spaces, special characters, and reserved keywords. In some cases, however, a word used as an identifier in an earlier SQL Server version might become a reserved keyword in a later version. If you port your database to the newer version under such circumstances, you’ll need to delimit the identifier with brackets or, as you’ll see shortly, in double quotes.

It might surprise you to see data types bracketed along with the other identifiers, but data types are named objects, not simply reserved keywords. As a result, they must conform to the same naming rules as other objects. That means you can name user-defined types just like you do other objects, which opens up the possibility of spaces, special characters, and reserved keywords.

To avoid any possible confusion, SSMS takes the safest route possible when generating T-SQL script by enclosing all identifiers in brackets, without distinguishing between those that need them and those that do not. That said, many developers find this to be an annoying and cumbersome practice that makes the code difficult to read. For that reason, they might delimit identifiers only when necessary and skip the rest.

Let’s return to our sample table and look at its definition:

If we were to try to run this script as is, we’d receive syntax errors, not surprising given the hodgepodge of spaces and special characters. When the database engine tries to parse the code, it cannot distinguish the space in the name from other spaces. It also sees Group as a reserved keyword and the ampersand () as a bitwise operator. In other words, the parser doesn’t have a clue what’s going on.

If we insist on continuing with this naming strategy, we can easily rectify the situation by enclosing the questionable table and column names in brackets, as in the following example:

Notice we need to delimit only the offending identifiers. Now the database engine will create the table with no problem. As an alternative to this approach, we can use double quotes instead of brackets, as in the following T-SQL:

Be aware, however, that we can use double quotes only if the database’s QUOTED_IDENTIFIER option is set to ON. This setting conforms to ISO standards and is the default when creating a database in SQL Server. If the option is set to OFF, the names enclosed in double quotes are treated as literal string values, the same as with single quotes, although even if the option is set to OFF, we can still use brackets:

Notice that the script includes a SET statement that sets the QUOTED_IDENTIFIER option to O FF, but the table will still be created with no problem. That’s not the case with double quotes. The following example will generate syntax errors when you try to run it:

But simply changing the QUOTED_IDENTIFIER option to ON fixes the problem:

The database engine now creates the table without a hitch. Note, however, even with the QUOTED_IDENTIFIER option is set to ON, if we use single quotes for identifiers, as in the following example, the script will generate more syntax errors:

Just to be safe, some SQL Server developers might choose to enclose all their identifiers in brackets or double quotes, even if they avoid spaces, special characters, and reserved keywords. Others might do everything they can to avoid having to delimit their identifiers. It’s mostly a matter of preference.

“I’m trying to create a table that contains a foreign key, but I keep getting an error message saying that there are no primary or candidate keys in the referenced table that match the referencing column list. What’s going on?”

Chances are, you’re receiving the error because the column referenced in your FOREIGN KEY constraint is not configured with a PRIMARY KEY or UNIQUE constraint or is not part of a unique index. In other words, the referenced column or columns must be unique. A FOREIGN KEY constraint controls the data that can be stored in a table and enforces data integrity. For this to be possible, the values in the referenced column must be unique so you can clearly establish a link between the rows in the two tables.

If we try to create a FOREIGN KEY constraint that references a non-unique column we’ll receive an error. For example, suppose we add the SalesRegion and RegionalSales tables to our database:

The RegionalSales table includes a FOREIGN KEY constraint that references the RegionID column in the SalesRegion table. However, that column has not been defined with a PRIMARY KEY or UNIQUE constraint, nor is it part of a unique index. As a result, the script generates the following error:

This, of course is easily fixed by defining a primary key on the RegionID column:

Now when we run the script, both tables are created without a hitch. If we insert data into the RegionalSales table that does not violate the foreign key, we should have no problem. But if the data violates the foreign key, we’ll receive an error.

“Why are there so many different ways to create a primary key?”

Why? Because circumstances can vary from one table to the next, depending on whether you’re creating a primary key on one column or on multiple columns and on whether you’re creating the primary key on an existing table or as part of the definition of a new table. That said, SQL Server still provides multiple ways to achieve the same results.

Take, for example, the following CREATE TABLE statement, which defines the primary key as a constraint on the SalesID column:

This is the easiest way to create a primary key: as part of a column definition in a CREATE TABLE statement. You need only include the PRIMARY KEY keywords in the column definition, and the database engine does the rest. It sets the column’s nullability to NOT NULL, if the column is not already configured that way, and creates a unique clustered index on the column, if a clustered index is not defined elsewhere; otherwise, it creates a unique nonclustered index.

When defining a column PRIMARY KEY constraint, you can get even more specific by providing the constraint name and specifying the type of index, as shown in the following example:

Notice that we’ve included the constraint name (pk_RegionalSales) and index type (CLUSTERED). If we don’t assign a name to the primary key, the database engine generates something messy such as PK__Regional__C952FB12B0CC5751. Luckily, we can avoid it by including the information as part of a column constraint, as we’ve done here, or by defining a table constraint, as shown in the following example:

This time around, the SalesID column is set up as a regular INT column, with the constraint definition added separately. Again, we’re able to name the constraint (pk_RegionalSales) and specify that the index be clustered.

It’s up to you whether you use a column constraint or table constraint when defining a primary key on a single column. However, to create a composite primary key, that is, one that contains more than one column, you must use a table constraint. For example, the following CREATE TABLE statement creates a primary key on the SalesID and DetailID columns:

Notice our constraint definition now specifies the two columns, in the order they should be indexed. Also, as before, we’ve named the constraint and specified the index type.

Now let’s look at another approach to defining a primary key, this time on an existing table. In the following example, we first create our table and then run an ALTER TABLE statement to create the primary key:

We use the ALTER TABLE statement simply to add the constraint. Like the earlier examples, the constraint is based on only the SalesID column. However, we can also use this approach to create a composite primary key on an existing table:

That’s all there is to creating a primary key; however, there are a couple of other issues worth noting. If you define a table constraint in your CREATE TABLE statement or add the constraint using an ALTER TABLE statement, the primary key column must already be configured as NOT NULL. In addition, if you specify that a clustered index be created when defining the primary key, a clustered index cannot already exist on the table.

“I’m developing several stored procedures whose logic requires temporary structures for storing data. Which are better, table variables or temporary tables?”

On the surface, temporary tables and table variables are very similar. They both store table data in the tempdb database and are limited in duration. And creating either type is a simple enough process. For example, the following T-SQL script creates the #sales temporary table, populates the table, and retrieves the data:

As you can see, we create a temp table similar to how we create a regular table, except that we precede the table name with the hash mark (#). This instructs the database engine to create a local temporary table visible only to the current session. If we want to create a global temporary table available to all sessions, we use double hash marks (##).

In either case, we can insert data into and retrieve data from the temporary table, as we’ve done with #sales in the example above. The following table shows the results returned by the SELECT statement:

SalesID

SalesAmount

SalesDate

1

123.44

2014-06-10 09:17:35.813

2

499.22

2014-06-10 09:17:35.817

3

846.29

2014-06-10 09:17:35.817

Now let’s try to do the same thing with a table variable. In the following T-SQL script, we declare a table variable, insert data into the variable, and then retrieve that data:

Notice that the declaration includes the TABLE data type, which gives the variable its table structure. At the same time, we define the columns, their data types, and their nullability. Notice also that, like the temporary table example, we also define an IDENTITY column, a PRIMARY KEY constraint, and a DEFAULT constraint. Next, we insert the same data we inserted into the temporary table and then retrieve that data, as shown in the following table:

SalesID

SalesAmount

SalesDate

1

123.44

2014-06-10 09:25:38.843

2

499.22

2014-06-10 09:25:38.843

3

846.29

2014-06-10 09:25:38.843

As you can see, the temporary table and the table variable achieve the same results, at least based on what we’ve done here. We need to look a little deeper to understand the differences between them.

In general, temporary tables are more robust than table variables. For example, you can run TRUNCATE and ALTER TABLE statements against a temp table, but not a table variable. A temp table can also be used in a SELECT INTO statement, participate in a transaction, and explicitly create clustered and nonclustered indexes. A table variable cannot. A table variable is also limited in scope to the procedure or batch it is called within, and it cannot be referenced by other procedures in the same session. Plus, SQL Server maintains column statistics for temp tables, but not table variables, which can be important to query performance.

As a result, it would seem that temporary tables are the way to go, given all their advantages. So why use table variables at all? To begin with, you must use them when defining table-valued functions, which do not permit temp tables. Also, creating a table variable lets you skip the statistics overhead that goes with a temp table, if the optimal query plan is unlikely to change. Table variables also require fewer locking resources and are not affected by transactional rollbacks.

So for smaller workloads with a limited scope, a table variable might be all you need. However, when you require the advantages offered by temp tables, use those. Phil Factor offers a great discussion on temporary tables and table variables in his article “Temporary Tables in SQL Server.”

“I keep reading that I’m supposed to avoid null values in my databases. Should I define default constraints rather than permit nullable columns?”

The debate over whether or not to avoid null values has been raging on for years in the database community. For some developers, the avoidance of NULL has taken on almost religious significance. Even Microsoft recommends that you avoid null values because they incur more complexity when querying and updating data. In addition, other column options, such as PRIMARY KEY constraints, cannot be used with nullable columns. Yet if you search through the sample Adventure Works databases, you’ll find they’re riddled with null values.

Regardless of where one might fall on the debate, there’s no denying that NULL is a different type of value that requires special handling in SQL Server. A null value indicates the absence of an actual data value. It is not the same as an empty string or a 0, and it provides no mechanism for understanding why a value might be missing. A null value is sometimes interpreted to mean “unknown” or “not applicable” or perhaps something else, but the only fact you can state with certainty is that the value is missing and NULL is being used in its place.

Not surprisingly, a missing value can complicate even the simplest of operations. Consider the standard Boolean logic used in a WHERE clause. A typical predicate expression might compare a column value with a literal value. If the values match, a True is returned and the row is included. Otherwise, a False is returned and the row is not included. Even if you use logical operators to create compound expressions, you’re still relying on that true/false logic.

However, when you add null values to the mix, you’re introducing a three-valued logic that must incorporate the missing data. Consequently, you might discover subtle differences in your query results that are not always apparent or readily understood. Comparing known data to missing data is never a straightforward process. Even a basic arithmetic operation can be affected if NULL shows up on the scene. Imagine trying to multiply 10 by NULL: 10 times something missing is still something missing. A full discussion of the impact of NULL is beyond the scope of this answer, but suffice it to say that these reasons are enough for many SQL Server developers to want to avoid null values altogether.

In some cases, you can get around null values by modifying your schema, essentially taking an ultra-normalization approach by putting the offending column in a separate table, but that can lead to extraordinarily complex joins even to perform the simplest queries.

Another approach often touted is to use default values in place of null values. For example, suppose we start with the following table, which contains several nullable columns:

After we create the table, we insert data, leaving a couple columns with missing values. When we then retrieve that data, we receive results similar to the following:

SalesID

SalesAmount

Freight

OrderDate

ShipDate

Comments

1

476.36

NULL

2014-06-10 10:00:18.250

NULL

NULL

Not surprisingly, if we were to insert lots of rows in this table, we could potentially have a table teaming with null values. To get around this, we can define defaults on the nullable columns and configure them as NOT NULL:

For the ShipDate and Comments columns, we define each default as an empty string, and for the Freight column, we define the default as 0. We then insert a row, once again specifying a value only for the SalesAmount column. The default values take care of the rest, as shown in the following results:

SalesID

SalesAmount

Freight

OrderDate

ShipDate

Comments

1

476.36

0.00

2014-06-10 10:03:17.350

1900-01-01 00:00:00.000

For the Comments column, the default approach might work okay in certain circumstances. The default is an empty string, so the column simply appears empty. Keep in mind, however, that using an empty string as a default can impact operations that rely on expressions such as COALESCE, operators such as IS NULL, aggregate functions such as MAX, concatenation operations and sorting.

Using a default gets even more complicated for other column types. For example, the ShipDate column also uses an empty string as a default. However, because this is a DATETIME column, the stored value must be specific date and time (which is actually stored as two integers). When a value is not included in a DATETIME column, the database engine uses the base date, 1900-01-01 00:00:00.000.

Then we have the Freight column, whose default value is 0. Ask your general variety accountants about the use of 0, and they will undoubtedly tell you that 0 and NULL are absolutely not one in the same and should never be treated as such. Assuming a 0 where a value is not supplied can have a serious impact on calculations. For example, if you were to run the AVG aggregate function against the Freight column, the function would ignore null values but include all the zeroes in the calculations.

So the default route is not all its cracked up to be, yet adding tables doesn’t necessarily cut it either. Depending on the data, you might be able to use binary, XML or other types of storage to accommodate particular types of data that might have a lot of missing data, such the properties on a customer record. On the other hand, you might find the way you’re supporting null values presents no problems for your applications. In general, however, it’s a good idea to minimize null values. And in those cases when using them makes the most sense, be aware of how the data is being queried and what those null values mean to your result sets.

“When I try to run a CREATE TABLE AS SELECT (CTAS) statement in SQL Server, I receive an error. Does SQL Server support the CTAS statement?”

Yes, it does, sort of, in a somewhat different form. But first, let’s clarify what we’re talking about here. The CTAS statement is part of the SQL standard and looks something like the following:

Essentially, we’re using a SELECT statement to create the Employees table. Unfortunately, despite the fact that this form of the CREATE TABLE statement is standard SQL and implemented by such database systems as Oracle and MySQL, SQL Server doesn’t support it, so if you try to run the statement you’ll receive a syntax error. The good news is that SQL Server provides another method for achieving the same results, the SELECT...INTO statement, as shown in the following example:

Basically, we’ve added an INTO clause to a SELECT statement to create a table based on our result set. Consequently, the new table will have the same columns as the original columns and be populated with the returned data. However, this approach doesn’t copy over any constraints or indexes, leaving you with a basic heap. In addition, the SELECT...INTO statement works only if the new table does not already exist. If it does exist, then you must use an INSERT statement containing a SELECT statement that retrieves the values to be inserted into the target table.

“When creating objects in a SQL Server database, should I assign them to the default (dbo) schema or to a custom schema?”

Microsoft recommends that you use custom schemas to control access in SQL Server databases, rather than the default dbo schema. Custom schemas provide distinct namespaces for individual groups of objects and their data. For example, you might create a schema for Human Resources, one for Marketing, and another for Production. Each schema would contain the tables, views, stored procedures, functions, and other database objects specific to that area of focus.

Prior to SQL Server 2005, owners and database objects were tightly coupled, with users and object owners being treated as one in the same. In fact, a user could not be deleted without first removing the database objects or changing their owner. As of SQL Server 2005, however, that dependency has been removed, providing greater flexibility and control, while reducing administrative overhead.

Schemas provide secure containers for protecting database objects and their data. They also provide a way to logical group those objects and data. For example, objects and data related to production would reside in the Production schema, objects and data related to Human Resources would reside in the HR schema, and so on. Schemas make it easier to manage access across functional areas without increasing administrative complexities or jeopardizing data.

In addition, because the dbo schema is the default, it could provide an easier path for hackers to exploit a database because the schema as well known and a more likely target. It’s also not a particularly descriptive name, as is a schema such as Sales.

That said, there’s no technical reason why you should or should not use dbo. Depending on your project, creating a lot of schemas might be overkill. For many teams, it comes down to personal preference. You won’t break anything by using dbo, but you might appreciate the advantages that come from creating your own schemas.

“When I try to incorporate a subquery in my CHECK constraints, SQL Server generates an error. Is there a way to use subqueries in CHECK constraints?”

You cannot use subqueries in CHECK constraints. Here’s proof. The following T-SQL script attempts to create the RegionalSales table, which includes a CHECK constraint on the ProdID column:

The CHECK constraint uses a subquery to aggregate data in the ProductInventory table in order to ensure that the item is in stock before entering the product ID. Unfortunately, if you were to try to run the CREATE TABLE statement, you would receive the following error:

See? Just like I told you. No subqueries. Only scalar expressions.

One way around this dilemma is to create a scalar function that incorporates the subquery’s logic. The following function definition retrieves the number of products on-hand, based on the inputted product ID:

Now we can use the function in our CHECK constraint, as shown in the following CREATE TABLE statement:

Notice that when we call the function, we pass in the ProdID column. We’ve also included AND dbo.CheckQuantity ( ProdID ) IS NOT NUL L in the CHECK constraint to handle the possibility of the function returning a null value because the product doesn’t exist.

Once we’ve created our table, we can try out our function by attempting to insert data into the table:

Because 316 is a valid product ID and is in stock, we’re able to insert the data. The SELECT statement confirms this by returning the following results:

SalesID

SalesAmount

OrderDate

ProdID

1

438.22

2014-06-10 12:18:28.437

316

Now let’s try to insert a product ID for which no products are in stock:

As expected, we receive a message saying that we have a conflict with the CHECK constraint:

We’ll receive the same message if we try to pass in a fake ID:

So a scalar function can provide a reasonable workaround to the subquery restriction when creating a CHECK constraint. Be aware, however, that scalar functions are executed on a row-by-row basis. If you’re inserting or updating lots of rows, there will be lots of processing, especially if you’re function is performing complex aggregations or other operations. Be aware, too, that some developers have found that implementing a user-defined function in a CHECK constraint can result in unexpected results, so be sure to test your solution thoroughly. At times, you might find you can achieve the results you need by using other mechanisms, such as foreign keys or triggers. You’ll have to take it on a case-by-case basis.

“I see that many SQL Server developers add a prefix to object names, such as ‘tbl’ for a table or ‘sp’ for a stored procedure. Is it considered best practices to include prefixes?”

Few issues in the world of database development continue to be as hotly contested as whether to add prefixes to database object names. Many teams avoid them altogether. Others embrace them at all levels, using conventions such as tbl for tables, vw for views, and sp for stored procedures, ending up with names similar to tblRegionalSales, vwQuarterlySales, or spUpdateSalesAmounts. Some teams use them for only certain types of objects, such as views or stored procedures.

Proponents of such an approach argue that adding prefixes provides a quick and easy method for identifying the type of object you’re working with so you know instantly whether you’re dealing with a table or a view or some other type of entity. At the same time, prefixes have no impact on performance or security, but can help expedite searching for particular object types.

Even so, prefixing database object names has grown out of favor with many database developers, the belief, at least in part, is that a name is no place for metadata. And they might be right. Consider this scenario. Your database includes the tblRegionalSales table. Because of changing business needs, you must replace that table with tblRegionalSales2. However, a number of other database objects and application components include references to tblRegionalSales, so you create a view named tblRegionalSales to avoid a major meltdown (even if only as a temporary fix). Suddenly you have a view with a tbl prefix, throwing your entire naming system out of whack.

A name and the object it represents are not one in the same, and you cannot assume that a name will continue to reference the same type of object forever. That said, breaking the naming conventions in this way is probably not the end of civilization as we know it, but you do want to be aware of the potential for confusion at some point down the line if you choose this route.

Perhaps more important than whether or not your team uses prefixes is to be certain that whatever naming conventions you agree upon, you apply them consistently to all database objects. If one developer uses prefixes and another does not, you’ll likely cause even more confusion than choosing one approach over the other.

“Why bother using the FILESTREAM features in SQL Server for BLOB data? Why not just store the files directly on the file system?”

In some cases, you might not want to use the FILESTREAM capabilities in SQL Server. Microsoft recommends using FILESTREAM storage only when your files are larger than 1 MB, you require fast read access, and your application’s middle-tier code needs access to those files.

One of the main advantages of using FILESTREAM is that you can store and retrieve your BLOB files together with your relational data, all within the same transaction, thus avoiding an elaborate workaround to sync stored files with data in SQL Server. The files are also included in your database backup and restore operations, and you can access BLOB data from within your .NET code, which is typical of a middle-tier operation. At the same time, you can take advantage of the ability of the NTFS file system to save and retrieve the files faster than if they were stored directly in SQL Server.

To implement FILESTREAM storage, you must first ensure that it is enabled on your SQL Server instance, which you can do through the server properties in SSMS. From there, you must add a FILESTREAM filegroup to your database and then add a file to the filegroup, as shown in the following example:

In this case, we’ve created a filegroup named AW_images and added the file ProductImages.ndf. We can now define a table that includes a FILESTREAM column:

Notice that the ProdImage column includes the FILESTREAM option to indicate that FILESTREAM storage should be used. Also notice that, in order to support FILESTREAM storage, we must include a unique ROWGUIDCOL column, either as a separate column (as we’ve done here) or as part of the primary key.

As you can see, implementing FILESTREAM storage is a fairly straightforward process that lets you easily integrate your BLOB and relational data to facilitate application access to both sets of data. If you don’t need these capabilities, there’s no sense incurring the additional overhead, although with relatively little effort, you can have an integrated system that could benefit your applications as well as their users.

“When I try to persist a computer column that contains the GETDATE function, I receive an error saying that I can’t persist the column because it’s non-deterministic. What difference does it make?”

SQL Server doesn’t like surprises. If we want to persist a computed column, it must be deterministic, that is, it must always produce the same results when given the same input. Most of the time, this issue won’t come up. But if we introduce a non-deterministic function such as GETDATE into the mix, we can no longer predict those results. Each time we run GETDATE it returns a different value (unless we’re very, very fast). The database engine refuses to persist such ambiguity.

SQL Server’s resistance to the idea of a non-deterministic computed column is easy enough to demonstrate. The following T-SQL code creates the RegionalSales table, which includes the DeliverDate computed column:

Notice the GETDATE function in the computed column definition. Because we have not persisted the column, we can create the table with no problem, after which, we can insert data and retrieve the results, shown in the following table:

SalesID

SalesAmount

OrderDate

DeliverDate

1

583.22

2014-06-10 13:08:59.723

2014-06-24 13:09:02.783

Now let’s change our table definition slightly by adding the PERSISTED keyword to our computed column:

Because we’ve added the PERSISTED keyword, SQL Server should create an index on the DeliverDate column that persists all data. Unfortunately, when we now try to create the table, we receive the following error:

Because we’ve included the GETDATE function in our computed column, we cannot persist that column. However, in this case, we can get around the limitation by instead specifying the OrderDate column, which itself uses the GETDATE function:

Now we can create the table with no problem. The GETDATE function used in the OrderDate column is essentially invisible to the DeliverDate column. You might not always be able to come up with this sort of workaround, but it does help give you a better sense of the deterministic nature of persisted computed columns.