Edit: At the suggestion of a much knowledgable commenter who shall remain named Aaron, I changed from using schema_name() function to using sys.tables. When writing code that is expected to have reuse, it can be safer to use the tables rather than functions because the tables will work in the context of the database that is in the from clause, so if you changed the code to database1.sys.tables because you wanted the tables from database1, and you were executing the code in database2, the columns of the table would give you the answer you expected, but the functions would be context of database2.
I have been doing a presentation on sequences of late (last planned version of that presentation was last week, but should be able to get the gist of things from the slides and the code posted here on my presentation page), and as part of that process, I started writing some queries to interrogate the structure of tables. I started with tables using an identity column for some purpose because they are considerably easier to do than sequences, specifically because the limitations of identity columns make determining how they are used easier.
In the future (which will probably be after PASS, since I have a lot of prep and 3 more presentations to do before PASS), I will start trying to discern the different cases where you might want to use a sequence and writing queries to make sure the table structures are as I desire. The queries presented here are really the first step in this direction, as in most cases I foresee a mixture of identity and sequence based surrogate keys even once people get to SQL Server 2012 as a typical set up. The queries I am presenting here will look for tables that meet certain conditions, including:
- Tables with no primary key – Very common scenario, no idea about uniqueness, or sometimes that identity property alone makes the table an adequate table.
- Tables with no identity column – Abolutely nothing wrong with this scenario, as the pattern of using an identity based primary key is just a choice\preference. However, if you you expect all of your tables to have identity columns, running this query can show you where you are wrong. I usually use this sort of query as part of a release, making sure that the tables I expected to have a surrogate actually do.
- Tables with identity column and PK, identity column in AK – This query is interesting for looking at other people’s databases sometimes. Not everyone uses the identity value as a surrogate primary key, and finding cases where it is in a non-key usage can help you find “interesting” cases.
- Tables with an identity based column in the primary key along with other columns – In this case, the key columns are illogical. The identity value should always be unique and be a sufficient surrogate key on it’s own. By putting other columns in the key, you end up with a false sense of uniqueness. Ideally, you want your tables to have at least one key where all of the values are created outside of SQL Server. Sometimes people with use this for an invoice line item and make the pk the invoiceId and an identity value like invoiceLineItemId.
I can’t say that this is “wrong” but if the only key includes a system generated value, it means that you can have duplicated data along with the system generated value. So you need to monitor the data more carefully.
- Tables with a single column identity based primary key but no alternate key. – This is the classic ‘bad’ use of surrogate key abuse. Just drop a surrogate key on the table and viola!, uniqueness. If you can’t see why this wouldn’t be the desirable case, it is like the previous case, except the only uniqueness criteria is a monotonically increasing value.
You can download the code directly from here or you can see all my downloadable queries on my downloadable package page: DownloadablePackages.
The queries:
–Tables with no primary key
SELECT schemas.name + ‘.’ + tables.name AS tableName
FROM sys.tables
JOIN sys.schemas
ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = ‘USER_TABLE’
–no PK key constraint exists
AND NOT EXISTS ( SELECT *
FROM sys.key_constraints
WHERE key_constraints.type = ‘PK’
AND key_constraints.parent_object_id = tables.object_id )
–Tables with no identity column
SELECT schemas.name + ‘.’ + tables.name AS tableName
FROM sys.tables
JOIN sys.schemas
ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = ‘USER_TABLE’
–no column in the table has the identity property
AND NOT EXISTS ( SELECT *
FROM sys.columns
WHERE tables.object_id = columns.object_id
AND is_identity = 1 )
–Tables with identity column and PK, identity column in AK
SELECT schemas.name + ‘.’ + tables.name AS tableName
FROM sys.tables
JOIN sys.schemas
ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = ‘USER_TABLE’
— table does have identity column
AND EXISTS ( SELECT *
FROM sys.columns
WHERE tables.object_id = columns.object_id
AND is_identity = 1 )
— table does have primary key
AND EXISTS ( SELECT *
FROM sys.key_constraints
WHERE key_constraints.type = ‘PK’
AND key_constraints.parent_object_id = tables.object_id )
— but it is not the PK
AND EXISTS ( SELECT *
FROM sys.key_constraints
JOIN sys.index_columns
ON index_columns.object_id = key_constraints.parent_object_id
AND index_columns.index_id = key_constraints.unique_index_id
JOIN sys.columns
ON columns.object_id = index_columns.object_id
AND columns.column_id = index_columns.column_id
WHERE key_constraints.type = ‘UQ’
AND key_constraints.parent_object_id = tables.object_id
AND columns.is_identity = 1 )
–Tables with an identity based column in the primary key along with other columns
SELECT schemas.name + ‘.’ + tables.name AS tableName
FROM sys.tables
JOIN sys.schemas
ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = ‘USER_TABLE’
— table does have identity column
AND EXISTS ( SELECT *
FROM sys.columns
WHERE tables.object_id = columns.object_id
AND is_identity = 1 )
–any PK has identity column
AND EXISTS( SELECT *
FROM sys.key_constraints
JOIN sys.index_columns
ON index_columns.object_id = key_constraints.parent_object_id
AND index_columns.index_id = key_constraints.unique_index_id
JOIN sys.columns
ON columns.object_id = index_columns.object_id
AND columns.column_id = index_columns.column_id
WHERE key_constraints.type = ‘PK’
AND key_constraints.parent_object_id = tables.object_id
AND columns.is_identity = 1 )
–and there are > 1 columns in the PK constraint
AND ( SELECT COUNT(*)
FROM sys.key_constraints
JOIN sys.index_columns
ON index_columns.object_id = key_constraints.parent_object_id
AND index_columns.index_id = key_constraints.unique_index_id
WHERE key_constraints.type = ‘PK’
AND key_constraints.parent_object_id = tables.object_id
) > 1
–Tables with a single column identity based primary key but no alternate key
SELECT schemas.name + ‘.’ + tables.name AS tableName
FROM sys.tables
JOIN sys.schemas
ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = ‘USER_TABLE’
–a PK key constraint exists
AND EXISTS ( SELECT *
FROM sys.key_constraints
WHERE key_constraints.type = ‘PK’
AND key_constraints.parent_object_id = tables.object_id )
–any PK only has identity column
AND ( SELECT COUNT(*)
FROM sys.key_constraints
JOIN sys.index_columns
ON index_columns.object_id = key_constraints.parent_object_id
AND index_columns.index_id = key_constraints.unique_index_id
JOIN sys.columns
ON columns.object_id = index_columns.object_id
AND columns.column_id = index_columns.column_id
WHERE key_constraints.type = ‘PK’
AND key_constraints.parent_object_id = tables.object_id
AND columns.is_identity = 0
) = 0 –must have > 0 columns in pkey, can only have 1 identity column
–but no Unique Constraint Exists
AND NOT EXISTS ( SELECT *
FROM sys.key_constraints
WHERE key_constraints.type = ‘UQ’
AND key_constraints.parent_object_id = tables.object_id )
–Test Cases
–The following are some sample tables that can be built to test these queries. If you have other ideas
–for cases (or find errors, email louis@drsql.org)
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.NoPrimaryKey’))
DROP TABLE dbo.NoPrimaryKey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.NoIdentityColumn’))
DROP TABLE dbo.NoIdentityColumn;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.IdentityButNotInPkey’))
DROP TABLE dbo.IdentityButNotInPkey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.TooManyColumnsInPkey’))
DROP TABLE dbo.TooManyColumnsInPkey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.MultipleColumnsInPkeyOk’))
DROP TABLE dbo.MultipleColumnsInPkeyOk;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.NoAlternateKey’))
DROP TABLE dbo.NoAlternateKey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.IdentityInAlternateKey’))
DROP TABLE dbo.IdentityInAlternateKey;
–very common scenario, assuming identity makes the table great
CREATE TABLE NoPrimaryKey
(
NoPrimaryKeyId int not null identity,
AnotherColumnId int not null
)
go
–absolutely nothing wrong with this scenario, unless you expect all of your
–tables to have identity columns, of course…
CREATE TABLE NoIdentityColumn
(
NoIdentityColumnId int primary key,
AnotherColumnId int not null
)
go
–absolutely nothing wrong with this scenario either, as this could be desired.
–usually it is some form of mistake in a database using surrogate keys though
CREATE TABLE IdentityButNotInPkey
(
IdentityButNotInPkeyId int primary key,
AnotherColumnId int identity not null
)
go
–absolutely nothing wrong with this scenario either, as this could be desired.
–usually it is some form of mistake in a database using surrogate keys though
CREATE TABLE IdentityInAlternateKey
(
IdentityInAlternateKeyId int primary key,
AnotherColumnId int identity not null unique
)
go
–In this case, the key columns are illogical. The identity value should always be unique and
–be a sufficient primary surrogate key. I definitely want to know why this is built this
–way. Sometimes people with use this for an invoice line item and make the pk the
–invoiceId and an identity value like invoiceLineItemId. I generally prefer the surrogate key
–to stand alone and have the multi-part key to be something that makes sense for the user
CREATE TABLE TooManyColumnsInSurrogatePkey
(
TooManyColumnsInPkeyId int identity,
AnotherColumnId int,
primary key (TooManyColumnsInPkeyId,AnotherColumnId)
)
go
CREATE TABLE MultipleColumnsInPkeyOk
(
TooManyColumnsInPkeyId int not null,
AnotherColumnId int not null,
primary key (TooManyColumnsInPkeyId,AnotherColumnId)
)
go
–this is my pet peeve, and something that should be avoided. You could end up having
–duplicate rows that are not logical.
CREATE TABLE NoAlternateKey
(
NoAlternateKeyId int not null identity primary key,
AnotherColumnThatShouldBeUnique int not null
)
go
Load comments