Utility Queries–Structure of Tables with Identity Column

Comments 0

Share to social media

Edit: At the suggestion of a much knowledgable commenter who shall remain named Aaron, I changed from using schema_name() function to using sys.tables. When writing code that is expected to have reuse, it can be safer to use the tables rather than functions because the tables will work in the context of the database that is in the from clause, so if you changed the code to database1.sys.tables because you wanted the tables from database1, and you were executing the code in database2, the columns of the table would give you the answer you expected, but the functions would be context of database2.

I have been doing a presentation on sequences of late (last planned version of that presentation was last week, but should be able to get the gist of things from the slides and the code posted here on my presentation page), and as part of that process, I started writing some queries to interrogate the structure of tables. I started with tables using an identity column for some purpose because they are considerably easier to do than sequences, specifically because the limitations of identity columns make determining how they are used easier.

In the future (which will probably be after PASS, since I have a lot of prep and 3 more presentations to do before PASS), I will start trying to discern the different cases where you might want to use a sequence and writing queries to make sure the table structures are as I desire. The queries presented here are really the first step in this direction, as in most cases I foresee a mixture of identity and sequence based surrogate keys even once people get to SQL Server 2012 as a typical set up. The queries I am presenting here will look for tables that meet certain conditions, including:

  • Tables with no primary key – Very common scenario, no idea about uniqueness, or sometimes that identity property alone makes the table an adequate table.
  • Tables with no identity column – Abolutely nothing wrong with this scenario, as the pattern of using an identity based primary key is just a choice\preference.  However, if you you expect all of your tables to have identity columns, running this query can show you where you are wrong.  I usually use this sort of query as part of a release, making sure that the tables I expected to have a surrogate actually do.
  • Tables with identity column and PK, identity column in AK – This query is interesting for looking at other people’s databases sometimes.  Not everyone uses the identity value as a surrogate primary key, and finding cases where it is in a non-key usage can help you find “interesting” cases.
  • Tables with an identity based column in the primary key along with other columns – In this case, the key columns are illogical. The identity value should always be unique and be a sufficient surrogate key on it’s own.  By putting other columns in the key, you end up with a false sense of uniqueness. Ideally, you want your tables to have at least one key where all of the values are created outside of SQL Server. Sometimes people with use this for an invoice line item and make the pk the invoiceId and an identity value like invoiceLineItemId.

    I can’t say that this is “wrong” but if the only key includes a system generated value, it means that you can have duplicated data along with the system generated value. So you need to monitor the data more carefully.

  • Tables with a single column identity based primary key but no alternate key. – This is the classic ‘bad’ use of surrogate key abuse. Just drop a surrogate key on the table and viola!, uniqueness. If you can’t see why this wouldn’t be the desirable case, it is like the previous case, except the only uniqueness criteria is a monotonically increasing value.

You can download the code directly from here  or you can see all my downloadable queries on my downloadable package page: DownloadablePackages.

The queries:

–Tables with no primary key

SELECT  schemas.name + ‘.’ + tables.name AS tableName
FROM    sys.tables
          JOIN sys.schemas
             ON tables.schema_id = schemas.schema_id
WHERE   tables.type_desc = ‘USER_TABLE’
        –no PK key constraint exists
    AND NOT EXISTS ( SELECT *
                        FROM   sys.key_constraints
                        WHERE  key_constraints.type = ‘PK’
                            AND key_constraints.parent_object_id = tables.object_id )

–Tables with no identity column

SELECT  schemas.name + ‘.’ + tables.name AS tableName
FROM    sys.tables
           JOIN sys.schemas
               ON tables.schema_id = schemas.schema_id
WHERE   tables.type_desc = ‘USER_TABLE’
–no column in the table has the identity property
    AND NOT EXISTS ( SELECT *
                     FROM   sys.columns
                     WHERE  tables.object_id = columns.object_id
                       AND is_identity = 1 )

–Tables with identity column and PK, identity column in AK

SELECT schemas.name + ‘.’ + tables.name AS tableName
FROM   sys.tables
        JOIN sys.schemas
            ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = ‘USER_TABLE’
        — table does have identity column 
  AND   EXISTS (    SELECT *
                    FROM   sys.columns
                    WHERE  tables.object_id = columns.object_id
                        AND is_identity = 1 ) 
        — table does have primary key 
  AND   EXISTS (    SELECT *
                    FROM   sys.key_constraints
                    WHERE  key_constraints.type = ‘PK’
                      AND key_constraints.parent_object_id = tables.object_id )
        — but it is not the PK 
  AND   EXISTS (    SELECT *
                    FROM   sys.key_constraints
                        JOIN sys.index_columns
                            ON index_columns.object_id = key_constraints.parent_object_id
                                AND index_columns.index_id = key_constraints.unique_index_id
                        JOIN sys.columns
                            ON columns.object_id = index_columns.object_id
                                AND columns.column_id = index_columns.column_id
                    WHERE  key_constraints.type = ‘UQ’
                        AND key_constraints.parent_object_id = tables.object_id
                        AND columns.is_identity = 1 )

–Tables with an identity based column in the primary key along with other columns

SELECT schemas.name + ‘.’ + tables.name AS tableName
FROM   sys.tables
         JOIN sys.schemas
            ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = ‘USER_TABLE’
        — table does have identity column
  AND   EXISTS ( SELECT *
                 FROM   sys.columns
                 WHERE  tables.object_id = columns.object_id
                   AND is_identity = 1 )
        –any PK has identity column
  AND   EXISTS( SELECT  *
                FROM    sys.key_constraints
                           JOIN sys.index_columns
                                ON index_columns.object_id = key_constraints.parent_object_id
                                   AND index_columns.index_id = key_constraints.unique_index_id
                           JOIN sys.columns
                                ON columns.object_id = index_columns.object_id
                                   AND columns.column_id = index_columns.column_id
                WHERE    key_constraints.type = ‘PK’
                  AND    key_constraints.parent_object_id = tables.object_id
                  AND    columns.is_identity = 1 )
    –and there are > 1 columns in the PK constraint
    AND (  SELECT  COUNT(*)
           FROM    sys.key_constraints
                      JOIN sys.index_columns
                          ON index_columns.object_id = key_constraints.parent_object_id
                             AND index_columns.index_id = key_constraints.unique_index_id
            WHERE   key_constraints.type = ‘PK’
              AND   key_constraints.parent_object_id = tables.object_id
        ) > 1

–Tables with a single column identity based primary key but no alternate key

SELECT schemas.name + ‘.’ + tables.name AS tableName
FROM sys.tables
         JOIN sys.schemas
             ON tables.schema_id = schemas.schema_id
WHERE tables.type_desc = ‘USER_TABLE’
        –a PK key constraint exists 
  AND   EXISTS ( SELECT * 
                 FROM   sys.key_constraints 
                 WHERE  key_constraints.type = ‘PK’ 
                   AND key_constraints.parent_object_id = tables.object_id )
    –any PK only has identity column 
  AND ( SELECT COUNT(*) 
        FROM   sys.key_constraints 
                  JOIN sys.index_columns 
                      ON index_columns.object_id = key_constraints.parent_object_id 
                         AND index_columns.index_id = key_constraints.unique_index_id 
                  JOIN sys.columns 
                      ON columns.object_id = index_columns.object_id 
                         AND columns.column_id = index_columns.column_id 
        WHERE  key_constraints.type = ‘PK’ 
          AND  key_constraints.parent_object_id = tables.object_id 
          AND columns.is_identity = 0
        ) = 0 –must have > 0 columns in pkey, can only have 1 identity column 

  –but no Unique Constraint Exists 
  AND NOT EXISTS ( SELECT * 
                   FROM   sys.key_constraints 
                   WHERE  key_constraints.type = ‘UQ’ 
                     AND key_constraints.parent_object_id = tables.object_id )  

–Test Cases

–The following are some sample tables that can be built to test these queries. If you have other ideas
–for cases (or find errors, email louis@drsql.org)

IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.NoPrimaryKey’))
        DROP TABLE dbo.NoPrimaryKey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.NoIdentityColumn’))
        DROP TABLE dbo.NoIdentityColumn;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.IdentityButNotInPkey’))
        DROP TABLE dbo.IdentityButNotInPkey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.TooManyColumnsInPkey’))
        DROP TABLE dbo.TooManyColumnsInPkey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.MultipleColumnsInPkeyOk’))
        DROP TABLE dbo.MultipleColumnsInPkeyOk;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.NoAlternateKey’))
        DROP TABLE dbo.NoAlternateKey;
IF EXISTS (SELECT * FROM sys.tables WHERE object_id = object_id(‘dbo.IdentityInAlternateKey’))
        DROP TABLE dbo.IdentityInAlternateKey;

–very common scenario, assuming identity makes the table great
CREATE TABLE NoPrimaryKey
(
    NoPrimaryKeyId int not null identity,
    AnotherColumnId int not null
)
go

–absolutely nothing wrong with this scenario, unless you expect all of your
–tables to have identity columns, of course…
CREATE TABLE NoIdentityColumn
(
    NoIdentityColumnId int primary key,
    AnotherColumnId int not null
)
go

–absolutely nothing wrong with this scenario either, as this could be desired.
–usually it is some form of mistake in a database using surrogate keys though
CREATE TABLE IdentityButNotInPkey
(
    IdentityButNotInPkeyId int primary key,
    AnotherColumnId int identity not null
)
go

–absolutely nothing wrong with this scenario either, as this could be desired.
–usually it is some form of mistake in a database using surrogate keys though
CREATE TABLE IdentityInAlternateKey
(
    IdentityInAlternateKeyId int primary key,
    AnotherColumnId int identity not null unique
)
go

–In this case, the key columns are illogical. The identity value should always be unique and
–be a sufficient primary surrogate key. I definitely want to know why this is built this
–way.  Sometimes people with use this for an invoice line item and make the pk the
–invoiceId and an identity value like invoiceLineItemId. I generally prefer the surrogate key
–to stand alone and have the multi-part key to be something that makes sense for the user
CREATE TABLE TooManyColumnsInSurrogatePkey
(
    TooManyColumnsInPkeyId int identity,
    AnotherColumnId int,
    primary key (TooManyColumnsInPkeyId,AnotherColumnId)
)
go

CREATE TABLE MultipleColumnsInPkeyOk
(
    TooManyColumnsInPkeyId int not null,
    AnotherColumnId int not null,
    primary key (TooManyColumnsInPkeyId,AnotherColumnId)
)
go

–this is my pet peeve, and something that should be avoided. You could end up having
–duplicate rows that are not logical.
CREATE TABLE NoAlternateKey
(
    NoAlternateKeyId int not null identity primary key,
    AnotherColumnThatShouldBeUnique int not null
)
go

Load comments

About the author

Louis Davidson

Simple Talk Editor

See Profile

Louis is the editor of this Simple-Talk website. Prior to that, has was a corporate database developer and data architect for a non-profit organization for 25 years! Louis has been a Microsoft MVP since 2004, and is the author of a series of SQL Server Database Design books, most recently Pro SQL Server Relational Database Design and Implementation.