Exploring Your SQL Server Databases with T-SQL

Most DBAs hoard their own favourite T-SQL scripts to help them with their work, often on a USB 'thumbdrive', but it is rare that one of them offers us a glimpse of the sort of scripts that they find useful. It is our good fortune that Scott Swanberg shows us those scripts he uses for discovering more about database objects.

As a DBA and performance tuning consultant, for Ambient Consulting in Minneapolis, I’m often asked to analyze performance issues on a SQL Server instance with which I’m unfamiliar. It can be a daunting task. Let’s face it, most of the time companies do not have good documentation about their databases. Or if they do, it is outdated, or it takes a few days to locate and retrieve it.

In this article, I share a set of basic scripts that I’ve developed, over the years, which mine the metadata in the various system functions, stored procedures, tables, catalog views, and dynamic management views, Together, they reveal all the secrets of the databases on the instance, their size, file locations, and their design, including columns, data types, defaults, keys and indexes.

If you’ve been used to retrieving some of this information by clinging to the mouse with whited knuckles and stabbing at the screen, then I hope you’ll be pleasantly surprised at the wealth of information some of these simple scripts reveal, instantly.

As with any scripts, always test them before running them in a production environment. I recommend that you start first with one of the SQL Server sample databases like AdventureWorks or pubs that you can download from: AdventureWorksDW Databases – 2012, 2008R2 and 2008.

OK, enough preamble, just show me the scripts!

Exploring your Servers

We’ll start with some queries that offer the server-level view of your databases.

Basic Server Information

First, some simple @@functions that provide basic server information.

Listing 1: Basic server information

How long has your server been running since the last SQL Server startup? Note the tempdb system database is recreated every time the server restarts. Thus this is one method to tell when the database server was last restarted.

Listing 2: how long has your server been running since startup?

Linked Servers

Linked Servers are database connections set up to allow communication from SQL Server to other data servers. Distributed queries can be ran against these linked servers. It’s good to know if your database server is an isolated self-contained database server or if there are links to other database servers.

Listing 3: Linked Servers

List All Databases

First step is to take inventory of all the databases found on the server. Note the four or five system databases (master, model, msdb, tempdb, and distribution if you are using replication). You may want to exclude these system databases in future queries. It is very easy to see a list of database directly from SQL Server Management Studio SQL Server (SSMS). However, these simple database queries are building blocks for more complicated queries.

There are several ways to get a list of the databases in T-SQL, and Listing 4 presents just a few of them. Each method produces a similar result set, but with subtle differences.

Listing 4: Database Inventory

Last Databases Backup?

Stop! Before you go any further, every good DBA should be certain they have recent database backup(s).

Listing 5: Last Database Backup

Better still if you know the physical file location of the last backups.

Listing 6: Physical file location for recent backups

Active User Connections by Database

It’s a good idea to have an understanding of what databases are being used at any point in time, especially if you are experiencing performance problems.

Listing 7: Active connections by database

Basic Database Exploration

Let’s now drill down and see how we can gather information about the objects in each of our databases, using various catalog views and Dynamic management Views. Most of the queries presented in this section let’s look at the individual database so remember to change the current database in SSMS or by a Use database; command. Remember to you can always check the current default database with Select DB_NAME();.

The sys.objects system table is one of the key system tables for gathering a lot of information on the objects that comprise your data model, with sys.objects.type being the key column on which to filter.

Listing 8: Listing out all user-defined tables in a database

The following table shows the list of objects types on which we can filter (see also the sys.objects documentation on Microsoft’s MSDN website).

sys.objects.type

AF = Aggregate function (CLR)

P = SQL Stored Procedure

TA = Assembly (CLR) DML trigger

C = CHECK constraint

PC = Assembly (CLR) stored-procedure

TF = SQL table-valued-function

D = DEFAULT (constraint or stand-alone)

PG = Plan guide

TR = SQL DML trigger

F = FOREIGN KEY constraint

PK = PRIMARY KEY constraint

TT = Table type

FN = SQL scalar function

R = Rule (old-style, stand-alone)

U = Table (user-defined)

FS = Assembly (CLR) scalar-function

RF = Replication-filter-procedure

UQ = UNIQUE constraint

FT = Assembly (CLR) table-valued function

S = System base table

V = View

IF = SQL inline table-valued function

SN = Synonym

X = Extended stored procedure

IT = Internal table

SQ = Service queue

Other catalog views, such as sys.tables and sys.views, inherit from sys.objects and provide the information for that particular object type. With these views, plus the OBJECTPROPERTY metadata function, we can uncover a great deal of information on each of the objects that make up our database schemas.

Database File Location

Physical location and drive of the current database file, including the master database file (*.mdf) and the Log database file (*.ldf) can be found using these queries.

Listing 9: Physical file location of the current database

Tables

Of course, the Object Explorer in SSMS provides a convenient list of the tables in a specific database, but using scripts we can unveil information that isn’t easily available via the GUI. The ANSI Standard approach is to use the INFORMATION_SCHEMA views, but they won’t return information regarding objects that are not part of the Standard (such as triggers, extended properties, and so on), so use of the SQL Server Catalog views is common.

Listing 10: Exploring table details

Row Counts for all Tables

If you know nothing about a table, all tables are equal. The more you know about the tables the more you can mentally begin to determine which tables are more important and which tables are less important. Generally speaking, tables with the largest number of rows tend to be the ones that suffer most often from performance issues.

From SSMS object explorer, we can right-click on any table name and select Properties, and view the Storage page will provide a row count for that table.

2066-img135.jpg

However, it’s hard to collect this information for all tables, manually. Likewise, the brute force approach of executing SELECT COUNT(*) FROM TABLENAME; for every table is likely to involve a lot of typing.

A much better approach is to use T-SQL as a script generator. The script in Listing 11 will generate a set of T-SQL statements to return the row count for each table in the current database. Simply execute it, then copy-and-paste the generated statements into the query window and execute them.

Listing 11: A script to generate a script to return row counts for all tables

2066-img138.jpg

sp_msforeachtable

sp_msforeachtable is an undocumented Microsoft function that will loop through all the tables in a database executing a query, and replacing ‘?’ with each table name. There is also a similar database level function called sp_msforeachdb.

There are some known issues with these undocumented functions, as they do not handle special characters in the object names. For example, table or database names containing a dash character (“-“) will cause the stored procedure to fail.

Listing 12: Using sp_msforeachtable to return row counts for all tables

A faster way to get row counts – use the clustered index

All of the previous methods to return row counts for each table rely on use of COUNT(*), which performs poorly for tables with more than about 500K rows, in my experience.

A faster way to get table row counts is to get the record counts from the clustered index or heap partition. Note while this method is much faster, Microsoft has indicated the record count updates on indexes may not always match the record counts of the table, due to a delay in the index counts getting updated. In most cases they are exactly the same or very, very close and will be the same shortly.

Listing 13: Return table row counts from the index or table partition

Finding Heaps (tables with no clustered index)

Working with heap tables is like working with a flat file, instead of a database. If you want to guarantee a full table scan for any and all queries, use a heap table. My general recommendation would be to add a primary key clustered index to all heap tables.

Listing 14: Finding heaps

Investigating Table Activity

Knowing which tables have the most reads and writes is another important piece of information when performance tuning your database. Previously, we examined queries to return the row counts for each table. The following examples show the number of table reads and writes.

Note these statistics from Dynamic Management Views are cleared out each time SQL Server restarts (wait and latch statistics can also be cleared out manually). The longer the server has been up, the more reliable the statistics. I have a lot more confidence with statistics that are over 30 days (assumes the tables have been through a month end cycle) and a lot less confidence if they are less than 7 days.

Listing 15: Read and write activity for all tables referenced since the last server restart, in a database

A more advanced version of the same query uses a cursor to consolidate the information for all Tables for all databases on the server. While I am not a fan of cursors due to their slow performance, navigating multiple databases does seem to be a good use for one.

Listing 16: Read and write activity for all tables referenced since the last server restart, in all databases

Views

Views are scripted queries that are stored in the database. You can think of them as virtual tables. Data is not stored in the view but we reference the view in our queries in exactly the same way we would reference a table.

In SQL Server, we can even, in some circumstances, update data through a view. To make a view read only, one trick is to use SELECT DISTINCT in the view definition. A view is only updateable if each row in the view maps unambiguously to a single row in the underlying table. Any view that fails this criteria, such as any view built on more than one table, or that uses grouping, aggregations, and calculations in its definition, will be read only.

Listing 17: Exploring views

Synonyms

A synonym is an “also known as (aka)” name for an object in the database. A few times in my career I have been asked to review a query only to be scratching my head trying to figure out the table to which the query is referring. For example, consider a simple query Select * from Client. I search for the table named Client but I can’t find it. OK, it must be a view then, search for view named Client and I still can’t find it. I must have the wrong database? Turns out Client is a synonym for a customer and the actual table is Customer. The marketing group wanted to refer to this table as Client so created a synonym. Thankfully, use of synonyms is rare, but they can cause confusion if you are not aware of them.

Listing 18: Exploring synonyms

Stored Procedures

A stored procedure is a group of script(s) that are compiled into a single execution plan. We can use the catalog views to find out which stored procedures exist, what activity they perform, and which tables they reference.

Listing 19: Exploring stored procedures

With a simply addition to the WHERE clause of the stored procedure details query, we can investigate, for example, only those stored procedures that perform inserts.

Simply modify the WHERE clause as required to investigate stored procedures that do updates (LIKE '%update%'), deletes (LIKE '%delete%'), or reference a particular table (LIKE '%tablename%').

Functions

A function is stored SQL that accept parameters, performs an action or calculation and returns a result.

Listing 20: Exploring functions

Triggers

A trigger is like a stored procedure that executes in response to a particular event that occurs on the table to which the trigger belongs. For example, we can create INSERT, UPDATE and DELETE triggers.

Listing 21: Exploring triggers

CHECK Constraints

CHECK constraints are a good way to implement business logic in a database. For example, certain fields must be positive or negative or a date in one column must be later than a date in another column.

Listing 22: Exploring CHECK constraints

Exploring your Data Model in depth

Up to this point, we’ve examined scripts that gives us a ‘high level’ view of the objects that comprise our databases. Often though, we’ll want more in-depth knowledge of each table, including the columns, their data types and any default values defined, which keys, constraints and indexes exist (or are missing) and so on.

The queries presented across the coming sections build on this foundation, and provide a means almost to “reverse engineer” your existing data model.

Columns

The following script documents the tables and their column definitions, in a specified database. The resulting output is a good one to cut and paste into Excel, where you can filter or sort on column names to get a good understanding of the data types that exist in a given database. Watch out for and question column names that are the same but have different data types or different lengths.

Listing 23: Exploring columns and their data types

Column Defaults

Column Defaults are values that are stored in the column, if no value is entered for that column, when the record is first inserted. A common default for a column that stores dates is getdate() or current_timestamp. Another common default in auditing is system_user, to identify the login that performed a certain action.

Listing 24: Exploring column default values

Computed columns

Computed columns are columns where the values determined by an equation, usually referencing other columns in the table.

Listing 25: Exploring computed columns

Identity Columns

IDENTITY columns are populated with unique system controlled numbers. A common example is an order number, where each time an order is entered into the system, SQL Server assigns to the IDENTITY column the next sequential number.

Listing 26: Exploring IDENTITY columns

Keys and Indexes

As discussed earlier, a general best practice all tables should have a primary key clustered index. As a second general best practice, foreign keys should have a supporting index on the same columns as the foreign key. Foreign key indexes provide the most likely way that tables be joined together in multi-table queries. Foreign key indexes are also important for performance when deleting records.

What indexes exist?

To see which indexes exist on all tables in the current database.

Listing 27: Exploring existing indexes

Which indexes are missing?

The indexing related DMVs store statistics that SQL Server uses recommend indexes that could offer performance benefits, based on previously executed queries.

Do not add these indexes blindly. I would review and question each index suggested. Included column my come with a high cost of maintaining duplicate data.

Listing 28: Finding missing indexes

Foreign Keys

Foreign Keys define table dependencies and control referential integrity between multiple tables. In an Entity Relationship Diagram (ERD), the lines between the tables indicate the foreign keys.

Listing 29: Exploring Foreign keys

It will produce output similar to the following:

ForeignKey

TableName

ColumnName

Reference

TableName

Reference

ColumnName

FK__discounts__stor___286302EC

discounts

stor_id

stores

stor_id

FK__employee__job_id__34C8D9D1

employee

job_id

jobs

job_id

FK__employee__pub_id__37A5467C

employee

pub_id

publishers

pub_id

FK__pub_info__pub_id__300424B4

pub_info

pub_id

publishers

pub_id

FK__roysched__title___267ABA7A

roysched

title_id

titles

title_id

FK__sales__stor_id__239E4DCF

sales

stor_id

stores

stor_id

FK__sales__title_id__24927208

sales

title_id

titles

title_id

FK__titleauth__au_id__1DE57479

titleauthor

au_id

authors

au_id

FK__titleauth__title__1ED998B2

titleauthor

title_id

titles

title_id

FK__titles__pub_id__1A14E395

titles

pub_id

publishers

pub_id

Missing Indexes that support Foreign Keys.

As a general best practice, it is recommended to have an index associated with each foreign key. This facilitates faster table joins, which are typically joined on foreign key columns anyway. Indexes on foreign keys also facilitate faster deletes. If these supporting indexes are missing, SQL will perform a table scale on the related table each time a record in the first table is deleted.

Listing 30: Finding missing Foreign Key indexes

Object Dependencies

It depends…I’m sure you have heard that before. I will review three different methods to ‘reverse engineer’ database dependencies. The first method use the stored procedure sp_msdependencies. The second method uses the foreign key systems tables. The third method uses a CTE.

sp_msdependencies

sp_msdependencies is a SQL Server undocumented stored procedure that can be helpful in navigating complex table interdependencies.

Listing 31: sp_msdependencies help

If we list all dependencies using sp_msdependencies, it will return four columns: Type, ObjName, Owner (Schema) and Sequence.

Make special note of the Sequence number, this will start at 1 and will grow in sequential order. The Sequence is the number of layers, or rows, of dependencies.

I have used this method several times when asked to perform archiving or deleting of data on some large database models. If you know the table dependencies, you have a road map of the order in which you need to archive or delete records. Start with the table with the largest sequence number first, then work backward from the largest number to the smallest number. Tables with the same sequence number can be removed at the same time. This method does not violate any of the foreign key constraints, and thus allows you to move/delete records without temporarily dropping and rebuilding constraints.

Listing 32: using sp_msdependencies to view all dependencies

2066-img139.jpg

In SSMS, if you right click on a table name you can click on ‘View Dependencies’ and ‘Objects that depend on tablename’.

2066-img13A.jpg

We can see similar information from sp_msdependencies as follows:

Listing 33: using sp_msdependencies to view objects that depend on a table (first level only)

2066-img13B.jpg

In SSMS if you right click on a table name you can click on ‘View Dependencies’ and ‘Objects that depend on tablename’, and then expand all the ‘+’ signs to see all the levels it looks like this.

2066-img13C.jpg

The following msdependencies report would provide similar information.

Listing 34: using sp_msdependencies to view objects that depend on a table (all levels)

2066-img13E.jpg

Similarly, in SSMS, we can see the objects on which a given table depends.

2066-img13F.jpg

The following msdependencies report would provide similar information.

Listing 35: Using sp_msdependencies to view objects on which a table depends

2066-img141.jpg

If you want a list of just table dependencies you could use a temp table to filter the dependency types.

Listing 36: Using sp_msdependencies to view only table dependencies

Query the system catalog views

The second method to reverse engineer your database dependencies is to query the foreign key relationships system tables.

Listing 37: Using catalog views to view dependencies

Using a Common Table Expression (CTE)

The third method to reverse engineer your database hierarchical dependencies is to solve a recursive query using a Common Table Expression (CTE).

Listing 38: Using catalog views and a CTE to view dependencies

Summary

Within an hour or two, I can usually gain a good understanding of any database design, using the ‘reverse engineering’ methods described in this article.

My intention has been to provide a set of sample scripts that you can run immediately on the server and databases you are currently supporting. Microsoft Excel is a great tool to use to help analyze and document your database data. I recommend copying some of the following tables and column query results to excel, so you can filter and sort in a variety of ways. It also is a great way to share the results of your reverse engineering analysis with your managers and peers.