Exploring your database schema with SQL

In the second part of Phil's series of articles on finding stuff (such as objects, scripts, entities, metadata) in SQL Server, he offers some scripts that should be handy for the developer faced with tracking down problem areas and potential weaknesses in a database.

Pretty quickly, if you are doing any serious database development, you will want to know more about a database than SSMS can tell you; there are just more things you might need to know than any one GUI can provide efficiently. For example, if you are doing a review of a development database, there are a number of facts you’ll need to establish regarding the database, its tables, keys and indexes, in order to home-in on any possible problem areas.

Fortunately, SQL Server provides any number of ways to get at the metadata you need. The INFORMATION_SCHEMA views provide basic metadata about most of the entities in each database. The far-more-expansive Catalog views offer just about every piece of metadata that SQL Server currently exposes to the user.

This article provides various scripts for interrogating these views to get all sorts of useful information about your database that you would otherwise have to obtain slowly, click-by-wretched-click, from the sluggish SSMS Object browser. Once you’ve built up your own snippet or template library, you’ll find it very easy to access your databases’ metadata using SQL Code.

Interrogating Information Schema and Catalog Views

Codd’s fifth Rule (no. 4) of what comprises a relational database states that there must be an active online, inline, relational catalog that is accessible to authorized users by means of their regular query language. This means that users must be able to access the database’s structure (catalog) using SQL. XQuery isn’t allowed by Codd’s rule; it must the same query language that they use to access the database’s data. The INFORMATION_SCHEMA  provides a standard way of doing this for SQL-based relational databases.

Unfortunately, the standard doesn’t cover all the features in a SQL Server database. Sybase and SQL Server always provided the System tables to provide all the information that was required of a database’s structure, long before the INFORMATION_SCHEMA views became a SQL Standard. The Catalog Views, introduced in SQL Server 2005, provide a more efficient and concise way of doing this, even if one loses a bit of the feel for the underlying structure. There are many more views than actual system tables and Microsoft has been assiduous in providing simple ways of getting the metadata that you want.

Building a Snippet Library

If you are weaning yourself off dependency on the object browser of SQL Server Management Studio, you’ll need a clip library of handy routines instead. It is impossible to keep all the information in your head. I have a range of snippets, recipes and templates of SQL calls to get the information I want, many of which I present in this article. The SSMS templates are handy for this, though I’ll use SQL Prompt or AceText too, to store code snippets.

Probably my most-used snippet is one of the simplest, and it gets the actual definition of all the views, procedures and functions. This is something I keep as a template. You’ll have to change the MyObjectName for the name of the routine whose code you want.

Sadly, it is impossible to get the build script for tables, along with all its associated objects, columns and indexes. It isn’t stored as such, though it is available via the Object Browser. If you want to get it via code, it has to be generated via SMO.

However, once you get started, there is a whole variety of things you will want to get information about what objects are associated with a given database, how many of them, who owns which objects, and so on.

Searching Schema-scoped Objects in a Database

Using a single Catalog view along with a special catalog function called OBJECTPROPERTY, we can find out the intimate details of any schema-scoped objects in the current database. Details of all schema-scoped objects are stored in the sys.objects Catalog view, from which other views such as  sys.foreign_keys, sys.check_constraints, sys.tables and sys.views  inherits. These additional views have added information that is specific to the particular type of object. There are database entities that are not classed as objects. Columns, indexes and parameters to routines, for example,  aren’t classed by SQL Server as objects.

The OBJECTPROPERTY function cannot be used for objects that are not schema-scoped, such as data definition language (DDL) triggers and event notifications.

Finding Tables with no Primary Keys

You’ll want to know if there tables without primary keys and why. Here is a way of getting that information from the INFORMATION_SCHEMA.tables view.

962-NoPrimaryKey.jpg

The following code, using a Catalog view, should give the same result as the previous code, but much more easily. The TableHasPrimaryKey property of the OBJECTPROPERTY function simply returns 1 if a primary key exists, or 0 if not.

Finding Tables with no Referential Constraints

You can, of course use almost the same query to explore many other characteristics of the tables. You’d certainly want to investigate any tables that seem to have no referential constraints, either as a key or a foreign reference.

962-Waif2.jpg

Finding Tables with no Indexes

You’d also be interested in those tables without clustered indexes and want to find out the reason why.

And you’d scratch your head a bit if there were tables of any great size without any index at all.

962-TablesWithoutInexes4.jpg

A one-stop View of your Table Structures

We can pull of this together in a single query against the sys.tables Catalog view to find out which objects (indexes, constraints and so on) do and don’t exist on a given database. This is a handy query to get a summary of the characteristics of your tables’ structure at a quick glance.

962-TableCharacteristics8.jpg

How many of each Object…

Since the OBJECTPROPERTY function generally returns either a 1 or a 0, it can be used pretty simply in order to find out not just whether there are constraints, defaults, rules or triggers on individual tables, but also how many of them there are.

962-Constraints13.jpg

Too many Indexes…

By a slightly different route, we can also find out which of our tables have the most indexes on them. Are any of them duplications? Here is a query you might use to see where the indexes might have gathered in undue numbers.

962-NumIndexes9.jpg

Seeking out Troublesome Triggers

I find triggers particularly troublesome as it is not always obvious that they are there. I’m not the only developer who has spent an hour trying to work out why the result of an update is nothing like what one was expecting, only to be struck by the thought that some crazed code-jockey has inexplicably placed an update trigger on one of your tables. Yes, there it is. The following code should winkle out these lurking problems, and much more besides.

.. and from this, you can drill down to  see the sort of triggers your tables have:

962-Triggers15.jpg

Querying the Documentation in Extended Properties

Catalog queries are a powerful way of querying the documentation in order to find out more about the business rules governing the database structure. There are several useful queries that you can use if you have been sensible enough to structure your documentation, such as listing out your procedures and functions, along with a brief synopsis of how they are used and why. Here, we’ll just restrict ourselves to a useful list of all the tables that have no documentation in the extended properties. There really aren’t any other places to put your table documentation so you can be fairly sure that these tables have no documentation.

Object Permissions and Owners

There are a whole variety of things you will need information about as well as the details of the database objects; lists of permissions on each object and the type of permissions they represent, for example. Here is a query that lists the database-level permissions for the users (or particular user, if the final condition that is currently commented out is used.)

A different task is to explore the ownership of the various objects in your database. The following code will make this task a lot simpler.

What’s been recently modified then?

If you are working with others on a database, then one of the more useful bits of code you can have is the following, which tells you the date at which your database objects were last-modified. This is the full code, but generally you’ll modify it slightly as you’ll just want to know the twenty latest modifications or so, or maybe list all the objects modified in the past week. Sadly, it will not tell you what has been deleted!

Searching all your Databases

You can use these various routines on all databases, or on a list of databases. You can use undocumented code, of course, but a better approach would be to use yet another system catalog called sys.Databases. You can then execute the code against all databases, collecting the result into a single table. Here is an example:

Interrogating Object Dependencies

If you are faced with the difficult task of refactoring code whilst keeping everything running reliably, one of the most useful things you can determine is the chain of dependencies of database objects. You’ll particularly need this if you are considering renaming anything in the database, changing a column in a table, moving a module, or are replacing a data type. Unfortunately, it isn’t particularly reliable.

One problem that SQL Server faces is that some entities used in an application can contain caller-dependent references, or one-part name references (e.g. they don’t specify the Schema). This can cause all sorts of problems because the binding of the referenced entity depends on the schema of the caller and so the reference cannot be determined until the code is run. Additionally, if code is stored in a string and executed, then the entities that the code is referencing cannot be recorded in the metadata.

One thing you can do, if you are checking on the dependencies of a routine (non-schema-bound stored procedure, user-defined function, view, DML trigger, database-level DDL trigger, or server-level DDL trigger) is to update its metadata. This is because the metadata for these objects, such as data types of parameters, can become outdated because of changes to their underlying objects. This is done by using sys.sp_refreshsqlmodule, e.g.

 sys.sp_refreshsqlmodule 'dbo.ufnGetContactInformation'

Even then, the information you get back from the metadata about dependencies is to be taken with a pinch of salt. It is reasonably easy to get a list of what objects refer to a particular object, and what objects are referred to by an object. Variations of the following query will do it for you, using the SQL Server 2005 catalog view sys.sql_dependencies.

962-Dependencies.jpg

You will have spotted that what you often need is not limited to the dependent objects of the object you are re-engineering. If you are altering the behavior of the object, you will need to then need to look in turn at the objects that are dependent on these dependent objects, and so on (and watch out for mutual dependency!). In other words, you need the dependency chains.

962-DependencyChain3.jpg

It’s worth noting that in SQL Server 2008, you would use the sys.sql_expression_dependencies table, which has a much improved way of working out dependencies. There is a very full discussion, with example code, here at Reporting SQL Dependencies.

Summary

With the various scripts, suggestions and illustrations in this article, I hope I’ve given you a taste for using the Catalog, or Information Schema, views for getting all sorts of useful information about the objects in your databases, and of the dependencies that exist between.

Some of this information is available from the SSMS Object browser but it is slow going. Once you’ve built up your own snippet or template library, you’ll find it quicker and easier to take the Spartan approach, and search your databases’ catalog using SQL.