Source Control and Databases

András's article on Source Control shows a lot of the thinking amongst the developers at Red Gate at that time that eventually crystallized into SQL Source Control. It identified several problems that had to be overcome before the task could be done properly. Andras's suggestions for the use of SQL Compare are now built-in to SQL Source Control, but it remains a fascinating explanation of the complications along the way.

Many DBAs must have made and heard the request for a source control mechanism for SQL Server databases. It sounds like a brilliant idea. Once implemented, one could easily go back to a specific database schema version, compare it against the current version, and have plenty of information for auditing, or for reverting a database object to the previous version.

But how does one even start introducing source control for a SQL Server database? Such a system has particular challenges to overcome, depending on which of the following database development techniques you adopt:

  1. Storing the database schema in the database
  2. Storing the database schema as SQL creation statements in files, which when executed, build up the database.

This article compares and contrasts these two development scenarios and then assesses some source control and team development challenges, as they apply to each.

The majority of available source control systems work with files, and consequently do not work with SQL Server databases directly. The database schema can be stored in files for the purpose of source control but then the need to translate schema objects to files introduces the question: is the schema in the database the same as the one represented in the files?

Storing the schema in the database

Microsoft SQL Server DBAs are spoilt for choice in terms of graphical tools that allow the creation of a database, a table, and any other database schema object. Management Studio makes database development accessible to everyone: with a few mouse clicks and some drag-and-drop one can easily create a database, a few tables, and other dependent objects. Figure 1 shows a screenshot of the view designer that aids creating views in a visual way.

397-Image1.gif

When using the GUI to add database objects to the database, no SQL statements are required from the user. These statements are generated by the GUI and discarded after their execution. While this makes the process very simple, it effectively leaves the user with nothing to check into source control. One can use the scripting functionalities of the GUI tools to generate the SQL statements, but it can become a very messy task to identify and replace the scripts for altered database objects.

Once a database schema is produced in this way, it can be compared, using schema comparison tools, with other development databases. All of the differences can be merged into a single staging or reference database. Regular backups would ensure that it is possible to go back to previous versions, and usually offline documentation would keep a more or less accurate history of who has done what and why.

NOTE:An alternative to offline documentation would be to use extended properties on the database objects. However, many of the graphical tools lose such extended properties during object alter.

Source control is possible using generated creation scripts, or third party tools that can create a snapshot of the schema, but this might not very granular.

In this development approach the database schema evolves in the database. Unfortunately, if not done carefully, it may evolve into something horrible. It is likely that it will contain “stale” objects, stored procedures whose purpose no one knows, stored procedures that do the same thing as ten other stored procedures. But since any object in such a database is a “public API”, heaven knows who is depending on them.

Overall, this development approach allows for rapid prototyping, without significant knowledge of SQL and it’s perhaps no surprise that Management Studio is a popular “database design/development” tool for many. The approach works well with Agile development and, with proper unit testing, refactorings and self discipline, it can prove a very powerful means for development. It is, however, a very different approach from an almost idealised methodology, in which a database schema is properly designed and, after the design is final, the applications are built on this database design. Many developers find it hard to reconcile this “ideal methodology” with the need to accommodate seemingly ever-changing requirements.

Storing the schema in creation SQL files

The other popular database development technique stores the schema in creation scripts. Generally, for every database schema object there is a file that contains the SQL commands that will create that object. These files are the basis for development and for source control.

The development process is similar to the approach taken when using the database as home for the schema. The important difference is that one does not make changes by executing SQL on a live database instance, but instead updates a set of SQL scripts. For every database schema modification the creation SQL files are modified and, once they are deemed ready for deployment, they are executed to create a reference database. This database can be compared against a live database, and then the migration/deployment stage starts.

The main reason for the popularity of this approach is that each creation SQL file can be stored in a source control system. In this way the history of every modification to the database schema is available. Using the locking mechanism of source control, large development teams can work on the same database and be able to work in parallel. Therefore, this approach to database development is more popular with larger organizations where development is performed by a larger team who need to cooperate.

Source control and team development challenges

Both of the above approaches to database development are popular, although they are generally used in different types of organizations. The fact that there are at least two approaches introduces some problems: sometimes these two approaches are mixed. It is obviously faster to fix a database performance problem by changing an index on the live database, even though the schema might be stored in creation SQL files. Of course it would be better to make the change on the script side first. But is this true if this would have meant an hour of extra time at 2 am?

Personally, I prefer to store the database schema in creation SQL files, but it is undoubted that both of these approaches have their shortcomings with regard to ease of use, tracking changes in source control and development as a team.

One of the challenges we need to face is the fact that the source control system stores files, but what we want to see in the source control is the not the files but the database schema itself. In SQL Server, the schema is a set of objects that may have different textual representations. If one uses creation SQL files it is therefore difficult to see if these files actually represent the schema in the database. Also, if one looks at the database schema, there are many different ways to generate the file that can be stored in the source control system.

Textual and non-textual database objects

Textual database objects, such as stored procedures, views and functions, are stored by SQL Server as meta information and as text. For example, consider the following function:

SQL Server will store the above text with all the whitespaces and comments, in a system table. You can view this definiton by calling sp_helptext, or by querying the system table sys.comments on SQL Server 2000, or the sys.sql_modules system view on SQL Server 2005.

In addition to this textual definition, SQL Server will store meta information about the funtions, such as the parameters it accepts, its name, and the owning schema or user, in other system tables. Some of these are accessible via sys.objects or sysobjects, sys.parameters and syscolumns, etc.

Unfortunately, SQL Server is not perfect, and the more modifications one makes to such textual objects, the more likely it is that the system tables that contain the meta information will get out of sync with the object definiton. This inconsistency in system tables is an unfortunate side effect of using the database as the storage location of the schema. It is not a problem if the schema is stored in scripts, and is created every time – in the right order – without alteration statements.

Nevertheless, both schema storage approaches described in this article benefit from the fact that textual database objects are stored by SQL Server exactly as presented to SQL Server. Such database objects can be retrieved from the database in an exact way, therefore they can be compared with the files that are stored in the source control system. This also works the other way around, if one needs to know if such a textual object in the source control is the same as the database version, one can once again just do a textual comparison.

The advantage of SQL Server storing such textual objects without modification is that formatting and comments are preserved, and can be retrieved from the database schema. These textual objects, particularly stored procedures, functions and views may contain complex business logic that needs to be quickly understood by people who want to make changes to them. One way to aid this process is to format the code in a readable way, follow consistent variable naming and add plenty of comments. Many such aids are even standardised within organizations.

Sadly, efforts to make creation scripts readable and comply with organizational policies cannot always be preserved by SQL Server. Non-textual database objects, for example tables, are stored as meta inforamtion only. Comments and formatting for such objects are not preserved in the database. For example, one may have a script like this:

This code contains an important comment that gives information about the EmployeeID. Also, every column is specified on a new line, with the comma at the beginning of the line. If this statement is executed, the formatting information and the important comment are not stored anywhere in the database schema. If there is need to preserve the above information, one has to store it outside the database. This can become tricky if one uses GUI for certain database modifications. If you add a new column to a table via the GUI then the table creation script will need to be updated. If you script the altered database table via the GUI then, again, it would not contain the comments.

This makes non-textual these objects less ideal for source control systems, because it is not obvious whether the script file in the source control system represents the same object that is in the database schema. If one scripts the database schema object from the database, the textual representation of the object could be significantly different from the script. For the above table SQL Server’s Management Studio will create the following:

Are the two tables the same?

The alternative is to create a toy database, execute the object creation script from the source control system, and use third party tools to compare the two database objects.

Processed database objects

Some objects are processed by SQL Server. In other words, the creation SQL bears no resemblance to what SQL Server stores internally. For example, check constraints and DRI defaults are often completely rewritten internally.

For example, if a DRI default is set to “2”, SQL Server 2005 will turn it into “((2))”. This is not a major change, but it does differ from the behaviour in SQL Server 2000. Some examples include the following:

 

Input

SQL Server 2000

SQL Server 2005

2

(2)

((2))

1+2+3

(1+2+3)

(((1)+(2))+3)

cast(3 as int)

(convert(int,3))

(CONVERT([int],(3),0))

This is not a problem when the schema is stored in the database. But if we use scripts, one would expect to validate that what is in the scripts is the same as what is in the database. One will certainly start wondering whether the scripts are in sync with the database if one had a cast statement in the creation script, and a convert statement in the system tables. Some of the rewrites are even more difficult to compare. For example, in SQL Server 2005, an IN statement is usually rewritten as a disjunction, with a lot of extra parentheses.

When one is using creation scripts to store the database schema, it may be worthwhile to create a DRI default or check constraint in SQL Server first, and then copy the processed form of it to the creation scripts. In this case the creation script will resemble the scripts that can be created by GUIs from the database, making it easier to see differences between the database and the creation scripts.

Renaming objects

Sometimes, database objects need to be renamed. A stored procedure may need to be extended to include more functionality, or a stored procedure that is no longer required will need to be replaced with a new one.

Renaming database objects is something that is rather simple to do in creation scripts, since it involves nothing more than a search and replace operation. But with the database as the schema source, there are few things to watch out for. The sp_rename procedure, or the Management Studio object rename, will not do a good job. First of all it will not modify all the system objects. It will update the sys.objects or sysobjects table, but not syscomments or sys.sql_modules.

NOTE: For more details see my blog, A reason to avoid sp_rename.

You should also be aware that if you generate a creation script for a renamed stored procedure using Query Analyzer or sp_helptext, then the generated script will not create the stored procedure that is in the database. It will create one that has the original name.

Furthermore, neither sp_rename nor Management Studio will rename the references to this object. This is one of the shortcomings we sort to overcome with the object renaming feature of SQL Refactor.

When one tries to check whether the source control version of a renamed stored procedure is the same as the one in the database, one may get surprised by the fact that while the textual representation of a stored procedure is the same in both the source control system and live database, the name that the stored procedure can be accessed via is different in the database.

If one is altering schema in the database and is using source control by scripting the database objects, then if you rename an object you will also need to update the corresponding script file. Usually these script files are named after the contained database object, so if a renamed database object is scripted it would be scripted to a file that reflects its new name. It is important to remove the old version of a renamed object by deleting the corresponding old script file.

Constraint names

Foreign keys, DRI defaults, primary keys, indexes and check constraints are parts of a table definition. These are also individual database objects, and can be referenced via a name. This name can be specified by the user when the object is created. For example the following statement will create a foreign key named FK_Employees_Employees:

However, if the name for the foreign key is not specified, a name will be generated automatically. For example if one executes:

This will create a foreign key with a generated name, such as FK__Employees__Repor__0CBAE877. This name is different every time the script is executed. If one is using creation scripts to store the schema, it is good practice to name such constraints explicitly, and not spend time matching such objects based on their content when exploring differences between the scripts and the live database.

Object dependencies

Stored procedures may depend on other stored procedures; views depend on tables. For tracking dependencies, some people make use of the sysdepends table in SQL Server. This table stores dependency information for some database objects. For example, if a stored procedure A calls stored procedure B, one would expect an entry to reflect this in sysdepends.

In practice, it sometimes is there, but sometimes not. If the stored procedure A has been created before stored procedure B has been created then the entry will be there. However, the sysdepends table can easily get out of sync with the reality. It is enough to alter a stored procedure and the dependencies are not updated. The same applies to the SQL Server 2005 equivalent, sys.sql_dependencies.

Having all the database objects in textual form in a creation script enables one to use text operations like find, search and replace, etc. This renders finding a referencing stored procedure to a simple text find, and one does not need to rely in possibly inconsistent sysdepends tables.

However, if your schema is stored in the database, one way to avoid corrupting the sysdepends table is to not alter textual database objects. For example, instead of altering a stored procedure with an ALTER statement, one can drop the old stored procedure, and create the new one. This will ensure that the sysdepends table is updated. However, dropping a stored procedure and creating it will delete all the extended properties and permissions on that object, so these will need to be set as well. Also, it is important to create all referenced objects first, since when the sysdepends table is modified, the dependent objects must exist.

Often, especially when using schema binding, one must execute the scripts that create the database in a specific order. This is a common problem with source control for databases, because for granularity reasons it is good practice to use a single file for each database object; however, to create the database from the SQL files one needs to create and execute a single script.

One solution I’ve seen in practice was to concatenate all the scripts, execute them, look at the error message, reorder, execute, look at the error message, reorder, execute …. Clearly this is a horrible solution, and of course, it will very likely mess up the sysdepends table.

Luckily there are tools that can read in creation scripts and order them in the right dependency order automatically. This will allow the creation of databases that have a consistent sysdepends table, and will help other DBAs in exploring the database. For more details see How to order creation SQL files.

Tools that help to address these issues

It is not the intent of this article to discuss the nuts and bolts of implementing source control system for SQL Server. However, it is worth noting briefly that there are a couple of tools that will help to address the issues involved in using source control with SQL Server databases.

Editor’s note. This list is now out of date, now that a lot of Andras’s thinking that is expressed in this article is now embodied in SQL Source Control

Microsoft Visual Studio Team Edition for Database Professionals (MSVSTEDBP) is biased towards the storing schema in creation scripts. Although it is a pricy solution, it contains many powerful means to aid the database development process. In MSVSTEDBP one is working on SQL scripts, and can unit test the database schema, but the tool also imposes restrictions on the source control systems that can be used for the database scripts.

Another tool that will be available at the end of June 2007 is Red Gate’s SQL Compare 6.0. Its primary purpose is to compare and synchronize two database schemas. The major addition to version 6.0 is that a database source now includes a set of creation scripts. These scripts can be created, read in as if they were a database, and one can synchronize towards the database as well as back from a live database to the script files. Also, when SQL Compare reads in scripts from files, it can synchronize partial changes to another data source, like a live database. It will do this by creating a script that is ordered based on the dependencies between the database objects.

Summary

Setting up source control for a SQL Server database can be tricky. There are two main approaches to database development, and both could be adapted to be used with source control. However, neither of these approaches is perfect, and there are issues one must look out for to ensure smooth source control operation. The main challenge is that in order to use source control one needs to use files, but it is not a trivial task to convert the database schema to files that can easily be compared back to the database schema itself. In this article I explored some of the associated problems, such as preserving documentation and formatting for textual and non-textual objects, as well as working with processed database objects like check constraints and inconsistent system tables.