This article aims to describe the major scripts, documents and lists that are needed to ensure that changes to a database can be easily tested, deployed, and maintained in service.
All RDBMSs can, like an application, be scripted. Of course the TSQL scripts for the DDL of the SQL Server databases need to be in source control, in an VCS (Version control System) repository , but for database configuration-management, this is where the story begins.
An RDBMS is different in many subtle ways from an application, and has rather different requirements. If the aim is to implement a ‘Continuous Integration’ (CI) process that provides a scripted deployment, it is dangerous to assume that the practices of application configuration management will work, without elaboration or modification, for the many procedures and processes that make up database lifecycle management.
From the developer’s perspective, a database that is ready for delivery may be seen as a collection of ‘logical’ database objects, such as tables and routines, columns, keys, and constraints. Other processes such as ETL, alerting, and messaging may only be represented by a stub or are ‘mocked’ sufficiently for a successful build and test. The access-control model may, at this stage, be rudimentary. Processes such as replication and log shipping will not be represented at all. The analysis of data for business analysis and reporting via SSAS is not likely to be considered at this stage. The only data that is required consists of the ‘static’ data that is required for the database to run.
For the Database developer and administrator, there lies ahead a great deal to attend to beyond this simple model for a database of any realistic size and complexity. All the systems that are required, or are already in place, beyond the ‘logical schema’ that has been defined in development must be checked to make sure there are no repercussions of the change. This article aims to explain some of the scripts and documents that are necessary merely to automate the bulk of a database deployment and subsequent management; and which need to be held in a repository.
The more frequently the entire delivery process is done, the easier it gets, because everyone gets alerted more quickly of potential problems. Even the process of changing database objects, regardless of their type, will not always be a trivial operation. Changes often impact dependent objects and sometimes even underlying data. The process of analysing and accommodating these dependencies is traditionally time-consuming and prone to error. By deploying frequently as far as staging, many of these issues are identified, scripts tested, and the process automated where possible, long before the decision is taken to deploy to production.
Keeping track of changes
To prevent problems, or to react quickly when they happen, it helps greatly to know what has changed with the system. Different specialists can spot the repercussions of any change from their own viewpoint. With a corporate-scale database that is essential for the revenue-generation of the organisation, it is essential. It is also likely to be a legal requirement. ‘Version creep’, or ‘version drift’ has to be monitored and dealt with as well as changes that are the result of the legitimate development process.
Custodian of data
Administrators need to continuously monitor changes to production databases to detect unauthorized changes. This is a broad remit, but obliges the DBA to ensure that all changes, even if as innocuous as an index being dropped, are accompanied with whatever change approvals are required, so that no valuable changes get lost.
Sadly, not all unauthorized changes are innocuous. There is, in consequence, a legal requirement for databases used by commerce or holding personal data to comply, in terms of auditing and security, with national and international standards. This is another important aspect of life cycle management. It is not only to ensure database efficiency, maintenance, and ease of operation, but to safeguard the interests of customers, users, and shareholders.
Different requirements for different installations
At various stages in the development cycle, there will be several copies of an application’s database that have to be maintained for various purposes such as development, staging, production, and the several types of testing. Each setting has its own requirements.
Depending on the purpose of the installation and the nature of the data, each copy of the database is likely to have different logins, roles and permissions. They will also have different configuration, physical file locations, replication, Service Broker, ETL systems, instrumentation and scheduled agent jobs. There will also be differences in alerting systems and even possibly in the way that errors are managed.
What constitutes the core database? Not access-control, surely, since a database used for testing is unlikely to share the same access-control as the database in staging or the production system. We can’t script out the replication artefacts since they would cause replication to break in any database with these artefacts in place. Scheduled jobs are rarely scripted out, yet these can easily contain scripts without which the database simply wouldn’t function.
Where changes take place
Production databases are not immune from schema change, even if the change is merely an adjustment to indexing strategy. In a normal business setting, all changes to production databases must adhere to proper production control procedures.
In-House development Applications
To ensure that nothing gets lost, and that the release processes are done using the correct versions of the code, the versions of the database that are deployed and maintained for development purposes must match the versions in source control. If, for example, the version undergoing integration tests has been altered without these changes being reflected in source control, and transferred to trunk, then the results of the tests become meaningless.
When a system is being actively developed , development databases are used. Depending on the methods used, these might be a multitude of local machines where changes are aggregated in source control. It could be one or more shared development databases. Whatever the model, developers make changes that the database administrator must then consolidate and propagate to staging or test databases. There are a number of tasks that are necessary.
The changes made to development will be reflected by the version in Source Control. The database must be built from this TSQL code to ensure that it compiles and runs. Because so many additional services and components may be required, such as CLR components, agent jobs, and server-based COM objects, the build will check that these are ‘scripted’ in the build script. As part of the database scripts, there will probably be simple ‘assertion’ tests provided, at component level, by the database developers that are designed to check that each component does what is expected, and that dependent objects are what is expected. If all is well, then these changes are made to staging or test databases.
Typically, most applications will get upgraded over time by their provider, usually a third-party. In the meantime, many third-party database applications are customized by the business user as required to support the application. This can be for a number of good reasons, such as replication, performance, auditing, or instrumentation / monitoring. Application customizations are usually dependent on upgrade scripts supplied by the application vendor, or in some cases is done via a remote login by the vendor. In most cases, the production team can only guess at the impact of the upgrade procedure on their customizations. Without finding out, all manner of processes could inexplicably break. When customers wish to determine the effects of an upgrade, they compare the database schema before and after the upgrade to find out what’s changed. This tells them how their customizations will be impacted as a result of the database being upgraded.
Managing Database Change
The administrator who is managing a deployment will aim to automate as much as possible of the process. In a simple system where a deployment can fail without a risk to security or business, that could include almost all the process, but with any corporate system there are likely to be tasks that require the DBAs expertise, and will require manual intervention as well as signoff. At one end of the spectrum of risk, a synchronisation script can, in a trivial deployment, be allowed to be automatically generated and immediately applied to the destination. At a certain point in the spectrum, a manual check of the script by a DBA, and signoff, is required. For a typical deployment, The Database Administrator will be involved in three major tasks.
The changes to the logical database model in source control represent a specification of the required changes to the production system. These may be made for a variety of reasons including business mergers, legislative changes, new business requirements, and application changes. There will be an impact on other systems, but these are unlikely to be recorded in the database source control. There are likely to be repercussions: The hardware must be checked to ensure that it can support the changes. The software configuration will need a check-over. A change to the database model will, for example, affect onward analysis and ETL. In some circumstances, this will require changes to the logical model before deployment can take place. To ensure rapid delivery, the DBA works ‘upstream’ with the development team to ensure that this never happens, and the change specification document, in whatever form it takes, becomes a way of ensuring that there are no surprises on either side. At the specification stage all changes are planned, reported and costed. In some organisations, this will require signoff.
The deployment process is that of changing the database to reflect the conditions that were specified in the change specification phase. Deployment includes changes to other systems and services, from the obvious such as hot standbys to the subtle, such as analysis and reporting services, alerting, instrumentation, monitoring. Deployment does not only encompass the process of applying the changes to the actual database, and if necessary, undoing the changes. In some cases, it will involve roll-forward, with the subsequent changes being reflects in source control
Managing the source and configuration
As well as the source build scripts kept in source control, the change-management process requires other scripts and records that allow developers and database administrators to manage a variety of changes in database environments: These should be in a SCM repository as well.
These scripts should be used for all installs so that every participant in the development process effectively ran a product upgrade each time they set up tests or development instances. This results in scripts that are well tested long before they hit production.
After a change is made to a production system, the changes to the database model must be itemised to help any audit to the evolution of the database model. Auditing documents who performed what actions, and why. This allows anyone to detect and, if necessary, prevent any ‘uncontrolled’, unknown or unanticipated access to the data.
These are a variety of scripts and lists that help to manage databases during delivery..
- Schema Baseline This is a build script or artefact that represents a point in time of the definition of the database and its associated database objects. These scripts are sometimes referred to as “create schema from scratch” scripts. It must be possible to compare this with another script or a database. A database will have a number of these over time, usually for each production version. Unlike source control, these are at schema or database level rather than object level, which would mean that for a simple database, one script would build the entire database, with the objects built in the correct dependency order. It must identify when it was generated and what classes of database object were included; the so-called ‘scope definition’. This has to be immutable as it is difficult to automatically compare, or generate synchronization scripts from Schema baselines with different scope definitions.
Normally, you have separate schema Baselines for access control objects such as roles and permissions, since these are best excluded from versioning. The same applies to objects whose definitions include setting-specific configuration information such as file paths or IP addresses. Database objects that participate in replications may need to be kept separate to avoid the problem of comparing tables that have replication artefacts that are added by the replication service.
If a Schema baseline consists of a number of scripts to create an entire database, then these should be linked together by a single script that includes individual component scripts in the right order.
- Server-specific configuration information: Whereas the development scripts will focus on tables, functions and procedures, this information relates to specific types of installation. It is normally kept as a structured document that automated CI scripts can access. It contains machine-specific information such as file paths. It will also contain all information that relates to added components that are relevant to the role of the server, such as whether SSIS packages should be installed for ETL, whether service broker or replication is used. It is usually copied from the repository to the server as appropriate, and is particularly useful when setting up test VMs of various types via scripts.
- Schema Comparison List: This is a summary record of the object-level differences between two databases, either or both of which could be a schema baseline
- Schema Migration, or Schema Synchronization scripts: A script represents a way of making schema changes from one version of a database to another version, whilst preserving the existing data.
- Schema Change Plans: A means of deploying specific changes from a development environment to one or more target databases.
- Data Synchronization scripts: A script that makes changes to synchronise the data between two copies of the same database version. Data scripts are often ephemeral if they deal with live data but they are essential for test environments.
- Data Migration scripts: A script represents a way of making changes to synchronise the data between two copies of different database versions where refactoring has made simple synchronisation impossible.
These are only the configuration management categories of documents. The documentation will also hold all the disaster recovery procedures, and critical error responses. It will also hold the first-line responses by production staff to all errors and warnings encountered within the system. All deployment script will be held here, and if they require change, will be ‘versioned’.
There are a number of uses for this repository. Most importantly, it aids teamwork and ensures that essential information for both development and operations is accessible. It makes it quicker to review and approve change requests, and gives the development project managers a reference that helps track development process. It also aids with defect-tracking and ensures that if disaster strikes, the information necessary to restore services is to hand.
It makes it easy to identify all the configuration items that belong to the systems that make up the database application(s). By providing a structure for the ‘required documents, it becomes obvious when something is missing. Without an easy way of checking, a document that is only necessary when responding to a disaster can be missed out.
This type of SCM repository helps to manage the build process and to identify the requirements of any tools used for builds. It can be used to ensure that the software, hardware, networking, and cloud-based provision keeps pace with the requirements of the database application.
In looking through the list, one might wonder why scripts need to be held to migrate between versions of the database, rather than to simply use a database synchronization tool such as SQL Compare and SQL Data Compare. One would certainly use a tool to provide the basic script, but may require modification to cope with special cases such as a well-used OLTP system where table-changes must be done in a way that does not result in extended table-lock. For schema changes, the synchronization script is usually checked by the DBA or other designated signatory and ‘signed-off’, after testing wherever possible (some release management tools support an ‘approval gate’ model where changes are automatically provided to the DBA for review before production). Data Migration and Synchronization scripts might seem odd candidates for retention but where test data is loaded into a test VM, this must be held in a repository so that the test can be repeated easily and deterministically.
Another question that is often asked is why these documents must be held in source control rather than in the file system. The most obvious reason is that it is an audit requirement. Where a change is fraudulent, the document can only be used as evidence if its state at a particular point of time can be proved.
The automation of the development and delivery of any database application is commonly held to provide the key to rapid delivery. Whereas it is certainly important, one can end up automating an ineffective and arcane system. Automation has to be preceded by a process identifying and managing the essential steps. This ensures that ‘roadblocks’ are avoided and there is sufficient visibility for all the different specialities in the organisation who are involved in the process, such as Ops staff, developers, auditors, compliance experts, network specialists, QA, database administrators and project managers. To make this possible, there must be a SCM archive.
The scripts and other artefacts that make up this archive varies according to the nature of the application, database, and the organisation, but the principle remains that all the essential scripts, lists and document are kept in a form where changes and ‘sign-off’ can be tracked. The term ‘essential’ is used to mean that the items are required to allow audit, change management, disaster recovery, first-line support, and both the build and configuration of all the components of the system. An SCM archive must be, where possible machine-readable, and in a form that makes it as easy as possible to automate the many processes that go into the database lifecycle.