Redgate Software

The Database DevOps Challenges SQL Clone Solves

SQL Clone helps to accelerate the delivery of databases updates, by enabling an organization to provide database copies, and the right data, to all parts of the deployment pipeline that need it, with a light footprint, and securely.

It removes much of the administrative burden of providing a ‘database refresh’, while still allowing operations teams to control where database copies are deployed and to ensure data remains protected at all stages. By reducing the length of time required to deploy a new database copy, or clone, to seconds, it allows users to work in parallel, and so improves productivity.

With its PowerShell integration, SQL Clone becomes an important component in a broader Database DevOps toolchain that provides automated logging, and reporting for all the processes required for database development, testing and delivery. It also allows the delivery process to be customized more precisely to the requirements of the team.

Introducing governance to the database distribution process

By adopting DevOps processes, an organization should be able to make better use of its existing data, when engaged in development, testing and training. For example, the BI team will want to export data for downstream reporting and analysis; developers and testers will want ‘representative’ copies of production data for integration and acceptance testing, or to reproduce a production bug, and so develop and deliver a rapid fix.

These are all legitimate uses for this data, so long as all necessary precautions are taken regarding compliance, before it is moved outside the secure environment of production. If an organization stores and handles personal or sensitive data, the process of copying databases requires increasingly strict management and compliance. These data protection and privacy concerns will restrict the free movement of at least part of the data to the business activities that need it.

Every organization must know which activities need to access personal or sensitive data and why, as well as what processes and databases store and use this data, and where they are located. They then need an automated, reliable and secure way to allow the relevant business processes, and users, to access the data that is essential for their task.

SQL Clone helps to solve this problem, because it offers a lightweight mechanism for copying databases, with centralized control, and allows us to incorporate data masking to protect personal and sensitive data. With SQL Clone, the database administrator can:

This gives the DevOps team an opportunity to create a database provisioning process that can distribute data to the areas of the business that need it, quickly. It can also control what data goes where, while ensuring that any sensitive or personal data is always protected and secure.

Managing disk space for provisioning

Any database can be copied by restoring it from a backup, or attaching the data and log files, or by building it and then importing data. Each of these methods can be automated, but to do it for every server that needs a database copy often requires significant storage space within the domain.

When databases run into hundreds of gigabytes, or even terabytes, in size, and the business requires multiple copies of that database to be delivered to various parts of the business, this becomes a problem. It requires the involvement of storage administrators to manage the file systems and to prepare sufficient space on local servers, or on the SAN. Inevitably, this slows down the delivery of data, and will lead, for example, to delays in the testing schedule, and often means that the development and testing teams work with stale or inaccurate data. This can reduce the effectiveness of certain types of database tests, because the results will be a less reliable indicator of true production behavior.

With SQL Clone, we create just one full copy, called an ‘image’, and then create multiple clones from that image. Each clone has access to the data in the image, which is immutable, and only the changes made to a clone are stored locally. This means that even for databases that are hundreds of GB in size, each copy, or clone, requires only tens of MB of local disk space. Suddenly, creating ten copies of a 500 GB database requires only a little over 500 GB of initial extra storage space, rather than 5 TB.

Fast database provisioning

It is not just the amount of storage space that can cause difficulties. As the number of database copies increases, so does the time required to move the data about.

With the commonly-used backup-and-restore method, the process of copying each database is often too slow to support the team’s requirements; making a single copy of a 500 GB database can take about 5 hours. For SQL Clone, the process of creating the initial image is as slow, but it is done only once. Subsequent clones are almost instantaneous. With SQL Clone’s PowerShell automation, administrators can schedule an overnight job that creates a sanitized image of a source database, and then rapidly deploys clones of that image to all target servers.

With appropriate permissions set up, developers and testers can even ‘self-serve’ clones. This is quite a big cultural shift, because it enables isolated and parallel work practices. It moves the database towards being just another ‘standard resource’ that can be ‘imported’, used and reset, as required. For example:

In addition, setting up a new team member with database copy becomes a trivial task, especially when coupled with use of virtual machines and containers, to standardize the database server configuration. This means all team members can work on a standardized database environment, database schema, and data set, but without the need to adopt a single shared development server.

Isolated development environments

The more frequently that a team must deliver software, the greater the demands placed on the development and testing cycles. While the shared database development model can be made to work, it can also create dependencies that delay work, and limit what can be achieved in each cycle.

Many development teams still have just one development database on one server, and everyone in the team develops and tests on it. This requires discipline, and it has some advantages. For example, it uses less disk space and often requires less DBA time to manage compared to the dedicated development model. Also, the shared model means data is kept in one location, where it is under IT control.

However, shared development servers can also lead to conflict and ‘wait states’. Often, team members will lose time waiting for a long testing cycle to complete before they can proceed with new feature development, or waiting for a ‘database refresh’ after accidental changes made the current copy unusable. In shared environments, it is sometimes harder to diagnose which database modification caused a sudden change in behavior or performance.

Complications most often arise when one developer alters an object with interdependencies on the work of another developer, although judicious use of schemas can avoid a lot of this. Nevertheless, there are other difficulties and compromises. When a team using a shared server has a database in a working state, they tend to favor the ‘status quo’, with team members often requesting to delay database updates that could cause breakages and delays in their current test cycle.

SQL Clone enables teams to use one database per developer model, without causing additional administration or security concerns.

Realistic database testing

Both the development and operations team have a shared interest in stopping as many data-related ‘bugs’ as possible from reaching the production system. This requires team to perform integration, performance and acceptance testing, with a database that matches production as closely as possible, from as early as possible in the development cycle.

Certain types of database testing, such as unit testing, are well-suited to use of small, standard data sets, However, the QA team will struggle to do effective testing if data is old or doesn’t reflect accurately the characteristics and anomalies present in production data. Likewise, use of realistic data during integration tests will help the team find data compatibility issues earlier in the release pipeline, and reduce occurrence of data-related bugs. It will also increase confidence the performance results in test will realistically reflect production behavior, as well as make it much easier to reproduce a ‘production bug’ in other environments, and develop a fix quickly.

If a database contains only public data, then there is no reason why the clone used for test development and analysis, can’t match exactly what’s in production. However, if a database contains any personal, or otherwise sensitive, data, then we need to incorporate extra steps into database provisioning process to ensure this data is protected. Either we need to generate that data, rather than copy it, or implement a data masking or obfuscation strategy, which will prevent unauthorized access of this data, by replacing, anonymizing or removing sensitive pieces of information.

With SQL Clone, you have a few options, depending on requirements:

  1. Deploy all clones from a sanitized image of the production database – you can apply data masking to a database copy, before creating the image. If you need to mask data in index columns, you’ll need to find ways to preserve, as far as possible, the data distribution. For example, if permissible, your masking rules could keep the first few characters of a text column the same, and alter all subsequent characters.
  2. Use a data generator in conjunction with SQL Clone – if you can’t copy production data then you can integrate clone with SQL Data Generator, and create images and clones from a generated data set. Again, you may need to consider techniques to replicate the characteristics and distribution of the production data, such as by generating data from weighted lists.
  3. Copy-and-generate – copy the data you can, and mask it as required, generate the rest.

Techniques 1) and 3) may need to guard against leaving behind artefacts of the original data, such as in the transaction log. This is most often done by building a new version of a database from source control and then importing into it the data from an existing, sanitized copy of the same database version.

A PowerShell-driven DevOps toolchain

DevOps demands not only that people work together, but that software works together, on Windows or Linux, and across other database platforms. Because the delivery process is closely matched to the requirements of the organization, a delivery team and operations team need to be able to pick and choose from a range of specialist tools to automate the end-to-end database delivery process, incorporating building, testing, packaging, releasing, configuring and monitoring. These specialist tools must be able to communicate together and report progress and errors in a standard way.

On Windows, PowerShell provides the toolchain, and allow objects rather than documents to be passed. Windows DevOps tools tend to be PowerShell-aware, or even shipped as PowerShell cmdlets, but they can be useful participants with a Command-line Interface (CLI).

Shift left critical database delivery processes

A DevOps approach to database development, release and deployment, requires cooperation between development and operational teams, and this means working on operational concerns as early as possible during development, rather than leaving them to deployment. This ‘shifts left’ the use of operational knowledge into the development cycle, to reduce the time spent correcting development work in the light of problems only identified in deployment. As soon as possible in the development cycle, the team will practice deploying their changes to a “production” set up.

This principle is often referred to as “fail early, fail often“. Flaws in application architecture and deficiencies in the build, deployment and rollback procedures are discussed sooner rather than later, and the team continually refine their processes based on a continuous flow of feedback.

To achieve this requires good teamwork, and relies on tools that can underpin the deployment process by automating as much as possible. This is greatly assisted by a lightweight, automated and secure database provisioning mechanism, using SQL Clone and PowerShell.

By making the process of database build and ‘tear down’ quicker and less resource-intensive, tasks such as unit testing, basic build and integration testing can be more comprehensive. The operations team can set up an integration environment configured to reflect as closely as possible the production environment. Using SQL Clone’s PowerShell integration, they can automate the process of deploying a clone of the most recent, sanitized, image of the production database.

This will free the team, to a large extent, from the time-consuming logistics of provisioning, and mean far less time is required for many of the deployment processes. They can focus instead on improving their understanding of the support, monitoring, compliance, security and performance characteristics of the functionality they plan to deliver to production. It also means that the development work is deployed earlier and more often, so it is more visible to other teams within IT. The net result is fewer unpleasant surprises.

Summary

Database DevOps promises to make it easier to introduce many of the techniques that underlie the best Database Lifecycle Management practices, such as Continuous Integration and Continuous Delivery. These practices aim to allow an organization to deploy databases at the frequency they need, without introducing an unrealistic administration burden, or a burden on the network resources. It must achieve all this without compromising compliance with data privacy or security regulations.

Some of these practices demand a high level of cooperation, both between teams and between the tools that comprise the database delivery toolchain. SQL Clone is one component in this toolchain. It provides an automated means to copy and deploy databases between environments, quickly in a secure, automated, and auditable fashion. It does a single task and does it well.