{"id":71948,"date":"2017-08-02T08:35:28","date_gmt":"2017-08-02T08:35:28","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=71948"},"modified":"2021-05-11T15:57:18","modified_gmt":"2021-05-11T15:57:18","slug":"automated-database-provisioning-development-testing","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/tools-sql-server\/automated-database-provisioning-development-testing\/","title":{"rendered":"Automated Database Provisioning for Development and Testing"},"content":{"rendered":"<p>It\u2019s a snap to do the first release of any piece of code, because nothing stands in the way: no compatibility issues, no dependency problems. It\u2019s different when you have existing code that you have to maintain while you add the new features. When you have to adjust the existing functionality safely, things can get tricky very fast. This is why it is so important to have a set of tests for your code. You need to be confident that everything about your existing functionality continues to work while you adjust the existing code.<\/p>\n<p>Now, let\u2019s toss in the complication of having databases to process. All that testing just became that much harder because you also need to deal with the fact that you have existing data. How will your code-changes affect that data? To ensure fast, safe deployments, you will need to have comprehensive tests for the database, as well as for your code.<\/p>\n<p>This means that you will need both development and test databases, maybe several of them, with as much data as your production system, and as similar as possible in its distribution. You may even need to have copies of your production databases in your testing environments and development environments. If so, then you will probably now have to deal with the process of cleaning that data to remove or modify sensitive information, because of regulatory requirements or just plain common sense. You then have to have mechanisms in place to provision your development and test environments with all this clean data. That\u2019s a lot of work, and it will take time. You also have to account for the fact that your databases might be large, which means your database provisioning will take more time and require more storage space.<\/p>\n<p>Redgate SQL Clone can help with provisioning a large database in a timely manner without straining disk storage, whether the data is artificially generated or is the \u2018clean\u2019 anonymized version of the production data. This article is going to explain how you would automate these processes using native SQL Server tooling and then show how you can automate the same processes using SQL Clone.<\/p>\n<h2>The Setup<\/h2>\n<p>I\u2019m going to use a copy of the StackOverflow database, which weighs in at around 118 GB, to represent our production database. I&#8217;m assuming that you already have a process in place for taking a nightly full backup, and restoring it to a secondary server for testing, so our automated provisioning process will start from the point where the latest production backup has already been restored to a secondary server.<\/p>\n<p>From there, we&#8217;ll need to run through the following automation steps, first using native SQL Server tools and then using SQL Clone:<\/p>\n<table>\n<thead>\n<tr>\n<td>&nbsp;<\/td>\n<td>\n<p><strong>Automation steps using SQL Server tools<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>Automation steps using SQL Clone<\/strong><\/p>\n<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\n<p>1.<\/p>\n<\/td>\n<td>\n<p>Clean sensitive data out of the database prior to releasing to testing and development<\/p>\n<\/td>\n<td>\n<p>Clean sensitive data out of the database prior to releasing to testing and development<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>2.<\/p>\n<\/td>\n<td>\n<p>Back up the cleaned-up database<\/p>\n<\/td>\n<td>\n<p>Create an image of the cleaned-up database<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>3.<\/p>\n<\/td>\n<td>\n<p>Restore the clean database to a test environment.<\/p>\n<\/td>\n<td>\n<p>Create clone in the test environment<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>4.<\/p>\n<\/td>\n<td>\n<p>Restore the clean database to a development environment.<\/p>\n<\/td>\n<td>\n<p>Create clone in the development environment<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>For this set of tests, I\u2019m going to do everything on a single SQL Server instance, merely changing the database names to represent each environment. In reality, we&#8217;d be working with three different instances; one for clean-up, one for testing and one for development, but that wouldn\u2019t change any of the automation steps, beyond requiring different connection strings for each instance. You will have to imagine the additional network burden.<\/p>\n<p>Having established an automated process for producing a clean copy of the production database, each night, steps 3 and 4 would ideally be automated and secured to the point where &#8220;self-service&#8221; becomes possible. In other words, so that testers can quickly spin up multiple database copies for performing parallel database test runs, with realistic data, and so that developers can quickly create multiple database copies to, for example, compare side-by-side the behavior of two possible code solutions to the same problem.<\/p>\n<p>In each case, the automation step would just be run locally in the same way that I\u2019m going to automate it as part of the larger process. It would only involve getting the local restore process run.<\/p>\n<h2>Automation Using SQL Server Tools<\/h2>\n<p>Setting up the scheduling or triggering of the automation script is a simple part of this process, so I\u2019ll leave that for you to figure out on your own. Instead, I\u2019m going to focus on the PowerShell, yes, PowerShell, commands you need to use. If you\u2019re not already using PowerShell to automate your system processes, the time to get started with it is long overdue.<\/p>\n<p>Much of what we\u2019re going to do uses T-SQL, but you need a way to control T-SQL across different environments, which in the absence of PowerShell would require setting up Linked Servers and other mechanisms that, frankly, most smart DBAs and database developers avoid where possible.<\/p>\n<p>With our Stackoverflow database already restored from backup, we\u2019ll run a simple clean-up script. modifying all emails to use the standard, safe, email address of \u2018noemail@example.com\u2019. I had to bump up the default <strong>QueryTimeout<\/strong> because I\u2019m modifying so much data on a machine that\u2019s not that fast. This is something you\u2019ll want to keep an eye out for as you create your own data cleansing scripts. More complex clean-up scripts will usually take longer to run.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">#1 Clean production data from newly restored database\r\n$CleanCommand = \"UPDATE dbo.Users\r\nSET EmailHash = 'NoEmail@example.com';\"\r\nInvoke-Sqlcmd -Query $CleanCommand -ServerInstance $Server -Database \"SoDirty\" -QueryTimeOut 400 \r\n<\/pre>\n<p>In a real cleansing operation, you&#8217;d be performing more complex cleansing procedures, and would also need several subsequent operations to ensure that no artefacts remain in the logs, data pages, buffers, or in the query store, that would allow a technical user to defeat your masking. However, such details are beyond the scope of this article and would be a distraction from its main theme.<\/p>\n<p>With the data cleansing completed, I will back up the now-clean database:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">#2 Backup SoDirty as SoClean.bak\r\nBackup-SqlDatabase -ServerInstance $Server -Database \"SoDirty\" -BackupFile \"F:\\bu\\SoClean.bak\" -Initialize\r\n<\/pre>\n<p>From there, steps #3 and #4 are effectively the same, restore the clean database the test and development servers:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">#3 Set up Testing database as SoTest\r\n$MoveData = New-Object Microsoft.SqlServer.Management.Smo.RelocateFile(\"StackOverFlow\",\"f:\\data\\SoTest.mdf\")\r\n$MoveLog = New-Object Microsoft.SqlServer.Management.Smo.RelocateFile(\"StackOverFlow_Log\",\"f:\\log\\SoTest.ldf\")\r\nRestore-SqlDatabase -ServerInstance $Server -Database \"SoTest\" -BackupFile \"F:\\bu\\SoClean.bak\" -RelocateFile @($MoveData, $MoveLog)\r\n\r\n#4 Set up Development database as SoDev\r\n$MoveData = New-Object Microsoft.SqlServer.Management.Smo.RelocateFile(\"StackOverFlow\",\"f:\\data\\SoDev.mdf\")\r\n$MoveLog = New-Object Microsoft.SqlServer.Management.Smo.RelocateFile(\"StackOverFlow_Log\",\"f:\\log\\SoDev.ldf\")\r\nRestore-SqlDatabase -ServerInstance $Server -Database \"SoDev\" -BackupFile \"F:\\bu\\SoClean.bak\" -RelocateFile @($MoveData, $MoveLog)\r\n<\/pre>\n<p>Of course, restoring multiple test and development databases to several different machines won\u2019t, usually, run in a serial fashion like this. If it&#8217;s a self-service process then developers and testers will restore copies on an ad-hoc basis, and each restores will take the same time as step #3.<\/p>\n<p>If this were a fully automated end-to-end process then if step #3 takes X mins and you need to restore 8 database copies, then you know that a serial process will take at least 8X. Depending on your set up, you may be able to reduce this somewhat by multi-threading the restores.<\/p>\n<h3>SQL Server Tools: Results<\/h3>\n<p>The following times where recorded when running these scripts against a modest Azure Virtual Machine running the latest version of SQL Server 2016 and the latest StackOverflow database:<\/p>\n<table>\n<thead>\n<tr>\n<td>\n<p>Step<\/p>\n<\/td>\n<td>\n<p>Time<\/p>\n<\/td>\n<td>\n<p>Running Total<\/p>\n<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\n<p>Clean Data<\/p>\n<\/td>\n<td>\n<p>2 minutes<\/p>\n<\/td>\n<td>\n<p>2 minutes<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Create New Backup<\/p>\n<\/td>\n<td>\n<p>57 minutes<\/p>\n<\/td>\n<td>\n<p>59 minutes<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Restore Test DB<\/p>\n<\/td>\n<td>\n<p>93 minutes<\/p>\n<\/td>\n<td>\n<p>152 minutes<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Restore Dev DB<\/p>\n<\/td>\n<td>\n<p>97 minutes<\/p>\n<\/td>\n<td>\n<p>249 minutes<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The times to highlight are the 57 minutes to create the clean backup and then about 90 minutes that each additional database will take to restore from the clean backup. Those times are not going to change and each of those 90 minute restores are necessary before any given environment is ready for use, so the idea of refreshing an environment more than once in a day is unlikely to be conducive to most work streams.<\/p>\n<h2>Automation Using SQL Clone<\/h2>\n<p>To repeat the setup, but using SQL Clone, the core process is effectively the same. I\u2019m going to assume that we start at the same place, a restored, but dirty, database. We arrive at the following process:<\/p>\n<ol>\n<li>Clean sensitive data out of that database prior to releasing to testing and development.<\/li>\n<li>Create a new image of the cleaned data<\/li>\n<li>Create a clone of the clean image in a test environment.<\/li>\n<li>Create a clone of the clean image in a development environment.<\/li>\n<\/ol>\n<p>We\u2019ll once again use PowerShell to run through this process. First, we connect to the clone server:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">#Connect to SQL Clone\r\n$SQLCloneServer= \"http:\/\/devvm:14145\"\r\nConnect-SqlClone -ServerUrl $SQLCloneServer\r\n\r\n#Establish locations for SQL Clone storage\r\n$BackupFolder = Get-SqlCloneBackupLocation -Path \"f:\\bu\"\r\n$ImageDestination = Get-SqlCloneImageLocation -Path \"f:\\images\"\r\n$Server = 'DEVVM'\r\n<\/pre>\n<p>We\u2019re ready to clean up the data:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">#1 - Run scripts to manually clean data\r\n$CleanCommand = \"UPDATE dbo.Users\r\nSET EmailHash = 'NoEmail@example.com';\"\r\nInvoke-Sqlcmd -Query $CleanCommand -ServerInstance \"$($Server)\" -Database \"SoDirtyClone\" -QueryTimeout 400\r\n<\/pre>\n<p>You\u2019ll note that this is identical to Step #1 of the process using SQL Server tools. The cleanup scripts we have will be the same whether we\u2019re using SQL Clone or not.<\/p>\n<p>Now we have to create an image of the clean database:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">#2 - Create a new image from clean database\r\nNew-SqlCloneImage -Name \"SoClean\" -SqlServerInstance $SqlInstance -DatabaseName \"SoDirtyClone\" -Destination $ImageDestination | Wait-SqlCloneOperation\r\n$CleanImage = Get-SqlCloneImage -Name \"SoClean\"\r\n<\/pre>\n<p>This is the one operation in the SQL Clone process that is costly in terms of time and resources. Creating an image is the tax we pay in order for the clone processing to be as fast as it is. With the image in place, we can create clones for Testing and Development:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">#3 - Create test database clone\r\nNew-SqlClone -Image $CleanImage -Name \"SoTestClone\" -Location $SqlInstance | Wait-SqlCloneOperation\r\n\r\n#4 - Create Dev database clone\r\nNew-SqlClone -Image $CleanImage -Name \"SoDevClone\" -Location $SqlInstance | Wait-SqlCloneOperation\r\n<\/pre>\n<p>Similar to running the restore, every time you recreate a clean image, you\u2019re just repeating the same process. Unlike the restore process, each of the clones you\u2019re going to run in all your various test and development instances will only take about 15-20 seconds. Speaking of which, let\u2019s see the results of the SQL Clone test provisioning process.<\/p>\n<h3>SQL Clone: Results<\/h3>\n<table>\n<thead>\n<tr>\n<td>\n<p>Step<\/p>\n<\/td>\n<td>\n<p>Time<\/p>\n<\/td>\n<td>\n<p>Total<\/p>\n<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\n<p>Clean Data<\/p>\n<\/td>\n<td>\n<p>2:29 minutes<\/p>\n<\/td>\n<td>\n<p>2:29 minutes<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Create Clean Image<\/p>\n<\/td>\n<td>\n<p>91 minutes<\/p>\n<\/td>\n<td>\n<p>93:29 minutes<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Create Test Clone<\/p>\n<\/td>\n<td>\n<p>12 seconds<\/p>\n<\/td>\n<td>\n<p>93:41 minutes<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Create Dev Clone<\/p>\n<\/td>\n<td>\n<p>6 seconds<\/p>\n<\/td>\n<td>\n<p>93:47 minutes<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>As you can see, the cost, the tax if you will, for getting the incredibly fast access of a cloned database is the time it takes to create an Image. However, the Image creation process is only used in limited steps, similar to the backup in the original approach. You then use that Image multiple times, saving about 90 minutes each time you create a Clone. Additionally, you\u2019re only moving the data the once, at the time you create the Image; so you\u2019re also saving disk space in all the places where you create a Clone<\/p>\n<h2>Conclusion<\/h2>\n<p>It\u2019s pretty easy to see that it takes just about the same amount of work to set up automation using SQL Server tools to do a restore or to use SQL Clone. The amount of code involved is nearly identical. The differences come in two places. First, and most dramatic, is the amount of time saved. Going from 249 minutes to 94 minutes in total across this whole process is a significant time saving.<\/p>\n<p>In addition, you have savings in terms of storage, because with SQL Clone you do not require 100 GB per instance in Development and QA, for each restored database copy. Each clone will occupy only tens of MBs of disk space.<\/p>\n<p>As you increase the number of servers needing provisioning after the cleaning process, SQL Clone becomes more and more attractive.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If your development team needs to work on anonymised copies of the current  production database, and if changes are being delivered rapidly as well, that could mean a lot of time and routine DevOps work copying databases. SQL Clone was designed for tasks like this. Grant Fritchey investigates whether  you save time, effort and scripting over  the more traditional approach, and at what point it makes sense to use it.&hellip;<\/p>\n","protected":false},"author":221792,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143516,143532],"tags":[68855],"coauthors":[6785],"class_list":["post-71948","post","type-post","status-publish","format-standard","hentry","category-database-devops","category-tools-sql-server","tag-sql-provision"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/71948","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221792"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=71948"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/71948\/revisions"}],"predecessor-version":[{"id":71958,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/71948\/revisions\/71958"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=71948"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=71948"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=71948"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=71948"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}