Creating a Simple Development Harness with SQL Compare and SQL Data Generator
There is no single, correct approach to developing and testing your SQL batches and routines. It depends on the requirement. For simple tasks, like modifying and testing a view or stored procedure, SQL Compare and SQL Data Generator, combined, provide a useful, lightweight development harness.
Before you make an alteration to a database, it is normal to devise a strategy first. What sort of script do you need to work with? How do you update the database scripts that are in source control? What sort of database development harness do you need? What data do you use? You can, of course relentlessly use the same strategy for every bit of development work but that’s like SQL Server using the same query execution plan for every query.
A database development harness? Well, I mean a collection of database objects and test data sufficient to allow you to run the SQL batches and routines (mainly procedures, views and functions) that you’re developing, under varying conditions, and monitoring both their behavior and outputs.
The conventional idea of working invariably with an entire database that is fully populated with data could be a bad strategy. It can sometimes slow you down and discourage thorough testing; it all depends on what you’re doing. More likely, you can scope your work to a smaller group of database objects, filled with data that is more challenging and extreme in its variation and quirkiness. You can run this on a ‘disposable’ database that you can liberally sprinkle with logs, traps, and print statements without annoying anyone else. This gives you a lightness-of-foot, in the sense that you can run a series of unit tests much more quickly. Take, for example, the search function for a website. It uses just a very few of the tables and will generally be a procedure. You’d just need to work on the procedure and any tables that were there for the search function, filled with similar words, suggested searches, or common searches. You can do automated timings and log any issues as they happen.
If, like me, you prefer to work where possible with generated data, you can populate your harness generously because there are fewer table. You’re only dealing with a subset of the data.
I’ll often work on ideas in a single-user database, taken from the current build but with just the subset of tables and other objects that I need, and over which I have godlike powers. Once I have something with which I’m no longer ashamed, I’ll transfer it to a shared database, where I can iron out any conflicts with the work of other team members.
Here’s a fictitious example that will, I hope, make the point.
AdventureWorks mobile app developer wants a database change
The project manager of AdventureWorks Development, Ben, was just settling down to his after-lunch Minecraft session when he was interrupted by an application developer for AdventureWorks’ new mobile App.
“We need a change in the interface with the AdventureWorks 2016 database.”
Ben stared indignantly at the developer. “Change?”
“Yes, the view HumanResources.vEmployee
that we use to get employee data makes us do extra work to create full names and addresses. Why would we just want to know their middle names?’ Sure, we search on things like lastname
, but generally we only deal with complete names and addresses, and that is what we don’t have. It is awkward for us to do it in JavaScript, especially if one has a grid”
Ben began mentally thumbing through the standard countermoves Contravenes GDPR? Seems unlikely. The security team wouldn’t approve it! Good but overused. Deployment would take too long. Hmm, less likely now.
Sighing, he decided that he must reluctantly reveal his new DevOps persona. “Sure, we’re really pushed right now”, he started, trying to look tired and harassed, “but we’ll somehow manage to put it into the next sprint”.
When the dust settled and the rest of the database dev team somehow found other important things to do, Phil the developer stared sadly at the screen, pining to be allowed to finish his nifty text search algorithm, but faced with this irritation of a change.
First, out with SQL Compare. Phil the dev selected the current build as source and set the target as an empty database. Then he selected just the view he needed to alter:
Now for the SQL Compare options. It is possible to generate an error here. The vital option is found only in the dependencies step of the deployment wizard and is called Deploy all dependencies (IncludeDependencies
Alias: incd)
Because it is possible with SQL Compare to select just part of a database for a deployment script, it would be perfectly possible, without this option, to generate a script that cannot be executed to build the database. If, for example, a view depends on three tables, then it cannot be deployed unless these three tables are deployed with the view. With this option, the objects that are referenced by the objects you select are also included in the script, whether you selected them or not.
Basically, it can be used to build a subset of a much larger database. You can occasionally come unstuck unless your other options are well chosen. The defaults should be fine, but if, for example, you’ve opted to ignore constraints and you haven’t ignored extended properties, SQL Compare will try to add extended properties to non-existent constraints when it is building a database from scratch, and produce an error.
In this case, twenty-one dependent tables were identified as being required.
All Phil needs to do now is to create the script. He could, of course go ahead and execute the code in the target database, but he prefers not to. He saves the script and then runs it from SSMS just in case anything goes wrong, normally due to selecting the wrong options. It is easier to debug in SSMS.
Now a database that is empty of data isn’t any use. You could use SQL Data Compare to fill these tables, but Phil wouldn’t have this option because the employee data is entirely confidential.
He points SQL Data Generator at the empty database. It sniffs away at it and comes up with a strategy for filling it based on the names of the columns and the contents of CHECK
constraints. Occasionally, one can get static enumeration data such as, in the example below, type of phone. Don’t be tempted to get all the data from a real table because that defeats obfuscation. Also, it would, if it were possible, cut across the referential integrity that has been carefully worked out to tie the fake data together.
Phil goes through the data that is used by the view and makes sure that it is reasonable. He then checks out the view to see if the representation is reasonable.
Once it is all reasonable, and he can increase the amount of data in each table, as required, to do performance testing. A strategy can be saved as a SQL Data Generator project and re-used.
The next job is to create the two derived columns, to give the developers the full names and addresses that they need. To help with this, he quickly executes, from his store of metadata calls, the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
SELECT --Object_Schema_Name(p.object_id)+'.'+p.name AS name, String_Agg (r.name + ' ' + system_type_name + ' ' + CASE WHEN is_nullable = 0 THEN ' NOT' ELSE '' END + ' NULL' + CASE WHEN collation_name IS NULL OR collation_name = DatabasePropertyEx(Db_Name(), 'Collation') THEN '' ELSE ' COLLATE ' + collation_name END, ','+Char(13)+char(10) ) WITHIN GROUP( ORDER BY column_ordinal ASC) AS result FROM sys.dm_exec_describe_first_result_set( 'Select * from [HumanResources].[vEmployee]', NULL, 0 ) AS r WHERE r.is_hidden = 0 AND error_state IS NULL; |
This tells him which columns are nullable, and so must be treated differently in string concatenation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
BusinessEntityID int NOT NULL, Title nvarchar(8) NULL, FirstName nvarchar(50) NOT NULL, MiddleName nvarchar(50) NULL, LastName nvarchar(50) NOT NULL, Suffix nvarchar(10) NULL, JobTitle nvarchar(50) NOT NULL, PhoneNumber nvarchar(25) NULL, PhoneNumberType nvarchar(50) NULL, EmailAddress nvarchar(50) NULL, EmailPromotion int NOT NULL, AddressLine1 nvarchar(60) NOT NULL, AddressLine2 nvarchar(60) NULL, City nvarchar(30) NOT NULL, StateProvinceName nvarchar(50) NOT NULL, PostalCode nvarchar(15) NOT NULL, CountryRegionName nvarchar(50) NOT NULL, AdditionalContactInfo xml NULL |
With this information, it is easy to do the alterations, adding the two derived columns:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
ALTER VIEW [HumanResources].[vEmployee] AS SELECT e.BusinessEntityID, p.Title, p.FirstName, p.MiddleName, p.LastName, p.Suffix, e.JobTitle, pp.PhoneNumber, pnt.Name AS PhoneNumberType, ea.EmailAddress, p.EmailPromotion, a.AddressLine1, a.AddressLine2, a.City, sp.Name AS StateProvinceName, a.PostalCode, cr.Name AS CountryRegionName, p.AdditionalContactInfo, Coalesce(p.Title+' ','')+p.FirstName+' '+Coalesce( p.MiddleName+' ','')+ p.LastName +' '+ Coalesce(p.Suffix,'')+', '+e.JobTitle AS FullName, a.AddressLine1 +', '+ coalesce(a.AddressLine2+', ','') + a.City +', '+sp.Name +', '+ a.PostalCode+', '+ cr.Name AS Address FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON p.BusinessEntityID = e.BusinessEntityID INNER JOIN Person.BusinessEntityAddress AS bea ON bea.BusinessEntityID = e.BusinessEntityID INNER JOIN Person.Address AS a ON a.AddressID = bea.AddressID INNER JOIN Person.StateProvince AS sp ON sp.StateProvinceID = a.StateProvinceID INNER JOIN Person.CountryRegion AS cr ON cr.CountryRegionCode = sp.CountryRegionCode LEFT OUTER JOIN Person.PersonPhone AS pp ON pp.BusinessEntityID = p.BusinessEntityID LEFT OUTER JOIN Person.PhoneNumberType AS pnt ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID LEFT OUTER JOIN Person.EmailAddress AS ea ON p.BusinessEntityID = ea.BusinessEntityID; GO |
…And then he tries it out.
The addresses aren’t, of course, particularly convincing but this isn’t for UAT, and they are fine for development, though a higher percentage of NULL
values would have, in retrospect, been a good idea.
Once it was working to his satisfaction, and it takes a few attempts, Phil compares the updated view to one on the shared development server, and as the development version hasn’t changed in any way, he updates the existing version of the view in the shared development database to the new version of the view. Finally, he double-checks to ensure that the addition of two columns hasn’t broken anything that had used Select *
. Finally, he checks it in to source control. Job done.
He’d been pessimistically allowed a whole day for this feature, but after three hours, the change was checked in. Aha, Phil thought to himself, plenty of time left over for his pet skunkworks text-search project. Ben, conveniently, was in important meetings all afternoon so he could work undisturbed.
Conclusion
Some tools are too good to be kept just for DBAs. Some should be prised from their steely grip by developers and used for those many development tasks that would otherwise absorb tedious time to perform.
SQL Compare has a surprisingly useful role in development work, but I also rate SQL Data Generator highly, particularly for testing. Without tools like this, the work of increasing code quality would still be done, but less thoroughly and more slowly. An ideal development tool can be automated and perform routine task quickly. After all, Database Development shouldn’t be a chore, should it?