{"id":97434,"date":"2023-08-14T14:30:07","date_gmt":"2023-08-14T14:30:07","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=97434"},"modified":"2023-10-09T13:29:10","modified_gmt":"2023-10-09T13:29:10","slug":"working-around-schema-drift-sql-server","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/working-around-schema-drift-sql-server\/","title":{"rendered":"Working around schema drift in SQL Server"},"content":{"rendered":"<p>At Stack Overflow, our environment has multiple implementations of a largely &#8211; but not 100% &#8211; identical schema. (By &#8220;schema,&#8221; I mostly mean the set of tables within a database.) I sometimes need to retrieve or update data across a large number of sites. Let&#8217;s say, pulling or removing information from the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Users<\/code> table, and related tables, across all of our databases.<\/p>\n<h3>This gets complicated.<\/h3>\n<p>Because:<\/p>\n<ul>\n<li>Each of our <a href=\"https:\/\/stackexchange.com\/sites\" target=\"_blank\" rel=\"noopener\">Q &amp; A sites<\/a> has its own database, and each database has a table called <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Users<\/code> (and some metadata is split out into a separate <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">UserMetadata<\/code> table).<\/li>\n<li>A few sites are special, like <a href=\"https:\/\/area51.stackexchange.com\/\" target=\"_blank\" rel=\"noopener\">Area 51<\/a>, where the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Users<\/code> table looks a little different (it is wider, because the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">UserMetadata<\/code> split isn&#8217;t present here for historical reasons).<\/li>\n<li>A few others <em>look<\/em> like Q &amp; A databases, like <a href=\"https:\/\/chat.stackexchange.com\/\" target=\"_blank\" rel=\"noopener\">Chat<\/a>. They also contain a table called <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Users<\/code>, but here the table is used differently and the column structure is unique.<\/li>\n<li><a href=\"https:\/\/stackoverflow.co\/teams\/\" target=\"_blank\" rel=\"noopener\">Stack Overflow for Teams<\/a> is more complex. A team&#8217;s tables are shared with others within a single database, with the boundary being a schema instead of a database. There is no table called <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">dbo.Users<\/code>, but a database may contain multiple <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Users<\/code> tables, such as <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Team0001.Users<\/code>, <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Team0002.Users<\/code>, and so on. I&#8217;ll deal with this special case in a future post.<\/li>\n<\/ul>\n<p>IntelliSense highlights a few of these differences straight away:<\/p>\n<p><img decoding=\"async\" style=\"border: 1px solid #aaa;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/07\/schema-drift-1A.png\" alt=\"IntelliSense reveals schema drift\" \/><\/p>\n<p>Schema has &#8220;drifted&#8221; in these and other cases, for example when features are only applicable to one site, team, or product. Features like <a href=\"https:\/\/stackoverflow.co\/collectives\/\" target=\"_blank\" rel=\"noopener\">Collectives<\/a>, <a href=\"https:\/\/meta.stackexchange.com\/questions\/382019\/bookmarks-have-evolved-into-saves\" target=\"_blank\" rel=\"noopener\">Saves<\/a>, or <a href=\"https:\/\/stackoverflow.com\/help\/what-is-staging-ground\" target=\"_blank\" rel=\"noopener\">Staging Ground<\/a> can lead to new columns or even entire tables that are only present in some databases.<\/p>\n<p><strong>This is okay<\/strong>, and there are many ways to deal with drift &#8211; even drift by design. But that&#8217;s not the point of this post. Because even if I <em>could<\/em> convince everyone all of our databases should have identical schema, I can&#8217;t snap my fingers and make it happen overnight. In the meantime, I have to deal with these differences.<\/p>\n<h3>A fake example.<\/h3>\n<p>Imagine I want to collect a list of users with more than one answer and who mention groundhogs in the &#8220;About Me&#8221; section of their profile. In any Q &amp; A database, I can issue the following query:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> SELECT u.Id, u.AnswerCount\r\n   FROM Users AS u\r\n   INNER JOIN UserMetadata AS um\r\n   ON u.Id = um.UserId\r\n   WHERE um.AboutMe LIKE N'%groundhogs%'\r\n   AND u.AnswerCount &gt; 1;\r\n<\/pre>\n<p>That works fine across &#8220;normal&#8221; site databases.<\/p>\n<h3>But there are exceptions.<\/h3>\n<p>In the Area 51 database, that query fails. As mentioned above, there is no <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">UserMetadata<\/code> table, and <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">AnswerCount<\/code> is not aggregated anywhere. I have to write the query differently:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> SELECT u.Id, AnswerCount = COUNT(p.Id)\r\n   FROM Users AS u\r\n   INNER JOIN Posts AS p\r\n   ON u.Id = p.OwnerUserId\r\n   AND p.PostTypeId = 2\r\n   WHERE u.AboutMe LIKE N'%groundhogs%'\r\n   GROUP BY u.Id\r\n   HAVING COUNT(p.Id &gt; 1);\r\n<\/pre>\n<p>In the Chat database, this query makes no sense, because there are no answers. So I want an easy way for this database to be skipped entirely.<\/p>\n<h3>The problem.<\/h3>\n<p>I don&#8217;t want to hard-code lists of databases that happen to support one version of the schema or another <em>right now<\/em>. This reminds me of browser detection techniques that rely on the user agent string the browser presents, rather than testing the required functionality. Picture maintaining a list of all possible user agent strings and keeping it up to date. This is simply unmanageable, and I don&#8217;t want to do it for database names, either. I want it to be dynamic, so I don&#8217;t have to update some list somewhere every time a new database gets added, existing schema drift gets corrected, or new schema drift appears. And sometimes new schema drift will appear, like with new features, as mentioned before. We know about this drift; it goes through PRs and reviews, and is what I call <em>controlled drift<\/em>. The technique I use relies on knowing the schemas are well controlled, and being aware of changes coming out. So if someone goes rogue and creates a new table in <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Area51<\/code> called <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">dbo.MoreUserMetadata<\/code>, and starts stuffing information about groundhogs in a random column there without anyone&#8217;s knowledge, that data will go unnoticed. That is one caveat to the solution I&#8217;ll propose.<\/p>\n<h3>A piece of the puzzle.<\/h3>\n<p>I have to repeat this process for a growing list of queries. So, in our central <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">DBA<\/code> database, I have an <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">Actions<\/code> table that I use to validate the metadata in each database before attempting to run a given query there.<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> USE DBA;\r\n GO\r\n\r\n CREATE TABLE dbo.Actions\r\n (\r\n   ActionID     int,\r\n   QueryText    nvarchar(max),\r\n   CheckTable1  nvarchar(128),\r\n   CheckColumn1 nvarchar(128),\r\n   CheckTable2  nvarchar(128),\r\n   CheckColumn2 nvarchar(128)\r\n );<\/pre>\n<p>I put these queries into the table (minus the statement terminator, since these will later be placed inside a common table expression):<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> INSERT dbo.Actions\r\n (\r\n   ActionID,    QueryText, \r\n   CheckTable1, CheckColumn1, \r\n   CheckTable2, CheckColumn2\r\n )\r\n VALUES\r\n \r\n   (1, N'SELECT u.Id, u.AnswerCount\r\n         FROM Users AS u\r\n         INNER JOIN UserMetadata AS um\r\n         ON u.Id = um.UserId\r\n         WHERE um.AboutMe LIKE N''%groundhogs%''\r\n         AND u.AnswerCount &gt; 1',\r\n       N'Users',        N'AnswerCount',\r\n       N'UserMetadata', N'AboutMe'),\r\n\r\n   (2, N'SELECT u.Id, AnswerCount = COUNT(p.Id)\r\n         FROM Users AS u\r\n         INNER JOIN Posts AS p\r\n         ON u.Id = p.OwnerUserId\r\n         AND p.PostTypeId = 2\r\n         WHERE u.AboutMe LIKE N''%groundhogs%''\r\n         GROUP BY u.Id\r\n         HAVING COUNT(p.Id) &gt; 1',\r\n       N'Users', N'AboutMe', \r\n       NULL,     NULL);<\/pre>\n<p>As I loop through each database, I build dynamic SQL that checks for the proper metadata before replacing tokens and executing the query. To demonstrate, first, let&#8217;s set up a few databases that meet our various criteria:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> CREATE DATABASE StackOverflow;\r\n GO\r\n USE StackOverflow; -- only query 1 can run here\r\n GO\r\n CREATE TABLE dbo.Users(Id int, AnswerCount int);\r\n CREATE TABLE dbo.UserMetadata(UserId int, AboutMe nvarchar(max));\r\n INSERT dbo.Users VALUES(1,5),(2,0);\r\n INSERT dbo.UserMetadata VALUES(1,'I love groundhogs!');\r\n GO\r\n\r\n CREATE DATABASE Chat; -- neither query can run here\r\n GO\r\n USE Chat;\r\n GO\r\n CREATE TABLE dbo.Users(UserId int);\r\n INSERT dbo.Users VALUES(1),(2);\r\n GO\r\n\r\n CREATE DATABASE Area51; -- only query 2 can run here\r\n GO\r\n USE Area51;\r\n GO\r\n CREATE TABLE dbo.Users(Id int, AboutMe nvarchar(max));\r\n CREATE TABLE dbo.Posts(Id int, OwnerUserId int, PostTypeId int);\r\n INSERT dbo.Users VALUES(5,'I hate groundhogs!');\r\n INSERT dbo.Posts VALUES(1,5,2),(2,5,2);\r\n GO\r\n<\/pre>\n<p>I&#8217;ll use a cursor to step through each action within each database, and check if it has supporting metadata. I keep it flexible so that I can check one or two tables, and zero or one columns in each table. Why zero? For some queries, I know the columns exist in all databases, so I don&#8217;t need to check.<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> DECLARE @debug bit = 0; \/* set to 1 to see output instead of execution *\/\r\n\r\n DECLARE @go        bit,       \r\n         @actions   cursor, \r\n         @actionId  int,       \r\n         @db        sysname,         \r\n         @context   nvarchar(255),\r\n         @checkSQL  nvarchar(max),\r\n         @queryText nvarchar(max), \r\n         @t1        nvarchar(128), @c1 nvarchar(128), \r\n         @t2        nvarchar(128), @c2 nvarchar(128),\r\n         @tCheck    nvarchar(255) = N'EXISTS (SELECT 1 FROM sys.tables AS t\r\n                                      WHERE name = ',\r\n         @cCheck    nvarchar(256) = N'EXISTS (SELECT 1 FROM sys.columns AS c\r\n                                      WHERE c.object_id = t.object_id\r\n                                      AND c.name = ',\r\n         @params    nvarchar(max) = N'@t1 nvarchar(128), @c1 nvarchar(128), \r\n                                      @t2 nvarchar(128), @c2 nvarchar(128), \r\n                                      @go bit OUTPUT';\r\n\r\n \/* temp table to hold the results from each database *\/\r\n DROP TABLE IF EXISTS #Results;\r\n CREATE TABLE #Results(ActionID int, Source nvarchar(255), UserId int, AnswerCount int);\r\n\r\n \/* cursor to loop through each action for each database *\/\r\n SET @actions = cursor FOR \r\n SELECT act.ActionID, db.name, act.QueryText, \r\n        act.CheckTable1, act.CheckColumn1, \r\n        act.CheckTable2, act.CheckColumn2\r\n   FROM dbo.Actions AS act \r\n   CROSS JOIN sys.databases AS db\r\n   WHERE db.state = 0 AND db.database_id &gt; 4\r\n   ORDER BY db.name, act.ActionID;\r\n\r\n OPEN @actions;\r\n FETCH @actions INTO @actionId, @db, @queryText, @t1, @c1, @t2, @c2;\r\n\r\n WHILE @@FETCH_STATUS &lt;&gt; -1\r\n BEGIN\r\n   \/* set the right database context, and construct the metadata \r\n      check depending on whether we need to validate one or two \r\n      tables and\/or columns.\r\n\r\n      The tables\/columns are passed into the dynamic SQL based\r\n      on what comes back from the Actions table. *\/\r\n\r\n   SELECT @context  = QUOTENAME(@db) + N'.sys.sp_executesql',\r\n          @checkSQL = N'IF ' + @tCheck + N'@t1'\r\n           + CASE   WHEN @c1 IS NOT NULL THEN N' AND ' + @cCheck + N'@c1)' \r\n             ELSE N'' END + N')'\r\n           + CASE   WHEN @t2 IS NOT NULL THEN N' AND ' + @tCheck + N'@t2'\r\n             + CASE WHEN @c2 IS NOT NULL THEN N' AND ' + @cCheck + N'@c2)'\r\n               ELSE N'' END + N')' \r\n             ELSE N'' END\r\n           + N' SET @go = 1; ELSE SET @go = 0;';\r\n\r\n   IF @debug = 1\r\n   BEGIN\r\n     PRINT CONCAT_WS(char(13), @db, 'Action:', @ActionID, 'CheckSQL:', @checkSQL);\r\n     PRINT CONCAT('@t1\/@c1: ',@t1, '.' + @c1, char(13), '@t2\/@c2: ',@t2, '.' + @c2);\r\n   END\r\n\r\n   \/* check if the metadata exists in the target database: *\/\r\n   EXEC @context @checkSQL, @params, @t1, @c1, @t2, @c2, @go OUTPUT;\r\n \r\n   IF @debug = 1\r\n   BEGIN\r\n     PRINT CONCAT('Should action #', @actionId, ' run here (', @db, ')? ', @go);\r\n   END\r\n\r\n   \/* if it does, we're good to go! *\/\r\n   IF @go = 1\r\n   BEGIN\r\n     \/* wrap the query in a CTE so we can inject action\/DB: *\/\r\n     SET @queryText = N'WITH cte AS (' + @QueryText + ')\r\n                        SELECT @ActionID, DB_NAME(), * FROM cte;';\r\n \r\n     IF @debug = 1\r\n     BEGIN\r\n       PRINT CONCAT('Would have run:', char(13), @queryText);\r\n     END\r\n     ELSE\r\n     BEGIN\r\n       INSERT #Results(ActionID, Source, UserID, AnswerCount)\r\n         EXEC @context @queryText, N'@ActionID int', @actionId;\r\n     END\r\n   END\r\n\r\n   FETCH @actions INTO @actionId, @db, @queryText, @t1, @c1, @t2, @c2;\r\n END\r\n\r\n IF @debug = 0\r\n BEGIN\r\n   SELECT ActionID, Source, UserID, AnswerCount FROM #Results;\r\n END<\/pre>\n<h3>Results.<\/h3>\n<p>Even though I have very different schema across (many!) different databases, this technique allows me to pull a consolidated result without knowing which ones store the same data in slightly different ways:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/07\/schema-drift-B1.png\" alt=\"Results in spite of schema drift\" \/><\/p>\n<p>If you are trying to get a handle on what the code does, the parameter <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@debug<\/code> provides some insight. If you execute the script with <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@debug = 1<\/code>, you&#8217;ll see this output instead of the results. This shows what code is produced and executed by <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">sp_executesql<\/code>:<\/p>\n<pre class=\"show-plain-default:true theme:terminal font:consolas font-size:11.5 line-height:11 decode-attributes:false tab-convert:true lang:none \r\n decode:true whitespace-before:1 whitespace-after:1 highlight:0\"> Area51\r\n Action:\r\n 1\r\n CheckSQL:\r\n IF EXISTS (SELECT 1 FROM sys.tables AS t\r\n              WHERE name = @t1 AND EXISTS (SELECT 1 FROM sys.columns AS c\r\n              WHERE c.object_id = t.object_id\r\n              AND c.name = @c1)) AND EXISTS (SELECT 1 FROM sys.tables AS t\r\n              WHERE name = @t2 AND EXISTS (SELECT 1 FROM sys.columns AS c\r\n              WHERE c.object_id = t.object_id\r\n              AND c.name = @c2)) SET @go = 1; ELSE SET @go = 0;\r\n @t1\/@c1: Users.AnswerCount\r\n @t2\/@c2: UserMetadata.AboutMe\r\n Should action #1 run here (Area51)? 0\r\n Area51\r\n Action:\r\n 2\r\n CheckSQL:\r\n IF EXISTS (SELECT 1 FROM sys.tables AS t\r\n              WHERE name = @t1 AND EXISTS (SELECT 1 FROM sys.columns AS c\r\n              WHERE c.object_id = t.object_id\r\n              AND c.name = @c1)) SET @go = 1; ELSE SET @go = 0;\r\n @t1\/@c1: Users.AboutMe\r\n @t2\/@c2: \r\n Should action #2 run here (Area51)? 1\r\n Would have run:\r\n WITH cte AS (SELECT u.Id, AnswerCount = COUNT(p.Id)\r\n         FROM Users AS u\r\n         INNER JOIN Posts AS p\r\n         ON u.Id = p.OwnerUserId\r\n         AND p.PostTypeId = 2\r\n         WHERE u.AboutMe LIKE N'%groundhogs%'\r\n         GROUP BY u.Id\r\n         HAVING COUNT(p.Id) &gt; 1)\r\n       SELECT @ActionID, DB_NAME(), * FROM cte;\r\n ... repeat for every database ...\r\n<\/pre>\n<p>I&#8217;ll follow up soon with details on some of the more special cases.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At Stack Overflow, our environment has multiple implementations of a largely &#8211; but not 100% &#8211; identical schema. (By &#8220;schema,&#8221; I mostly mean the set of tables within a database.) I sometimes need to retrieve or update data across a large number of sites. Let&#8217;s say, pulling or removing information from the Users table, and&#8230;&hellip;<\/p>\n","protected":false},"author":341115,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143531],"tags":[4183],"coauthors":[158980],"class_list":["post-97434","post","type-post","status-publish","format-standard","hentry","category-featured","category-t-sql-programming-sql-server","tag-t-sql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97434","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/341115"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=97434"}],"version-history":[{"count":56,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97434\/revisions"}],"predecessor-version":[{"id":97611,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97434\/revisions\/97611"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=97434"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=97434"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=97434"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=97434"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}