{"id":3500,"date":"2012-02-28T01:00:00","date_gmt":"2012-02-28T01:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/database-migration-scripts-getting-from-place-a-to-place-b\/"},"modified":"2021-04-27T13:35:18","modified_gmt":"2021-04-27T13:35:18","slug":"database-migration-scripts-getting-from-place-a-to-place-b","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/database-migration-scripts-getting-from-place-a-to-place-b\/","title":{"rendered":"Database Migration Scripts: Getting from place A to place B"},"content":{"rendered":"<p>We&#8217;ll be looking at a typical database &#8216;migration&#8217; script which uses an unusual technique to migrate existing &#8216;de-normalised&#8217; data into a more correct form.<\/p>\n<p>So, the book-distribution business that uses the PUBS database has gradually grown organically, and has slipped into &#8216;de-normalisation&#8217; habits. What&#8217;s this? A new column with a list of tags or &#8216;types&#8217; assigned to books. Because books aren&#8217;t really in just one category, someone has &#8216;cured&#8217; the mismatch between the database and the business requirements. This is fine, but it is now proving difficult for their new website that allows searches by tags. Any request for history book really has to look in the entire list of associated tags rather than the &#8216;Type&#8217; field that only keeps the primary tag.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1437-Refactor.1.PNG\" alt=\"1437-Refactor.1.PNG\" \/><\/p>\n<p>We have other problems. The TypleList column has duplicates in there which will be affecting the reporting, and there is the danger of mis-spellings getting there.<\/p>\n<p>The reporting system can&#8217;t be persuaded to do reports based on the tags and the Database developers are complaining about the unCoddly things going on in their database.<\/p>\n<p>In your version of PUBS, this extra column doesn&#8217;t exist, so we&#8217;ve added it and put in 10,000 titles using <a href=\"http:\/\/www.red-gate.com\/products\/sql-development\/sql-data-generator\/?utm_source=simpletalk&amp;utm_medium=weblink&amp;utm_content=PhilFactorBlogAB&amp;utm_campaign=sqldatagenerator\">SQL Data Generator<\/a>.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true \">\/*\r\nSo how do we refactor this database?\r\nfirstly, we create a table of all the tags. \r\n*\/\r\n\r\nIF\u00a0\u00a0OBJECT_ID('TagName') IS NULL OR OBJECT_ID('TagTitle') IS NULL\r\n\u00a0\u00a0BEGIN\r\n\u00a0\u00a0CREATE TABLE\u00a0\u00a0TagName (TagName_ID INT IDENTITY(1,1) PRIMARY KEY ,\r\n\u00a0\u00a0\u00a0\u00a0 Tag VARCHAR(20) NOT NULL UNIQUE)\r\n\u00a0\u00a0\/* ...and we insert into it all the tags from the list (remembering to take out any leading spaces *\/\r\n\r\n\u00a0\u00a0INSERT INTO TagName (Tag) \r\n\u00a0\u00a0\u00a0\u00a0SELECT DISTINCT LTRIM(x.y.value('.', 'Varchar(80)')) AS [Tag] \r\n\u00a0\u00a0\u00a0\u00a0FROM \r\n\u00a0\u00a0\u00a0\u00a0(SELECT\u00a0\u00a0Title_ID,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0CONVERT(XML, '&lt;list&gt;&lt;i&gt;' + REPLACE(TypeList, ',', '&lt;\/i&gt;&lt;i&gt;') + '&lt;\/i&gt;&lt;\/list&gt;')\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS XMLkeywords\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0FROM\u00a0\u00a0 dbo.titles)g\r\n\u00a0\u00a0\u00a0\u00a0CROSS APPLY XMLkeywords.nodes('\/list\/i\/text()') AS x ( y )\r\n\r\n\u00a0\u00a0\/* we can then use this table to provide a table that relates tags to articles *\/\r\n\r\n\u00a0\u00a0CREATE TABLE TagTitle\r\n\u00a0\u00a0 (TagTitle_ID INT IDENTITY(1, 1),\r\n\u00a0\u00a0 [title_id] [dbo].[tid] NOT NULL REFERENCES titles (Title_ID),\r\n\u00a0\u00a0 TagName_ID INT NOT NULL REFERENCES TagName (Tagname_ID)\r\n\u00a0\u00a0 CONSTRAINT [PK_TagTitle]\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PRIMARY KEY CLUSTERED ([title_id] ASC, TagName_ID)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ON [PRIMARY])\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\r\n\u00a0\u00a0CREATE NONCLUSTERED INDEX idxTagName_ID\r\n\u00a0\u00a0ON\u00a0\u00a0TagTitle (TagName_ID)\r\n\u00a0\u00a0INCLUDE (TagTitle_ID,title_id)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\r\n\u00a0\u00a0\/* ...and it is easy to fill this with the tags for each title ... *\/\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\r\n\u00a0\u00a0INSERT INTO TagTitle (Title_ID, TagName_ID)\r\n\u00a0\u00a0\u00a0\u00a0SELECT DISTINCT Title_ID, TagName_ID \r\n\u00a0\u00a0\u00a0\u00a0 FROM \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (SELECT\u00a0\u00a0Title_ID,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0CONVERT(XML, '&lt;list&gt;&lt;i&gt;' + REPLACE(TypeList, ',', '&lt;\/i&gt;&lt;i&gt;') + '&lt;\/i&gt;&lt;\/list&gt;')\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS XMLkeywords\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0FROM\u00a0\u00a0 dbo.titles)g\r\n\u00a0\u00a0\u00a0\u00a0CROSS APPLY XMLkeywords.nodes('\/list\/i\/text()') AS x ( y )\r\n\u00a0\u00a0\u00a0\u00a0INNER JOIN TagName ON TagName.Tag=LTRIM(x.y.value('.', 'Varchar(80)'))\r\n\u00a0\u00a0\u00a0\u00a0END\r\n\u00a0\u00a0\u00a0\u00a0\r\n\/* That's all there was to it. Now we can select all titles that have the military tag, just to try things out *\/\r\nSELECT Title FROM titles\r\n\u00a0\u00a0INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID\r\n\u00a0\u00a0INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID\r\n\u00a0\u00a0WHERE tagname.tag='Military'\r\n\r\n\/* and see the top ten most popular tags for titles *\/\r\nSELECT Tag, COUNT(*) FROM titles\r\n\u00a0\u00a0INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID\r\n\u00a0\u00a0INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID\r\n\u00a0\u00a0GROUP BY Tag ORDER BY COUNT(*) DESC\r\n\r\n\/* and if you still want your list of tags for each title, then here they are *\/\r\nSELECT title_ID, title, STUFF(\r\n\u00a0\u00a0(SELECT ','+tagname.tag FROM titles thisTitle\r\n\u00a0\u00a0\u00a0\u00a0INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID\r\n\u00a0\u00a0\u00a0\u00a0INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID\r\n\u00a0\u00a0WHERE ThisTitle.title_id=titles.title_ID\r\n\u00a0\u00a0FOR XML PATH(''), TYPE).value('.', 'varchar(max)')\r\n\u00a0\u00a0,1,1,'')\u00a0\u00a0\r\n\u00a0\u00a0FROM titles\r\n\u00a0\u00a0ORDER BY title_ID<\/pre>\n<p>So we&#8217;ve refactored our PUBS database without pain. We&#8217;ve even put in a check to prevent it being re-run once the new tables are created. Here is the diagram of the new tag relationship<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1437-ERD.jpg\" alt=\"1437-ERD.jpg\" \/><\/p>\n<p>We&#8217;ve done both the DDL to create the tables and their associated components, and the DML to put the data in them. I could have also included the script to remove the de-normalised TypeList column, but I&#8217;d do a whole lot of tests first before doing that. Yes, I&#8217;ve left out the assertion tests too, which should check the edge cases and make sure the result is what you&#8217;d expect.<\/p>\n<p>One thing I can&#8217;t quite figure out is how to deal with an ordered list using this simple XML-based technique. We can ensure that, if we have to produce a list of tags, we can get the primary &#8216;type&#8217; to be first in the list, but what if the entire order is significant? Thank goodness it isn&#8217;t in this case. If it were, we might have to revisit a string-splitter function that returns the ordinal position of each component in the sequence.<\/p>\n<p>You&#8217;ll see immediately that we can create a synchronisation script for deployment from a comparison tool such as <a href=\"http:\/\/www.red-gate.com\/products\/sql-development\/sql-compare\/?utm_source=simpletalk&amp;utm_medium=weblink&amp;utm_content=PhilFactorBlogAB&amp;utm_campaign=sqlcompare\">SQL Compare<\/a>, to change the schema (DDL). On the other hand, no tool could do the DML to stuff the data into the new table, since there is no way that any tool will be able to work out where the data should go. We used some pretty hairy code to deal with a slightly untypical problem. We would have to do this migration by hand, and it has to go into source control as a batch. If most of your database changes are to be deployed by an automated process, then there must be a way of<\/p>\n<ul>\n<li>over-riding this part of the data synchronisation process to do this part of the process<\/li>\n<li>taking the part of the script that fills the tables, Checking that the tables have not already been filled, and executing it as part of the transaction.<\/li>\n<\/ul>\n<p>Of course, you might prefer the approach I&#8217;ve taken with the script of creating the tables in the same batch as the data conversion process, and then using the presence of the tables to prevent the script from being re-run.<\/p>\n<p>The problem with scripting a refactoring change to a database is that it has to work both ways. If we install the new system and then have to rollback the changes, several books may have been added, or had their tags changed, in the meantime. Yes, you have to script any rollback! These have to be mercilessly tested, and put in source control just in case of the rollback of a deployment after it has been in place for any length of time.<\/p>\n<p>I&#8217;ve shown you how to do this with the part of the script ..<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">\/* and if you still want your list of tags for each title, then here they are *\/\r\nSELECT title_ID, title, STUFF(\r\n\u00a0\u00a0(SELECT ','+tagname.tag FROM titles thisTitle\r\n\u00a0\u00a0\u00a0\u00a0INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID\r\n\u00a0\u00a0\u00a0\u00a0INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID\r\n\u00a0\u00a0WHERE ThisTitle.title_id=titles.title_ID\r\n\u00a0\u00a0FOR XML PATH(''), TYPE).value('.', 'varchar(max)')\r\n\u00a0\u00a0,1,1,'')\u00a0\u00a0\r\n\u00a0\u00a0FROM titles\r\n\u00a0\u00a0ORDER BY title_ID <\/pre>\n<p>&#8230;which would be turned into an UPDATE &#8230; FROM script.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">UPDATE titles SET\u00a0\u00a0typelist= ThisTaglist\r\nFROM\u00a0\u00a0 \r\n\u00a0\u00a0(SELECT title_ID, title, STUFF(\r\n\u00a0\u00a0\u00a0\u00a0(SELECT ','+tagname.tag FROM titles thisTitle\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID\r\n\u00a0\u00a0\u00a0\u00a0WHERE ThisTitle.title_id=titles.title_ID\r\n\u00a0\u00a0\u00a0\u00a0ORDER BY CASE WHEN tagname.tag=titles.[type] THEN 1 ELSE 0\u00a0\u00a0END DESC\r\n\u00a0\u00a0\u00a0\u00a0FOR XML PATH(''), TYPE).value('.', 'varchar(max)')\r\n\u00a0\u00a0\u00a0\u00a0,1,1,'')\u00a0\u00a0AS ThisTagList\r\n\u00a0\u00a0FROM titles)f\r\nINNER JOIN Titles ON f.title_ID=Titles.title_ID\r\n<\/pre>\n<p>\nYou&#8217;ll notice that it isn&#8217;t quite a round trip because the tags are in a different order, though we&#8217;ve managed to make sure that the primary tag is the first one as originally.<\/p>\n<p>So, we&#8217;ve improved the database for the poor book distributors using PUBS. It is not a major deal but you&#8217;ve got to be prepared to provide a migration script that will go both forwards and backwards. Ideally, database refactoring scripts should be able to go from any version to any other. Schema synchronization scripts can do this pretty easily, but no data synchronisation scripts can deal with serious refactoring jobs without the developers being able to specify how to deal with cases like this.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We&#8217;ll be looking at a typical database &#8216;migration&#8217; script which uses an unusual technique to migrate existing &#8216;de-normalised&#8217; data into a more correct form. So, the book-distribution business that uses the PUBS database has gradually grown organically, and has slipped into &#8216;de-normalisation&#8217; habits. What&#8217;s this? A new column with a list of tags or &#8216;types&#8217;&#8230;&hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,143516],"tags":[4500,4590,4502],"coauthors":[6813],"class_list":["post-3500","post","type-post","status-publish","format-standard","hentry","category-blogs","category-database-devops","tag-refactoring","tag-refactoring-databases","tag-refactoring-sql-server-databases"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3500","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=3500"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3500\/revisions"}],"predecessor-version":[{"id":90061,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3500\/revisions\/90061"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=3500"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=3500"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=3500"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=3500"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}