Searching Flyway Migration Files using Grep and Regex
This article demonstrates a cross-RDBMS way of searching through a set of SQL migration files, in the right order, to get a narrative summary of what changes were made, or will be made, to one or more of the tables or routines within each migration file. Getting these summary reports, even from a set of SQL migrations, isn't difficult, but having a few examples makes it a lot quicker to get started.
In any migration-based project such as Flyway, you are no longer working from a build script, but rather a series of migration files designed to take the database from one version to the next. This means that it isn’t always obvious in which migration file a particular object was created or changed, or why. When, for example, was that column made NOT
NULL
? You need to search the history of the changes to a table or routine across an entire chain of migrations.
During development, you’ll also often need to search through the source files for the name of the objects on which you are working, and view the context, to understand changes have been made, and view the comments in the source to understand why. A common chore in SQL, for example, is to check on dependency information, to find out which objects depend on the table, view or function on which you’re working, and the objects on which your object depends. If a script has been correctly written, it will be clear from the comments associated with the script why things were done a certain way, and who did them.
This article tackles the task of searching Flyway migration files in order to get the ‘narrative of changes’. It demonstrates both simple GREP-style searches as well as more advanced search techniques with regular expressions (regex) that use the output of the Flyway info
command to ensure the scripts are read in the right version order. In a related article, Searching a Flyway Database, I expand on this topic to include searches of not only the migration files, but also the generated files such as build scripts, database models and documentation generated by my Flyway Teamwork Framework.
Grep-style searches of Flyway migrations scripts
I was a youngster when we first talked of ‘grepping’. Anyone reared on UNIX will know of grep, a command-line utility for searching text data for lines that match a regular expression. It is actually an acronym from the ed
command, g/re/p
(globally search for a regular expression and print matching lines), which was the original painful way of doing it.
Grep still survives. Don’t underestimate its value. Commercial versions such as PowerGREP can nowadays search for strings in Word files, Excel spreadsheets, PDF or Outlook and can do replacements as well as search. PowerShell has a version called Select-string
. It is pretty simple to use as a grep replacement, but the advantage is that the output is an object that allows you to easily process the matches that are found.
The drawback of Grep-style searching is that it is line-oriented, and so it is more useful for words than long strings. This is a drawback for searching SQL because it isn’t at all line oriented. We can’t, for example, use it for finding ALTER
statements for a particular object because the ALTER
statement might be on a different line to the name of the object. The best you can do is to look at the context, the lines before and after.
We can show how you can use Select-string
for simple searches, though. Imagine that we need to find out how the discounts
table has been altered over the run of SQL migrations.
1 2 3 4 5 |
#the file path with, if needed, wildcard characters $Path='<Path-To>\v*.sql' #the name of the object $pattern = 'discounts' ls $path|Select-String -Pattern $pattern -SimpleMatch |
<Path-To>\V1.1.1__Initial_Build.sql:129:PRINT N'Creating [dbo].[discounts]' <Path-To>\V1.1.1__Initial_Build.sql:131:CREATE TABLE [dbo].[discounts] <Path-To>\V1.1.1__Initial_Build.sql:422:PRINT N'Adding foreign keys to [dbo].[discounts]' <Path-To>\V1.1.1__Initial_Build.sql:424:ALTER TABLE [dbo].[discounts] ADD CONSTRAINT [FK__discounts__stor___173876EA] FOREIGN KEY ([stor_id]) REFERENCES [dbo].[stores] ([stor_id]) <Path-To>\V1.1.2__Pubs_Original_Data.sql:1501:PRINT 'Now at the inserts to discounts ....' <Path-To>\V1.1.2__Pubs_Original_Data.sql:1505:INSERT discounts VALUES <Path-To>\V1.1.3__UseNVarcharetc.sql:75:PRINT N'Altering [dbo].[discounts]' <Path-To>\V1.1.3__UseNVarcharetc.sql:79:ALTER TABLE [dbo].[discounts] ADD …and so on…
This is fine. We can add the context parameter to get to see the lines before and after
1 |
ls $path|Select-String -Pattern $pattern -SimpleMatch -Context 2 |
<Path-To>\V1.1.1__Initial_Build.sql:127: <Path-To>\V1.1.1__Initial_Build.sql:128:GO > <Path-To>\V1.1.1__Initial_Build.sql:129:PRINT N'Creating [dbo].[discounts]' <Path-To>\V1.1.1__Initial_Build.sql:130:GO > <Path-To>\V1.1.1__Initial_Build.sql:131:CREATE TABLE [dbo].[discounts] <Path-To>\V1.1.1__Initial_Build.sql:132:( <Path-To>\V1.1.1__Initial_Build.sql:133:[discounttype] [varchar] (40) COLLATE Latin1_General_CI_AS NOT NULL, …
OK, it works, but it isn’t that accurate. Already, in seeing the context, you will see a problem. It will pick up any mention of the table, and it doesn’t allow you to specify the schema, because, in SQL Server, there might, or might not, be square brackets.
We can make a bit of progress by picking up the name of the schema, which will give you the same result:
1 2 3 |
#the name of the object $pattern = '\[{0,1}dbo(\]\.){0,1}\[{0,1}discounts\]\.{0,1}' ls $path|Select-String -Pattern $pattern -Context 2 |
However, we need more than Grep!
More advanced Regex searches
PowerShell offers a way of constructing a more powerful search that scans the whole file at once to find objects. We want the result of every match, so we need to dip into the .NET library and use the Matches
method.
That’s fine, but if we want the search to return not just the objects themselves but what was being done to them, we also need to know the action (CREATE
, ALTER
or DROP
, for example)? No problem because regexes are pretty extendible!
Search a script for changes to a table
Here is a regex that illustrates how we do this. To keep things simple, we will just locate the string that contains the action statement (‘create’, ‘alter’ or ‘drop’) for the discounts
table, but there could be a block comment or a line comment separating the CREATE
TABLE
statement and the word discounts
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
$regex = [regex] '(?s)(?#Find create or alter for the table specified )(CREATE|ALTER|DROP)\s{1,100}?TABLE\s{1,10}(?# Ignore either block comments, IF EXISTS or inline comments )(/\*.{1,1000}\*/|--[\w\s\d]{1,1000}|IF EXISTS){0,1}(?# Ignore square brackets )\s{0,100}\[{0,1}dbo(\]\.){0,1}\[{0,1}discounts\]\.{0,1}' $Sql=@' CREATE TABLE /* it could have comments */ [dbo].[discounts]( [CountryRegionCode] [nvarchar](3) NOT NULL, [Name] [Name] NOT NULL, [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_CountryRegion_ModifiedDate] DEFAULT (GETDATE()) ) ON [PRIMARY]; GO '@ $allmatches = $regex.Matches($SQL); if ($allmatches.count -gt 0) {$allmatches|foreach{"We found '$($_.Value)'"}} else {"Sorry, no matches"} |
We found 'CREATE TABLE /* it could have comments */ [dbo].[discounts]'
With just a string, it is easy to try things out by altering the code to see what happens.
Right. We have a viable method for finding where the discount
table was created, dropped or altered. I also will let you view the $AllMatches
object.
Search a set of migration files
The next stage is to try this out on a group of files:
1 2 3 4 5 6 7 8 9 10 11 |
$Path = '<My Path To>FlywayTeamwork\Pubs\Branches\develop\Migrations' $TheNextFile = '' ls "$path\v*.sql" -Name -PipelineVariable TheNextFile | foreach{ [IO.File]::ReadAllText("$Path\$_"); } | foreach{ $allmatches = $regex.Matches($_); if ($allmatches.count -gt 0) { $allmatches | foreach{ "We found '$($_.Value)' in $TheNextFile" } } else { "Sorry, no matches in $TheNextFile" } } |
Sorry, no matches in V1.1.10__AddAddressesPhonesEtc.sql Sorry, no matches in V1.1.11__AddProcedureWithTest.sql Sorry, no matches in V1.1.12__AddTestData.sql We found 'CREATE TABLE [dbo].[discounts]' in V1.1.1__Initial_Build.sql We found 'ALTER TABLE [dbo].[discounts]' in V1.1.1__Initial_Build.sql Sorry, no matches in V1.1.2__Pubs_Original_Data.sql We found 'ALTER TABLE [dbo].[discounts]' in V1.1.3__UseNVarcharetc.sql We found 'ALTER TABLE [dbo].[discounts]' in V1.1.3__UseNVarcharetc.sql We found 'ALTER TABLE [dbo].[discounts]' in V1.1.3__UseNVarcharetc.sql We found 'ALTER TABLE [dbo].[discounts]' in V1.1.4__RenameConstraintsAdd tables.sql We found 'ALTER TABLE [dbo].[discounts]' in V1.1.4__RenameConstraintsAdd tables.sql Sorry, no matches in V1.1.5__Add_New_Data.sql Sorry, no matches in V1.1.6__Add_Tags.sql Sorry, no matches in V1.1.7__Add_Indexes.sql Sorry, no matches in V1.1.8__AddEditions.sql Sorry, no matches in V1.1.9__AddconditionalVersion.sql
We can, if your object names are unique, search for a list of objects. We just need to alter the regex to replace the word discounts
with a list of tables (authors|discounts|publications
, for example), and specify any schema.
Using named capturing groups to find create, alter or drop statements for a table
However, we can do better than this if we use named capturing groups to pick up the values of the action (CREATE
, ALTER
or DELETE
), table and schema. We could include comments that follow the name of the table, but that’s not necessary at this stage. With a refinement of this technique, we can get all sorts of further information about the table in this way, which is very easily searched.
1 2 3 4 5 6 7 8 9 10 |
$Regex=[regex]@' (?s)(?#Find create or alter for the table specified )(?<Action>(CREATE|ALTER|DROP))\s{1,100}?(?<Object>(INDEX|TABLE|TRIGGER|VIEW|FUNCTION))\s{1,10}(?# Ignore block comments, IF EXISTS or inline comments )(/\*.{1,1000}\*/|--[\w\s\d]{1,1000}|IF EXISTS){0,1}(?# treat square brackets if present for schema )\s{0,100}(?<Schema>(\[[\w\s\d]{1,1000}\]|[\w\d]{1,1000}))\.(?# and for the table! )(?<Name>(\[[\w\s\d]{1,1000}\]|[\w\d]{1,1000})) '@ |
You’ll see that we’ve created a named capturing group for the for the SQL action (?<Action>
) and for the schema and the object. This would enable us to filter by both schema and object in a search for the objects in which we are interested, and even to search just for deletion. We have, admittedly, written quite an opaque regex, but it allows more accurate and powerful searches.
Here it is in simple use:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
$Path = '<My Path To>FlywayTeamwork\Pubs\Branches\develop\Migrations' $TheNextFile = '' ls "$path\v*.sql" -Name -PipelineVariable TheNextFile | foreach{ $filename = "$Path\$TheNextFile"; [IO.File]::ReadAllText($filename) | foreach{ $allmatches = $regex.Matches($_); if ($allmatches.count -gt 0) { $allmatches | Select-Object Groups | Foreach{ $group = $_; $ThisMatch = [ordered]@{ }; $_.Groups | Where { $_.Name -match '\D' } | foreach { $What = $_; $TheValue = $_.Value $ThisMatch.Add($_.Name, $TheValue) } $ThisMatch.Add('filename', $TheNextFile) [pscustomobject]$ThisMatch } # for each match } #if there was a match } } | Out-GridView -Title 'Showing all Create/alter/drop actions on objects in scripts within Directory <MyPath>' |
Here’s the report for our sample Pubs Flyway project. On the display, I’ve ordered it by the name of the object:
Searching Flyway migrations in version order
So far, we’ve just looked at all the files in a single directory. Now, if you are working with a Flyway migration, you might want to see what objects were altered to take us up to the current version, or what is in future migrations. You might also want to see the migrations in version order.
The biggest problem is that these Flyway migrations could be in a list of paths, specified in configuration settings within several different config files. Rather than getting the files from the directories, we can get them from Flyway, by using the Flyway
info
command.
It is only a simple modification of the technique, but it saves a lot of effort when dealing with a Flyway migration. In fact, I’d say that getting the info
object from Flyway was the only realistic way. To tidy up the listings, I’ve removed the square-bracket delimiters from the final listing (these are only in the SQL Server dialect- otherwise it is double-quotes).
For the typical search task, you probably won’t want to look through migrations that haven’t yet been applied, so I’ve added a filter that ensures that only the migration files that have been applied to the database are included in the search via the line …
1 |
($_.state -ieq 'Success') |
…but it is easy to alter this if you want to do other types of search.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
cd <My Path To>FlywayTeamwork\Pubs\Branches\Develop . .\Preliminary.ps1 #only if you are using my Flyway Teamwork framework #If not using Flyway Teamwork, you'll need to provide credentials. $Migrations = Flyway info -outputType=json | convertfrom-json if ($Migrations.error -ne $null) { #if getting the migration list was unsuccessful write-warning $Migrations.error.message } else { #if getting the migration list was successful $ver=$migrations.schemaVersion; $db=$migrations.Database $migrations.migrations | where { ![string]::IsNullOrEmpty($_.filepath) -and ($_.type -ieq 'SQL') -and <a id="post-6945486-_Hlk136421060"></a>($_.state -ieq 'Success') } | foreach{ # read each file $filename = Split-Path $_.filepath -leaf; $fileVersion=$_.version; [IO.File]::ReadAllText("$($_.filepath)") | foreach{ $allmatches = $regex.Matches($_); if ($allmatches.count -gt 0) { $allmatches | Select-Object Groups | Foreach{ $group = $_; $ThisMatch = [ordered]@{ }; $group.Groups | Where { $_.Name -match '\D' } | foreach { $What = $_; $TheValue = $_.Value $TheValue= $TheValue.Replace('[','').Replace(']','') $ThisMatch.Add($_.Name, $TheValue) } $ThisMatch.Add('filename', $filename) $ThisMatch.Add('Version', $fileVersion) [pscustomobject]$ThisMatch } # for each match } #if there was a match }#end read file contents }| Out-GridView -Title "Showing all Create/alter/drop actions on objects in scripts for $db ($ver)" } |
With the result…
You’ll notice that the ALTER
statements are filtered first. That’s just the Grid view being perverse. The order that emerges from the pipeline is:
Action Object Schema Name filename Version ------ ------ ------ ---- -------- ------- CREATE TABLE dbo employee V1.1.1__Initial_Build.sql 1.1.1 ALTER TABLE dbo employee V1.1.1__Initial_Build.sql 1.1.1 CREATE TABLE dbo jobs V1.1.1__Initial_Build.sql 1.1.1 ALTER TABLE dbo jobs V1.1.1__Initial_Build.sql 1.1.1 CREATE TRIGGER dbo employee_insupd V1.1.1__Initial_Build.sql 1.1.1 CREATE TABLE dbo stores V1.1.1__Initial_Build.sql 1.1.1 ALTER TABLE dbo stores V1.1.1__Initial_Build.sql 1.1.1 CREATE TABLE dbo discounts V1.1.1__Initial_Build.sql 1.1.1 CREATE TABLE dbo publishers V1.1.1__Initial_Build.sql 1.1.1 ALTER TABLE dbo publishers V1.1.1__Initial_Build.sql 1.1.1 CREATE TABLE dbo pub_info V1.1.1__Initial_Build.sql 1.1.1 ALTER TABLE dbo pub_info V1.1.1__Initial_Build.sql 1.1.1 CREATE TABLE dbo titles V1.1.1__Initial_Build.sql 1.1.1 ALTER TABLE dbo titles V1.1.1__Initial_Build.sql 1.1.1 CREATE TABLE dbo roysched V1.1.1__Initial_Build.sql 1.1.1 CREATE TABLE dbo sales V1.1.1__Initial_Build.sql 1.1.1 ALTER TABLE dbo sales V1.1.1__Initial_Build.sql 1.1.1 CREATE TABLE dbo authors V1.1.1__Initial_Build.sql 1.1.1
Allowing for embedded comments in the code
So, when we allow for embedded comments within block comments, allow double quote characters to delimit a name that has illegal characters, and capture any comments, we finally end up with a regex that looks like this, which is probably as far as I’d want to go.
1 2 3 4 5 6 7 8 9 10 11 |
(?s)(?#Find create or alter for the table specified first find the action statement )(?<Action>(CREATE|ALTER|DROP))\s{1,100}?(?# And now the type of object )(?<Object>(INDEX|TABLE|TRIGGER|VIEW|FUNCTION))\s{1,10}(?# block comments with embedded /*..*/, IF EXISTS or inline comments )(?<Comment>(/\*(?>[^*/]+|\*[^/]|/[^*]|/\*(?>[^*/]+|\*[^/]|/[^*])*\*/)*\*/|--[\w\s\d]{1,1000}|IF EXISTS){0,1})(?# Find the schema, including any pesky square brackets or double-quotes if present for schema )\s{0,100}(?<Schema>((\[|")[\w\s\d]{1,1000}(\]|")|[\w\d]{1,1000}))\.(?# and now the object name )(?<Name>((\[|")[\w\s\d]{1,1000}(\]|")|[\w\d]{1,1000})) |
Find out what objects changed and where
I’ve taken the above Regex, which will will find CREATE
, ALTER
or DROP
statements for INDEX
, TABLE
, TRIGGER
, VIEW
or FUNCTION
in the SQL code within Flyway migration files, and incorporated it into a function called Find-MigrationSQLCode
. This function will allow you to search between any two versions to find out what changed where. It gets the list of successful migration files and uses these in the correct order, searching each one for the strings specified by the Regex.
One might think that it would be easier to just run simple GREP-style searches, using the location you’ve defined for your files, but you really only want the successfully executed ones, up to the current migration, all nicely sorted. Flyway locations are actually lists that can be found in several different flyway.conf files or even included in an environment variable.
All we need to do is change to the right working directory and execute this PowerShell, specifying the starting version as 1.1.6:
1 |
Find-MigrationSQLCode -FirstVersion '1.1.6' |
Conclusions
We’ve got a cross-RDBMS way of searching through SQL files to get a summary of what happened within each file. Getting these summary reports, even from a set of SQL migrations, isn’t hard. I’ve just given the basics here, but you’re bound to look at this and find things you don’t like, or you might think of extra information you need. That is the joy of automation in database development: you can get things working exactly as you like them. For a start, I’d want to pull in any documentation within that CREATE table
clause and the name of the table, and add it to the summary, so that it explains why it did the action. No great problem there.
Yes, Regex is ‘terra incognito’ for many database developers. You use LIKE
wildcards, sure, but Regex is less popular. There is good reason for this: a complex regex can sometimes miss a string you are searching for, and it may not be immediately obvious why. From my own experience as a card-carrying dyslexic, I need a good Regex IDE such as Regex Buddy to make it easy and testable, but Regex really is the only game in town for shredding and mangling the more complex text data. Any Developer or DBA who is doing the import of text data really needs to be confident with it. It is certainly a useful tool in automating DevOps processes.