Product articles Flyway Team-based Development
Searching Flyway Migration Files using…

Searching Flyway Migration Files using Grep and Regex

This article demonstrates a cross-RDBMS way of searching through a set of SQL migration files, in the right order, to get a narrative summary of what changes were made, or will be made, to one or more of the tables or routines within each migration file. Getting these summary reports, even from a set of SQL migrations, isn't difficult, but having a few examples makes it a lot quicker to get started.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

In any migration-based project such as Flyway, you are no longer working from a build script, but rather a series of migration files designed to take the database from one version to the next. This means that it isn’t always obvious in which migration file a particular object was created or changed, or why. When, for example, was that column made NOT NULL? You need to search the history of the changes to a table or routine across an entire chain of migrations.

During development, you’ll also often need to search through the source files for the name of the objects on which you are working, and view the context, to understand changes have been made, and view the comments in the source to understand why. A common chore in SQL, for example, is to check on dependency information, to find out which objects depend on the table, view or function on which you’re working, and the objects on which your object depends. If a script has been correctly written, it will be clear from the comments associated with the script why things were done a certain way, and who did them.

This article tackles the task of searching Flyway migration files in order to get the ‘narrative of changes’. It demonstrates both simple GREP-style searches as well as more advanced search techniques with regular expressions (regex) that use the output of the Flyway info command to ensure the scripts are read in the right version order. In a related article, Searching a Flyway Database, I expand on this topic to include searches of not only the migration files, but also the generated files such as build scripts, database models and documentation generated by my Flyway Teamwork Framework.

Grep-style searches of Flyway migrations scripts

I was a youngster when we first talked of ‘grepping’. Anyone reared on UNIX will know of grep, a command-line utility for searching text data for lines that match a regular expression. It is actually an acronym from the ed command, g/re/p (globally search for a regular expression and print matching lines), which was the original painful way of doing it.

Grep still survives. Don’t underestimate its value. Commercial versions such as PowerGREP can nowadays search for strings in Word files, Excel spreadsheets, PDF or Outlook and can do replacements as well as search. PowerShell has a version called Select-string. It is pretty simple to use as a grep replacement, but the advantage is that the output is an object that allows you to easily process the matches that are found.

The drawback of Grep-style searching is that it is line-oriented, and so it is more useful for words than long strings. This is a drawback for searching SQL because it isn’t at all line oriented. We can’t, for example, use it for finding ALTER statements for a particular object because the ALTER statement might be on a different line to the name of the object. The best you can do is to look at the context, the lines before and after.

We can show how you can use Select-string for simple searches, though. Imagine that we need to find out how the discounts table has been altered over the run of SQL migrations.

<Path-To>\V1.1.1__Initial_Build.sql:129:PRINT N'Creating 
[dbo].[discounts]'
<Path-To>\V1.1.1__Initial_Build.sql:131:CREATE TABLE 
[dbo].[discounts]
<Path-To>\V1.1.1__Initial_Build.sql:422:PRINT N'Adding foreign keys 
to [dbo].[discounts]'
<Path-To>\V1.1.1__Initial_Build.sql:424:ALTER TABLE 
[dbo].[discounts] ADD CONSTRAINT [FK__discounts__stor___173876EA] FOREIGN KEY ([stor_id]) REFERENCES [dbo].[stores] 
([stor_id])
<Path-To>\V1.1.2__Pubs_Original_Data.sql:1501:PRINT 'Now at the 
inserts to discounts ....'
<Path-To>\V1.1.2__Pubs_Original_Data.sql:1505:INSERT discounts VALUES
<Path-To>\V1.1.3__UseNVarcharetc.sql:75:PRINT N'Altering 
[dbo].[discounts]'
<Path-To>\V1.1.3__UseNVarcharetc.sql:79:ALTER TABLE 
[dbo].[discounts] ADD
…and so on…

This is fine. We can add the context parameter to get to see the lines before and after

<Path-To>\V1.1.1__Initial_Build.sql:127:
  <Path-To>\V1.1.1__Initial_Build.sql:128:GO
> <Path-To>\V1.1.1__Initial_Build.sql:129:PRINT N'Creating 
[dbo].[discounts]'
  <Path-To>\V1.1.1__Initial_Build.sql:130:GO
> <Path-To>\V1.1.1__Initial_Build.sql:131:CREATE TABLE 
[dbo].[discounts]
  <Path-To>\V1.1.1__Initial_Build.sql:132:(
  <Path-To>\V1.1.1__Initial_Build.sql:133:[discounttype] [varchar] 
(40) COLLATE Latin1_General_CI_AS NOT NULL, 
…

OK, it works, but it isn’t that accurate. Already, in seeing the context, you will see a problem. It will pick up any mention of the table, and it doesn’t allow you to specify the schema, because, in SQL Server, there might, or might not, be square brackets.

We can make a bit of progress by picking up the name of the schema, which will give you the same result:

However, we need more than Grep!

More advanced Regex searches

PowerShell offers a way of constructing a more powerful search that scans the whole file at once to find objects. We want the result of every match, so we need to dip into the .NET library and use the Matches method.

That’s fine, but if we want the search to return not just the objects themselves but what was being done to them, we also need to know the action (CREATE, ALTER or DROP, for example)? No problem because regexes are pretty extendible!

Search a script for changes to a table

Here is a regex that illustrates how we do this. To keep things simple, we will just locate the string that contains the action statement (‘create’, ‘alter’ or ‘drop’) for the discounts table, but there could be a block comment or a line comment separating the CREATE TABLE statement and the word discounts.

We found 'CREATE TABLE /* it could have comments */
[dbo].[discounts]'

With just a string, it is easy to try things out by altering the code to see what happens.

Right. We have a viable method for finding where the discount table was created, dropped or altered. I also will let you view the $AllMatches object.

Search a set of migration files

The next stage is to try this out on a group of files:

Sorry, no matches in V1.1.10__AddAddressesPhonesEtc.sql
Sorry, no matches in V1.1.11__AddProcedureWithTest.sql
Sorry, no matches in V1.1.12__AddTestData.sql
We found 'CREATE TABLE [dbo].[discounts]' in V1.1.1__Initial_Build.sql
We found 'ALTER TABLE [dbo].[discounts]' in V1.1.1__Initial_Build.sql
Sorry, no matches in V1.1.2__Pubs_Original_Data.sql
We found 'ALTER TABLE [dbo].[discounts]' in V1.1.3__UseNVarcharetc.sql
We found 'ALTER TABLE [dbo].[discounts]' in V1.1.3__UseNVarcharetc.sql
We found 'ALTER TABLE [dbo].[discounts]' in V1.1.3__UseNVarcharetc.sql
We found 'ALTER TABLE [dbo].[discounts]' in V1.1.4__RenameConstraintsAdd tables.sql
We found 'ALTER TABLE [dbo].[discounts]' in V1.1.4__RenameConstraintsAdd tables.sql
Sorry, no matches in V1.1.5__Add_New_Data.sql
Sorry, no matches in V1.1.6__Add_Tags.sql
Sorry, no matches in V1.1.7__Add_Indexes.sql
Sorry, no matches in V1.1.8__AddEditions.sql
Sorry, no matches in V1.1.9__AddconditionalVersion.sql

We can, if your object names are unique, search for a list of objects. We just need to alter the regex to replace the word discounts with a list of tables (authors|discounts|publications, for example), and specify any schema.

Using named capturing groups to find create, alter or drop statements for a table

However, we can do better than this if we use named capturing groups to pick up the values of the action (CREATE, ALTER or DELETE), table and schema. We could include comments that follow the name of the table, but that’s not necessary at this stage. With a refinement of this technique, we can get all sorts of further information about the table in this way, which is very easily searched.

You’ll see that we’ve created a named capturing group for the for the SQL action (?<Action>) and for the schema and the object. This would enable us to filter by both schema and object in a search for the objects in which we are interested, and even to search just for deletion. We have, admittedly, written quite an opaque regex, but it allows more accurate and powerful searches.

Here it is in simple use:

Here’s the report for our sample Pubs Flyway project. On the display, I’ve ordered it by the name of the object:

Report of which objects changed in which migration files

Searching Flyway migrations in version order

So far, we’ve just looked at all the files in a single directory. Now, if you are working with a Flyway migration, you might want to see what objects were altered to take us up to the current version, or what is in future migrations. You might also want to see the migrations in version order.

The biggest problem is that these Flyway migrations could be in a list of paths, specified in configuration settings within several different config files. Rather than getting the files from the directories, we can get them from Flyway, by using the Flyway info command.

It is only a simple modification of the technique, but it saves a lot of effort when dealing with a Flyway migration. In fact, I’d say that getting the info object from Flyway was the only realistic way. To tidy up the listings, I’ve removed the square-bracket delimiters from the final listing (these are only in the SQL Server dialect- otherwise it is double-quotes).

For the typical search task, you probably won’t want to look through migrations that haven’t yet been applied, so I’ve added a filter that ensures that only the migration files that have been applied to the database are included in the search via the line …

…but it is easy to alter this if you want to do other types of search.

With the result…

Narrative summary of changes in a set of versioned flyway migration files

You’ll notice that the ALTER statements are filtered first. That’s just the Grid view being perverse. The order that emerges from the pipeline is:

Action Object  Schema Name              filename                                Version
------ ------  ------ ----              --------                                -------
CREATE TABLE   dbo    employee          V1.1.1__Initial_Build.sql               1.1.1  
ALTER  TABLE   dbo    employee          V1.1.1__Initial_Build.sql               1.1.1  
CREATE TABLE   dbo    jobs              V1.1.1__Initial_Build.sql               1.1.1  
ALTER  TABLE   dbo    jobs              V1.1.1__Initial_Build.sql               1.1.1  
CREATE TRIGGER dbo    employee_insupd   V1.1.1__Initial_Build.sql               1.1.1  
CREATE TABLE   dbo    stores            V1.1.1__Initial_Build.sql               1.1.1  
ALTER  TABLE   dbo    stores            V1.1.1__Initial_Build.sql               1.1.1  
CREATE TABLE   dbo    discounts         V1.1.1__Initial_Build.sql               1.1.1  
CREATE TABLE   dbo    publishers        V1.1.1__Initial_Build.sql               1.1.1  
ALTER  TABLE   dbo    publishers        V1.1.1__Initial_Build.sql               1.1.1  
CREATE TABLE   dbo    pub_info          V1.1.1__Initial_Build.sql               1.1.1  
ALTER  TABLE   dbo    pub_info          V1.1.1__Initial_Build.sql               1.1.1  
CREATE TABLE   dbo    titles            V1.1.1__Initial_Build.sql               1.1.1  
ALTER  TABLE   dbo    titles            V1.1.1__Initial_Build.sql               1.1.1  
CREATE TABLE   dbo    roysched          V1.1.1__Initial_Build.sql               1.1.1  
CREATE TABLE   dbo    sales             V1.1.1__Initial_Build.sql               1.1.1  
ALTER  TABLE   dbo    sales             V1.1.1__Initial_Build.sql               1.1.1  
CREATE TABLE   dbo    authors           V1.1.1__Initial_Build.sql               1.1.1

Allowing for embedded comments in the code

So, when we allow for embedded comments within block comments, allow double quote characters to delimit a name that has illegal characters, and capture any comments, we finally end up with a regex that looks like this, which is probably as far as I’d want to go.

Find out what objects changed and where

I’ve taken the above Regex, which will will find CREATE, ALTER or DROP statements for INDEX, TABLE, TRIGGER, VIEW or FUNCTION in the SQL code within Flyway migration files, and incorporated it into a function called Find-MigrationSQLCode. This function will allow you to search between any two versions to find out what changed where. It gets the list of successful migration files and uses these in the correct order, searching each one for the strings specified by the Regex.

One might think that it would be easier to just run simple GREP-style searches, using the location you’ve defined for your files, but you really only want the successfully executed ones, up to the current migration, all nicely sorted. Flyway locations are actually lists that can be found in several different flyway.conf files or even included in an environment variable.
All we need to do is change to the right working directory and execute this PowerShell, specifying the starting version as 1.1.6:

Conclusions

We’ve got a cross-RDBMS way of searching through SQL files to get a summary of what happened within each file. Getting these summary reports, even from a set of SQL migrations, isn’t hard. I’ve just given the basics here, but you’re bound to look at this and find things you don’t like, or you might think of extra information you need. That is the joy of automation in database development: you can get things working exactly as you like them. For a start, I’d want to pull in any documentation within that CREATE table clause and the name of the table, and add it to the summary, so that it explains why it did the action. No great problem there.

Yes, Regex is ‘terra incognito’ for many database developers. You use LIKE wildcards, sure, but Regex is less popular. There is good reason for this: a complex regex can sometimes miss a string you are searching for, and it may not be immediately obvious why. From my own experience as a card-carrying dyslexic, I need a good Regex IDE such as Regex Buddy to make it easy and testable, but Regex really is the only game in town for shredding and mangling the more complex text data. Any Developer or DBA who is doing the import of text data really needs to be confident with it. It is certainly a useful tool in automating DevOps processes.

Tools in this post

Flyway

DevOps for the Database

Find out more