Searching a Flyway Database
As a database gets larger, and development more complex, so it becomes increasingly necessary to be able to search for strings in the source files and the database itself. Maybe you need to find when a table first got created, when a foreign key was added, or to find out which tables lack documentation. I'll show you how to answer these sorts of questions by running simple 'wildcard' searches on your Flyway migration files, or source files, as well as more targeted searches on certain parts of your database model.
To home in on detail, you have source control of course, but often you just want a broader picture. You may be fixing a problem or running checks before refactoring. You might want to find any tables that refer to an entity such as an ’employee’, or to remove some tables or views that seem superfluous but might still be used by one or two legacy procedures or functions (soft references). You might need to investigate a column referred to in an obscure error message, or a reference to an object in a rogue query plan. It is hard to predict what, precisely, you’re likely to need to know about the database, so you need an ad-hoc solution to finding information, as if you were doing a Google search.
Where do you want to search?
It very much depends on what you are searching for and why. If you are interested in the present database, as a working system then you’ll want to search the names of objects in the current version. If you’re interested in how it got to its present state, then you’ll want to through all past migrations, or just recent versions. In either case, you’d probably want to look through the comments and documentation.
if a build script exists for the current version that’s where I’d generally start. A build script is very much in the comfort zone of the average database developer. It is very easily searched and is especially useful if it includes documentation blocks (/*
and */
) and the code of comments attached to objects (extended properties in SQL Server, comments in MySQL and PostgreSQL).
Searching Flyway migration files
In Flyway, it is good to be able to explore your migration files, because this will allow you to search through the whole history of changes. This is especially helpful when the migrations have the /* */
documentation blocks that explain what the code does and why.
We can find all the migrations files for the current Flyway database, as specified by the Flyway locations. If you’re a Flyway user, you’ll already know the locations of the migration files for the current database so you can simply search these locations by supplying the paths as string literals. For a reliable and more general solution, we’d need to interrogate the Flyway configuration files.
The locations in the flyway.conf files consist of a list of file locations, which can be absolute or relative file references and can include wildcards. Every time Flyway is executed, it checks for new SQL-based migrations at runtime, by scanning the filesystem and Java classpath. It will pick up any new SQL migrations as long as they conform to the configured naming convention. Any scan of these files is recursive. All migrations in non-hidden directories below the specified ones must be picked up as well. To search all the migrations, this sort of brute-force search can take quite a while with a large project. We also need to convert the list of file locations into an array of absolute paths, without Flyway’s ‘filesystem:
‘, ‘classpath:
‘, ‘gcs:
‘ or ‘s3:
‘ qualifiers because they are incomprehensible to PowerShell.
If you’re using my Flyway Teamwork Framework, it does all this hard work for you. It scans the settings for the flyway.conf in both the user and current flyway directories, as well as environment settings, taking the last assignment only. It takes the list of Flyway locations, resolves the wildcards and relative filepaths, and delivers all the information as a value (an array) to $dbDetails.migrationsLocation
, within the $dbDetails
hashtable, when you execute preliminary.ps1.
Searching source scripts, models, documentation
The most common pain point for teams, when moving to a Flyway database development, seems to be the lack an easy way to show just the current state of the database.
Fortunately, when working with my Flyway Teamwork framework, you can generate and save a lot of useful ‘build artifacts’, for every successful migration run. This includes object-level source scripts (the $CreateScriptFoldersIfNecessary
task), build scripts ($CreateBuildScriptIfNecessary
), a JSON database model($SaveDatabaseModelIfNecessary
) and JSON-based database documentation ($ExecuteTableDocumentationReport
). We can then do plenty of useful searches on these files, especially with RDBMSs such as MySQL that keep comments with the table source.
My previous article, Simple Reporting with Flyway and Database Models, showed how to generate the database model and run simple reports to help you understand the structure of your databases. In this article, I’ll expand these techniques to include searches on these other artefacts. I’ll demo how to, for example, inspect and search through many directories of migration files for where that pesky duplicate index was created, or for the block comment that explains that mysterious view that is causing trouble.
Please refer to the above article for the basics of setting up a project within Flyway teamwork framework. You can get all the files you need from the Teamwork project on GitHub. To try out the examples in this article, you can run the following code to generate the required artefacts for each new database version created by a Flyway migration run:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
cd MyPathToTheFlywayProjectFolder . '.\preliminary.ps1' Write-Output @" Processing the $($dbDetails.variant) variant of $($dbDetails.branch) branch of the $($dbDetails.project) project using $($dbDetails.database) database on $($dbDetails.server) server with user $($dbDetails.installedBy)" "@ $PostMigrationTasks = @( $GetCurrentVersion, #checks the database and gets the current version number #it does this by reading the Flyway schema history table. $SaveDatabaseModelIfNecessary, #Build a JSON model of the database $CreateScriptFoldersIfNecessary, $ExecuteTableDocumentationReport ) @('1.1.1', '1.1.2', '1.1.3','1.1.4', '1.1.5', '1.1.6', '1.1.7', '1.1.8') | foreach{ Flyway migrate "-target=$($_)" # Migrates the database to a particular version if (!($?)) {break} Process-FlywayTasks $dbDetails $PostMigrationTasks } |
Simple GREP-style searches
We’ll start with a few simple searches that use Select-string
, which is PowerShell’s equivalent to GREP, to search our migration scripts, line-by-line, for a particular string. I’ll also demo how to use [IO.File]::ReadAllText
to return the contents of a JSON doc as a string and then search it.
Searching migration scripts for a string
For this first example, we’ll use Select-string
to search for the string ‘emp‘ in our migration files. Anticipating that we might hit a file doing a large insert into a table with the string ’emp’ in the name, we’ll take the precaution of only including a set number of ‘hits’ per file.
1 2 3 4 5 6 7 |
$pattern='emp' #the string to search for $dbDetails.migrationsLocation | foreach { # for each location listed dir "$_\*.*" -Recurse|foreach { #for each location listed $_ | Select-String $pattern | select -first 5} #a sensible limit } if ($dbDetails.migrationsLocation -eq $null) {throw 'The MigrationsLocation value in the hashtable is missing. Please use latest version of preliminary.ps1'} |
As explained earlier, I’m getting the list of locations where migration files might be from the $dbDetails
hashtable, which is provided by the Teamwork framework when you execute preliminary.ps1.
I’ll just show a few hits without worrying too much about display at this point…
Migrations\afterMigrate__ApplyTableDescriptions.sql:8: ('dbo.employee', 'An employee of any of the publishers'), Migrations\afterMigrate__ApplyTableDescriptions.sql:49:( N'dbo.employee', N'TABLE', N'emp_id', N'The key to the Employee Table' ), Migrations\afterMigrate__ApplyTableDescriptions.sql:50:( N'dbo.employee', N'TABLE', N'fname', N'first name' ), Migrations\afterMigrate__ApplyTableDescriptions.sql:51:( N'dbo.employee', N'TABLE', N'minit', N'middle initial' ), Migrations\afterMigrate__ApplyTableDescriptions.sql:52:( N'dbo.employee', N'TABLE', N'lname', N'last name' ), Migrations\U1.1.1__Undo_Initial_Build.sql:34:-- Dropping dbo.employeI.. Migrations\U1.1.1__Undo_Initial_Build.sql:35: DROP TABLE IF EXISTS dbo.employee; Migrations\U1.1.2__Undo_Pubs_Original_Data.sql:20: WHERE TABLE_NAME='employee' Migrations\U1.1.2__Undo_Pubs_Original_Data.sql:21: AND CONSTRAINT_NAME='fk_Employee_Pubs_JobID')) then Migrations\U1.1.2__Undo_Pubs_Original_Data.sql:22: ALTER TABLE employee DROP FOREIGN KEY fk_Employee_Pubs_JobID; Migrations\U1.1.2__Undo_Pubs_Original_Data.sql:26: WHERE TABLE_NAME='employee' Migrations\U1.1.2__Undo_Pubs_Original_Data.sql:27: AND CONSTRAINT_NAME='fk_Employee_publishers_pub_id')) then Migrations\U1.1.4__Undo_RenameConstraintsAdd tables.sql:36: WHERE TABLE_NAME='employee' Migrations\U1.1.4__Undo_RenameConstraintsAdd tables.sql:37: AND CONSTRAINT_NAME='fk_Employee_publishers_pub_id' AND constraint_SCHEMA = DATABASE())) then Migrations\U1.1.4__Undo_RenameConstraintsAdd tables.sql:38: ALTER TABLE dbo.employee DROP FOREIGN KEY fk_Employee_publishers_pub_id; Migrations\U1.1.4__Undo_RenameConstraintsAdd tables.sql:46: WHERE TABLE_NAME='employee' Migrations\U1.1.4__Undo_RenameConstraintsAdd tables.sql:47: AND CONSTRAINT_NAME='fk_Employee_JobID' AND constraint_SCHEMA = DATABASE())) then
You can use regex expressions in the pattern. In fact, since Select-string
was designed to emulate GREP, it treats all strings we pass to it as regexes.
Select-String
is based on searching lines of text. Unless you specify otherwise, it finds the first match in each line and, for each match, it returns the file name, line number, and all text in the line containing the match. You can fine-tune this to find several matches per line or display the context (the text before and after the match).
With this, you can find out a great deal about the past, but what if your quest is about a particular version of the database? For example, what if you want to search only the current version of the database, for objects, such as tables and views, that contain a particular string?
Searching object source scripts
Here, I’m looking through the generated object-level source, just listing every line that matches my search. Again, I’m using my framework so that I can pull the location of the current version of the database from there.
1 2 3 4 5 6 7 8 9 10 |
$pattern='emp' #the string to search for (regex search) #find the object-level source for the current version Process-FlywayTasks $DBDetails $GetCurrentVersion $SearchLocation="$($DBDetails.reportLocation)\$($DBDetails.version)\Source" If (!(Test-Path -Path $SearchLocation)) { Throw "$SearchLocation does not exist: you need an object-level source to do this" } dir "$SearchLocation\*.sql" -Recurse|Select-String $pattern -Context 2, 3 #-context 2,3 means show the two lines before and three after |
Here’s the output:
Versions\1.1.7\Source\table\dbo.employee.sql:1:/*!40101 SET @saved_cs_client = @@character_set_client */; Versions\1.1.7\Source\table\dbo.employee.sql:2:/*!40101 SET character_set_client = utf8 */; > Versions\1.1.7\Source\table\dbo.employee.sql:3:CREATE TABLE 'employee' ( > Versions\1.1.7\Source\table\dbo.employee.sql:4: 'emp_id' char(9) NOT NULL COMMENT 'The key to the Employee Table', Versions\1.1.7\Source\table\dbo.employee.sql:5: 'fname' varchar(20) NOT NULL COMMENT 'first name', Versions\1.1.7\Source\table\dbo.employee.sql:6: 'minit' char(1) DEFAULT NULL COMMENT 'middle initial', Versions\1.1.7\Source\table\dbo.employee.sql:7: 'lname' varchar(30) NOT NULL COMMENT 'last name', > Versions\1.1.7\Source\table\dbo.employee.sql:8: 'job_id' smallint(6) NOT NULL DEFAULT 1 COMMENT 'the job that the employee does', Versions\1.1.7\Source\table\dbo.employee.sql:9: 'job_lvl' smallint(6) DEFAULT 10 COMMENT 'the job level', > Versions\1.1.7\Source\table\dbo.employee.sql:10: 'pub_id' char(8) NOT NULL COMMENT 'the publisher that the employee works for', > Versions\1.1.7\Source\table\dbo.employee.sql:11: 'hire_date' datetime NOT NULL DEFAULT curdate() COMMENT 'the date that the mployee was hired', > Versions\1.1.7\Source\table\dbo.employee.sql:12: PRIMARY KEY ('emp_id' ), Versions\1.1.7\Source\table\dbo.employee.sql:13: KEY ' Jobid_index' ('job_id' ), Versions\1.1.7\Source\table\dbo.employee.sql:14: KEY ' pub_id_index' ('pub_id' ), > Versions\1.1.7\Source\table\dbo.employee.sql:15: CONSTRAINT ' fk_Employee_JobID' FOREIGN KEY (' job_id' ) REFERENCES ' jobs' ('job_id' ), > Versions\1.1.7\Source\table\dbo.employee.sql:16: CONSTRAINT ' fk_Employee_Pubs_JobID' FOREIGN KEY ('job_id' ) REFERENCES ' jobs' (' job_id' ), > Versions\1.1.7\Source\table\dbo.employee.sql:17: CONSTRAINT ' fk_Employee_publishers_pub_id' FOREIGN KEY ('pub_id' ) REFERENCES ' publishers' ('pub_id' ), > Versions\1.1.7\Source\table\dbo.employee.sql:18: CONSTRAINT ' CK_emp_id' CHECK (' emp_id' regexp '[A-Z][A-Z][A-Z][1-9][0-9][0-9][0-9][0-9][FM]' or ' emp_id' regexp '[A-Z]-[A-Z][1-9][0-9][0-9][0-9][0-9][FM]') > Versions\1.1.7\Source\table\dbo.employee.sql:19:) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COMMENT='An employee of any of the publishers'; Versions\1.1.7\Source\table\dbo.employee.sql:20:/*!40101 SET character_set_client = @saved_cs_client */;
Searching the database model
We’ve just shown a search through the current database source. With the Flyway Teamwork framework, there are plenty of other artefacts to search.
Here we search the database model (a JSON representation of a database), for the current database version. Again, we’re looking for objects that contain lines with the string ‘emp
‘. The [IO.File]::ReadAllText
method returns the content of the JSON model as a string object, which we then search, using a simple wildcard search:
1 2 3 4 5 |
$pattern='emp' #the string to search for (regex search) Process-FlywayTasks $DBDetails $GetCurrentVersion $model = [IO.File]::ReadAllText("$($DBDetails.reportLocation)\$($DBDetails.version)\Reports\DatabaseModel.JSON") | convertfrom-json Display-object $model | where value -like "*$pattern*"| Out-GridView -Title "Searching $($DBDetails.ProjectName) $($DBDetails.version) for lines with the string '$pattern' in them " |
Searching documentation
Similarly, we can scan through the JSON documentation file containing the column comments and both the column and table to which they belong:
1 2 3 4 5 6 7 8 |
$pattern='emp' #the string to search for (regex search) Process-FlywayTasks $DBDetails $GetCurrentVersion $doc = [IO.File]::ReadAllText("$($DBDetails.reportLocation)\$($DBDetails.version)\Reports\TableDocumentation.JSON") | convertfrom-json $doc | foreach{ $TheTable = $_.TableObjectName; $_.TheColumns } | foreach{ $_.psobject.Properties } | where { "$($_.Name)$($_.Value)" -like "*$pattern*" } | foreach{ @{ 'Table' = $TheTable; $_.Name = $_.Value } } | Out-GridView -Title "Searching table comments for lines with the string '$pattern' in them " |
Regex searches: looking for blocks of text
We’ve managed to get by this far by either using simple wildcard matching, or by searching text in files, line by line, looking for string matches. We can, of course, do more with a Regex-based search to return ‘blocks’ of text, such as comment blocks or CREATE
TABLE
statements. These take several lines in a file so we can’t use Select-string
, which will only find a string on a single line.
Searching migration scripts for comment blocks
Imagine, that we needed to search all our migration files fort all occurrences of a comment block. We’ll use a regex for this example, to find a multi-line comment.
1 |
$pattern= '/\*((?s:.)*?)\*/' #find a comment block |
We might think of using select-string
again, but this is for line-based strings, and comment clocks are multi-line. We could do this, which is serviceable (the file path is rather long for display, so we’ve chosen to just use the filename):
1 2 3 4 5 6 7 8 9 10 11 12 13 |
$DBDetails.migrationsLocation | foreach { #for each location listed dir "$_\*.*" -Recurse | foreach { #for every migration file in the directories $currentFile = $_; #remember the name of the file [IO.File]::ReadAllText("$_") | foreach{ #read the entire text [regex]::Matches($_, $pattern) # Find all matches } | foreach{ #display each match [pscustomobject]@{ 'File' = (Split-Path $currentFile -Leaf); 'comment' = $_.Value; }; } } } | Out-GridView -Title "Searching Flyway migrations for comment blocks with '$pattern'" |
Naturally, if you wish to look for a particular string within the comment blocks, you just amend the regex string accordingly.
Searching a build script for CREATE TABLE statements
It isn’t just comment blocks. You can search through scripts for create statements, of course. Here we search a PostgreSQL build script to print out all the CREATE
TABLE
statements
1 2 3 4 5 6 7 8 |
$pattern = 'CREATE TABLE(?s:.)+?(;\n|GO)' $BuildScriptPath = "<pathToTheFile>" $BuildScript = [IO.File]::ReadAllText($buildScriptPath) [regex]::Matches($BuildScript, $pattern) | # Find all matches foreach{ #display each match $_.Value; }| Out-GridView -Title "Searching build script for CREATE TABLE statements with '$pattern'" |
Targeted searches using the database model
There are many ways of representing a database, and each scheme you use requires different logic to query it. Databases hold a relational model of databases in database tables, which are then manifested as INFORMATION_SCHEMA
views. Flyway Teamwork uses this to provide consistent JSON-based models for the most popular RDBMSs that have an information-schema.
We’ve already seen how to search for simple strings in the database model, using wildcards, but we can do more. Sometimes we have more targeted questions such as, ‘Which indexes are duplicated?‘, or ‘Which tables have no primary keys?‘ or ‘What tables reference the customer table?‘. To answer them, we need to be able to search only a specific part the JSON database model.
Exploring foreign key references
Here is an example that uses the same model that we used earlier on. This time we’ll look at all the foreign keys and find out what tables are using them and what they are referencing:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Process-FlywayTasks $DBDetails $GetCurrentVersion #get the current version of the database $model = [IO.File]::ReadAllText("$($DBDetails.reportLocation)\$($DBDetails.version)\Reports\DatabaseModel.JSON") | convertfrom-json #read in the model of the current version of the database Display-Object $model -reportNodes $true | #get all the nodes in the model... where path -like '*.foreign key.*' | foreach{ # ... that describe the details of foreign keys $node = $_; #get every description of a foreign key $ThePath = $node.Path.split('.'); [pscustomObject]@{ #create a result that describes the referrer and table referred to 'Referrer' = $ThePath[1] + '.' + $ThePath[3]; 'References' = $node.Value.'Foreign Table'; } } | out-gridview -Title "All tables that reference others and the tables they reference" |
OK, if you have direct access to the current version of the database, then you can get this information directly from the database. However, once you have the code, it is dead easy this way, and you can query any version of the database for which you’ve stored a model.
Getting details of primary keys
There are several other things you can query. Here is a list of all tables with primary keys, the key name and the column(s) they use:
1 2 3 4 5 6 7 8 9 10 |
Display-Object $model -reportNodes $true | #get all the nodes in the model... where path -like '$.*.table.*.primary key.*' | foreach{ # ... that describe the details of foreign keys $node = $_; #get every description of a foreign key $ThePath = $node.Path.split('.'); [pscustomObject]@{ #create a result that describes the referrer and table referred to 'Table' = $ThePath[1] + '.' + $ThePath[3]; 'Key Name' = $ThePath[5]; 'column(s)' = $node.Value; } } | out-gridview -Title "tables with primary keys, and their column(s)" |
OK. Here is an opportunity. You can run some checks to find those tables that have no primary key, or the ones that have any indexes that are duplicates or near-duplicates. Perhaps you need to be alerted if the combination of key columns aren’t indexed. You’d want to know if a table had no enforcement of uniqueness at all. It is all possible by simple queries to the model.
Summary
If you’re like me, you’ll hate having to do repetitive tasks. If the chores are automated as much as possible, I’m much more likely to make sure they are done and to use them. This applies to searching and analyzing databases. If I can do searches reasonably easily, I’m much more likely to make certain rather than take a risk. I’m increasingly spending almost as much time with PowerShell as with the various database IDEs such as SSMS and HeidiSQL.
Searching scripts and analyzing models can take away much of the shock of moving from the comfort zone of your favored way of working with databases. Whether you prefer a Migrations-led approach or a state-based methodology, and you change to accommodate both, then you can soon get over the panic of being short of information about the database if you make the process of generating scripts and database information easier and less intrusive. You can than see clearly what is going on with the development, and who’s doing it.