Searching a Flyway Database

As a database gets larger, and development more complex, so it becomes increasingly necessary to be able to search for strings in the source files and the database itself. Maybe you need to find when a table first got created, when a foreign key was added, or to find out which tables lack documentation. I'll show you how to answer these sorts of questions by running simple 'wildcard' searches on your Flyway migration files, or source files, as well as more targeted searches on certain parts of your database model.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

To home in on detail, you have source control of course, but often you just want a broader picture. You may be fixing a problem or running checks before refactoring. You might want to find any tables that refer to an entity such as an ’employee’, or to remove some tables or views that seem superfluous but might still be used by one or two legacy procedures or functions (soft references). You might need to investigate a column referred to in an obscure error message, or a reference to an object in a rogue query plan. It is hard to predict what, precisely, you’re likely to need to know about the database, so you need an ad-hoc solution to finding information, as if you were doing a Google search.

Where do you want to search?

It very much depends on what you are searching for and why. If you are interested in the present database, as a working system then you’ll want to search the names of objects in the current version. If you’re interested in how it got to its present state, then you’ll want to through all past migrations, or just recent versions. In either case, you’d probably want to look through the comments and documentation.

if a build script exists for the current version that’s where I’d generally start. A build script is very much in the comfort zone of the average database developer. It is very easily searched and is especially useful if it includes documentation blocks (/* and */) and the code of comments attached to objects (extended properties in SQL Server, comments in MySQL and PostgreSQL).

Searching Flyway migration files

In Flyway, it is good to be able to explore your migration files, because this will allow you to search through the whole history of changes. This is especially helpful when the migrations have the /* */ documentation blocks that explain what the code does and why.

We can find all the migrations files for the current Flyway database, as specified by the Flyway locations. If you’re a Flyway user, you’ll already know the locations of the migration files for the current database so you can simply search these locations by supplying the paths as string literals. For a reliable and more general solution, we’d need to interrogate the Flyway configuration files.

The locations in the flyway.conf files consist of a list of file locations, which can be absolute or relative file references and can include wildcards. Every time Flyway is executed, it checks for new SQL-based migrations at runtime, by scanning the filesystem and Java classpath. It will pick up any new SQL migrations as long as they conform to the configured naming convention. Any scan of these files is recursive. All migrations in non-hidden directories below the specified ones must be picked up as well. To search all the migrations, this sort of brute-force search can take quite a while with a large project. We also need to convert the list of file locations into an array of absolute paths, without Flyway’s ‘filesystem:‘, ‘classpath:‘, ‘gcs:‘ or ‘s3:‘ qualifiers because they are incomprehensible to PowerShell.

If you’re using my Flyway Teamwork Framework, it does all this hard work for you. It scans the settings for the flyway.conf in both the user and current flyway directories, as well as environment settings, taking the last assignment only. It takes the list of Flyway locations, resolves the wildcards and relative filepaths, and delivers all the information as a value (an array) to $dbDetails.migrationsLocation, within the $dbDetails hashtable, when you execute preliminary.ps1.

Searching source scripts, models, documentation

The most common pain point for teams, when moving to a Flyway database development, seems to be the lack an easy way to show just the current state of the database.

Fortunately, when working with my Flyway Teamwork framework, you can generate and save a lot of useful ‘build artifacts’, for every successful migration run. This includes object-level source scripts (the $CreateScriptFoldersIfNecessary task), build scripts ($CreateBuildScriptIfNecessary), a JSON database model($SaveDatabaseModelIfNecessary) and JSON-based database documentation ($ExecuteTableDocumentationReport). We can then do plenty of useful searches on these files, especially with RDBMSs such as MySQL that keep comments with the table source.

My previous article, Simple Reporting with Flyway and Database Models, showed how to generate the database model and run simple reports to help you understand the structure of your databases. In this article, I’ll expand these techniques to include searches on these other artefacts. I’ll demo how to, for example, inspect and search through many directories of migration files for where that pesky duplicate index was created, or for the block comment that explains that mysterious view that is causing trouble.

Please refer to the above article for the basics of setting up a project within Flyway teamwork framework. You can get all the files you need from the Teamwork project on GitHub. To try out the examples in this article, you can run the following code to generate the required artefacts for each new database version created by a Flyway migration run:

Simple GREP-style searches

We’ll start with a few simple searches that use Select-string, which is PowerShell’s equivalent to GREP, to search our migration scripts, line-by-line, for a particular string. I’ll also demo how to use [IO.File]::ReadAllText to return the contents of a JSON doc as a string and then search it.

Searching migration scripts for a string

For this first example, we’ll use Select-string to search for the string ‘emp‘ in our migration files. Anticipating that we might hit a file doing a large insert into a table with the string ’emp’ in the name, we’ll take the precaution of only including a set number of ‘hits’ per file.

As explained earlier, I’m getting the list of locations where migration files might be from the $dbDetails hashtable, which is provided by the Teamwork framework when you execute preliminary.ps1.

I’ll just show a few hits without worrying too much about display at this point…

Migrations\afterMigrate__ApplyTableDescriptions.sql:8:  ('dbo.employee', 'An employee of any of the publishers'),
Migrations\afterMigrate__ApplyTableDescriptions.sql:49:( N'dbo.employee', N'TABLE', N'emp_id', N'The key to the Employee Table' ), 
Migrations\afterMigrate__ApplyTableDescriptions.sql:50:( N'dbo.employee', N'TABLE', N'fname', N'first name' ), 
Migrations\afterMigrate__ApplyTableDescriptions.sql:51:( N'dbo.employee', N'TABLE', N'minit', N'middle initial' ), 
Migrations\afterMigrate__ApplyTableDescriptions.sql:52:( N'dbo.employee', N'TABLE', N'lname', N'last name' ), 
Migrations\U1.1.1__Undo_Initial_Build.sql:34:-- Dropping dbo.employeI..
Migrations\U1.1.1__Undo_Initial_Build.sql:35:  DROP TABLE  IF EXISTS  dbo.employee;
Migrations\U1.1.2__Undo_Pubs_Original_Data.sql:20:            WHERE TABLE_NAME='employee' 
Migrations\U1.1.2__Undo_Pubs_Original_Data.sql:21:            AND CONSTRAINT_NAME='fk_Employee_Pubs_JobID')) then 
Migrations\U1.1.2__Undo_Pubs_Original_Data.sql:22:        ALTER TABLE employee DROP FOREIGN KEY  fk_Employee_Pubs_JobID;
Migrations\U1.1.2__Undo_Pubs_Original_Data.sql:26:            WHERE TABLE_NAME='employee' 
Migrations\U1.1.2__Undo_Pubs_Original_Data.sql:27:            AND CONSTRAINT_NAME='fk_Employee_publishers_pub_id')) then 
Migrations\U1.1.4__Undo_RenameConstraintsAdd tables.sql:36:            WHERE TABLE_NAME='employee' 
Migrations\U1.1.4__Undo_RenameConstraintsAdd tables.sql:37:            AND CONSTRAINT_NAME='fk_Employee_publishers_pub_id' AND constraint_SCHEMA = DATABASE())) 
then 
Migrations\U1.1.4__Undo_RenameConstraintsAdd tables.sql:38:    ALTER TABLE dbo.employee DROP FOREIGN KEY fk_Employee_publishers_pub_id;
Migrations\U1.1.4__Undo_RenameConstraintsAdd tables.sql:46:            WHERE TABLE_NAME='employee' 
Migrations\U1.1.4__Undo_RenameConstraintsAdd tables.sql:47:            AND CONSTRAINT_NAME='fk_Employee_JobID' AND constraint_SCHEMA = DATABASE())) then  

You can use regex expressions in the pattern. In fact, since Select-string was designed to emulate GREP, it treats all strings we pass to it as regexes.

Select-String is based on searching lines of text. Unless you specify otherwise, it finds the first match in each line and, for each match, it returns the file name, line number, and all text in the line containing the match. You can fine-tune this to find several matches per line or display the context (the text before and after the match).

With this, you can find out a great deal about the past, but what if your quest is about a particular version of the database? For example, what if you want to search only the current version of the database, for objects, such as tables and views, that contain a particular string?

Searching object source scripts

Here, I’m looking through the generated object-level source, just listing every line that matches my search. Again, I’m using my framework so that I can pull the location of the current version of the database from there.

Here’s the output:

   Versions\1.1.7\Source\table\dbo.employee.sql:1:/*!40101 SET @saved_cs_client     = @@character_set_client */;
  Versions\1.1.7\Source\table\dbo.employee.sql:2:/*!40101 SET character_set_client = utf8 */;
> Versions\1.1.7\Source\table\dbo.employee.sql:3:CREATE TABLE 'employee' (
> Versions\1.1.7\Source\table\dbo.employee.sql:4:  'emp_id' char(9) NOT NULL COMMENT 'The key to the Employee Table',
  Versions\1.1.7\Source\table\dbo.employee.sql:5:  'fname' varchar(20) NOT NULL COMMENT 'first name',
  Versions\1.1.7\Source\table\dbo.employee.sql:6:  'minit' char(1) DEFAULT NULL COMMENT 'middle initial',
  Versions\1.1.7\Source\table\dbo.employee.sql:7:  'lname' varchar(30) NOT NULL COMMENT 'last name',
> Versions\1.1.7\Source\table\dbo.employee.sql:8:  'job_id' smallint(6) NOT NULL DEFAULT 1 COMMENT 'the job that the employee does',
  Versions\1.1.7\Source\table\dbo.employee.sql:9:  'job_lvl' smallint(6) DEFAULT 10 COMMENT 'the job level',
> Versions\1.1.7\Source\table\dbo.employee.sql:10:  'pub_id'  char(8) NOT NULL COMMENT 'the publisher that the employee works for',
> Versions\1.1.7\Source\table\dbo.employee.sql:11:  'hire_date'  datetime NOT NULL DEFAULT curdate() COMMENT 'the date that the  mployee was hired',
> Versions\1.1.7\Source\table\dbo.employee.sql:12:  PRIMARY KEY ('emp_id' ),
  Versions\1.1.7\Source\table\dbo.employee.sql:13:  KEY ' Jobid_index'  ('job_id' ),
  Versions\1.1.7\Source\table\dbo.employee.sql:14:  KEY ' pub_id_index'  ('pub_id' ),
> Versions\1.1.7\Source\table\dbo.employee.sql:15:  CONSTRAINT ' fk_Employee_JobID'  FOREIGN KEY (' job_id' ) REFERENCES ' jobs'  ('job_id' ),
> Versions\1.1.7\Source\table\dbo.employee.sql:16:  CONSTRAINT ' fk_Employee_Pubs_JobID'  FOREIGN KEY ('job_id' ) REFERENCES ' jobs'  (' job_id' ),
> Versions\1.1.7\Source\table\dbo.employee.sql:17:  CONSTRAINT ' fk_Employee_publishers_pub_id'  FOREIGN KEY ('pub_id' ) REFERENCES ' publishers'  ('pub_id' ),
> Versions\1.1.7\Source\table\dbo.employee.sql:18:  CONSTRAINT ' CK_emp_id'  CHECK (' emp_id'  regexp '[A-Z][A-Z][A-Z][1-9][0-9][0-9][0-9][0-9][FM]' or ' emp_id'  regexp 
'[A-Z]-[A-Z][1-9][0-9][0-9][0-9][0-9][FM]')
> Versions\1.1.7\Source\table\dbo.employee.sql:19:) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COMMENT='An employee of any of the publishers';
  Versions\1.1.7\Source\table\dbo.employee.sql:20:/*!40101 SET character_set_client = @saved_cs_client */; 

Searching the database model

We’ve just shown a search through the current database source. With the Flyway Teamwork framework, there are plenty of other artefacts to search.

Here we search the database model (a JSON representation of a database), for the current database version. Again, we’re looking for objects that contain lines with the string ‘emp‘. The [IO.File]::ReadAllText method returns the content of the JSON model as a string object, which we then search, using a simple wildcard search:

Searching the Flyway database model for a string

Searching documentation

Similarly, we can scan through the JSON documentation file containing the column comments and both the column and table to which they belong:

Searching table comments for a string

Regex searches: looking for blocks of text

We’ve managed to get by this far by either using simple wildcard matching, or by searching text in files, line by line, looking for string matches. We can, of course, do more with a Regex-based search to return ‘blocks’ of text, such as comment blocks or CREATE TABLE statements. These take several lines in a file so we can’t use Select-string, which will only find a string on a single line.

Searching migration scripts for comment blocks

Imagine, that we needed to search all our migration files fort all occurrences of a comment block. We’ll use a regex for this example, to find a multi-line comment.

We might think of using select-string again, but this is for line-based strings, and comment clocks are multi-line. We could do this, which is serviceable (the file path is rather long for display, so we’ve chosen to just use the filename):

Searching Flyway migration scripts for comment blocks

Naturally, if you wish to look for a particular string within the comment blocks, you just amend the regex string accordingly.

Searching a build script for CREATE TABLE statements

It isn’t just comment blocks. You can search through scripts for create statements, of course. Here we search a PostgreSQL build script to print out all the CREATE TABLE statements

Searching a database build script for CREATE TABLE statements

Targeted searches using the database model

There are many ways of representing a database, and each scheme you use requires different logic to query it. Databases hold a relational model of databases in database tables, which are then manifested as INFORMATION_SCHEMA views. Flyway Teamwork uses this to provide consistent JSON-based models for the most popular RDBMSs that have an information-schema.

We’ve already seen how to search for simple strings in the database model, using wildcards, but we can do more. Sometimes we have more targeted questions such as, ‘Which indexes are duplicated?‘, or ‘Which tables have no primary keys?‘ or ‘What tables reference the customer table?‘. To answer them, we need to be able to search only a specific part the JSON database model.

Exploring foreign key references

Here is an example that uses the same model that we used earlier on. This time we’ll look at all the foreign keys and find out what tables are using them and what they are referencing:

Searching the database model to reveal foreign key references

OK, if you have direct access to the current version of the database, then you can get this information directly from the database. However, once you have the code, it is dead easy this way, and you can query any version of the database for which you’ve stored a model.

Getting details of primary keys

There are several other things you can query. Here is a list of all tables with primary keys, the key name and the column(s) they use:

Searching the database model for tables with primary keys

OK. Here is an opportunity. You can run some checks to find those tables that have no primary key, or the ones that have any indexes that are duplicates or near-duplicates. Perhaps you need to be alerted if the combination of key columns aren’t indexed. You’d want to know if a table had no enforcement of uniqueness at all. It is all possible by simple queries to the model.

Summary

If you’re like me, you’ll hate having to do repetitive tasks. If the chores are automated as much as possible, I’m much more likely to make sure they are done and to use them. This applies to searching and analyzing databases. If I can do searches reasonably easily, I’m much more likely to make certain rather than take a risk. I’m increasingly spending almost as much time with PowerShell as with the various database IDEs such as SSMS and HeidiSQL.

Searching scripts and analyzing models can take away much of the shock of moving from the comfort zone of your favored way of working with databases. Whether you prefer a Migrations-led approach or a state-based methodology, and you change to accommodate both, then you can soon get over the panic of being short of information about the database if you make the process of generating scripts and database information easier and less intrusive. You can than see clearly what is going on with the development, and who’s doing it.

Tools in this post

Flyway

DevOps for the Database

Find out more