Managing PostgreSQL Extensions Using Flyway
Any reliable DevOps deployment process must not only deliver the right database code but also ensure that the correct conditions exist for the deployment to succeed. For PostgreSQL databases, this includes ensuring that any extensions on which the database code relies are installed, and at the correct version. Fortunately, Flyway will automatically track which extensions are installed on a database and at what version, so we can 'propagate' these changes accurately during deployments and avoid inconsistencies in database behavior and even application breakages.
There are many reasons behind the tremendous growth of PostgreSQL over the last 5-8 years, and its adoption across increasingly large and complex commercial estates. It is due in part to its liberal open-source license, deep investments by major cloud hosting providers, and features that provide solutions to unique data challenges. In the latter category, one feature in particular has led to a tremendous amount of innovation unmatched by similar database solutions: PostgreSQL extensions.
These extensions offer an array of extra functionality and increase the ability of the database to tackle a much greater range of business problems. In the article, we’ll review, briefly, the types of extensions available and how we install them into a PostgreSQL database. We’ll then focus on how the use of extensions affects your development and deployment workflow, and how Flyway automatically tackles the additional challenge of ensuring that the right versions of the rights extensions are always installed on the right databases.
For efficient management of each phase of database development, testing and deployment, we always need a way to track the exact ‘state’ of our schema objects, and how these objects changed from one version of the database to the next. For PostgreSQL databases, we need to know not only how the database code changes, but also which extensions are installed on each database and at what version. We also need a way to verify which version of an extension is supposed to be installed and used with the current database and application state.
Managing databases changes in a complex, multi-database platform is challenging at the very least and nearly impossible without the right tools. Even among mainstream relational databases, small differences in how features work can destroy our ability to maintain consistent and predicable migrations and can easily cause applications to stop functioning. This means that any tool that strives to be fully “PostgreSQL compatible” must work with the ecosystem of extensions.
PostgreSQL Extensions at a Glance
PostgreSQL has always been built “in the open”, with a global group of contributors. The extremely permissive license allowed many companies in the early and mid-2000s to fork the source code and create new database variants based on PostgreSQL for unique purposes. As you can imagine, keeping all these unique variants in sync with the PostgreSQL ‘core’ was time-consuming and often difficult.
At the same time, the main contributors realized that there would always be more feature requests than there were contributors and reviewers to implement them. One of the ways they alleviated this tension was to introduce extensions. A PostgreSQL extension is an installable code that can access hooks within a running PostgreSQL instance to modify its runtime behavior, provide new capabilities, or inspect real-time metrics and information, without modifying core PostgreSQL code directly.
- Would you like to create indexes specific to geospatial data and provide functions for querying related data? Install PostGIS
PostGIS
! - Could the search within your application benefit from trigram searching? Use
pg_trgm
! - Have you ever wanted to add an index without incurring the time or space to verify that it would help improve a set of queries?
HypoPG
to the rescue!
These, and hundreds of other useful extensions, are available to install into your PostgreSQL database. Numerous extensions are provided with the core PostgreSQL code base as part of “contrib modules”. Some of these are so ubiquitous, like pg_stat_statements
, that nearly every DBaaS provider installs and enables it by default.
Other extensions, like TimescaleDB
, provide a rich set of features for managing high-volume time-series data, including automatic partitioning, compression, and continuously calculated aggregate views over raw data.
Additionally, with the introduction of the Rust extension library pgrx
, more developers can easily create custom, high-performance extensions in an accessible and memory efficient way.
Installing PostgreSQL Extensions
To use PostgreSQL extensions in your database there are essentially two requirements:
- The extension package (code) must be physically available on the PostgreSQL server host, in the proper location. Most extensions are provided through package managers or repository tools like the PostgreSQL Extension Network (PGXN).
- An extension must be installed into every database where you want to use it.
You install an extension using the PostgreSQL-specific SQL command, CREATE
EXTENSION
.
For example, if you wanted to provide machine learning capabilities over your PostgreSQL data, you might consider using the pgvector
extension. If it is properly installed on the server, running the following command will install the latest version of pgvector
into the currently connected database:
1 |
CREATE EXTENSION IF NOT EXISTS pgvector; |
Similarly, upgrading to a new version of an extension is done through the ALTER
EXTENSION
command. To upgrade pgvector
to the latest version, we would need to run the following command in each database where the application uses extension functionality.
1 |
ALTER EXTENSION pgvector UPDATE; |
If there is a newer version installed on the server, PostgreSQL will run the included upgrade scripts to install new database functions, update any catalog tables to track features, and migrate extension-specific data if necessary.
Managing and Tracking PostgreSQL Extension Versions
Just like other programs (and PostgreSQL itself), extensions must be versioned. As new features are added to an extension, or functions provide new parameters, a new version of the extension is created. For the PostgreSQL developer, managing database changes can start to become tricky when the new functionality you’re delivering relies on the availability of the right versions of numerous extensions.
For example, let’s say you originally developed an application using pgvector
version 0.3.2, and this is the version installed in the production database. However, in the latest development version you now want to exploit the new vector_avg
and vector_combine
functions that are available in version 0.4.0.
When it comes time to deliver these database changes to all of your PostgreSQL databases throughout the pipeline, from development to production, not only must you deliver the new code that relies on the vector functions, but also ensure that every deployment target has the right version of the extension installed. If the extension isn’t updated, the new application code may be referencing functions that don’t exist in the currently installed extension version, potentially causing the application to break.
For consistency, we’d also want all ‘downstream’ databases to get the same version of the extension as was used for development. If our migration scripts simply include CREATE
EXTENSION
or ALTER
EXTENSION…UPDATE
but don’t specify a version, we really have no idea which version will be installed and used in each environment; it will just be the latest version available at that time. To ensure that each environment is using the right the same version of each extension that is required for our application, we need to specify an extension version in our migration code:
1 2 |
CREATE EXTENSION pgvector VERSION '0.3.2'; ALTER EXTENSION pgvector UPDATE TO '0.4.2'; |
If you need to track, manually, which versions of which extensions are installed and then write scripts to ensure all copies of the database on downstream servers are updated to match, it quickly becomes a very time-consuming and error-prone task and distracts from developing the core functionality.
Given all this, it’s perhaps surprising to learn that PostgreSQL itself doesn’t track extension versions in backups or across machines. For example, you might think that you could reproduce a database, complete with all the right extension versions, simply by capturing a pg_dump
file and restoring it. Unfortunately, when you create a snapshot backup with the pg_dump
tool, only the CREATE EXTENSION
command is captured, for a later restore process. The specific version of an extension is not saved or noted. Therefore, when you restore the dump file the database will install the latest version of the currently installed extension, regardless of if it matches what was on the server where the backup was taken. This raises the level of effort for DBAs that manage PostgreSQL clusters and databases.
In short, we need a tool to automatically track and manage extension versions across all our PostgreSQL databases, in dev, test, staging, and production. If the versions of the required extensions, in any of these environments, get “out of sync” with the intended version, unexpected problems are sure to follow. In many cases, your database may be inaccessible if there is a mismatch in versions.
Flyway to the Rescue!
As hopefully you’ve gathered by now, extension management is an essential component of maintaining a PostgreSQL database. Fortunately, Flyway (Teams or Enterprise Editions only) will monitor and track changes, not only changes to your database code, but also to any extensions on which it relies.
In the following example, the Flyway Desktop (Teams) project is connected to my local development database. I’ve made changes to one of the database tables, but also installed a couple of new extensions. Even though I did this simply by issuing CREATE
EXTENSION
or ALTER
EXTENSION…UPDATE
statements, Flyway has detected the exact version of each extension that was installed (v1.5 for pg_trgm
), and we can save these changes to the schema model.
As you make changes to user objects in your development database, and install or upgrade any required PostgreSQL extensions, Flyway Desktop will auto-detect the changes and generate the SQL statements required to update the schema model to the latest ‘state’. By examining the Flyway schema model, DBAs can verify the current, expected state of each database and extension version. We can also commit these changes directly to the version control system, so we can track how the database objects, and database extensions, changed between successive versions of the database.
If you have Flyway Enterprise Edition, then Flyway will also auto-generate the migrations scripts to make sure the extensions are updated appropriately in any target database, as well as the user objects. This allows Flyway to ensure all extensions are upgraded appropriately, preventing mis-matched installations of extension packages across the infrastructure.
As migration scripts are then applied to each environment within the pipeline, teams can be assured that the correct extensions are being used. If an environment doesn’t have the correct extension version available, PostgreSQL will error as part of the migration process, an indicator that additional maintenance work is needed on the server itself.
However, we may want to avoid the migration running at all unless the correct ‘conditions’ exist in the target databases. Flyway can help here as well, via its system of callback scripts.
Verifying Extension Availability
While Flyway Teams and Enterprise track and maintain a list of all current extensions and the required versions for a database project, there is not currently a mechanism to automatically verify that all required extensions are available before migrations are run. However, we can use a ‘hook’, or ‘callback’ within the Flyway framework to verify that the target database has the necessary extensions installed and ready to use.
In his article, Running Flyway Pre-Migration Checks on the Database, Phil Factor demonstrates how to verify that a target database is in the required state before migrations are applied. Building on the example he provides that verifies that a set of PostgreSQL extensions exists, we can run a beforeMigrate.sql script to check for extensions and the required versions.
In this example, we build a list of required extensions using a VALUES
statement and join that to the current list of all available extension versions in the cluster. If any of the required extension versions does not exist, we raise an error which will stop the migration.
This technique will also help in cases where you need to ensure that an exact version of an extension is used. For example, sometimes a PostgreSQL cluster will have multiple versions of the same extension installed and available to use with different databases in the same cluster deliberately using different versions of the extension. A development database might have the most recent version installed, but a database used for nightly regression testing or perhaps to investigate a production bug that can’t be reproduced in the current development version will need the extension versions to match exactly what is in production.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
```sql DO $$ BEGIN IF EXISTS( -- Add a VALUE record for each required extension and version WITH current_extensions(name,version) AS ( VALUES ('pg_stat_statements','1.10'), ('pgvector','0.4.2') ), -- Select all currently installed extensions and versions installed_extensions(name,version) AS ( SELECT name, VERSION FROM pg_catalog.pg_available_extension_versions ) -- This query will only return rows if -- an extension version is not available. SELECT * FROM current_extensions ce LEFT JOIN installed_extensions ie USING(name,VERSION) WHERE ie.VERSION IS NULL ) THEN RAISE EXCEPTION 'At least one required extension/version is not available'; END IF; END $$ ``` |
On my example PostgreSQL cluster, pgvector
was not yet available when I tried to do a test migration. This caused an exception and prevented the migration from moving forward.
With a small amount of maintenance on this beforeMigrate.sql script, the pipeline will clearly warn developers and DBAs and prevent migrations until the proper installation is completed on each target server.
Intelligent Management of Database Changes
To ensure the consistent and predictable behavior of our PostgreSQL applications, it’s vital that we know exactly how a database changes between different versions. This means maintaining an accurate picture of the state of each object in the databases, and the exact ‘state’ of any additional components, such as extensions, on which they rely.
This requires a team with industry-leading experience, and tools, that understand database schema objects, migration best practices, and database-specific features and capabilities. This article has demonstrated how this team can use the schema model in Flyway Teams and Enterprise to manage databases changes consistently, and to apply those changes accurately, throughout the development and deployment cycle.
Beyond extensions, PostgreSQL change management needs to take the entire development process into account. Flyway provides everything your team needs to manage PostgreSQL changes from dev to prod, easily integrated with your pipeline. Database changes are treated like code, including using feature branches within the repository for managing change sets.
Individual developers can view and manage changes through Flyway Desktop, a cross-platform application that integrates with source control, manages the schema model, generates migration scripts, and more. Within the pipeline, Flyway can verify migration scripts against a set of rules specific to your development standards, or to check for destructive actions like DROP TABLE
.
In total, Flyway is the most powerful tool to manage and verify database changes from start to finish, in a repeatable, verifiable, process.
Tools in this post
Flyway Teams
Ideal for organizations looking to improve collaboration and fine tune their processes during development and the deployment of database changes.