How to automate vector embeddings with pgai Vectorizer in PostgreSQL

Comments 0

Share to social media

While pgvector enables powerful semantic search, it doesn’t automatically keep embeddings in sync when your data changes, requiring manual updates. The pgai Vectorizer automatically keeps PostgreSQL vector embeddings in sync by generating and updating them whenever your data changes, removing the need for manual regeneration with pgvector.

It runs in the background using a worker that processes changes via queues, triggers, and embedding APIs. This makes it easy to build real-time semantic search in PostgreSQL using pgvector and TigerData’s pgai tools. Learn everything you need to know in this guide.

It’s no secret that PostgreSQL now stores vector embeddings of unstructured data using pgvector, enabling both relational and semantic search. When it comes to keeping your source data and corresponding AI-generated embeddings in sync during changes, however, pgvector falls short.

Simply put, it requires you to manually regenerate the vector embeddings to mirror any changes made in your PostgreSQL database. It doesn’t happen automatically.

Thankfully, the pgai Vectorizer tool, created by Timescale (now TigerData), is here to save the day. With a SQL command, it creates AI-generated vector embeddings and regenerates them when your source data changes.

Timescale also provides Docker images to quickly set up a PostgreSQL environment that is ready for pgai Vectorizer. In this article, I’ll use these images to demonstrate exactly how the tool works.

Before you continue reading…
Are you new to using pgvector in PostgreSQL? If so, please first read this article. It explains how pgvector works and how semantic search is handled in it.

What is the pgai Vectorizer tool for PostgreSQL?

The pgai Vectorizer tool uses pgvector under the hood to store and manage vector embeddings in PostgreSQL. It leverages pgai’s SQL functions to define how embeddings are generated – specifying the embedding provider to use, the source table, the column to load raw data from to embed, formatting, and so on. It runs outside your database and is always on standby.

When you create a Vectorizer, it processes the embedding asynchronously, as follows:

  • A queue is set up in the database to track the columns that need embedding.
  • Triggers ensure new or updated columns are added to this queue.
  • A background worker runs and polls the queue for pending jobs.
  • The worker processes jobs in batches, calls the embedding API (e.g., OpenAI, Ollama), and writes embeddings back to the database.
  • It then processes any failed jobs on the next polling cycle.

Get started with PostgreSQL – free book download

‘Introduction to PostgreSQL for the data professional’, written by Grant Fritchey and Ryan Booz, covers all the basics of how to get started with PostgreSQL.
Download your free copy

Because it runs outside PostgreSQL, your database is isolated and immune to external API failures or latency problems. You can also scale it horizontally to handle more embedding workloads.

Note that the tool is third-party, not an official PostgreSQL extension, and depends on pgvector for storage, indexing, and similarity search.

How to use pgai Vectorizer in a self-hosted PostgreSQL database

To use pgai Vectorizer in a self-hosted PostgreSQL database, you must:

See the official GitHub docs on how to use pgai Vectorizer on self-hosted and managed PostgreSQL databases.

Additionally, see the official GitHub docs containing the API reference for pgaiVectorizer.

Requirements to use pgai Vectorizer (what you need)

To use pgai Vectorizer, install Docker Engine and the Docker Compose plugin. If you’re on Windows or Mac OS, you also need Docker Desktop (which comes with Compose by default.)

You also need an embedding provider API key. The choice of embedding provider is up to you, but I use OpenAI in this article.

How to create a database and pgai Vectorizer worker

Open up a docker-compose.yml file with your default code editor and paste in the code snippets below:

This will pull and start a TimescaleDB PostgreSQL database instance and a single pgai Vectorizer worker. The database will be available on localhost:5432, and the vectorizer will automatically poll for embedding jobs every 10 seconds.

Start both containers:
docker compose up -d

Verify that they are running:
docker compose ps

You should have an output similar to this:

How to set up, and run, pgai Vectorizer

To set up the pgai Vectorizer tool in your PostgreSQL database, run the following:
docker compose run --rm --entrypoint "python -m pgai install -d postgres://postgres:postgres@db:5432/postgres" vectorizer-worker

This installs the necessary database objects under the ai schema which you can view with:
docker compose exec db psql -U postgres -c "\\dt ai.*".

Once it’s installed, you should have an output similar to 2026-03-21 03:45:17 [info ] pgai 0.12.1 installed.

How to create a table and insert relational data in pgai Vectorizer

Connect to your database instance (db) interactively with: docker compose exec db psql -U postgres

Create the table articles to work with:

Insert articles into the articles table:

How to create a vectorizer for your table in pgai Vectorizer

pgai AI’s schema provides several SQL functions to perform AI tasks in PostgreSQL. To create a vectorizer, you must use the ai.create_vectorizer function. Run the SQL query below to create a vectorizer for the articles table:

From the SQL command above, the vectorizer will:

  • Source data from the articles table, load contents from the content column, and watch it for changes.
  • Generate embeddings using OpenAI’s text-embedding-3-small model.
  • Split text into chunks with overlap to preserve context recursively.
  • Format input by prepending title and author metadata to each chunk.
  • Store embeddings in a destination table (or view) named articles_embeddings.

After running this command, the vectorizer worker will automatically generate and sync embeddings. You can monitor its progress with: SELECT * FROM ai.vectorizer_status;

If the pending_items column shows 1, it’s still processing your embeddings. If it shows 0, it’s up to date.

Alternatively, you can stream real-time logs from the vectorizer worker when embeddings are being generated:
docker compose logs -f vectorizer-worker

You’ll see messages like running vectorizer, finished processing vectorizer and helpful messages for debugging in case there’s an error:

The articles_embeddings view will include the original content of the content column, plus chunk and embedding for semantic search. You can query it with: SELECT * FROM articles_embeddings LIMIT 1;  

Setting the limit to 1 helps to inspect the structure and content of the articles_embeddings view without loading large amounts of data. If you’d like to view all of its contents, omit the LIMIT 1 query parameter.

How to automate embeddings in pgai Vectorizer

The Vectorizer worker monitors changes through create, update, and delete operations to process embeddings in the background accordingly. This way, your view – articles_embeddings in this case – stays in sync with the latest content in the source table.

Update the articles table to trigger the vectorizer:

Stream the logs of the vectorizer to view it processing the update, using: docker compose logs -f vectorizer-worker

Insert an article into the article table:

The vectorizer will generate new embeddings for it in the background. And, with that, the process is complete.

What else can you do with pgai Vectorizer?

The pgai Vectorizer tool turns your PostgreSQL database into an AI powerhouse. What we’ve covered here is just one example of this. You can also perform a hybrid search and re-rank their results using re-ranking models like Cohere and Voyage AI. Or, why not translate natural language to SQL via semantic catalog?

Whether you decide to use it for more than just automating vector embeddings or not, you’ve now seen the power of the pgai Vectorizer tool in PostgreSQL. With its vectorizer, manual embedding lifecycles and stale embeddings are a thing of the past.

In this article, you saw this firsthand, with every create, insert, and update command you made being picked up and processed in the background. I hope you found the guide helpful, and feel free to share your thoughts in the comments below!

Simple Talk is brought to you by Redgate Software

Take control of your databases with the trusted Database DevOps solutions provider. Automate with confidence, scale securely, and unlock growth through AI.
Discover how Redgate can help you

FAQs: How to automate vector embeddings with pgai Vectorizer in PostgreSQL

1. What is pgai Vectorizer in PostgreSQL?

It’s a tool from Timescale (TigerData) that automatically generates and updates AI embeddings in PostgreSQL using pgvector.

2. Does pgvector update embeddings automatically?

No. pgvector stores embeddings but requires manual updates when source data changes.

3. How does pgai Vectorizer work?

It uses SQL-defined configurations, triggers, a job queue, and a background worker to generate and refresh embeddings automatically.

4. What is pgai used for?

pgai enables AI workflows in PostgreSQL, including automatic embeddings, semantic search, and RAG pipelines.

5. Do I need Docker to use pgai Vectorizer?

Yes, for self-hosted setups Docker is commonly used to run PostgreSQL and the vectorizer worker.

Article tags

About the author

Mercy Bassey

See Profile

Mercy is a talented technical writer and programmer with deep knowledge of various technologies. She focuses on cloud computing, software development, DevOps/IT, and containerization, and she finds joy in creating documentation, tutorials, and guides for both beginners and advanced technology enthusiasts.