{"id":110134,"date":"2026-05-25T11:45:00","date_gmt":"2026-05-25T11:45:00","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=110134"},"modified":"2026-05-13T15:59:53","modified_gmt":"2026-05-13T15:59:53","slug":"how-to-automate-vector-embeddings-with-pgai-vectorizer-in-postgresql","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/postgresql\/how-to-automate-vector-embeddings-with-pgai-vectorizer-in-postgresql\/","title":{"rendered":"How to automate vector embeddings with pgai Vectorizer in PostgreSQL"},"content":{"rendered":"\n<p><strong>While pgvector enables powerful semantic search, it doesn\u2019t automatically keep embeddings in sync when your data changes, requiring manual updates. The pgai Vectorizer automatically keeps PostgreSQL vector embeddings in sync by generating and updating them whenever your data changes, removing the need for manual regeneration with pgvector. <\/strong><\/p>\n\n\n\n<p><strong>It runs in the background using a worker that processes changes via queues, triggers, and embedding APIs. This makes it easy to build real-time semantic search in PostgreSQL using pgvector and TigerData&#8217;s pgai tools.<\/strong> <strong>Learn everything you need to know in this guide.<\/strong><\/p>\n\n\n\n<p>It&#8217;s no secret that <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/postgresql\/how-to-build-an-ai-powered-semantic-search-in-postgresql-with-pgvector\/#:~:text=This%20is%20changing,know%20and%20love.\" target=\"_blank\" rel=\"noreferrer noopener\">PostgreSQL now stores vector embeddings of unstructured data using pgvector<\/a>, enabling both relational and semantic search. When it comes to keeping your source data and corresponding AI-generated embeddings in sync during changes, however, <a href=\"https:\/\/github.com\/pgvector\/pgvector\/\" target=\"_blank\" rel=\"noreferrer noopener\">pgvector<\/a> falls short.<\/p>\n\n\n\n<p>Simply put, it requires you to manually regenerate the vector embeddings to mirror any changes made in your PostgreSQL database. It doesn&#8217;t happen automatically.<\/p>\n\n\n\n<p>Thankfully, the <a href=\"https:\/\/www.tigerdata.com\/blog\/pgai-vectorizer-now-works-with-any-postgres-database\" target=\"_blank\" rel=\"noreferrer noopener\">pgai Vectorizer<\/a> tool, created by Timescale (now TigerData), is here to save the day. With a <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/\" target=\"_blank\" rel=\"noreferrer noopener\">SQL<\/a> command, it creates AI-generated vector <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/ai-in-sql-server-2025-embeddings\/\" target=\"_blank\" rel=\"noreferrer noopener\">embeddings<\/a> and regenerates them when your source data changes.<\/p>\n\n\n\n<p>Timescale also provides <a href=\"https:\/\/www.docker.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Docker<\/a> images to quickly set up a PostgreSQL environment that is ready for pgai Vectorizer. In this article, I&#8217;ll use these images to demonstrate exactly how the tool works.<\/p>\n\n\n\n<div id=\"callout-block_998494bad5da6d9a4222368d529017da\" class=\"callout alignnone\">\n    <div class=\"child-last:mb-0 child-first:mt-0 bg-gray-50 dark:bg-gray-950 p-4xl my-3xl\">\n\n<p><strong>Before you continue reading&#8230;<\/strong><br>Are you new to using pgvector in PostgreSQL? If so, please first read <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/postgresql\/how-to-build-an-ai-powered-semantic-search-in-postgresql-with-pgvector\/\" target=\"_blank\" rel=\"noreferrer noopener\">this article<\/a>. It explains how pgvector works and how semantic search is handled in it.<\/p>\n\n<\/div>\n<\/div> \n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-is-the-pgai-vectorizer-tool-for-postgresql\">What is the pgai Vectorizer tool for PostgreSQL?<\/h2>\n\n\n\n<p><strong>The <a href=\"https:\/\/www.tigerdata.com\/blog\/pgai-vectorizer-now-works-with-any-postgres-database\" target=\"_blank\" rel=\"noreferrer noopener\">pgai Vectorizer<\/a> tool uses <a href=\"https:\/\/www.tigerdata.com\/blog\/postgresql-as-a-vector-database-using-pgvector\" target=\"_blank\" rel=\"noreferrer noopener\">pgvector<\/a> under the hood to store and manage vector embeddings in <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/postgresql\/\" target=\"_blank\" rel=\"noreferrer noopener\">PostgreSQL<\/a>. It leverages pgai\u2019s SQL functions to define how embeddings are generated &#8211; specifying the embedding provider to use, the source table, the column to load raw data from to embed, formatting, and so on. It runs outside your database and is always on standby.<\/strong><\/p>\n\n\n\n<p>When you create a Vectorizer, it processes the embedding asynchronously, as follows:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>A queue is set up in the database to track the columns that need embedding.<br><\/li>\n\n\n\n<li>Triggers ensure new or updated columns are added to this queue.<br><\/li>\n\n\n\n<li>A background worker runs and polls the queue for pending jobs.<br><\/li>\n\n\n\n<li>The worker processes jobs in batches, calls the embedding <a href=\"https:\/\/www.red-gate.com\/simple-talk\/sysadmin\/general\/api-monitoring-key-metrics-and-best-practices\/\" target=\"_blank\" rel=\"noreferrer noopener\">API<\/a> (e.g., OpenAI, Ollama), and writes embeddings back to the database.<br><\/li>\n\n\n\n<li>It then processes any failed jobs on the next polling cycle.<\/li>\n<\/ul>\n<\/div>\n\n\n<section id=\"my-first-block-block_5d83bfe836ee79b6a080e53530971294\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Get started with PostgreSQL &#8211; free book download<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            &#8216;Introduction to PostgreSQL for the data professional&#8217;, written by Grant Fritchey and Ryan Booz, covers all the basics of how to get started with PostgreSQL.                                    <\/div>\n            <\/div>\n                                            <a href=\"https:\/\/www.red-gate.com\/hub\/books\/introduction-to-postgresql-for-the-data-professional\/\" class=\"btn btn--secondary btn--lg\" aria-label=\"Download your free copy: Get started with PostgreSQL - free book download\">Download your free copy<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<p>Because it runs outside PostgreSQL, your database is isolated and immune to external API failures or latency problems. You can also scale it horizontally to handle more embedding workloads.<\/p>\n\n\n\n<p>Note that the tool is third-party, not an official PostgreSQL extension, and depends on pgvector for storage, <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/postgresql\/exploring-postgresql-indexes\/\" target=\"_blank\" rel=\"noreferrer noopener\">indexing<\/a>, and similarity search. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-use-pgai-vectorizer-in-a-self-hosted-postgresql-database\">How to use pgai Vectorizer in a self-hosted PostgreSQL database<\/h2>\n\n\n\n<p>To use pgai Vectorizer in a self-hosted PostgreSQL database, you must:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Install pgai and its Vectorizer component (vectorizer-worker) as <a href=\"https:\/\/docs.python.org\/3\/library\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">Python libraries<\/a>.<br><\/li>\n\n\n\n<li>Deploy the pgai PostgreSQL extension and run the Vectorizer via the pgai CLI.<br><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/timescale\/pgvectorscale?tab=readme-ov-file#installing-from-source\" target=\"_blank\" rel=\"noreferrer noopener\">Build and install Pgvectorscale from source<\/a> if you want to use <a href=\"https:\/\/www.tigerdata.com\/blog\/understanding-diskann\" target=\"_blank\" rel=\"noreferrer noopener\">StreamingDiskANN<\/a> indexing.<br><\/li>\n\n\n\n<li>Spin up and manage Vectorizer worker processes to manage embedding throughput.<\/li>\n<\/ul>\n<\/div>\n\n\n<p><a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/vectorizer\/worker.md#running-on-self-hosted-postgres-or-other-platforms\" target=\"_blank\" rel=\"noreferrer noopener\"><em>See the official GitHub docs on how to use p<\/em><\/a><a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/vectorizer\/worker.md#running-on-self-hosted-postgres-or-other-platforms\"><em>gai Vectorizer on self-hosted and managed PostgreSQL databases.<\/em><\/a><\/p>\n\n\n\n<p><em>Additionally,<\/em> <a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/vectorizer\/api-reference.md\" target=\"_blank\" rel=\"noreferrer noopener\"><em>see the official GitHub docs containing the API reference for pgaiVectorizer<\/em><\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-requirements-to-use-pgai-vectorizer-what-you-need\">Requirements to use pgai Vectorizer (what you need)<\/h2>\n\n\n\n<p><strong>To use pgai Vectorizer, install <a href=\"https:\/\/docs.docker.com\/engine\/\" target=\"_blank\" rel=\"noreferrer noopener\">Docker Engine<\/a> and the <a href=\"https:\/\/docs.docker.com\/compose\/\" target=\"_blank\" rel=\"noreferrer noopener\">Docker Compose plugin<\/a>. If you&#8217;re on Windows or Mac OS, you also need <a href=\"https:\/\/www.docker.com\/products\/docker-desktop\/\" target=\"_blank\" rel=\"noreferrer noopener\">Docker Desktop<\/a> (which comes with Compose by default.)<\/strong><\/p>\n\n\n\n<p>You also need an embedding provider API key. The choice of <a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/vectorizer\/overview.md#select-an-embedding-provider-and-set-up-your-api-keys\"><\/a><a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/vectorizer\/overview.md#select-an-embedding-provider-and-set-up-your-api-keys\" target=\"_blank\" rel=\"noreferrer noopener\">embedding provider<\/a> is up to you, but I use <a href=\"https:\/\/openai.com\/api\/\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI<\/a> in this article.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-create-a-database-and-pgai-vectorizer-worker\">How to create a database and pgai Vectorizer worker<\/h2>\n\n\n\n<p>Open up a docker-compose.yml file with your default code editor and paste in the code snippets below:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">name: pgai\nservices:\n  db:\n    image: timescale\/timescaledb-ha:pg17\n    environment:\n      POSTGRES_PASSWORD: postgres\n      OPENAI_API_KEY: &lt;your-openai-api-key&gt;\n    ports:\n      - \"5432:5432\"\n    volumes:\n      - data:\/home\/postgres\/pgdata\/data\n  vectorizer-worker:\n    image: timescale\/pgai-vectorizer-worker:latest\n    environment:\n      PGAI_VECTORIZER_WORKER_DB_URL: postgres:\/\/postgres:postgres@db:5432\/postgres\n      OPENAI_API_KEY: &lt;your-openai-api-key&gt;\n    command: [\"--poll-interval\", \"10s\", \"--log-level\", \"INFO\"]\nvolumes:\n  data:<\/pre><\/div>\n\n\n\n<p>This will pull and start a TimescaleDB PostgreSQL database instance and a single pgai Vectorizer worker. The database will be available on <em>localhost:5432,<\/em> and the vectorizer will automatically poll for embedding jobs every 10 seconds.<\/p>\n\n\n\n<p>Start both containers:<br><code>docker compose up -d<\/code><\/p>\n\n\n\n<p>Verify that they are running:<br><code>docker compose ps<\/code><\/p>\n\n\n\n<p>You should have an output similar to this:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">NAME                       IMAGE                                     COMMAND                  SERVICE             CREATED          STATUS          PORTS\npgai-db-1                  timescale\/timescaledb-ha:pg17             \"\/docker-entrypoint.\u2026\"   db                  34 seconds ago   Up 33 seconds   8008\/tcp, 0.0.0.0:5432-&gt;5432\/tcp, [::]:5432-&gt;5432\/tcp, 8081\/tcp\npgai-vectorizer-worker-1   timescale\/pgai-vectorizer-worker:latest   \"python -m pgai vect\u2026\"   vectorizer-worker   14 hours ago     Up 27 minutes  <\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-set-up-and-run-pgai-vectorizer\">How to set up, and run, pgai Vectorizer<\/h2>\n\n\n\n<p>To set up the pgai Vectorizer tool in your PostgreSQL database, run the following:<br><code>docker compose run --rm --entrypoint \"python -m pgai install -d postgres:\/\/postgres:postgres@db:5432\/postgres\" vectorizer-worker<\/code><\/p>\n\n\n\n<p>This installs the necessary database objects under the ai schema which you can view with:<br> <code>docker compose exec db psql -U postgres -c \"\\\\dt ai.*\"<\/code>. <\/p>\n\n\n\n<p>Once it\u2019s installed, you should have an output similar to <code>2026-03-21 03:45:17 [info ] pgai 0.12.1 installed<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-create-a-table-and-insert-relational-data-in-pgai-vectorizer\">How to create a table and insert relational data in pgai Vectorizer<\/h2>\n\n\n\n<p>Connect to your database instance (<code>db<\/code>) interactively with: <code>docker compose exec db psql -U postgres<\/code><\/p>\n\n\n\n<p>Create the table <code>articles<\/code> to work with:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">CREATE TABLE articles (\nid INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\ntitle TEXT,\nauthor TEXT,\ncontent TEXT\n);  <\/pre><\/div>\n\n\n\n<p>Insert articles into the <code>articles<\/code> table:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">INSERT INTO articles (title, author, content)\nVALUES \n(\n    'The World of Citrus Fruits',\n    'John Doe',\n    'Citrus fruits are among the most widely cultivated fruits in the world, valued for their refreshing flavor and impressive nutritional profile. The citrus family includes oranges, lemons, limes, grapefruits, and tangerines. Oranges alone account for a significant portion of global fruit production, with Brazil, China, and the United States being the largest producers. Citrus fruits are an excellent source of vitamin C, a powerful antioxidant that supports immune function and skin health. Beyond vitamin C, they contain folate, potassium, and beneficial plant compounds like flavonoids and carotenoids linked to reduced risk of chronic diseases. Citrus cultivation dates back thousands of years, with origins traced to Southeast Asia before spreading through the Middle East, Mediterranean, and eventually the Americas. Modern citrus farming faces challenges including pests, diseases like citrus greening, and climate change, which threaten yields in major producing regions.'\n),\n(\n    'Tropical Fruits and Their Health Benefits',\n    'Jane Smith',\n    'Tropical fruits thrive in warm, humid climates near the equator, offering an extraordinary range of flavors, textures, and nutritional benefits. Mangoes, pineapples, papayas, bananas, coconuts, and guavas are among the most popular tropical fruits enjoyed globally. The mango, often called the king of fruits, is particularly rich in vitamin A, vitamin C, and folate, and contains powerful antioxidants like mangiferin with anti-inflammatory properties. Pineapple contains bromelain, a unique enzyme that aids protein digestion and reduces inflammation. Papaya is celebrated for its digestive enzyme papain, which soothes digestive discomfort. Bananas provide a quick source of energy through natural sugars while delivering potassium, magnesium, and vitamin B6. The cultivation of tropical fruits plays a vital economic role in many developing countries, providing livelihoods for millions of small-scale farmers across Africa, Asia, and Latin America.'\n),\n(\n    'Stone Fruits: Nature and Nutrition',\n    'Bob Johnson',\n    'Stone fruits, also known as drupes, are characterized by their fleshy outer layer surrounding a hard pit that contains the seed. Peaches, plums, cherries, apricots, and nectarines all belong to this group, sharing a similar botanical structure despite differences in flavor and texture. Peaches are perhaps the most iconic stone fruit, native to Northwest China and rich in vitamins A and C, potassium, and dietary fiber. Cherries have attracted significant scientific interest due to their high concentration of anthocyanins, powerful antioxidants linked to reduced muscle soreness, improved sleep quality, and lower risk of heart disease. Plums and prunes are well known for their digestive benefits, containing sorbitol and dietary fiber that promote healthy bowel function. Climate change poses a growing challenge to stone fruit farmers, as milder winters are disrupting the chilling requirements that these trees depend on to produce fruit successfully.'\n);<\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-create-a-vectorizer-for-your-table-in-pgai-vectorizer\">How to create a vectorizer for your table in pgai Vectorizer<\/h2>\n\n\n\n<p>pgai AI&#8217;s schema provides several SQL functions to perform AI tasks in PostgreSQL. To create a vectorizer, you must use the <code>ai.create_vectorizer<\/code> function. Run the SQL query below to create a vectorizer for the <code>articles<\/code> table:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">SELECT ai.create_vectorizer(\n  'articles'::regclass,\n  loading =&gt; ai.loading_column('content'),           \n  embedding =&gt; ai.embedding_openai('text-embedding-3-small', 1536), \n  destination =&gt; ai.destination_table('articles_embeddings'),\n  formatting =&gt; ai.formatting_python_template('Title: $title\\\\nAuthor: $author\\\\n$chunk') \n);<\/pre><\/div>\n\n\n\n<p>From the SQL command above, the vectorizer will:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Source data from the <code>articles<\/code> table, load contents from the <code>content<\/code> column, and watch it for changes.<br><\/li>\n\n\n\n<li>Generate embeddings using OpenAI\u2019s <code>text-embedding-3-small<\/code> model.<br><\/li>\n\n\n\n<li>Split text into chunks with overlap to preserve context <a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/utils\/chunking.md#chunk_text_recursively\"><\/a><a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/utils\/chunking.md#chunk_text_recursively\" target=\"_blank\" rel=\"noreferrer noopener\">recursively<\/a>.<br><\/li>\n\n\n\n<li>Format input by prepending title and author metadata to each chunk.<br><\/li>\n\n\n\n<li>Store embeddings in a destination table (or view) named <code>articles_embeddings<\/code>.<\/li>\n<\/ul>\n<\/div>\n\n\n<p>After running this command, the vectorizer worker will automatically generate and sync embeddings. You can monitor its progress with: <code>SELECT * FROM ai.vectorizer_status;<\/code><\/p>\n\n\n\n<p>If the <code>pending_items<\/code> column shows <code>1<\/code>, it&#8217;s still processing your embeddings. If it shows <code>0<\/code>, it&#8217;s up to date.<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">id |               name               |  source_table   |           target_table           |            view            | embedding_column | pending_items | disabled \n----+----------------------------------+-----------------+----------------------------------+----------------------------+------------------+---------------+----------\n  1 | public_articles_embeddings_store | public.articles | public.articles_embeddings_store | public.articles_embeddings | embedding        |             0 | f\n(1 row)<\/pre><\/div>\n\n\n\n<p>Alternatively, you can stream real-time logs from the vectorizer worker when embeddings are being generated:<br><code>docker compose logs -f vectorizer-worker<\/code><\/p>\n\n\n\n<p>You\u2019ll see messages like <code>running vectorizer<\/code>, <code>finished processing vectorizer<\/code> and helpful messages for debugging in case there\u2019s an error:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">vectorizer-worker-1  | 2026-03-21 04:34:38 [info     ] sleeping for 0:00:30 before polling for new work\nvectorizer-worker-1  | 2026-03-21 04:35:08 [warning  ] no vectorizers found\nvectorizer-worker-1  | 2026-03-21 04:35:08 [info     ] sleeping for 0:00:30 before polling for new work\nvectorizer-worker-1  | 2026-03-21 04:35:38 [info     ] running vectorizer             vectorizer_id=1\nvectorizer-worker-1  | 2026-03-21 04:35:56 [info     ] finished processing vectorizer items=3 vectorizer_id=1\nvectorizer-worker-1  | 2026-03-21 04:35:56 [info     ] sleeping for 0:00:30 before polling for new work\nvectorizer-worker-1  | 2026-03-21 04:36:26 [info     ] running vectorizer             vectorizer_id=1\nvectorizer-worker-1  | 2026-03-21 04:36:27 [info     ] finished processing vectorizer items=0 vectorizer_id=1\nvectorizer-worker-1  | 2026-03-21 04:36:27 [info     ] sleeping for 0:00:30 before polling for new work<\/pre><\/div>\n\n\n\n<p>The <code>articles_embeddings<\/code> view will include the original content of the <code>content<\/code> column, plus <code>chunk<\/code> and <code>embedding<\/code> for semantic search. You can query it with: <code>SELECT * FROM articles_embeddings LIMIT 1;&nbsp;&nbsp;<\/code><\/p>\n\n\n\n<p>Setting the limit to 1 helps to inspect the structure and content of the <code>articles_embeddings<\/code> view without loading large amounts of data. If you\u2019d like to view all of its contents, omit the<code> LIMIT 1<\/code> query parameter.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-automate-embeddings-in-pgai-vectorizer\">How to automate embeddings in pgai Vectorizer<\/h2>\n\n\n\n<p>The Vectorizer worker monitors changes through create, update, and delete operations to process embeddings in the background accordingly. This way, your view &#8211; <code>articles_embeddings<\/code> in this case &#8211; stays in sync with the latest content in the source table.<\/p>\n\n\n\n<p>Update the <code>articles<\/code> table to trigger the vectorizer:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">UPDATE articles \nSET \n    title = 'The Brilliance of Berries',\n    content = 'Berries, including strawberries, blueberries, and raspberries, are vibrant fruits packed with fiber and antioxidants. Unlike citrus, they thrive in cooler temperate climates. Blueberries are famous for anthocyanins, which may help brain health and memory. They are often eaten fresh or used in desserts.'\nWHERE id = 1;<\/pre><\/div>\n\n\n\n<p>Stream the logs of the vectorizer to view it processing the update, using: <code>docker compose logs -f vectorizer-worker<\/code><\/p>\n\n\n\n<p>Insert an article into the <code>article<\/code> table:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">INSERT INTO articles (title, author, content)\nVALUES (\n    'Why Papaya Is Called the Fruit of the Angels',\n    'Carlos Rivera',\n    'Papaya, once referred to as the fruit of the angels by Christopher Columbus, is a tropical fruit native to Central America and southern Mexico. Today it is cultivated across tropical regions worldwide, with India, Brazil, and Indonesia among the largest producers. The papaya plant is unique in that it can begin bearing fruit within the first year of planting, making it one of the fastest-yielding fruit crops in tropical agriculture. Papayas are exceptionally rich in vitamin C, vitamin A, folate, and potassium. They also contain lycopene, a powerful antioxidant associated with reduced risk of heart disease and certain cancers. The fruit is perhaps best known for containing papain, a proteolytic enzyme found in both the fruit and its latex that breaks down proteins and is widely used in meat tenderizers, digestive supplements, and pharmaceutical applications. Unripe green papaya is commonly used in savory dishes across Southeast Asia, particularly in the popular Thai green papaya salad. Ripe papaya has a soft, buttery texture and a sweet, musky flavor that makes it a staple breakfast fruit across many tropical countries. Papaya cultivation faces threats from the papaya ringspot virus, a destructive pathogen that devastated Hawaiian papaya crops in the 1990s before the introduction of genetically modified virus-resistant varieties saved the industry.'\n);<\/pre><\/div>\n\n\n\n<p>The vectorizer will generate new embeddings for it in the background. And, with that, the process is complete.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-else-can-you-do-with-pgai-vectorizer\">What else can you do with pgai Vectorizer?<\/h2>\n\n\n\n<p><strong>The pgai Vectorizer tool turns your PostgreSQL database into an AI powerhouse. What we&#8217;ve covered here is just one example of this.<\/strong> <strong>You can also perform a <a href=\"https:\/\/www.tigerdata.com\/blog\/combining-semantic-search-and-full-text-search-in-postgresql-with-cohere-pgvector-and-pgai\"><\/a><a href=\"https:\/\/www.tigerdata.com\/blog\/combining-semantic-search-and-full-text-search-in-postgresql-with-cohere-pgvector-and-pgai\" target=\"_blank\" rel=\"noreferrer noopener\">hybrid search<\/a> and re-rank their results using re-ranking models like <a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/projects\/extension\/docs\/model_calling\/cohere.md#cohere_rerank\" target=\"_blank\" rel=\"noreferrer noopener\">Cohere<\/a> and <a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/vectorizer\/quick-start-voyage.md#reranking-with-voyage-ai\"><\/a><a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/vectorizer\/quick-start-voyage.md#reranking-with-voyage-ai\" target=\"_blank\" rel=\"noreferrer noopener\">Voyage AI<\/a>. Or, why not translate natural language to SQL via <a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/semantic_catalog\/README.md\"><\/a><a href=\"https:\/\/github.com\/timescale\/pgai\/blob\/main\/docs\/semantic_catalog\/README.md\" target=\"_blank\" rel=\"noreferrer noopener\">semantic catalog<\/a>?<\/strong><\/p>\n\n\n\n<p>Whether you decide to use it for more than just automating vector embeddings or not, you&#8217;ve now seen the power of the pgai Vectorizer tool in PostgreSQL. With its vectorizer, manual embedding lifecycles and stale embeddings are a thing of the past. <\/p>\n\n\n\n<p>In this article, you saw this firsthand, with every create, insert, and update command you made being picked up and processed in the background. I hope you found the guide helpful, and feel free to share your thoughts in the comments below!<\/p>\n\n\n\n<section id=\"my-first-block-block_da04abe024dfa9fbcf6eb0a5d8b19dc3\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Simple Talk is brought to you by Redgate Software<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            Take control of your databases with the trusted Database DevOps solutions provider. Automate with confidence, scale securely, and unlock growth through AI.                                    <\/div>\n            <\/div>\n                                            <a href=\"https:\/\/www.red-gate.com\/solutions\/overview\/\" class=\"btn btn--secondary btn--lg\" aria-label=\"Discover how Redgate can help you: Simple Talk is brought to you by Redgate Software\">Discover how Redgate can help you<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: How to automate vector embeddings with pgai Vectorizer in PostgreSQL<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. What is pgai Vectorizer in PostgreSQL?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"198\" data-end=\"366\">It\u2019s a tool from Timescale (TigerData) that automatically generates and updates AI embeddings in PostgreSQL using pgvector.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. Does pgvector update embeddings automatically?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"368\" data-end=\"505\">No. pgvector stores embeddings but requires manual updates when source data changes.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. How does pgai Vectorizer work?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"507\" data-end=\"676\">It uses SQL-defined configurations, triggers, a job queue, and a background worker to generate and refresh embeddings automatically.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">4. What is pgai used for?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"678\" data-end=\"815\">pgai enables AI workflows in PostgreSQL, including automatic embeddings, semantic search, and RAG pipelines.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">5. Do I need Docker to use pgai Vectorizer?<\/h3>\n            <div class=\"faq-answer\">\n                <section class=\"text-token-text-primary w-full focus:outline-none [--shadow-height:45px] has-data-writing-block:pointer-events-none has-data-writing-block:-mt-(--shadow-height) has-data-writing-block:pt-(--shadow-height) [&amp;:has([data-writing-block])&gt;*]:pointer-events-auto [content-visibility:auto] supports-[content-visibility:auto]:[contain-intrinsic-size:auto_100lvh] R6Vx5W_threadScrollVars scroll-mb-[calc(var(--scroll-root-safe-area-inset-bottom,0px)+var(--thread-response-height))] scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]\" dir=\"auto\" data-turn-id=\"request-WEB:6306acb4-7dbf-4974-909d-1006203c1834-9\" data-testid=\"conversation-turn-6\" data-scroll-anchor=\"false\" data-turn=\"assistant\">\n<div class=\"text-base my-auto mx-auto pb-10 [--thread-content-margin:var(--thread-content-margin-xs,calc(var(--spacing)*4))] @w-sm\/main:[--thread-content-margin:var(--thread-content-margin-sm,calc(var(--spacing)*6))] @w-lg\/main:[--thread-content-margin:var(--thread-content-margin-lg,calc(var(--spacing)*16))] px-(--thread-content-margin)\">\n<div class=\"[--thread-content-max-width:40rem] @w-lg\/main:[--thread-content-max-width:48rem] mx-auto max-w-(--thread-content-max-width) flex-1 group\/turn-messages focus-visible:outline-hidden relative flex w-full min-w-0 flex-col agent-turn\">\n<div class=\"flex max-w-full flex-col gap-4 grow\">\n<div class=\"min-h-8 text-message relative flex w-full flex-col items-end gap-2 text-start break-words whitespace-normal outline-none keyboard-focused:focus-ring [.text-message+&amp;]:mt-1\" dir=\"auto\" data-message-author-role=\"assistant\" data-message-id=\"045a9001-489b-448a-90d3-ad0770ded207\" data-message-model-slug=\"gpt-5-3-mini\" data-turn-start-message=\"true\">\n<div class=\"flex w-full flex-col gap-1 empty:hidden\">\n<div class=\"markdown prose dark:prose-invert w-full wrap-break-word light markdown-new-styling\">\n<p data-start=\"817\" data-end=\"960\" data-is-last-node=\"\" data-is-only-node=\"\">Yes, for self-hosted setups Docker is commonly used to run PostgreSQL and the vectorizer worker.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/section>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Automate PostgreSQL vector embeddings with pgai Vectorizer. Learn how to sync pgvector embeddings automatically using Timescale\u2019s AI-powered workflow.&hellip;<\/p>\n","protected":false},"author":341730,"featured_media":105920,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[159169,143523,53,143534],"tags":[159075,158978],"coauthors":[158989],"class_list":["post-110134","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-databases","category-featured","category-postgresql","tag-ai","tag-postgresql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/110134","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/341730"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=110134"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/110134\/revisions"}],"predecessor-version":[{"id":110170,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/110134\/revisions\/110170"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/105920"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=110134"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=110134"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=110134"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=110134"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}