{"id":107690,"date":"2025-10-22T12:25:31","date_gmt":"2025-10-22T12:25:31","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=107690"},"modified":"2025-12-17T11:15:59","modified_gmt":"2025-12-17T11:15:59","slug":"an-ai-powered-t-sql-assistant-built-with-python-and-sql-server","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/an-ai-powered-t-sql-assistant-built-with-python-and-sql-server\/","title":{"rendered":"How to Build an AI-Powered T-SQL Assistant with Python &amp; SQL Server"},"content":{"rendered":"\n<p>If you\u2019re a <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/database-administration-sql-server\/\" target=\"_blank\" rel=\"noreferrer noopener\">SQL Server DBA<\/a> or developer looking to harness AI for your everyday scripting workflows, this article will walk you through building an <strong>AI-powered T-SQL assistant<\/strong> using <a href=\"https:\/\/www.python.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Python<\/a> and <a href=\"https:\/\/www.microsoft.com\/en-gb\/sql-server\/sql-server-downloads\" target=\"_blank\" rel=\"noreferrer noopener\">SQL Server<\/a>.<\/p>\n\n\n\n<p>We\u2019ll cover how to index your scripts with embeddings, store and search them semantically in SQL, and then build a chat-style interface that delivers context-aware script recommendations.<\/p>\n\n\n\n<p>Whether you\u2019re already familiar with AI or just starting out, this walkthrough shows how to apply cutting-edge features in a practical, real-world DBA scenario.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-to-start-a-bit-of-background\">To Start: A Bit of Background<\/h2>\n\n\n\n<p>A very strong memory I have is of my father and his toolbox, with all kinds of tools for the varied repairs that were needed at home. Sometimes, he would spend several minutes looking for a specific screw for something he needed to fix. Over the years, he collected more and more tools &#8211; exactly as I have done after many years of being a <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/database-administration-sql-server\/\" target=\"_blank\" rel=\"noreferrer noopener\">SQL Server DBA<\/a>.<\/p>\n\n\n\n<p>Over time I&#8217;ve gathered numerous SQL scripts that have helped me in various situations, but quickly discovered that finding the right script for the right situation often took far too long, just as my father experienced with his toolbox.<\/p>\n\n\n\n<p>Thanks to AI, that pain can finally be eased: I&#8217;ve made a tool to easily and quickly find the best script for any given need, using <a href=\"https:\/\/learn.microsoft.com\/en-us\/sql\/sql-server\/what-s-new-in-sql-server-2025?view=sql-server-ver17#ai\" target=\"_blank\" rel=\"noreferrer noopener\">new AI features from SQL Server 2025<\/a> (which are already available in <a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/azure-sql\/database\" target=\"_blank\" rel=\"noreferrer noopener\">Azure SQL Database<\/a>) and <a href=\"https:\/\/www.python.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Python<\/a> code that loads an AI model on a GPU to generate and search through data. <\/p>\n\n\n\n<p>In this article I\u2019ll give you a detailed look at how it all works. If you&#8217;re a DBA who has never touched AI beyond <a href=\"https:\/\/chatgpt.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">ChatGPT<\/a>, this is a great opportunity to explore a real-world example in a context you already know: finding the right <a href=\"https:\/\/learn.microsoft.com\/en-us\/sql\/t-sql\/language-reference?view=sql-server-ver17\" target=\"_blank\" rel=\"noreferrer noopener\">T-SQL<\/a> script for your needs. And, if you&#8217;re not a DBA, you&#8217;ll still get an understanding of how to integrate AI into your own solutions and see how it can be used for much more than just text generation.<\/p>\n\n\n\n<p><em>Before continuing, I recommend reading <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/ai-in-sql-server-2025-embeddings\/\" target=\"_blank\" rel=\"noreferrer noopener\">this article about embeddings<\/a>.<\/em><\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"h-introducing-my-git-repository-sql-server-lib\">Introducing my Git Repository: SQL Server Lib<\/h1>\n\n\n\n<p>So before we get started, here\u2019s a quick introduction to my Git repository, <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\" target=\"_blank\" rel=\"noreferrer noopener\">SQL Server Lib<\/a>. <em>The names and comments are in Portuguese but should be pretty obvious for the most part, plus the search function works in any language.<\/em><\/p>\n\n\n\n<p>This repository is the result of a personal project I started this year to review the many useful scripts I&#8217;ve saved in my time as a DBA and share them &#8211; for free &#8211; with the whole world:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"550\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-1024x550.png\" alt=\"An image of the git repository\" class=\"wp-image-107691\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-1024x550.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-300x161.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-768x412.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><a id=\"_msocom_1\"><\/a><\/p>\n\n\n\n<p>So far, I&#8217;ve saved more than 490 scripts to my local folder (started with a free <a href=\"https:\/\/www.dropbox.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Dropbox<\/a>, later migrated to OneDrive), ranging from simple scripts to identify queries causing CPU usage, to utilities like finding a string in every column that contains text. This number keeps increasing as I keep adding new scripts or enhancing existing ones. <\/p>\n\n\n\n<p>At the time of writing, 102 scripts have been added to the repo, as seen by the progress bar on the repo so everyone can keep track of how many are left to add:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"294\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-1-1024x294.png\" alt=\"The progress bar of the repo\" class=\"wp-image-107692\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-1-1024x294.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-1-300x86.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-1-768x221.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-1.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><a id=\"_msocom_2\"><\/a><\/p>\n\n\n\n<p><em>Another advantage of doing this is that anyone can help me enhance these scripts by coming up with new ideas, or just by fixing something. Simply submit a PR via Git and then I (or future maintainers) can review it. In doing so, we can build a free SQL Server script library reviewed by the community itself.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-ai-to-the-rescue-ask-for-a-script-naturally\">AI to the Rescue: Ask For a Script Naturally<\/h2>\n\n\n\n<p>As mentioned in the introduction, even after publishing the scripts I was still spending a significant amount of time trying to find the right script (or the right \u201ctool\u201d, just like my father with his toolbox). Depending on the script, it was sometimes faster to recreate it than to search for it &#8211; but the main issue with that was that some trick or detail I had learned while writing the script could be forgotten.<\/p>\n\n\n\n<p>For years, I found the script I needed just by memory (knowing which folder it was in), or with <a href=\"https:\/\/learn.microsoft.com\/en-us\/powershell\/\" target=\"_blank\" rel=\"noreferrer noopener\">PowerShell<\/a> code like this (to find it by part of code that I vaguely remembered):<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">gci -rec *.sql | sls \u2018dm_exec_query_stats\u2019<\/pre><\/div>\n\n\n\n<p>Then I did a visual check on the results to find the script. But it wasn&#8217;t always simple &#8211; and I didn\u2019t always find the correct one.<\/p>\n\n\n\n<p>But now we live in an AI-powered world, and if there&#8217;s one thing language models are great at, it&#8217;s text processing. And SQL scripts are just text &#8211; so this is a perfect use case. That&#8217;s why I created a web tool that lets you search for scripts from the Git repo. Just type what you need, and the tool will handle the magic. You can access it <a href=\"https:\/\/huggingface.co\/spaces\/rrg92\/sqlserver-lib-assistant\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\n\n\n\n<p>Following is a video that shows how you can use the tool. The first four examples are in Portuguese, but later I demonstrate how the tool can handle other languages including Japanese, French, and even Hindi.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"614\" height=\"460\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-2.png\" alt=\"The text field says: I need to find information about indexes.\" class=\"wp-image-107722\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-2.png 614w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-2-300x225.png 300w\" sizes=\"auto, (max-width: 614px) 100vw, 614px\" \/><\/figure>\n\n\n\n<p><a id=\"_msocom_1\"><\/a><\/p>\n\n\n\n<p>In the rest of this post, I\u2019ll explain the code and services I used to build the tool, so by the end, you\u2019ll have a better understanding of how SQL Server can be used for AI &#8211; and how some AI models work beyond just generating text. Don\u2019t worry if you\u2019re not familiar with AI terms yet; I\u2019ll only assume you have some basic experience with programming logic and a bit of experience with SQL.<\/p>\n\n\n\n<section id=\"my-first-block-block_cff4effcd6ac5a6975ca632882fe1e66\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">The fast and easy way to write, format and debug SQL<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            SQL Prompt, now with AI capabilities, works alongside you to make your SQL coding fast, accurate and effortless.                                    <\/div>\n            <\/div>\n                            <a href=\"https:\/\/www.red-gate.com\/products\/sql-prompt\/\" class=\"btn btn--secondary btn--lg\">Learn more &amp; try for free<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<h1 class=\"wp-block-heading\">Divide and Conquer<\/h1>\n\n\n\n<p>Using AI to assist in search involves two main phases: <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/learn\/sql-server-index-basics\/\" target=\"_blank\" rel=\"noreferrer noopener\">indexing<\/a> the content and the search itself.<\/p>\n\n\n\n<p>To start smoothly, let me show you a diagram with all the technologies I used to build this project and how they relate to these two phases:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"821\" height=\"592\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-3.png\" alt=\"The diagram which is explained in the following paragraph.\" class=\"wp-image-107724\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-3.png 821w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-3-300x216.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-3-768x554.png 768w\" sizes=\"auto, (max-width: 821px) 100vw, 821px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>This diagram might look a bit intimidating at first glance but don&#8217;t worry, we\u2019ll break it down throughout the post. After reading, you can come back and see how many new things you\u2019ve learned about AI. You\u2019ll notice that the architecture is actually very simple and includes common elements from a traditional <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/architecture\/data-guide\/relational-data\/etl\" target=\"_blank\" rel=\"noreferrer noopener\">ETL (Extract, Transform, Load)<\/a> project &#8211; although I\u2019ve seen plenty of ETL pipelines out there that are far more complex and don\u2019t use any AI at all.<\/p>\n\n\n\n<p>The left side of the image, where you see the stick figure labeled <em>Rodrigo<\/em>, represents the part that inserts data into SQL Server &#8211; in other words, the indexing phase. The yellow area on the right is the phase where we interact with the chatbot, starting from the stick figure labeled <em>User,<\/em> which represents anyone in the world. A small part of this yellow area is also used during the indexing phase.<\/p>\n\n\n\n<p>What we\u2019re doing here is known as <a href=\"https:\/\/cloud.google.com\/use-cases\/retrieval-augmented-generation?hl=en\" target=\"_blank\" rel=\"noreferrer noopener\">RAG: Retrieval-Augmented Generation<\/a>. That\u2019s just a fancy name for something simple: take data related to the user\u2019s question and provide it to the AI chatbot to improve the answer. But here\u2019s the real challenge: how do we find the relevant data for the question?<\/p>\n\n\n\n<p>To do that, we need to index the content so we can efficiently retrieve what&#8217;s relevant to the user&#8217;s text. In other words, we need to search by meaning, or do a <a href=\"https:\/\/cloud.google.com\/discover\/what-is-semantic-search?hl=en\" target=\"_blank\" rel=\"noreferrer noopener\">semantic search<\/a>. For example, if the user searches for &#8220;performance problem,&#8221; we want to return scripts that check CPU usage, validate what\u2019s currently running, etc. We don\u2019t want to return a script that recreates constraints as it\u2019s not relevant in this case. <\/p>\n\n\n\n<p>So, once all the content is stored in a database, and a user sends a query, we go to that database, retrieve the most relevant entries, and use a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model\" target=\"_blank\" rel=\"noreferrer noopener\">Large Language Model (LLM)<\/a> to present and explain those results as if an expert were responding.<\/p>\n\n\n\n<p>Note: LLM stands for Large Language Model, the type of AI that generates text. It&#8217;s the core behind tools like ChatGPT, <a href=\"https:\/\/www.deepseek.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">DeepSeek<\/a> or <a href=\"https:\/\/gemini.google.com\/app\" target=\"_blank\" rel=\"noreferrer noopener\">Google Gemini<\/a>. Maybe your experience with it so far has been through a nice web interface, but there are also APIs and services we can call programmatically. You\u2019ll see an example of that in this project: we\u2019ll use Google Gemini to generate the final response text.<\/p>\n\n\n\n<p>And thanks to the LLM, we can guide it to read the relevant scripts we\u2019ve found and present them in any language we want &#8211; for example, matching the user\u2019s original input language. Even if I wrote or commented my scripts in Brazilian Portuguese or English, AI helps make my experience accessible to the entire world, in their own native language.<\/p>\n\n\n\n<p>It\u2019s important to remember that in this project, we\u2019re not only using the type of AI that generates text (the LLM), but also two other types of AI models that don\u2019t generate text &#8211; they generate numbers (that can be used to do some other operations) based on text.<\/p>\n\n\n\n<p>Throughout the examples, you\u2019ll understand why the new vector support in SQL is so important and truly game-changing in the moment we\u2019re living in.<\/p>\n\n\n\n<p>In this article, I\u2019ll point out whenever the code is interacting or loading an AI model. So, let\u2019s dive into each of these phases separately to see how the code works to make all the magic happen!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-phase-1-indexing-the-data\">Phase 1: Indexing the Data<\/h2>\n\n\n\n<p>The first thing needed for a project like this is to index my scripts. But it&#8217;s not just taking the content of the file and throwing it into a database to do a LIKE or a Full-Text search. Remember that we need to support searching by meaning. If I search for the text <em>performance problem<\/em>, then the search should find all the scripts that talk about requests, locks, waits, identifying CPU usage, etc.<\/p>\n\n\n\n<p>In the world of AI, there is a concept called embeddings. They are a numerical representation of text. <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/ai-in-sql-server-2025-embeddings\/\" target=\"_blank\" rel=\"noreferrer noopener\">The post I recommended earlier<\/a> provides a more complete introduction, but here\u2019s a quick summary:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>You need an AI model that can generate that embeddings.<\/li>\n\n\n\n<li>You input a text into that AI model.<\/li>\n\n\n\n<li>It outputs an array of numbers that represent the text and its meaning.<\/li>\n\n\n\n<li>You can repeat that process for another text and compare this array with another using a simple math operation called <em>Cosine Distance<\/em>. A result closer to 0 means that the two texts have similar meanings.<\/li>\n<\/ol>\n<\/div>\n\n\n<p>Basically, I need to generate the embeddings of my scripts and store them in a database table. Since I\u2019m using GitHub, I chose to use <a href=\"https:\/\/github.com\/features\/actions\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Actions<\/a>, so whenever I update the repository, it automatically triggers a script that generates the embeddings.<a id=\"_msocom_1\"><\/a><\/p>\n\n\n\n<p><strong>GitHub Actions<\/strong><\/p>\n\n\n\n<p>Think of GitHub Actions as a \u201ctrigger\u201d that runs when something happens in my repository. A YAML file called <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/.github\/workflows\/embed.yml\">embed.yml<\/a> defines all of that for me. This is called workflow. To better control when this process starts, I configured an action that runs a PowerShell script whenever I push a tag with the prefix embed-*.<\/p>\n\n\n\n<p>This gives me control over when I want to start the process and even lets me do a bit of version control for the indexing phase. For example, I can choose not to trigger the process if I\u2019ve only changed files that I don\u2019t consider important to index right now.<\/p>\n\n\n\n<p>But if you don\u2019t know what a Git tag is, no problem! Just imagine this: whenever I make a change in my repo, something automatically runs a PowerShell script for me on GitHub\u2019s servers.<\/p>\n\n\n\n<p>If you\u2019d like to learn more about GitHub Actions, check out this great series on Simple Talk by Kathi Kellenberger: <a href=\"https:\/\/www.red-gate.com\/simple-talk\/featured\/introduction-to-github-actions-part-1-getting-started\/\" target=\"_blank\" rel=\"noreferrer noopener\">Introduction to GitHub Actions Part 1: Getting Started &#8211; Simple Talk<\/a><\/p>\n\n\n\n<p>The script responsible for this is called <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/SqlLibEmbeddings\/embed.ps1\" target=\"_blank\" rel=\"noreferrer noopener\">embed.ps1<\/a>. Yes, it\u2019s a PowerShell script. Let\u2019s now rebuild our initial diagram to show the components we\u2019ve covered so far:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"677\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-4.png\" alt=\"A diagram showing the components we've built so far.\" class=\"wp-image-107725\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-4.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-4-300x216.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-4-768x553.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-first-ai-use-generating-script-embeddings\">First AI Use: Generating Script Embeddings<\/h3>\n\n\n\n<p>If you thought we would only use AI to generate text, here is the first surprise. AI goes far beyond generating text, and in our project, the first use is to generate the numerical representation of our scripts content, the famous embeddings. This will allow us to search for meaning later in SQL Server (thanks to the new AI support).<\/p>\n\n\n\n<p>For each repository file, I will generate its embeddings using a Hugging Face Space that I created.<\/p>\n\n\n\n<p><strong>About Hugging Face <\/strong><strong>\ud83e\udd17<\/strong><strong> and Gradio<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging Face<\/a> (yes, the same name as the emoji) is an online service where anyone can upload AI models and build demo applications that use those AI models. Hugging Face provides the infrastructure to host them, including hardware resources like GPUs to showcase projects like the one I\u2019m presenting to you. It\u2019s known as the \u201c<em>GitHub of AI\u201d<\/em> and there you\u2019ll find lots of resources related to the AI world! So, it will be common to hear about Hugging Face (or just \ud83e\udd17) when talking about AI.<\/p>\n\n\n\n<p>The <em>Hugging Face Space<\/em> is where we create the demo application. Think of it like this: anyone can create a <em>space<\/em> which simply contains Python code that runs a web interface, load AI models, invoke APIs, etc. Each space has its own unique URL for access.<\/p>\n\n\n\n<p>There\u2019s a cool Python library developed specifically for AI\/Machine Learning demos: <a href=\"https:\/\/www.gradio.app\/\" target=\"_blank\" rel=\"noreferrer noopener\">Gradio<\/a>. Thanks to that library, it\u2019s super easy to develop complex UI interfaces like chats, tabs, etc. And because it\u2019s made for Machine Learning, it includes lots of components from the AI world. So, it\u2019s common to see many spaces on Hugging Face using Gradio, which handles all the HTML, JavaScript, and APIs. I don\u2019t need to worry about that &#8211; I can just focus on the code that does solves my problem!<\/p>\n\n\n\n<p>The tool you see in the video at the start of this article is the Space I created to showcase this project, and it uses that library, Gradio, to provide the UI interface and API. We\u2019ll look at the code soon.<\/p>\n\n\n\n<p>If you look at the embed.ps1 file, you will see that I use a function called GetEmbeddings. This function is defined in the file <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/SqlLibEmbeddings\/util.ps1\">util.ps1<\/a>. It uses a PowerShell module the I created called <a href=\"https:\/\/powersh.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">PowershAI<\/a>, which has one of the features I like the most: I can invoke the Gradio API as if it were a PowerShell function. Thanks to that, I can integrate the AI services I exposed via Python directly into PowerShell, and, in this case, the embed function is available to me from that script:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"797\" height=\"763\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-5.png\" alt=\"\" class=\"wp-image-107726\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-5.png 797w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-5-300x287.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-5-768x735.png 768w\" sizes=\"auto, (max-width: 797px) 100vw, 797px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>This part of the code is straightforward:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li><strong>Line 17<\/strong>: I read all SQL scripts. I could\u2019ve used a simple command like <em>gci *.sql -rec<\/em>, which returns all .sql files in the current directory. But since I was testing on my machine and not every script had been committed to Git yet, I had to make it a bit more complex to get only the files already committed. In the future, once all files are committed, I can simplify this part.<br><\/li>\n\n\n\n<li>Now that I have a list of SQL scripts to index, I can iterate over them using a foreach loop. <strong>Line 27<\/strong> starts that loop.<br><\/li>\n\n\n\n<li>For each file, I read its content (as shown in <strong>line 30<\/strong>). A future improvement here is to ensure the correct file encoding\u2014this helps avoid common issues with special characters, especially since most of my files are documented in pt-BR.<br><\/li>\n\n\n\n<li>Then, I build the final string to be indexed, including the relative path and content of the script. In the image above, the text on <strong>line 34<\/strong> is in pt-BR and means \u201cName of Script:\u201d and \u201cScript content:\u201d, respectively. So in the end, $EmbeddingContent will hold the content I want to index.<br><\/li>\n\n\n\n<li>Next, I call the GetEmbeddings function, which generates the embeddings for the given text.<br><\/li>\n\n\n\n<li>Finally, I build a simple object containing all the data I want to insert into SQL Server\u2014think of this object as a row. I keep adding these rows to an array called $ScriptData, so I can insert everything at once. There\u2019s definitely room for optimization in the PowerShell code here, but since we\u2019re only dealing with about 100 items, I\u2019m not too worried for now. Future versions can improve this part.<\/li>\n<\/ul>\n<\/div>\n\n\n<p><strong>Why PowerShell and PowershAI?<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/powersh.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">PowershAI<\/a> is a new module that I\u2019m developing and that allows us to access a lot of AI services from the command line, using the common PowerShell way of doing things. I could have used Python, JavaScript, etc., inside GitHub Actions, but I found it interesting to use PowerShell here for the following reasons:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>I find it simpler and easier to manipulate files with PowerShell.<br><\/li>\n\n\n\n<li>It was a great opportunity to put my project to a more real-world test.<br><\/li>\n\n\n\n<li>Connecting to SQL Server and doing things like BULK INSERT is easier with PowerShell (thanks to the .NET library support).<\/li>\n<\/ul>\n<\/div>\n\n\n<p>Here is our updated diagram with all that flow added:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"846\" height=\"690\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-6.png\" alt=\"The updated diagram with the flow added.\" class=\"wp-image-107727\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-6.png 846w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-6-300x245.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-6-768x626.png 768w\" sizes=\"auto, (max-width: 846px) 100vw, 846px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>As seen in the diagram above, embed.ps1, which was triggered by my change to Git, iterates through each SQL file in my repository and generates embeddings for them using PowershAI to access an AI model that our Python code loaded there on Hugging Face, using GPU to be fast. Later, in the Phase 2 section, we\u2019ll look at how the embed function works on the Python side. For now, just trust that this call produces\u2014for each file\u2014an array of 1024 positions, each being a float, representing the semantics of our script.<\/p>\n\n\n\n<p>Now, we need to store the results in some database\u2014and this is where SQL Server&#8217;s new features come into play.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-inserting-the-data-into-sql-server\">Inserting the Data into SQL Server<\/h3>\n\n\n\n<p>Once I have calculated all the embeddings, I can now insert them into the database. Here I have several approaches that I could use: Insert each calculated embedding into the database, calculate everything at once and insert it into the database at the end, insert only what has changed from one commit to another, etc.<\/p>\n\n\n\n<p>For this first version, I chose to loop, calculate all the embeddings of all the files, and in the end, do TRUNCATE TABLE and a BULK INSERT of all the embeddings. The main disadvantage of this is that if one of the embeddings fails, I lose everything else and have to start all over again. In future versions, I will improve this, but for a simple case like this, it works well. Therefore, after the loop, I do this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"836\" height=\"125\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-7.png\" alt=\"\" class=\"wp-image-107728\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-7.png 836w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-7-300x45.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-7-768x115.png 768w\" sizes=\"auto, (max-width: 836px) 100vw, 836px\" \/><\/figure>\n\n\n\n<p>dbulk is just a helper function defined in the <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/SqlLibEmbeddings\/util.ps1\" target=\"_blank\" rel=\"noreferrer noopener\">util.ps1<\/a> script. It uses <a href=\"https:\/\/learn.microsoft.com\/en-us\/dotnet\/api\/system.data.sqlclient?view=windowsdesktop-9.0\" target=\"_blank\" rel=\"noreferrer noopener\">the native .NET libraries for connecting to SQL Server<\/a> (yes, they are deprecated, but it still works fine. I may update it to use the new library in the future). This function performs a BULK INSERT (you\u2019ll actually see INSERT BULK &#8230; in sys.dm_exec_requests when it runs). The first parameter is an array of objects that have the same columns as the target table where I want to load the data. It also accepts a parameter called <em>-pre<\/em>, which is a script that will be executed before the bulk insert happens. <\/p>\n\n\n\n<p>As mentioned earlier, this part could be improved in the future by implementing incremental loads instead of truncating and re-inserting everything. The function gets the connection info from environment variables configured in GitHub\u2014these are only accessible when the pipeline is running.<\/p>\n\n\n\n<p>This is our updated diagram (note: in the diagram, I used an INSERT &#8230; VALUES statement just for layout\/white-space reasons. In the code, the operation is TRUNCATE followed by a BULK INSERT):<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"834\" height=\"685\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-8.png\" alt=\"\" class=\"wp-image-107729\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-8.png 834w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-8-300x246.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-8-768x631.png 768w\" sizes=\"auto, (max-width: 834px) 100vw, 834px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Looking at the diagram, we\u2019ve added <strong>Step 2<\/strong>, flowing out from the embed.ps1 icon. It represents the next part of the data flow: retrieving the embeddings and script information (file name, content, etc.) and inserting them into a table on our SQL instance.<\/p>\n\n\n\n<p>At the time I\u2019m writing this post, <a href=\"https:\/\/www.microsoft.com\/en-us\/evalcenter\/evaluate-sql-server-2025\" target=\"_blank\" rel=\"noreferrer noopener\">SQL Server 2025<\/a> is already in public preview, but I will keep using an <a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/azure-sql\/database\" target=\"_blank\" rel=\"noreferrer noopener\">Azure SQL Database,<\/a> which already supports the AI feature we need: the <strong>vector data type<\/strong>. Our database contains just a single table, which is defined in <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/SqlLibEmbeddings\/tab.Scripts.sql\" target=\"_blank\" rel=\"noreferrer noopener\">tab.Scripts.sql<\/a>:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">CREATE TABLE Scripts (\n\t id int IDENTITY PRIMARY KEY WITH(DATA_COMPRESSION = PAGE)\n\t,RelPath varchar(1000) NOT NULL\n\t,ChunkNum int NOT NULL\n\t,ChunkContent nvarchar(max) NOT NULL\n\t,embeddings vector(1024)\n)\n<\/pre><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Some info about these columns:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li><strong>Id<\/strong> is just a unique sequential number for each script. I enabled data compression, but due to the large size of the columns involved, it won\u2019t be very effective, so I\u2019ll probably disable it.<br><\/li>\n\n\n\n<li>RelPath is the relative path of the script from the Git repo root.<br><\/li>\n\n\n\n<li>ChunkNum is a sequential number indicating the chunk of the script. I can split large scripts into chunks &#8211; AI models can have limits on the maximum text they can process for embeddings. So chunking is a common technique to deal with that. In this version, I\u2019m not doing that yet because I want to keep it simple, but I plan to in the future. So, for now, this value is always 1.<br><\/li>\n\n\n\n<li>ChunkContent is the content of the respective chunk. Since I\u2019m not doing any chunking yet, this ends up being the entire file content.<br><\/li>\n\n\n\n<li>Embeddings is the new one here: it defines a new type called VECTOR, which is the type SQL uses to represent these vectors. The 1024 is the size, so SQL will only accept embeddings with exactly 1024 positions, no more, no less. That means my AI model needs to support that size.<\/li>\n<\/ul>\n<\/div>\n\n\n<p>The Azure SQL Database setup is very simple. I initially tried to start with the <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-sql\/database\/free-offer?view=azuresql\" target=\"_blank\" rel=\"noreferrer noopener\">Azure SQL Free Offer<\/a>, but during my tests, the free seconds ran out, so I decided to upgrade to a basic 5 DTU server. If I need more processing, for example if there are more scripts involved or more users accessing, I can evaluate the various options for scaling that Azure SQL Database provides me, so it&#8217;s very easy and convenient to use SQL here.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"740\" height=\"548\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-9.png\" alt=\"The various tiers available\" class=\"wp-image-107730\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-9.png 740w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-9-300x222.png 300w\" sizes=\"auto, (max-width: 740px) 100vw, 740px\" \/><\/figure>\n\n\n\n<p><a id=\"_msocom_1\"><\/a><\/p>\n\n\n\n<p>This SQL database is exposed on the internet, so anyone with the address and credentials could connect to it. But for me, that\u2019s not a big problem right now &#8211; the data is completely public, and someone would still need to guess the credentials, which are very strong. So, I decided to focus my efforts on more important parts of the project. An additional layer of security here would be to only allow requests from the GitHub and Hugging Face IP range. However, I didn&#8217;t do it for simplicity, but if necessary, you can explore more options.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Summary of Indexing Steps<\/h3>\n\n\n\n<p>Then we\u2019ve covered the entire first phase. To summarize, here\u2019s the full process:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>When I push scripts to my repo and add a Git tag that starts with embed-*, it triggers a workflow defined in the <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/.github\/workflows\/embed.yml\" target=\"_blank\" rel=\"noreferrer noopener\">embed.yml<\/a> file<br><\/li>\n\n\n\n<li>The workflow loads a PowerShell Docker image that runs <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/SqlLibEmbeddings\/embed.ps1\" target=\"_blank\" rel=\"noreferrer noopener\">embed.ps1<\/a>.<br><\/li>\n\n\n\n<li>The embed.ps1 script uses the <a href=\"https:\/\/powersh.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">PowershAI<\/a> module to connect to a <a href=\"https:\/\/huggingface.co\/spaces\/rrg92\/sqlserver-lib-assistant\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging Face Space<\/a> I created, generating 1024-dimension embeddings for each SQL script file using GPU for fast process.<br><\/li>\n\n\n\n<li>After all embeddings are generated, the <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/SqlLibEmbeddings\/tab.Scripts.sql\" target=\"_blank\" rel=\"noreferrer noopener\">Scripts<\/a> table is truncated, and I use BULK INSERT to insert the script data (filename, contents, embeddings) into the table.<\/li>\n<\/ol>\n<\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"834\" height=\"685\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-10.png\" alt=\"\" class=\"wp-image-107731\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-10.png 834w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-10-300x246.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-10-768x631.png 768w\" sizes=\"auto, (max-width: 834px) 100vw, 834px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Next, let\u2019s see how the other phase &#8211; searching &#8211; is done. We\u2019ll take a look at the Python code that runs the models and how SQL Server is used to help find data related to the user&#8217;s question!<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Phase 2: The Search<\/h1>\n\n\n\n<p>Now you know how we index the scripts; it&#8217;s time for the most magical part: How do I do a semantic search and generate a response based on this data?! This part is almost entirely in the Hugging Face Space using Python and a part going to SQL Server.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Presenting the Hugging Face Space<\/h2>\n\n\n\n<p>Everything starts when the user types a message in the chat interface. Now let&#8217;s look at the diagram with only what we need for phase 2:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"704\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-11.png\" alt=\"\" class=\"wp-image-107732\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-11.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-11-300x225.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-11-768x575.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The yellow area represents the Hugging Face boundaries, showing that everything is hosted there. Up to this point, I hadn\u2019t shared any details about the Space I created\u2014so let\u2019s check it out: <a href=\"https:\/\/huggingface.co\/spaces\/rrg92\/sqlserver-lib-assistant\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/huggingface.co\/spaces\/rrg92\/sqlserver-lib-assistant<\/a><\/p>\n\n\n\n<p>After click on link, go to the Files tab, you&#8217;ll see all the code I use to upload this Space (Hugging Face Spaces are also git repositories):<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"440\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-12-1024x440.png\" alt=\"All the code used to upload this Hugging Face Space.\" class=\"wp-image-107733\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-12-1024x440.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-12-300x129.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-12-768x330.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-12.png 1040w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The file that gives life to the Space is <a href=\"https:\/\/huggingface.co\/spaces\/rrg92\/sqlserver-lib-assistant\/blob\/main\/app.py\" target=\"_blank\" rel=\"noreferrer noopener\">app.py<\/a>.. It\u2019s the entry point, so when you access the demo (in the App tab), you\u2019re seeing what that file outputs. In this case, it\u2019s a web interface built using the Gradio Python library.<\/p>\n\n\n\n<p>Just open the file and you\u2019ll see a lot of Python code. Don\u2019t worry if you\u2019re not familiar with Python &#8211; I\u2019ll explain the important parts. So, if you have experience with any programming language, that\u2019s enough to understand what I\u2019m going to show here.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-starting-point-chat-interface\">Starting Point: Chat Interface<\/h2>\n\n\n\n<p>As mentioned previously, the Interface is built with the library <a href=\"https:\/\/www.gradio.app\/\" target=\"_blank\" rel=\"noreferrer noopener\">Gradio<\/a>. It is focused on building Machine Learning applications. It contains several elements common to the world of AI and is therefore widely integrated into Hugging Face <a href=\"https:\/\/huggingface.co\/blog\/gradio-joins-hf\" target=\"_blank\" rel=\"noreferrer noopener\">who owns the Gradio startup<\/a>. I quite like it because of how easy it is to integrate these common visual elements into the world of AI.<\/p>\n\n\n\n<p>For example, creating the interface for the chat, and the table that receives the data is extremely simple. At the beginning of the script, I import the library:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"534\" height=\"170\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-13.png\" alt=\"\" class=\"wp-image-107734\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-13.png 534w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-13-300x96.png 300w\" sizes=\"auto, (max-width: 534px) 100vw, 534px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The <strong>gr<\/strong> is just an alias for the library, so I reference it using that shorter name. &nbsp;<\/p>\n\n\n\n<p>Here is where I define the Chat interface: About 10 lines of code. Anyone who knows web programming knows how complex it is to create this from scratch. So, thank you Gradio team! This saves me a lot of time and allows me to focus more on AI models and solutions, rather than on visual elements.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"424\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-14.png\" alt=\"\" class=\"wp-image-107735\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-14.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-14-300x135.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-14-768x346.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The gr.ChatInterface class is the main piece of this code. It handles everything needed to render the chat web interface. When you type a message and press Enter (or click the send button), it calls a function passed as its first parameter: ChatFunc. This is just a regular Python function that I defined earlier in the same file. Let\u2019s take a look at it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-chatfunc-function\">The ChatFunc Function<\/h2>\n\n\n\n<p>When you type the text in the interface and send the message, the Chatfunc function is invoked. This function defines some elements to interact with the chat interface, such as the current message, history, and other parameters that I can pass, in this case, the language detected or chosen by the user. <a href=\"https:\/\/www.gradio.app\/guides\/creating-a-chatbot-fast\" target=\"_blank\" rel=\"noreferrer noopener\">It has a very complete guide<\/a> from Gradio team. I highly recommend that you read it if you want to learn all the details of how to use it.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"688\" height=\"213\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-15.png\" alt=\"\" class=\"wp-image-107736\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-15.png 688w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-15-300x93.png 300w\" sizes=\"auto, (max-width: 688px) 100vw, 688px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Here is our updated diagram up to here:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"456\" height=\"536\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-16.png\" alt=\"\" class=\"wp-image-107737\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-16.png 456w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-16-255x300.png 255w\" sizes=\"auto, (max-width: 456px) 100vw, 456px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Before we move on to the next relevant piece of code, you\u2019ll notice some auxiliary functions and classes in the code, like the BotMessage class, the Reply function, yield statement (it is just way to output content to caller in python) and others. These are just helpers I use to output data to the interface in real-time (while the process is running). Since this article is focused on how semantic search works, I won\u2019t go into detail about those parts. But if you\u2019d like more clarification, feel free to leave a comment &#8211; I\u2019ll be happy to write a separate post just about how to display messages using Gradio\u2019s ChatInterface. Also, the guide I shared earlier has some good examples if you want to explore that further.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-second-use-of-ai-translating-the-user-s-message\">Second Use of AI: Translating the User&#8217;s Message<\/h2>\n\n\n\n<p>Here is the second point in our project where we use an AI model (the first was generating embeddings during the indexing phase). Now, we\u2019ll use an actual language model, but instead of running it directly in the code, I\u2019ll invoke it through an API.<\/p>\n\n\n\n<p>The function that do this is called <strong>ai<\/strong>. This function invokes an LLM using the <a href=\"https:\/\/openai.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI<\/a> Python libraries.<br>In this case, the first execution of it asks for a translation of the user&#8217;s text into English.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"282\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-17.png\" alt=\"\" class=\"wp-image-107738\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-17.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-17-300x90.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-17-768x230.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Why is that? After testing, I saw that the model I use for generating embeddings and comparing them performs very well in English. So, I decided to use the English text for the search. Thanks to LLMs, doing this is very easy and peaceful. This is the first use of the Google API that I have made in that project. Here I also take the opportunity to identify the language of the source text.<\/p>\n\n\n\n<p>Finally, the Question variable will have the translated text and SourceLang the language chosen automatically (if automatic mode is enabled in the Settings tab). I could upload a smaller model in this script to do the translation, but I chose not to just to keep the code simpler. Maybe a future improvement here.<\/p>\n\n\n\n<p>In case you&#8217;re curious, the <strong>ai<\/strong> function is just a convenience so I can call the OpenAI library in a more flexible way:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"838\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-18.png\" alt=\"\" class=\"wp-image-107739\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-18.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-18-300x267.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-18-768x685.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>For simplicity, the ai function accepts two parameters: system and user. System is the <strong>system prompt<\/strong>. In LLMs, the system prompt is a special instruction message that defines how the model should behave when answering the user message, which is passed through the user parameter. The schema parameter accepts a class, allowing the model to generate a response using the structure of that class (i.e., matching its properties). The ai function internally calls another function named llm, which abstracts some of the calls to the underlying OpenAI library. If I want more advanced control over the parameters, you can simply use the llm function directly.<\/p>\n\n\n\n<p>Here, it\u2019s just some conveniences for me\u2026 All of this can be summarized as: I\u2019m calling the OpenAI library, which sends the request to the Google Gemini API. Now, you might be wondering:<\/p>\n\n\n\n<p><em>How are you using the OpenAI library if you&#8217;re using Google Gemini? This is confusing, Rodrigo!<\/em><\/p>\n\n\n\n<p>Let me explain: OpenAI was the pioneer in popularizing LLMs, so most people started by learning and using its library to connect to its API. When other providers entered the scene, like Google or DeepSeek, they wanted to make adoption easy. So, many of them exposed their APIs using the <strong>same structure and conventions<\/strong> as OpenAI\u2019s, making them compatible with the OpenAI library. This means that if someone wants to switch models or providers, they often just need to change the <strong>base URL<\/strong> of the API &#8211; everything else works as is.<\/p>\n\n\n\n<p>If you look at my code, the ClientOpenai variable is defined on line 66. Notice how I change the base_url parameter to tell the library: <em>&#8220;Hey, use this URL for your API calls, it\u2019s compatible with how this library works.&#8221;<\/em><\/p>\n\n\n\n<p>The values I pass to the function come from environment variables, which I can set in the Space Settings (I\u2019ll show that later). If no value is passed for the base URL, I fall back to the default: Google\u2019s API.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"94\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-19-1024x94.png\" alt=\"\" class=\"wp-image-107740\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-19-1024x94.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-19-300x27.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-19-768x70.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-19.png 1026w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>How do I know that? Simple, <strong>Google\u2019s doc<\/strong> tells me: <a href=\"https:\/\/ai.google.dev\/gemini-api\/docs\/openai\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI compatibility &nbsp;|&nbsp; Gemini API &nbsp;|&nbsp; Google AI for Developers<\/a><\/p>\n\n\n\n<p>By doing it this way, I can use any LLM that\u2019s compatible with the OpenAI library just by changing the base URL. Simple and flexible!<\/p>\n\n\n\n<p>Here is our updated diagram with a call to Google Gemini:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"682\" height=\"672\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-20.png\" alt=\"\" class=\"wp-image-107741\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-20.png 682w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-20-300x296.png 300w\" sizes=\"auto, (max-width: 682px) 100vw, 682px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-third-use-of-ai-generating-embeddings-again\">Third Use of AI: Generating Embeddings Again<\/h2>\n\n\n\n<p>Following the code, the next relevant snippet is the search function. This is where we will again use an AI model that is already our old friend.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"186\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-21-1024x186.png\" alt=\"\" class=\"wp-image-107742\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-21-1024x186.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-21-300x55.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-21-768x140.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-21.png 1067w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The <em>search<\/em> function basically connects to SQL Server and obtains the most relevant scripts for a text passed as a parameter. Let&#8217;s explore it in detail:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"254\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-22.png\" alt=\"\" class=\"wp-image-107743\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-22.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-22-300x81.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-22-768x208.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The first thing I need to do is get the text embeddings. Here I use the embed function that you learned about earlier. That it is exactly the same function that I use there in phase 1 for indexing. This is very important because for this whole process to work well (index and search), you need to generate the embeddings for user text using the same model and parameters that you used to index. <\/p>\n\n\n\n<p>Generating embeddings with a different model or settings can lead to very poor results, bringing content that has absolutely nothing to do with the text. The difference here is that, when indexing, I call the function via API, using the powershai module and here, in the query, I call it directly, since it is loaded in the same code that generates the interface.<\/p>\n\n\n\n<p>Let\u2019s update the diagram so you can see where we are. This time, I\u2019ll include the Phase 1 diagram as well, just to help you remember where we used it earlier:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"934\" height=\"697\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-23.png\" alt=\"\" class=\"wp-image-107744\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-23.png 934w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-23-300x224.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-23-768x573.png 768w\" sizes=\"auto, (max-width: 934px) 100vw, 934px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Note that, on the Hugging Face side, the user reaches the embed function through the ChatFunction, which calls the search function, and then calls embed. On the GitHub Actions side, we also reach the embed function &#8211; but this time via the <a href=\"https:\/\/powersh.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">PowershAI<\/a> module, which wraps the API calls to Gradio. That\u2019s the point: we\u2019re using the same code to generate embeddings in both phases!<\/p>\n\n\n\n<p>It is time now to explain the embed function and finally learn how we load an AI model in my code!<\/p>\n\n\n\n<section id=\"my-first-block-block_37b4644537d1269c9dac0ccfaa547795\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">10 tools for every stage of SQL Server development<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            SQL Toolbelt Essentials includes 10 ingeniously simple tools that cover your entire database development lifecycle.                                    <\/div>\n            <\/div>\n                            <a href=\"https:\/\/www.red-gate.com\/products\/sql-toolbelt-essentials\/\" class=\"btn btn--secondary btn--lg\">Learn more &amp; try for free<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-embed-function\">The &#8216;Embed&#8217; Function<\/h3>\n\n\n\n<p>The embed function is responsible for generating embeddings for a given text. Remember, embeddings are arrays of floating-point numbers that represent the meaning of the text. An AI model, based on its training, is able to generate that array. If we take two sentences and generate their embeddings, we can compare the two arrays to see how similar they are (we\u2019ll look at that later). Now, let\u2019s take a look at the body of the function, starting at line 56:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"145\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-24.png\" alt=\"\" class=\"wp-image-107745\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-24.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-24-300x46.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-24-768x118.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><figcaption class=\"wp-element-caption\"><em>&#8220;Oh, Rodrigo, this is a very simple function! I was expecting a giant block of code since we\u2019re talking about an AI model!&#8221;.<\/em><\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Yes, I know! Thanks to modularity and the open-source community, invoking AI can be as simple as calling a method. The actual \u201cAI\u201d magic happens on line 58. Look at the variable named Embedder &#8211; it holds an object from a super powerful library called <a href=\"https:\/\/sbert.net\/\" target=\"_blank\" rel=\"noreferrer noopener\">Sentence Transformers<\/a>. This object was defined on line 142:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"79\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-25-1024x79.png\" alt=\"\" class=\"wp-image-107746\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-25-1024x79.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-25-300x23.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-25-768x60.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-25.png 1071w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><strong>A Little Background on Sentence Transformers, Neural Networks and AI Models<\/strong><\/p>\n\n\n\n<p>The <a href=\"https:\/\/sbert.net\/\" target=\"_blank\" rel=\"noreferrer noopener\">Sentence Transformers<\/a> library is an open-source Python library that abstracts away the complexity of using an AI model, like loading the model, moving it to the GPU, getting results, managing the math behind the scenes, and so on. Thanks to libraries like this, combined with the <strong>Hugging Face infrastructure<\/strong>, I only need to pass the name of the AI model I want to use. Actually, SentenceTransformers relies on several other libraries under the hood, like <a href=\"https:\/\/huggingface.co\/docs\/transformers\/en\/index\" target=\"_blank\" rel=\"noreferrer noopener\">Transformers<\/a>, <a href=\"https:\/\/pytorch.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">PyTorch<\/a>, and more. There\u2019s a lot happening behind the scenes to make everything work seamlessly.<\/p>\n\n\n\n<p>The AI model is stored as a binary file, which contains a huge set of numbers &#8211; these are the <strong>parameters<\/strong> learned during training. The <a href=\"https:\/\/sbert.net\/\" target=\"_blank\" rel=\"noreferrer noopener\">Sentence Transformers<\/a> library knows how to build the architecture of the neural network (like layers, connections, and operations). This structure is generally the same across many AI models of the same family. What actually makes one AI model different from another are the parameters it has learned during training. It\u2019s like SentenceTransformers builds the shape of the brain and its neurons, and the parameters are what the brain has learned.<\/p>\n\n\n\n<p>When the model is loaded, the library reads the binary file and sets the parameter values into the architecture, bringing the model to life and making it ready for inference (inference = execution). It\u2019s like opening a PDF document. You have software that knows the PDF format and specifications &#8211; how to draw elements, fonts, etc. Any software that follows the PDF specification is able to open a binary file, with .PDF extension, containing the values that the software can interpret to know what to draw: colors, shapes, text, and more.<\/p>\n\n\n\n<p><strong>The AI Model we are Using &#8211; and How it&#8217;s Loaded and Used<\/strong><\/p>\n\n\n\n<p>We\u2019re loading a model called mixedbread-ai\/mxbai-embed-large-v1, which I pass as the first parameter to the SentenceTransformer class. This is an open-source AI model created by an organization called <a href=\"https:\/\/www.mixedbread.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">MixedBread<\/a>.<\/p>\n\n\n\n<p>When I do this, the library automatically downloads the files needed to load and run the model.<\/p>\n\n\n\n<p>So, how does SentenceTransformers know where to find those files? Remember when I mentioned Hugging Face earlier? It doesn\u2019t just host <strong>Spaces<\/strong> &#8211; it also hosts <strong>models<\/strong>. SentenceTransformers prepends the URL <em>https:\/\/huggingface.co<\/em> to the model name: <a href=\"https:\/\/huggingface.co\/mixedbread-ai\/mxbai-embed-large-v1\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/huggingface.co\/mixedbread-ai\/mxbai-embed-large-v1<\/a><\/p>\n\n\n\n<p>If you click the link above, you\u2019ll land on the model\u2019s main page on Hugging Face. There\u2019s a tab called <strong>&#8220;Files and versions&#8221;<\/strong>, where you\u2019ll see a list of all the files associated with the model. Based on default settings, SentenceTransformers knows exactly which files to download or read in order to prepare the model for use. All of that is happening behind the scenes in this moment, so I don\u2019t need to worry about it.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"564\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-26-1024x564.png\" alt=\"\" class=\"wp-image-107747\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-26-1024x564.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-26-300x165.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-26-768x423.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-26.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Who uploaded the files above? It was the <strong>MixedBread<\/strong> developers. After training the model, they uploaded it to Hugging Face, made it publicly available, and ensured it was compatible with the <strong>SentenceTransformers<\/strong> library.<\/p>\n\n\n\n<p>Hugging Face hosts thousands of AI models, and many of them can be used in exactly the same way, with just a few lines of code.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1018\" height=\"649\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-27.png\" alt=\"\" class=\"wp-image-107748\" style=\"width:832px;height:auto\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-27.png 1018w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-27-300x191.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-27-768x490.png 768w\" sizes=\"auto, (max-width: 1018px) 100vw, 1018px\" \/><figcaption class=\"wp-element-caption\"><em>&#8220;But Rodrigo, why did you choose that exact model?&#8221;<\/em><\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Good question! You noticed earlier that I said invoking the AI is as simple as calling a method, right? Yes &#8211; <strong>invoking<\/strong> is easy. The hard part is deciding <strong>which<\/strong> AI model to use among thousands of available options. I ran several tests with different models and came to the conclusion that this one worked really well for my search use case, so I decided to go with it. The decision on which model to use depends on several factors: <strong>results<\/strong>, <strong>cost<\/strong>, <strong>performance<\/strong>, and more. Testing different models and relying on past experience &#8211; like reading papers, blog posts, and community discussions &#8211; helps a lot in finding the best one for each case.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-know-the-model\">Know the Model<\/h4>\n\n\n\n<p>For that task of generating embeddings, one of the most important pieces of information to know about a model is the <strong>number of dimensions<\/strong> it generates. From the model\u2019s main page (linked earlier), I found out how many dimensions this particular model outputs by following another link that they documented: <a href=\"https:\/\/www.mixedbread.com\/docs\/embeddings\/models\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.mixedbread.com\/docs\/embeddings\/models<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"367\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-28-1024x367.png\" alt=\"\" class=\"wp-image-107749\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-28-1024x367.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-28-300x108.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-28-768x275.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-28.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>This is important because when creating my database table to store the embeddings, I need to specify the a length in vector data type. Embeddings are just arrays of numbers, and the <strong>length of that array<\/strong> depends on the model being used. In the case of this model &#8211; mxbai-embed-large-v1 &#8211; it generates embeddings with <strong>1024 positions<\/strong>. That\u2019s why, in the <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/SqlLibEmbeddings\/tab.Scripts.sql\" target=\"_blank\" rel=\"noreferrer noopener\">tab.Scripts.sql<\/a> file, we use the data type vector(1024). The larger the number of dimensions, the more storage space it requires &#8211; but it can also mean a more precise representation of the text, which may lead to better results in comparisons and searches. That said, more dimensions don\u2019t always mean better quality. But based on my tests, I found that 1024 strikes a good balance and works well for my use case.<\/p>\n\n\n\n<p>This also means: if I decide to change the model later, I may need to <strong>change the table structure<\/strong> and <strong>reindex all rows<\/strong> to match the new embedding size. So, this choose <a>is very very very very SUPER VERY important<\/a>! If you are planning to use AI, you must test thoroughly before going to production with your table of TB of data.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Context Size and Chunking<\/h4>\n\n\n\n<p>Another important thing to consider is the Context Length. This defines how much data the model can process at once. The value is measured in <strong>tokens,<\/strong> not characters<strong>. <\/strong>Each model uses a thing called tokenizer, which is like a class or function that takes input text and outputs a sequence of tokens, that are just unique pieces of text that have a unique Id (a number). Multiple models can use the same tokenizer. SentenceTransformers knows which tokenizer to use by checking the model\u2019s downloaded files &#8211; there\u2019s metadata that specifies which tokenizer the model was built with. So you need to know how many tokens your data corresponds to. <\/p>\n\n\n\n<p>In my case, I didn\u2019t worry about that for now, since most of my scripts are short and don\u2019t reach the limit. So, for this first version, I decided to keep things simple. I could either choose a model with a larger context length or use a technique called <strong>chunking<\/strong>, where I split the data into smaller parts that fit within the model&#8217;s context window. As I mentioned earlier, the table is already prepared for chunking &#8211; it&#8217;s just not implemented yet.<\/p>\n\n\n\n<p>So, choosing the right model is an important decision. Before I got this solution up and running for you &#8211; bringing this model as the best choice &#8211; I ran tests with several others. I generated embeddings for a subset of files and tested queries against them. After evaluating the results, I ultimately chose this model based on its precision, cost and performance. Some other options I tried were: <a href=\"https:\/\/huggingface.co\/jinaai\/jina-embeddings-v2-base-en\" target=\"_blank\" rel=\"noreferrer noopener\">Jina<\/a>, <a href=\"https:\/\/huggingface.co\/nomic-ai\/nomic-embed-code\" target=\"_blank\" rel=\"noreferrer noopener\">Nomic Embed Code<\/a> and <a href=\"https:\/\/platform.openai.com\/docs\/guides\/embeddings\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI<\/a>.<\/p>\n\n\n\n<p>In the future I might replace the model I\u2019m using if a better one becomes available. So, this is a <strong>continuous process<\/strong> &#8211; something I\u2019ll keep updating and reviewing as new models and improvements come out. But, this also requires planning which, like I said before, can be a costly operation!<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Using GPU<\/h4>\n\n\n\n<p>Going back to the embed function, notice it has the decorator <em>@spaces.GPU<\/em>. With this, I\u2019m telling Hugging Face that this function should run on a GPU if it&#8217;s available:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"683\" height=\"127\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-29.png\" alt=\"\" class=\"wp-image-107750\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-29.png 683w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-29-300x56.png 300w\" sizes=\"auto, (max-width: 683px) 100vw, 683px\" \/><\/figure>\n\n\n\n<p><a id=\"_msocom_1\"><\/a><\/p>\n\n\n\n<p>This is part of the <a href=\"https:\/\/huggingface.co\/docs\/hub\/en\/spaces-zerogpu\" target=\"_blank\" rel=\"noreferrer noopener\">ZeroGPU feature<\/a>, where Hugging Face allows us to use powerful GPUs to run our Spaces. Technically, it\u2019s not \u201cfree\u201d since I need a PRO account to create a space that uses it but, compared to paying separately for GPU usage, it\u2019s much more cost-effective. Thanks to this, I don\u2019t need to pay for GPU hours to keep the Space running, and anyone can use it and benefit from the GPU, without me or the user being charged per usage.<\/p>\n\n\n\n<p>However, there\u2019s a drawback: Hugging Face enforces a limited GPU usage time per user. If you\u2019re not logged in, Hugging Face uses your IP address to track usage and apply limits. So, if you try to use my tool and encounter an error related to GPU availability (or even a generic error), feel free to let me know in comments here or there in space (Community Tab). I can evaluate upgrading the Space to use paid GPUs, so everyone can enjoy full access without those limitations.<\/p>\n\n\n\n<p>But, thanks to this, my embed function runs very fast. Even though the function itself looks simple, it&#8217;s running a full AI model with lots of math under the hood, so having a GPU helps a lot when generating embeddings!<\/p>\n\n\n\n<p>If <strong>ZeroGPU<\/strong> is not available (for example, if I choose <strong>CPU-only hardware<\/strong> in the Space settings), then this decorator has no effect. To help you understand what that setting looks like, here it is:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"446\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-30-1024x446.png\" alt=\"\" class=\"wp-image-107751\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-30-1024x446.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-30-300x131.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-30-768x335.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-30.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>To understand the difference between using a GPU and a CPU, you can check out the version of the Space that use only CPU: <a href=\"https:\/\/huggingface.co\/spaces\/rrg92\/sqlserver-lib-assistant-cpu\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/huggingface.co\/spaces\/rrg92\/sqlserver-lib-assistant-cpu<\/a>. The advantage of this version is that you&#8217;re not affected by the rate limits imposed on GPU usage. The disadvantage is that can be slower (much slower). You\u2019ll notice that most of the difference lies in the rerank function, which we\u2019ll discuss later in this article.<\/p>\n\n\n\n<p>One last important thing before we move on, remember that powershell code that runs in GitHub actions is calling that function? How is this possible?&nbsp; This is another facility of Gradio: It automatically exposes that function as API REST endpoint, thanks to that line:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"265\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-31-1024x265.png\" alt=\"\" class=\"wp-image-107752\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-31-1024x265.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-31-300x78.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-31-768x199.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-31.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>So, I can use the powershai to invoke that as was a simple powershell function. This is an example:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \"># install module if not installed\ninstall-module powershai -Scope CurrentUser\n\nimport-module PowershAI\n\n# Activate Hugging Face integration\nSet-AiProvider HuggingFace \n\n# Open a connection with the Space. This will be the default in this powershell session\n# Let's use the CPU version of the Space. slower, but you're not affected by GPU rate limits.\nGet-HfSpace rrg92\/sqlserver-lib-assistant-cpu \n\n# Create PowerShell function wrappers for the Gradio API.\n# This will create one function for each exposed API endpoint.\n# You can specify a function name prefix. let's use \"SimpleTalk\".\nNew-GradioSessionApiProxyFunction -Prefix SimpleTalk \n\n# From here, just call the desired function. The 'data' property contains the result.\nSimpleTalkEmbed 'Simple Talk is my favorite SQL blog'<\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Searching by Embeddings<\/h2>\n\n\n\n<p>Now that we have the embeddings of the user&#8217;s text, and we have the embeddings of the scripts saved in a database, we can do the famous semantic search, that is, search by meaning. Unlike a LIKE or Full Text search, where you pass a text to be searched in a Where filter, the approach we&#8217;re going to use here is a bit different:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"814\" height=\"755\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-32.png\" alt=\"\" class=\"wp-image-107753\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-32.png 814w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-32-300x278.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-32-768x712.png 768w\" sizes=\"auto, (max-width: 814px) 100vw, 814px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Note in line 109 that we&#8217;re building a T-SQL query using the embedding values returned by the embed function. This could be implemented as a stored procedure, but once again, I prefer to use an ad-hoc query. That query will be invoked by a SQL function, which is a wrapper around some calls to the pymssql library &#8211; a Python library used to connect to SQL Server. Here\u2019s the updated diagram with step 3 added:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"676\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-33.png\" alt=\"\" class=\"wp-image-107754\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-33.png 936w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-33-300x217.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-33-768x555.png 768w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-sql-function\">The SQL Function<\/h3>\n\n\n\n<p>Following the script, the sql function was created to encapsulate the logic to connect to SQL and return the results. First, I define these values, at beginning of app.py:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"261\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-34.png\" alt=\"\" class=\"wp-image-107755\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-34.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-34-300x83.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-34-768x213.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>I use them whenever the sql function is called:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"568\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-35.png\" alt=\"\" class=\"wp-image-107756\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-35.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-35-300x181.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-35-768x464.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>I chose to use pymssql, a library that is maintained by the community to connect to SQL. I could also use pyodbc (official Microsoft), but since the installation and configuration process has a few more steps, I stuck with pymssql.<\/p>\n\n\n\n<p>I implemented a small timeout logic because in my initial tests I was using the Azure SQL Database Free Offer, which is a free SQL Database. However, it has an auto-pause feature where it turns off after a period of inactivity and turns on automatically when trying to connect. However, this can take a while, so this timeout scheme allowed me to try again without completely freezing while waiting.<\/p>\n\n\n\n<p><strong>Important: <\/strong>Similar to GitHub, my SQL connection information is stored in environment variables, which I configure in the Space settings. In this case, I left a SQL Database open for internet access, created a user with permissions on the Script table, and put that information in the Space&#8217;s environment variables.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"249\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-36-1024x249.png\" alt=\"\" class=\"wp-image-107757\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-36-1024x249.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-36-300x73.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-36-768x187.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-36.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The Secrets section is where I define the environment variables with sensitive information (and that hugging face will not show the value in logs, etc).<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"448\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-37.png\" alt=\"\" class=\"wp-image-107758\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-37.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-37-300x143.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-37-768x366.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">The T-SQL Script for Search<\/h3>\n\n\n\n<p>Let\u2019s break down our SELECT to understand it better. Note that at the beginning of the Script I declare a variable with the vector type. This is the new data type in Azure SQL Database that allows me to store embeddings (and it will be in SQL 2025 as well) <a href=\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2024\/11\/19\/announcing-microsoft-sql-server-2025-apply-for-the-preview-for-the-enterprise-ai-ready-database\/\" target=\"_blank\" rel=\"noreferrer noopener\">as announced<\/a>. The value of this SQL variable comes from the Python variable containing the embeddings, which is the result of the embed function. When I concatenate in the string, it automatically converts to the JSON representation of an array of floats, which is the literal format that SQL Server expects for this type of data.<\/p>\n\n\n\n<p>Would it be the same as if I did it like this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"137\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-38.png\" alt=\"\" class=\"wp-image-107759\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-38.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-38-300x44.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-38-768x112.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Returning to the script query, note that I declared it as 1024, exactly the same size declared in the column. And here I reinforce: the embeddings need to be generated in the same way as they were for indexing, which includes being the same model and SIZE.<\/p>\n\n\n\n<p>Let&#8217;s isolate each part of the SELECT statement, starting with the innermost select. <\/p>\n\n\n\n<p><em><strong>TIP:<\/strong> If you would like to try these examples on your own local SQL Server 2025 instance, follow the instructions in this script: <a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/SqlLibEmbeddings\/IndexData2025.sql\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/SqlLibEmbeddings\/IndexData2025.sql<\/a>. It will guide you on how to create a local copy of my repo in your SQL Server, index it using the AI functions of SQL Server 2025, and perform searches using the same queries shown here.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"514\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-39.png\" alt=\"\" class=\"wp-image-107760\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-39.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-39-300x164.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-39-768x420.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Here I am performing a SELECT query on the scripts table and generating a column named CosDistance.<\/p>\n\n\n\n<p><a href=\"https:\/\/learn.microsoft.com\/en-us\/sql\/t-sql\/functions\/vector-distance-transact-sql?view=azuresqldb-current\" target=\"_blank\" rel=\"noreferrer noopener\">VECTOR_DISTANCE<\/a> is a function that accepts 2 embeddings and returns a value between 0 and 2. The closer to 0, the more semantically similar the two texts corresponding to these embeddings are. This metric is called Cosine Distance.<\/p>\n\n\n\n<p>Let&#8217;s see a sample result of this piece of code when searching for the text &#8220;cpu performance&#8221;. I will paste the embeddings directly into the code, as Python would send to SQL after replacing the variables in the script of image above:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"701\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-40-1024x701.png\" alt=\"\" class=\"wp-image-107761\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-40-1024x701.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-40-300x205.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-40-768x526.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-40.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The result above is from a sample SQL Server 2025 Public Preview, where I loaded my repo. Without any further work, we don\u2019t have anything particularly useful here. Note that CosDistance contains different values, and some rows are closer to 0 than others. This is why we need to order the results by CosDistance, which is done by the outermost query. Remember that the closer the Cosine Distance is to 0, the more similar the results are. So, we want the most similar results possible. If I just order that piece of code and get the top 10, we get the best scripts related to CPU performance at the top:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"764\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-41-1024x764.png\" alt=\"\" class=\"wp-image-107762\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-41-1024x764.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-41-300x224.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-41-768x573.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-41.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Now, note that after sorting, we get the rows that are as close to 0 as possible. These are our best embedding matches for \u201ccpu performance\u201d that our AI models can provide. If you compare with the previous result, without ordering, you will see that most of the queries don\u2019t have a direct relation, like backup, audit, or Always On. So, this is proof of how important ordering is here.<\/p>\n\n\n\n<p>Notice that in the outermost script, I use TOP X ORDER BY CosDistance because I only want to return the top X most similar rows. This is a completely different approach from what we&#8217;re usually accustomed to when it comes to filtering in SQL. Normally, you might think of using a WHERE clause. But remember in SQL, WHERE is not the only way to filter rows. Using TOP with ORDER BY can be very helpful in scenarios like this. <\/p>\n\n\n\n<p>Here, we\u2019re not filtering based on the absolute values of the embeddings\u2014we\u2019re interested in retrieving the rows that are most similar to the user question, which is represented by our vector in variable @search. Some other optimizations could fit, but I didn&#8217;t implement them for simplicity: I could set a minimum threshold (e.g., only CosDistance &lt;= 0.2 for example), I could try to incorporate a fulltext to check for some term, etc. Davide Mauri, from Azure SQL Team, has an excellent article showing that: <a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/enhancing-search-capabilities-in-sql-server-and-azure-sql-with-hybrid-search-and-rrf-re-ranking\/\" target=\"_blank\" rel=\"noreferrer noopener\">Hybrid Search in SQL Server and Azure SQL Database<\/a><\/p>\n\n\n\n<p>Another important point is that if my Scripts table had 1 million rows, the way I did it would be extremely inefficient, because the SQL would have to read all the rows, calculate the VECTOR_DISTANCE of all of them, sort all of that (probably causing disk usage) and in the end bring me a small portion. It would be a resource-destroying query.&nbsp; <\/p>\n\n\n\n<p>Here is where an INCREDIBLE feature comes in, which was <a href=\"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2024\/11\/19\/announcing-microsoft-sql-server-2025-apply-for-the-preview-for-the-enterprise-ai-ready-database\/#build-ai-applications-with-your-enterprise-database\" target=\"_blank\" rel=\"noreferrer noopener\">also announced<\/a> and we will probably have soon in Azure SQL and SQL 2025: VECTOR INDEX. We will be able to create a special index in this embeddings column that would optimize this search, and it would not need to read the 1 million. But, for now, for my data set which is extremely small, doing the scan does not yet represent a significant problem.<\/p>\n\n\n\n<p>Another highlight is this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"774\" height=\"440\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-42.png\" alt=\"\" class=\"wp-image-107763\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-42.png 774w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-42-300x171.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-42-768x437.png 768w\" sizes=\"auto, (max-width: 774px) 100vw, 774px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Here is where I calculate some interesting things, such as the number of characters in my script and the Similarity (Cosine Similarity). This is a value between -1 and 1, and the closer to 1, the more similar the values are. It is just a calculation derived from Cosine Distance. This excerpt will be used for display and as input so that another AI can analyze the results.<\/p>\n\n\n\n<p>This is a sample result using the same previous example of \u201ccpu performance\u201d (<a href=\"https:\/\/github.com\/rrg92\/sqlserver-lib\/blob\/main\/SqlLibEmbeddings\/SearchData2025.sql\" target=\"_blank\" rel=\"noreferrer noopener\">This script<\/a> contains a sample way to do that test directly from a SQL Server 2025):<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"655\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-43-1024x655.png\" alt=\"\" class=\"wp-image-107764\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-43-1024x655.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-43-300x192.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-43-768x491.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-43.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Here we just get the top X results, which is 10 by default, and return them to Python. But Rodrigo, what if there are more scripts related to CPU? Simple: they will not be shown to the user. We found the 10 best options to show the user, and probably one of them is what they need. Our table is still small &#8211; less than 500 rows &#8211; but each row is a script file, which can contain a lot of text, so returning all results is not a good option due to common problems with large data sets.<\/p>\n\n\n\n<p>However, since the top 10 is configurable, the user can try to ask for more results if they think the initial ones are not sufficient, or even enhance the prompt to be more precise and bring other relevant scripts to the top. In the future, our app can also help the user in this process by enhancing the search or performing multiple searches and combining the results for different terms or variations, to help find more relevant options. There is a world of creativity that can be used to help SQL find more relevant results.<\/p>\n\n\n\n<p>Now, we just need a text, generate embeddings from it, and let SQL do the work of searching for us.<\/p>\n\n\n\n<p>After the query runs, the queryResults variable will contain the top 10 scripts that best match our input text, sorted from the closest to the farthest match, essentially creating a ranked list of the most relevant results. Then, the search function returns this result to the ChatFunc, allowing the process to continue.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"313\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-44.png\" alt=\"\" class=\"wp-image-107765\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-44.png 1000w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-44-300x94.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-44-768x240.png 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>However, you&#8217;ll notice that the initial ranking may not be ideal, as some of the best matches could appear lower in the list. So, it&#8217;s time to do some extra processing to re-rank the results.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Fourth AI Use: Rerank<\/h2>\n\n\n\n<p>So far, our search function has done the main job: finding the scripts that are most related to the user&#8217;s text.<\/p>\n\n\n\n<p>However, embeddings are not perfect. They may find more things than they should. And this is completely normal, as it depends on the training of each model, which obviously does not cover all cases. It could find that Script A is closer to my text than Script B, when in fact, Script B might be the most suitable script.<\/p>\n\n\n\n<p>For this case, we have a second very important and effective technique: RERANK. Rerank uses an AI model to calculate a score that indicates how similar two texts are.<\/p>\n\n\n\n<p>Up to this point, we&#8217;ve done this through embeddings: We generated the embeddings separately and compared them to get a proximity value. Rerank is a similar process, but instead of computing numbers for each text and then comparing that numbers, you input the texts directly and get a number representing how much that text are similar. This significantly improves the results. This is the snippet that does it:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"355\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-45-1024x355.png\" alt=\"\" class=\"wp-image-107766\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-45-1024x355.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-45-300x104.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-45-768x266.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-45.png 1090w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Follow now path 4 in our diagram:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"710\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-46-1024x710.png\" alt=\"\" class=\"wp-image-107767\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-46-1024x710.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-46-300x208.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-46-768x532.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-46.png 1114w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>First, I need to build a list with only the script contents. The variable doclist<em> (line 241)<\/em> is an array of strings, where each item is the content of the script that our query found with the embeddings search. Then, I call the rerank function (line 244), which is the one that actually calculates the score. Note that I pass the text (in English) and the list with the content of the scripts. The result of the function will have the index of the document and the score, which I use back to add to our FoundScripts variable (line 233). That score represents how much the user question is similar to respective script content. Higher value means more similar.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-rerank-function\">The Rerank Function<\/h3>\n\n\n\n<p>Like embed function, the definition of rerank is quite simple:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1003\" height=\"194\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-47.png\" alt=\"\" class=\"wp-image-107768\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-47.png 1003w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-47-300x58.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-47-768x149.png 768w\" sizes=\"auto, (max-width: 1003px) 100vw, 1003px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>I put this simple snippet in a separate function for one very simple reason: To use the GPU! I don&#8217;t need a GPU to call database APIs, etc., but I only need it for this snippet (and for embedding), which is where I run the AI model in my Python code. So, only that needs to be separated, which saves me unnecessary use of ZeroGPU&#8217;s GPUs, which are limited.<\/p>\n\n\n\n<p>Here, again, the Sentence Transformers library is making use of AI child&#8217;s play. Many thanks to the Sentence Transformers team, especially <a href=\"https:\/\/github.com\/tomaarsen\" target=\"_blank\" rel=\"noreferrer noopener\">Tom Aarsen<\/a> who is a core maintainer of this library and has been evolving it a lot!<\/p>\n\n\n\n<p>The Rerank variable is an instance of SentenceTransformers.CrossEncoder. This class represents the models that perform the operation I explained above, which is to compare 2 texts and return a score. They are called &#8220;Cross Encoders&#8221; due to the neural network used (which accepts 2 texts and generates a score), as opposed to embeddings where the network used is known as Bi-Encoder, which processes each text separately and generates embeddings that can then be compared.<\/p>\n\n\n\n<p>Here is where it is defined:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"102\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-48-1024x102.png\" alt=\"\" class=\"wp-image-107769\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-48-1024x102.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-48-300x30.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-48-768x76.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-48.png 1139w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Here again is a model from mixedbread-ai, but it is a specific model for this rerank task. Like the other one, you can see more details about it at: <a href=\"https:\/\/huggingface.co\/mixedbread-ai\/mxbai-rerank-large-v1\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/huggingface.co\/mixedbread-ai\/mxbai-rerank-large-v1<\/a>. And, just like I did with the embedding model, I ran some tests with other models and found one that works perfectly for this case. In the future, I can always switch to a better one if needed.<\/p>\n\n\n\n<p>The CrossEncoder class has several methods that facilitate its use&#8230;The <a href=\"https:\/\/sbert.net\/docs\/package_reference\/cross_encoder\/cross_encoder.html#sentence_transformers.cross_encoder.CrossEncoder.rank\" target=\"_blank\" rel=\"noreferrer noopener\">rank <\/a>method allows me to pass a query text and a list of documents, then the model will return a score for each string in the second parameter. An important detail is that it returns negative values by default, which is why, when creating the object, I passed the activation_fn parameter, where I pass the Sigmoid function, which converts the values to a range between 0 and 1, which I find more intuitive to display in the end (thanks for <a href=\"https:\/\/github.com\/UKPLab\/sentence-transformers\/issues\/1058\" target=\"_blank\" rel=\"noreferrer noopener\">this GitHub issue<\/a>).<\/p>\n\n\n\n<p><strong><em>Rodrigo, what is the difference between the CrossEncoder class and the SentenceTransformer class?<\/em><\/strong><\/p>\n\n\n\n<p>Maybe you might be confused about these classes when creating AI model instances. With embeddings, we used a class called SentenceTransformer, and for reranking, another class called CrossEncoder. How do I know that? And why is it different?<\/p>\n\n\n\n<p>Basically, each class implements an architecture that involves some type of neural network. The CrossEncoder is just the implementation of another architecture, which uses a neural network designed specifically to accept two texts and generate a score. Meanwhile, the SentenceTransformer class implements the bi-encoder neural network, where the main objective is to generate an array of numbers (an embedding) from an input text.<\/p>\n\n\n\n<p>Also, CrossEncoder models are slower, but they produce results that are much better and more precise (I&#8217;ll show that later). I know this thanks to some research and study. If you want to know more, I recommend reading the getting started guide directly from the SentenceTransformers site, so you can dive a little deeper: <a href=\"https:\/\/www.sbert.net\/docs\/quickstart.html\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.sbert.net\/docs\/quickstart.html<\/a><\/p>\n\n\n\n<p>The higher the Score, the more related the texts are. So, I need to sort the result by Score. Python&#8217;s sorted function does that for me:<\/p>\n\n\n\n<p>At the end of this process, the RankedScripts variable will have the following:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>All scripts that were found similar to the user&#8217;s text<br><\/li>\n\n\n\n<li>A score indicating how similar this text is, and it is sorted in descending order by this score.<br><\/li>\n\n\n\n<li>The content of each script and its relative path in my GitHub repo.<br><\/li>\n\n\n\n<li>Cosine similarity for conference purposes only<\/li>\n<\/ul>\n<\/div>\n\n\n<p>Now, I have real data that I can use to present to the user.<\/p>\n\n\n\n<p>Up to here, it would be sufficient to display it in a table, the next snippet does exactly that:<\/p>\n\n\n\n<p>The function m() also returns the CurrentTable variable, and since it now contains our table, the yield causes it to be updated.<\/p>\n\n\n\n<p>Our diagram follows with a small change, representing the updated table in the user interface:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"644\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-49.png\" alt=\"\" class=\"wp-image-107770\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-49.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-49-300x206.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-49-768x526.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p>At this moment, the user can already see what their query found on <strong>Rank<\/strong> tab. For example, this is the result when I searched for &#8220;search for a string&#8221;:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"370\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-50-1024x370.png\" alt=\"\" class=\"wp-image-107771\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-50-1024x370.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-50-300x108.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-50-768x277.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-50.png 1100w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>This result contains a very cool example of how rerank was important. Notice the &#8220;Cosine Similarity&#8221; column. The first script contains a value of 0.51, while the last one contains a value of 0.52. If I didn&#8217;t have rerank, a script that talks about backup would have come first, because according to the definition above (remember, cosine similarity = the closer to 1, the more equal) it would be more similar than the ProcurarTexto.sql script (Procurar Texto = Search Text in pt-BR).<\/p>\n\n\n\n<p>Now, look at the Rank column. Notice how the game has completely changed. The score of ProcurarTexto.sql is much higher than the others, which reinforces how well the model got the relationship between the text &#8220;search for a string&#8221; and the script. That&#8217;s why reranking is extremely important here.<\/p>\n\n\n\n<p>Ah, Rodrigo, so why don&#8217;t you always use rerank and remove VECTOR_DISTANCE from the middle? And the answer is simple: performance. Rerank is effective but is not efficient for lot of data. I only needed to compare with 10 results, comparing with 100 is 10x more work. And, the table will increase, as I still have 400 scripts to put there (and growing). And, despite being a small number of lines, the content can be a lot: Remember that rerank acts on top of the text to generate the score, so the more text, the more work it will have. Different from VECTOR_DISTANCE, which compares an array of numbers, and therefore, is much more efficient.<\/p>\n\n\n\n<p>You can see that difference using the CPU version of space that I created: <a href=\"https:\/\/huggingface.co\/spaces\/rrg92\/sqlserver-lib-assistant-cpu\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/huggingface.co\/spaces\/rrg92\/sqlserver-lib-assistant-cpu<\/a>. Note how the most time of process is spent in status \u2018Doing Rerank\u2019.<\/p>\n\n\n\n<p>That&#8217;s why in the RAG process, it&#8217;s common to combine these two operations: with embeddings, you efficiently find a small set of relevant candidates, and with reranking, applied to this small set, you better classify the result. Several variations of RAG exist, always aiming to add operations to further improve the accuracy of this result. Reranking is just one of them.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-fifth-use-of-ai-generating-a-response\">Fifth Use of AI: Generating a Response<\/h2>\n\n\n\n<p>At this point, we have already presented the user with the most relevant scripts related to their query. Up to here, this is quite impressive as we utilized several AI models to assist, which is indeed fascinating.<\/p>\n\n\n\n<p>But, the fun doesn&#8217;t stop here, as we can still use the power of an LLM to summarize and explain the content of these scripts. Now that we have the data, I can ask an LLM to use this information to generate a better elaborated response to the user, reading script content, highlighting important parts and best: answering in the same language as user.<\/p>\n\n\n\n<p>After updating the table, I ask an LLM to generate a waiting message. The goal here is to provide a personalized message on the screen so that the user knows that something is still being done. I could have left a fixed message or used a smaller model, but, again, I opted for practicality:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"301\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-51-1024x301.png\" alt=\"\" class=\"wp-image-107772\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-51-1024x301.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-51-300x88.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-51-768x226.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-51.png 1128w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Another improvement I could make is to have triggered this message in a separate thread, at the beginning of ChatFunc function. That way, while I was searching for the scripts and doing the rerank, it would already be preparing this message. Maybe in future versions I&#8217;ll do that, because then I wouldn&#8217;t have to wait long just to generate a wait message&#8230;<\/p>\n\n\n\n<p>And finally, we get to the part where I ask the LLM to elaborate on an answer:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"532\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-52-1024x532.png\" alt=\"\" class=\"wp-image-107773\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-52-1024x532.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-52-300x156.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-52-768x399.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-52.png 1126w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Here, I created a prompt using some best practices (the famous prompt engineering) to provide context explaining what it is about and how to use the results. I pass our table directly, as JSON, so that it uses this as context. Nowadays, models can understand this mixture very well. I also give some guidance reinforcing that he should analyze the script and the language in which he should respond.<\/p>\n\n\n\n<p>Everything goes in the SystemPrompt, and as a user message, I pass the exact text that the user typed (not the translated text). Thus, I leave all responsibility and creativity to the LLM to use all of this and generate a very personalized response based on what it asked for and what we found, including giving it the freedom to reorder the results if it deems necessary. Once I present the table ordered by Rerank to the user, then I don&#8217;t need to ensure that the response follows that order. Here is the pure creativity of AI in action, and the better the model, the better the response will be.<\/p>\n\n\n\n<p><strong><em>Very important note:<\/em><\/strong> LLMs are probabilistic, so nothing I put there is guaranteed to work 100% of the time or follow all rules exactly. However, the more detailed and well-explained my prompt is, the higher the chance of getting a good response. Also, the better the model, the better it will follow my prompts. But again, this is not the same as writing code it\u2019s not deterministic. I need to expect some errors or unexpected behavior (and so does the user).<\/p>\n\n\n\n<p>Here, there could be more interesting things. For example, if there were no results, instead of concatenating this empty result, I could just change the prompt saying that there were no relevant results and have the LLM generate a generic response. But I chose to leave it as it is so that it always generates something. This will cause cases where the scripts are not so relevant to generate some forced response &#8230; But there is a lot of room to improve this.<\/p>\n\n\n\n<p>The rest of the function is just it getting the result, as it arrives, and sending it into the chat. Note that at this point, I\u2019m calling the llm function directly &#8211; not ai &#8211; because I want to use the streaming feature (using the for&#8230;in&#8230; statement). This allows me to access chunks of the LLM\u2019s response as soon as the server sends them, giving the effect of live typing.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"810\" height=\"506\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-53.png\" alt=\"\" class=\"wp-image-107774\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-53.png 810w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-53-300x187.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/10\/image-53-768x480.png 768w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>At this point, you already know the result: each time a chunk arrives, you see the chat interface being updated with the content. No big secret here \u2014 the answer function simply appends the chunk to a string variable and outputs it using yield statement. Then, Gradio captures that and updates the interface.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"h-next-steps-amp-conclusion\">Next Steps &amp; Conclusion<\/h1>\n\n\n\n<p>As you may have noticed, this is a very simple project, and my intention in doing it was more to study and apply a lot of things that I have studied throughout the year 2024.<\/p>\n\n\n\n<p>Certainly, we have a lot of room for code improvements, tools we could use, libraries, etc.<\/p>\n\n\n\n<p>I have been a SQL Server DBA for over 10 years, and in the last year I have dedicated myself to studying a lot about Machine Learning and AI. And, I am really happy to see that SQL will come with this strong support for AI.<br>I think we will still see a lot of new things, and as demonstrated here, I think they are features that have everything to be widely used, especially those that already use searches like FullText Search.<\/p>\n\n\n\n<p>Some future improvements that I hope to bring:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>When SQL Server 2025 is released, I will redo the demo with it, and with that, we will be able to test with the features that will come and that I am very excited to see!<br><\/li>\n\n\n\n<li>There&#8217;s still a lot of script to add, so the answers tend to get better.<br><\/li>\n\n\n\n<li>I want to try using an AI model running directly in the code, instead of calling an external API, using the GPU. Maybe it might even be an SLM (Small Language Model), that is smaller model but maybe can be sufficient to that simple task.<br><\/li>\n\n\n\n<li>Unintentionally, I&#8217;m creating a T-SQL dataset, thanks to the comments I&#8217;m making in the review, so I hope one day I can train an AI model with this data and use this model to generate the summaries. I believe the answers could be much better this way. But training a model is more complex and costly task, so, I need some time and exp to do that.<\/li>\n<\/ul>\n<\/div>\n\n\n<p>I think the main insight of this post is to show how much technology already exists and how accessible it is for you to create AI solutions that solve real problems in your daily life. In addition, you can check how databases still will play important role in this new age.<\/p>\n\n\n\n<p>Thank you very much for reading this far, I hope you enjoyed it, and if you have any questions about any code snippet or any explanation, just leave it in the comments and I will answer as soon as possible!<\/p>\n\n\n\n<section id=\"my-first-block-block_db7ec904ee75b953da4014537204e043\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Subscribe to the Simple Talk newsletter<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            Get selected articles, event information, podcasts and other industry content delivered straight to your inbox every two weeks.                                    <\/div>\n            <\/div>\n                            <a href=\"https:\/\/www.red-gate.com\/simple-talk\/subscribe\/\" class=\"btn btn--secondary btn--lg\">Subscribe now<\/a>\n                    <\/div>\n    <\/div>\n<\/section>","protected":false},"excerpt":{"rendered":"<p>Discover how to build an AI-powered T-SQL assistant using Python, embeddings, vector search and SQL Server\u2019s new AI features. Step-by-step for DBAs &#038; developers.&hellip;<\/p>\n","protected":false},"author":170434,"featured_media":107162,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[159169,53,143524],"tags":[159075,5021,4151,4252],"coauthors":[21049],"class_list":["post-107690","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-featured","category-sql-server","tag-ai","tag-python","tag-sql-server","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/107690","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/170434"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=107690"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/107690\/revisions"}],"predecessor-version":[{"id":108125,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/107690\/revisions\/108125"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/107162"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=107690"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=107690"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=107690"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=107690"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}