{"id":111056,"date":"2026-06-01T12:00:00","date_gmt":"2026-06-01T12:00:00","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=111056"},"modified":"2026-05-28T14:47:15","modified_gmt":"2026-05-28T14:47:15","slug":"how-to-host-an-ai-text-embeddings-model-for-sql-server-using-ollama","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/how-to-host-an-ai-text-embeddings-model-for-sql-server-using-ollama\/","title":{"rendered":"How to host an AI text embeddings model for SQL Server using Ollama"},"content":{"rendered":"\n<p><strong>When we want to use AI-based comparisons of text, via <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/ai-in-sql-server-2025-embeddings\/\" target=\"_blank\" rel=\"noreferrer noopener\">vector search in SQL Server<\/a>, we need to first generate embeddings for the text. An embedding is a numeric representation of meaning, usually represented by vectors. In this article, I&#8217;ll show you how to use <a href=\"https:\/\/ollama.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Ollama<\/a> to host a server locally that can be used to generate embeddings.<\/strong><\/p>\n\n\n\n<p><strong><em>This is the first article in Greg Low&#8217;s series <a href=\"https:\/\/www.red-gate.com\/simple-talk\/collections\/ai-text-embeddings-in-sql-server-everything-you-need-to-know\/\" target=\"_blank\" rel=\"noreferrer noopener\">&#8216;AI text embeddings in SQL Server: everything you need to know&#8217;.<\/a><\/em><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-are-embeddings\">What are embeddings?<\/h2>\n\n\n\n<p><strong>Instead of working directly with text, images, or other rich content, an embedding represents that content as a set of numbers that capture semantic relationships learned by a model.<\/strong> <strong>This lets systems work with meaning in a mathematical way rather than relying on concepts like text matching. <\/strong><\/p>\n\n\n\n<p>Embeddings are the output of trained text-based AI models. One possibly surprising concept is that they&#8217;re used for similarity, as opposed to facts &#8211; and for relative closeness instead of exact matches.<\/p>\n\n\n\n<p>SQL Server is not designed to host or execute those models. This isn&#8217;t a limitation; it is a design choice. While it <em>would<\/em> be possible to run code within SQL Server to generate embeddings, it just wouldn&#8217;t be a good idea. <\/p>\n\n\n\n<p>Note that SQL Server can already run <a href=\"https:\/\/www.red-gate.com\/simple-talk\/business-intelligence\/data-science\/building-machine-learning-models-to-solve-practical-problems\/\" target=\"_blank\" rel=\"noreferrer noopener\">machine learning<\/a> models directly, <em>and<\/em> use the <a href=\"https:\/\/learn.microsoft.com\/en-us\/sql\/t-sql\/queries\/predict-transact-sql?view=sql-server-ver17\" target=\"_blank\" rel=\"noreferrer noopener\"><code>PREDICT<\/code> statement<\/a> to make predictions. We don&#8217;t want to be doing that with the language models we need for embeddings, though.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-where-do-we-host-embedding-models\">Where do we host embedding models?<\/h2>\n\n\n\n<p>There are two common ways that systems access embedding models: <a href=\"https:\/\/www.red-gate.com\/simple-talk\/cloud\/\" target=\"_blank\" rel=\"noreferrer noopener\">cloud<\/a>-hosted services and locally-hosted services.<\/p>\n\n\n\n<p><strong>Cloud-hosted embedding services<\/strong> are provided by vendors such as <a href=\"https:\/\/openai.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI<\/a> and similar platforms. In this model, text is sent to a remote <a href=\"https:\/\/aws.amazon.com\/what-is\/api\/#:~:text=API%20stands%20for%20Application%20Programming,of%20service%20between%20two%20applications.\" target=\"_blank\" rel=\"noreferrer noopener\">API<\/a> over the network, and embeddings are returned as a service.<\/p>\n\n\n\n<p>These offerings are typically easy to start with, scale well, and are continuously updated by the provider. However, they introduce external dependencies and ongoing usage costs. They also increase the risk of data leaving your environment.<\/p>\n\n\n\n<p><strong>Locally hosted embedding services<\/strong> run models within your own infrastructure. Tools such as <a href=\"https:\/\/ollama.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Ollama<\/a> make it possible to run embedding models on a local machine or server and expose them via a local API.<\/p>\n\n\n\n<p>This approach provides greater control over data, avoids external network calls, and can reduce ongoing costs. At the same time, it shifts responsibility for performance, availability, updates, and resource management onto your team.<\/p>\n\n\n\n<p><strong>SQL Server doesn&#8217;t care which choice you make. It has no interest in <em>where<\/em> embeddings were generated: it only cares about having the required number of dimensions each time you retrieve a vector.<\/strong><\/p>\n\n\n\n<section id=\"my-first-block-block_38c56f7ba4627c1f001314a85e15d63e\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Fast, reliable and consistent SQL Server development&#8230;<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            &#8230;with SQL Toolbelt Essentials. 10 ingeniously simple tools for accelerating development, reducing risk, and standardizing workflows.                                    <\/div>\n            <\/div>\n                                            <a href=\"https:\/\/www.red-gate.com\/products\/sql-toolbelt-essentials\/\" class=\"btn btn--secondary btn--lg\" aria-label=\"Learn more &amp; try for free: Fast, reliable and consistent SQL Server development...\">Learn more &amp; try for free<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-install-ollama\">How to install Ollama<\/h2>\n\n\n\n<p>Ollama is easy to install. You download it from the <a href=\"https:\/\/ollama.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">official website<\/a> and, once installed, it (as of 2026) opens a chat window, as you can see below. For anyone who&#8217;s ever used <a href=\"https:\/\/chatgpt.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">ChatGPT<\/a>, it&#8217;s quite familiar:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"623\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-58.png\" alt=\"\" class=\"wp-image-111057\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-58.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-58-300x199.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-58-768x509.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p><strong>On the right-hand-side, you can choose the model that you want to query. Ollama can automatically download and run different models but, for SQL Server use, we want to use Ollama programmatically instead.<\/strong> <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-use-ollama-programmatically\">How to use Ollama programmatically<\/h2>\n\n\n\n<p>We can use Ollama programmatically either by using a <a href=\"https:\/\/aws.amazon.com\/what-is\/cli\/\" target=\"_blank\" rel=\"noreferrer noopener\">command line interface (CLI)<\/a>, or by making REST-based calls to the service.<\/p>\n\n\n\n<p>For testing with SQL Server, what I like to do is:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Stop the Ollama application and configure it so it doesn&#8217;t start automatically;<br><br><\/li>\n\n\n\n<li>Open a command line window, and execute Ollama commands directly;<br><br><\/li>\n\n\n\n<li>Start Ollama by executing <code>ollama serve<\/code>.<\/li>\n<\/ul>\n<\/div>\n\n\n<p>Doing it this way offers you the advantage of being able to see the actions Ollama takes. They&#8217;ll scroll past in the command line window.<\/p>\n\n\n\n<p>Executing<strong> <\/strong><code>ollama help<\/code> shows you the available commands:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"528\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-59.png\" alt=\"\" class=\"wp-image-111058\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-59.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-59-300x169.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-59-768x431.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p>The important commands for us to use are <code>list<\/code>, <code>pull<\/code>, <code>run<\/code>, and <code>stop<\/code>.<\/p>\n\n\n\n<p>If I execute <code>ollama list<\/code> on my system, it returns this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"203\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-60.png\" alt=\"\" class=\"wp-image-111059\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-60.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-60-300x65.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-60-768x166.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p>That&#8217;s showing the two text models I currently have downloaded. You can have multiple versions of the same model. In this case, it&#8217;s indicating <code>:latest<\/code> on the end of the model&#8217;s name. The <code>pull<\/code> command, meanwhile, is used to pull down a model that you don&#8217;t have.<\/p>\n\n\n\n<p>I can test a model by executing the <code>run<\/code> command:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"110\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-61.png\" alt=\"\" class=\"wp-image-111060\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-61.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-61-300x35.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-61-768x90.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p>I&#8217;ve said <code>ollama run all-minilm:latest \"How much stock do we have?\"<\/code>. This lets me calculate embeddings interactively. Note that a vector is returned.<\/p>\n\n\n\n<p>I can then stop the model running by using the <code>stop<\/code> command:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"260\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-62.png\" alt=\"\" class=\"wp-image-111061\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-62.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-62-300x83.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-62-768x212.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<p>For use with SQL Server, I could either let Ollama run automatically <em>or<\/em> run the server interactively. I can start the service by executing <code>ollama serve<\/code>. <em>You&#8217;ll get an error if a model is already running.<\/em><\/p>\n\n\n\n<p>Once I start the server, I&#8217;ll see a lot of configuration information, including the port it&#8217;s listening on:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"940\" height=\"138\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-63.png\" alt=\"\" class=\"wp-image-111062\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-63.png 940w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-63-300x44.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/05\/image-63-768x113.png 768w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-to-add-a-proxy-in-ollama\">How to add a proxy in Ollama<\/h2>\n\n\n\n<p>This is an <strong>http<\/strong> address (not <strong>https<\/strong>), which is a problem since SQL Server refuses to make REST calls using just <strong>http<\/strong>. So, to get around this roadblock, for local hosting I add a proxy.<\/p>\n\n\n\n<p>My preferred proxy is a tool called <a href=\"https:\/\/caddyserver.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Caddy<\/a>. The wonderful thing about Caddy is that you don&#8217;t even need to install it. You just download the appropriate executable and run it. All you need to do is provide a configuration file.<\/p>\n\n\n\n<p>So, I create a file called <code>Caddyfile<\/code> and then execute Caddy with the following command:<\/p>\n\n\n\n<p><code>caddy_windows_amd64.exe run --config Caddyfile<\/code><\/p>\n\n\n\n<p>The contents of my Caddyfile are as follows:<\/p>\n\n\n\n<div class=\"wp-block-urvanov-syntax-highlighter-code-block\"><pre class=\"lang:tsql decode:true \">{\n  auto_https disable_redirects\n}\n\nhttps:\/\/localhost:8443 {\n  tls internal\n  reverse_proxy 127.0.0.1:11434\n}<\/pre><\/div>\n\n\n\n<p>If you don&#8217;t include the option to disable redirects, you&#8217;ll likely get errors on your standard ports. And the rest of the file just says that <strong>https:\/\/localhost:8443<\/strong> will be mapped to <strong>127.0.0.1:11434<\/strong>, which is the address Ollama was listening on.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-summary-and-next-steps\">Summary and next steps<\/h2>\n\n\n\n<p>At this point, we&#8217;ve done two important things. We&#8217;ve installed Ollama and seen it working with a suitable text model. And we&#8217;ve installed a proxy server called Caddy so that SQL Server is happy to call Ollama when needed.<\/p>\n\n\n\n<p>In the next article in this series, I&#8217;ll show you how to configure and use these services from the SQL Server end.<\/p>\n\n\n\n<section id=\"my-first-block-block_56f75c2d2fc158cd048d0a74bc512f2c\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Simple Talk is brought to you by Redgate Software<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            Take control of your databases with the trusted Database DevOps solutions provider. Automate with confidence, scale securely, and unlock growth through AI.                                    <\/div>\n            <\/div>\n                                            <a href=\"https:\/\/www.red-gate.com\/solutions\/overview\/\" class=\"btn btn--secondary btn--lg\" aria-label=\"Discover how Redgate can help you: Simple Talk is brought to you by Redgate Software\">Discover how Redgate can help you<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: How to host an AI text embeddings model for SQL Server using Ollama<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. What is an embedding?<\/h3>\n            <div class=\"faq-answer\">\n                <p>An embedding is a numeric vector that represents the meaning of text. It lets systems compare content by semantic similarity rather than exact matching.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. Can SQL Server generate embeddings itself?<\/h3>\n            <div class=\"faq-answer\">\n                <p>No. SQL Server isn&#8217;t designed to host language models. Embeddings should be generated by an external service like Ollama or a cloud provider, then passed to SQL Server as vectors.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. Why use Ollama instead of a cloud embedding service?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Ollama runs models locally, giving you greater control over your data, no external API costs, and no risk of data leaving your environment, in exchange for managing performance and updates yourself.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">4. Why does Ollama need a proxy for SQL Server?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Ollama exposes its API over HTTP, but SQL Server only makes REST calls over HTTPS. A reverse proxy like Caddy listens on HTTPS and forwards requests to Ollama&#8217;s local endpoint at <code class=\"bg-text-200\/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]\">127.0.0.1:11434<\/code>.<\/p>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Learn how to install Ollama locally to generate text embeddings for SQL Server vector search, plus configure a Caddy proxy for secure HTTPS connections.&hellip;<\/p>\n","protected":false},"author":346483,"featured_media":111067,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[159169,143523,53,143524],"tags":[159075,4168,4170,159401,159400,4150,4151],"coauthors":[159368],"class_list":["post-111056","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-databases","category-featured","category-sql-server","tag-ai","tag-database","tag-database-administration","tag-greglowollamaseries","tag-ollama","tag-sql","tag-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/111056","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/346483"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=111056"}],"version-history":[{"count":8,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/111056\/revisions"}],"predecessor-version":[{"id":111075,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/111056\/revisions\/111075"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/111067"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=111056"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=111056"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=111056"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=111056"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}