{"id":109154,"date":"2026-03-23T14:34:00","date_gmt":"2026-03-23T14:34:00","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=109154"},"modified":"2026-03-13T16:50:32","modified_gmt":"2026-03-13T16:50:32","slug":"why-most-enterprise-ai-projects-fail-and-how-to-fix-them","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/ai\/why-most-enterprise-ai-projects-fail-and-how-to-fix-them\/","title":{"rendered":"Why most enterprise AI projects fail &#8211; and how to fix them"},"content":{"rendered":"\n<p>Today\u2019s <a href=\"https:\/\/www.red-gate.com\/simple-talk\/opinion\/editorials\/artificial-intelligence-chatgpt-gobbledegook\/\" target=\"_blank\" rel=\"noreferrer noopener\">AI<\/a> landscape is characterized by a gap. While it is often relatively easy to reach the <a href=\"https:\/\/asana.com\/resources\/proof-of-concept\" target=\"_blank\" rel=\"noreferrer noopener\">proof of concept (PoC)<\/a> stage, getting from a PoC to a reliable production system is often much more challenging than teams expect. As a result, by some industry estimates, nearly 80% of <a href=\"https:\/\/cloud.google.com\/discover\/what-is-enterprise-ai\" target=\"_blank\" rel=\"noreferrer noopener\">enterprise AI<\/a> projects never make it out of the lab. <\/p>\n\n\n\n<p>The problem isn\u2019t a <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/data-quality-the-foundation-for-business-agility-and-growth\/\" target=\"_blank\" rel=\"noreferrer noopener\">data quality<\/a> or infrastructure issue, but rather a architectural positioning one. If teams over-engineer complex models before knowing what can go wrong in production, they are creating more problems to solve, while increasing the <a href=\"https:\/\/www.ivalua.com\/glossary\/total-cost-of-ownership-tco\/\" target=\"_blank\" rel=\"noreferrer noopener\">total cost of ownership (TCO)<\/a> of the production environment in the process.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-why-do-ai-projects-stall-in-production-despite-often-being-well-designed-and-well-funded\">Why do AI projects stall in production &#8211; despite often being well-designed and well-funded?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-1-the-fine-tuning-trap\">1. The fine-tuning trap<\/h3>\n\n\n\n<p>Many organizations assume that to solve a specialized business problem, they must <a href=\"https:\/\/www.datacamp.com\/tutorial\/fine-tuning-large-language-models\" target=\"_blank\" rel=\"noreferrer noopener\">fine-tune a<\/a>n LLM on their own data. Yet, contrary to conventional wisdom, this need to custom train is overblown and fraught with technical debt. While a fine-tuned model can capture a static snapshot, when your underlying business logic or <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/exploring-your-database-schema-with-sql\/\" target=\"_blank\" rel=\"noreferrer noopener\">data schema<\/a> changes, it turns into an expensive legacy model that you cannot easily fix and must instead completely retrain.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-2-the-reliability-paradox\">2. The reliability paradox<\/h3>\n\n\n\n<p>While 90% accuracy is considered excellent in a lab, it\u2019s a disaster in real life. In a production environment &#8211; a high\u2011stakes setting serving millions of interactions &#8211; a 10% failure rate is a catastrophe. The model lacks any mechanism that can rein in its response with real-time, verifiable facts. Put simply, it doesn\u2019t know if what it\u2019s saying is true or false.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-3-the-tco-benchmark-the-1-00-vs-0-05-economic-gap\">3. The TCO benchmark: the $1.00 vs. $0.05 economic gap<\/h3>\n\n\n\n<p>Initial performance numbers are seductive, but TCO tells a different story. Most fine-tuning projects fail because the maintenance phase becomes prohibitively expensive, but that&#8217;s not all. Let&#8217;s see how it compares to the other solutions: <a href=\"https:\/\/cloud.google.com\/discover\/what-is-prompt-engineering\" target=\"_blank\" rel=\"noreferrer noopener\">prompt engineering<\/a> and <a href=\"https:\/\/aws.amazon.com\/what-is\/retrieval-augmented-generation\/\" target=\"_blank\" rel=\"noreferrer noopener\">retrieval-augmented generation (RAG)<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><td><strong>Feature<\/strong><\/td><td><strong>Prompt Engineering \/ RAG<\/strong><\/td><td><strong>Fine-Tuning<\/strong><\/td><\/tr><\/thead><tbody><tr><td><strong>Setup Time<\/strong><\/td><td>Days to Weeks<\/td><td>Months<\/td><\/tr><tr><td><strong>Data Required<\/strong><\/td><td>Minimal (Few-shot examples)<\/td><td>High (Thousands of labeled rows)<\/td><\/tr><tr><td><strong>Inference Cost<\/strong><\/td><td>Higher (per token)<\/td><td>Lower (on smaller models)<\/td><\/tr><tr><td><strong>Maintenance<\/strong><\/td><td>Low (Update the prompt\/database)<\/td><td>High (Periodic re-training)<\/td><\/tr><tr><td><strong>Reliability<\/strong><\/td><td>Traceable (via citations)<\/td><td>&#8220;Black Box&#8221; behavior<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>There&#8217;s one major standout here. Prompt engineering can be implemented in days, and even RAG &#8211; which is slightly more difficult &#8211; typically only takes from days to a few weeks. Fine-tuning, on the other hand, takes months! Also, RAG can still be useful with minimal <a href=\"https:\/\/www.ibm.com\/think\/topics\/data-curation\" target=\"_blank\" rel=\"noreferrer noopener\">curated data<\/a>, whereas fine-tuning relies on thousands of labeled rows. <\/p>\n\n\n\n<p>The one area where fine-tuning does win is cost of <a href=\"https:\/\/www.ibm.com\/think\/topics\/ai-inference\" target=\"_blank\" rel=\"noreferrer noopener\">AI inference<\/a>. Since a fine-tuned LLM is smaller and more specialized, it can be cheaper to serve per query, which explains why teams concerned with &#8216;serving&#8217; costs prefer it. However, the financial overhead of ongoing maintenance eventually creeps back into the picture, so the initial cost advantage disappears over time.<\/p>\n\n\n\n<p>An unexpected stall in production also can result in an architectural pivot. Here are some real-world examples of exactly this happening.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-real-world-scenario-a-enterprise-customer-support-assistant\">Real-world scenario A: enterprise customer support assistant<\/h2>\n\n\n\n<p>A large technology company builds an AI assistant to answer customer support questions. During the proof-of-concept phase, the team fine-tunes a language model using thousands of historical chat transcripts and internal troubleshooting guides, achieving impressive evaluation scores and strong demo performance. <\/p>\n\n\n\n<p>However, once deployed, the system begins to fail quietly. Product policies change, new devices launch, and support procedures evolve every few weeks, but the fine-tuned model continues generating answers based on outdated information embedded in its training data. Updating the system now requires collecting new labeled data and retraining the model &#8211; a process that takes weeks, at a significant cost. <\/p>\n\n\n\n<p>Eventually, engineers replace the approach with a retrieval-based system that pulls responses from live documentation, allowing updates to happen instantly without retraining. The project succeeds only after shifting from optimizing model intelligence to optimizing adaptability.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-real-world-scenario-b-ai-contract-analysis-in-financial-services\">Real-world scenario B: AI contract analysis in financial services<\/h2>\n\n\n\n<p>A financial institution deploys an AI system to review legal contracts and extract risk clauses. Early experiments show that a fine-tuned model performs well and offers lower per-query inference costs compared to larger general models, convincing leadership to move forward. In production, however, the hidden complexity emerges: regulations change, contract templates vary by region, and legal language evolves constantly. <\/p>\n\n\n\n<p>Each update requires new annotations from expensive domain experts and repeated retraining cycles, while engineers struggle to explain inconsistent outputs to compliance teams. Maintenance costs quickly exceed the expected savings, and reliability concerns slow adoption. The organization ultimately transitions to an RAG approach that references up-to-date policy documents and provides traceable citations, reducing operational overhead and restoring stakeholder trust. <\/p>\n\n\n\n<p>There&#8217;s one key takeaway here: the real cost of AI systems is not running them once, but keeping them correct over time.<\/p>\n\n\n\n<section id=\"my-first-block-block_0a7bbb167c34805d05f270818418e0bb\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Simple Talk is brought to you by Redgate Software<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            Take control of your databases with the trusted Database DevOps solutions provider. Automate with confidence, scale securely, and unlock growth through AI.                                    <\/div>\n            <\/div>\n                                            <a href=\"https:\/\/www.red-gate.com\/solutions\/overview\/\" class=\"btn btn--secondary btn--lg\" aria-label=\"Discover how Redgate can help you: Simple Talk is brought to you by Redgate Software\">Discover how Redgate can help you<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-solution-the-70-rule-and-rag-first-architectures\">The solution: the &#8216;70% rule&#8217; and RAG-first architectures<\/h2>\n\n\n\n<p>In my professional work, the 70% rule has been the remedy for most of our production roadblocks. Essentially, around 70% of production-grade AI applications will be more reliable and cost less with RAG and sophisticated prompting rather than through fine-tuning.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-why-is-rag-better-than-ai-fine-tuning-in-production\">Why is RAG better than AI fine-tuning in production?<\/h2>\n\n\n\n<p>In more detail, RAG is the better option because of a few reasons:<\/p>\n\n\n\n<p><strong>Traceability:<\/strong> RAG serves as an &#8216;open book&#8217; model that tells you exactly where it gets its information.<\/p>\n\n\n\n<p><strong>Agility:<\/strong> You can update your data in a <a href=\"https:\/\/www.ibm.com\/think\/topics\/vector-database\" target=\"_blank\" rel=\"noreferrer noopener\">vector database<\/a> in seconds. The system doesn&#8217;t require you to retrain the LLM to apply changes.<\/p>\n\n\n\n<p><strong>Predictability<\/strong>: By keeping the model weights that govern the LLM\u2019s core (pre-update) behavior frozen, we can 1) ensure the model doesn\u2019t suddenly worsen, and 2) prevent the model from catastrophically forgetting.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-explaining-prescriptive-guardrails-and-human-in-the-loop\">Explaining prescriptive guardrails and human-in-the-loop<\/h2>\n\n\n\n<p>The final bridge from lab to production is the <a href=\"https:\/\/community.openai.com\/t\/building-a-role-aware-guardrail-layer-between-llms-and-production-databases\/1370566\" target=\"_blank\" rel=\"noreferrer noopener\">guardrail layer<\/a>. These are secondary checks in which your inputs and outputs are vetted against your particular business constraints. <\/p>\n\n\n\n<p>Using scoring mechanisms such as the <a href=\"https:\/\/docs.statsig.com\/experiments\/advanced-setup\/sprt\" target=\"_blank\" rel=\"noreferrer noopener\">Sequential Probability Ratio Test (SPRT)<\/a>, the system can automatically flag low-confidence interactions but still pass them to a human expert in a human-in-the-loop (HITL) workflow. This process includes:<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/www.ibm.com\/think\/topics\/anomaly-detection\" target=\"_blank\" rel=\"noreferrer noopener\">Anomaly detection<\/a><\/strong>: An SPRT model will track the cumulative log-likelihood ratio of an incoming data stream from a baseline to indicate that data as anomalous if the ratio crosses a pre-defined threshold.<\/p>\n\n\n\n<p><strong>Automated escalation:<\/strong> Although the score is likely to bounce around near the edge of the highest threshold, crossing the threshold should produce a \u201ccode blue\u201d alert for immediate, documented, manual review.<\/p>\n\n\n\n<p><strong>Filtering:<\/strong> The framework will filter out the \u201cknown-goods\u201d and allow experts to hone in on high-risk edge cases. This could save more than 10,000 man-hours in some high-volume environments.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"651\" height=\"576\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/03\/image-6.png\" alt=\"Image explaining AI reliability measures via guardrail layers and human-in-the-loop processes.\" class=\"wp-image-109166\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/03\/image-6.png 651w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2026\/03\/image-6-300x265.png 300w\" sizes=\"auto, (max-width: 651px) 100vw, 651px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-summary\">Summary<\/h2>\n\n\n\n<p>Rather than seeking perfection, the 70% rule is based on the law of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Diminishing_returns\" target=\"_blank\" rel=\"noreferrer noopener\">diminishing returns<\/a>. In real-world deployments, a system using RAG that achieves 95% reliability at five cents a query is preferable to a fine-tuned system that achieves 96% reliability but costs $1.00 a query (or more) in infrastructure, specialized talent, care and feeding.<\/p>\n\n\n\n<p>So, before you commit to months of fine-tuning, ask your team if you\u2019ve really found the limits of RAG and your prescriptive guardrails. Simplicity is often the best course of action.<\/p>\n\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: AI in production<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. Why do so many AI projects fail to reach production?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"50\" data-end=\"310\">Many teams reach a proof of concept quickly but struggle with reliability, maintenance, and architecture challenges. As a result, up to 80% of enterprise AI projects never move beyond experimentation.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. What is the \u201cfine-tuning trap\u201d?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"312\" data-end=\"500\">Fine-tuning can create models that quickly become outdated when business rules or data change, requiring expensive retraining and ongoing maintenance.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. Why isn\u2019t 90% AI accuracy enough in production?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"502\" data-end=\"698\">At scale, a 10% failure rate can cause serious issues. Production systems need stronger reliability, validation, and fact-checking mechanisms.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">4. What is RAG in AI?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"700\" data-end=\"892\">Retrieval-Augmented Generation (RAG) combines a language model with real-time data retrieval, allowing responses to be based on current and verifiable information.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">5. Why is RAG often better than fine-tuning?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"894\" data-end=\"1080\">RAG is more adaptable. You can update data instantly without retraining the model, making it easier and cheaper to maintain in production.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">6. What is the \u201c70% rule\u201d for AI systems?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Roughly 70% of production AI applications work better with RAG and prompt engineering rather than fine-tuning.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">7. What are AI guardrails?<\/h3>\n            <div class=\"faq-answer\">\n                <p data-start=\"1243\" data-end=\"1394\">AI guardrails are validation layers that check inputs and outputs, detect anomalies, and escalate uncertain cases to humans.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">8. What is the key lesson for enterprise AI?<\/h3>\n            <div class=\"faq-answer\">\n                <div class=\"flex flex-col text-sm\">\n<article class=\"text-token-text-primary w-full focus:outline-none [--shadow-height:45px] has-data-writing-block:pointer-events-none has-data-writing-block:-mt-(--shadow-height) has-data-writing-block:pt-(--shadow-height) [&amp;:has([data-writing-block])&gt;*]:pointer-events-auto scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]\" dir=\"auto\" data-turn-id=\"request-WEB:386a6837-c2bb-4571-a373-e276b98a2eb6-1\" data-testid=\"conversation-turn-4\" data-scroll-anchor=\"true\" data-turn=\"assistant\">\n<div class=\"text-base my-auto mx-auto pb-10 [--thread-content-margin:var(--thread-content-margin-xs,calc(var(--spacing)*4))] @w-sm\/main:[--thread-content-margin:var(--thread-content-margin-sm,calc(var(--spacing)*6))] @w-lg\/main:[--thread-content-margin:var(--thread-content-margin-lg,calc(var(--spacing)*16))] px-(--thread-content-margin)\">\n<div class=\"[--thread-content-max-width:40rem] @w-lg\/main:[--thread-content-max-width:48rem] mx-auto max-w-(--thread-content-max-width) flex-1 group\/turn-messages focus-visible:outline-hidden relative flex w-full min-w-0 flex-col agent-turn\">\n<div class=\"flex max-w-full flex-col gap-4 grow\">\n<div class=\"min-h-8 text-message relative flex w-full flex-col items-end gap-2 text-start break-words whitespace-normal [.text-message+&amp;]:mt-1\" dir=\"auto\" data-message-author-role=\"assistant\" data-message-id=\"ec6bad72-d27b-4cd7-b664-997ec5905fbf\" data-message-model-slug=\"gpt-5-3\">\n<div class=\"flex w-full flex-col gap-1 empty:hidden\">\n<div class=\"markdown prose dark:prose-invert w-full wrap-break-word light markdown-new-styling\">\n<p data-start=\"1396\" data-end=\"1543\" data-is-last-node=\"\" data-is-only-node=\"\">The biggest cost isn\u2019t running AI once &#8211; it\u2019s keeping the system accurate and reliable over time.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/article>\n<\/div>\n<div class=\"pointer-events-none h-px w-px absolute bottom-0\" aria-hidden=\"true\" data-edge=\"true\">\u00a0<\/div>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Discover why many enterprise AI projects fail and learn how to build reliable, cost-effective LLM workflows.&hellip;<\/p>\n","protected":false},"author":346673,"featured_media":109165,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[159169,53],"tags":[159075,5992,4168,4170],"coauthors":[159377],"class_list":["post-109154","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-featured","tag-ai","tag-data-analysis","tag-database","tag-database-administration"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/109154","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/346673"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=109154"}],"version-history":[{"count":8,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/109154\/revisions"}],"predecessor-version":[{"id":109408,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/109154\/revisions\/109408"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/109165"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=109154"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=109154"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=109154"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=109154"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}