{"id":1793,"date":"2014-04-15T00:00:00","date_gmt":"2014-04-15T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/searching-for-strings-in-sql-server-databases\/"},"modified":"2021-09-29T16:21:38","modified_gmt":"2021-09-29T16:21:38","slug":"searching-for-strings-in-sql-server-databases","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/searching-for-strings-in-sql-server-databases\/","title":{"rendered":"Searching for Strings in SQL Server Databases"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\">For most of the time you will get adequate powerful searches from a well-normalized database that has all possible constraints and indexes. Just occasionally, usually when you&#8217;re searching through text, or you&#8217;ve inherited the work of someone whose major talents lie in other areas of life, you need other magic. This article is about some of the SQL magic that might be required.<\/p>\n<p>The first type of search I&#8217;ll refer to is the &#8216;Brute-force&#8217; search, where performance isn&#8217;t the primary concern. Even with the best of intentions, it is possible to find oneself searching whole blocks of text in a database, or searching across columns. Even as an experienced database developer, I find myself occasionally having to do ad-hoc searches for strings in databases, especially in large chunks of text. Anyone like me, who has to mop up after a database goes wrong, has a number of techniques in their metaphorical toolbox for getting to grips with the data.<\/p>\n<p>The second type of search is the &#8216;search-engine&#8217; type of search when you need to implement clever text-searching facilities in a database, the <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/search\/get-started-with-full-text-search\">FullText indexing features of SQL Server (FTS)<\/a> are great, and should always be the first port of call. By &#8216;clever&#8217; we mean finding two words near each other, matching any word derived from a particular root (for example &#8216;search&#8217;, &#8216;searching&#8217;, &#8216;searched&#8217;), finding several words with distinct weightings, or doing a fuzzy search for a word or phrase. There are a few rare , but interesting, exceptions when you actually need something more special-purpose and &#8216;hand-crafted&#8217;.<\/p>\n<p>We&#8217;ll tackle both types of searches in this article, but not Full-Text Search (FTS). For more on FTS, <a href=\"https:\/\/www.simple-talk.com\/sql\/database-administration\/full-text-searches-on-documents-in-filetables\/\">see Full-Text searches on documents in Filetables,<\/a>, <a href=\"https:\/\/www.simple-talk.com\/sql\/learn-sql-server\/sql-server-full-text-search-language-features\/\">FTS features<\/a>, and Rob Sheldon&#8217;s <a href=\"https:\/\/www.simple-talk.com\/sql\/learn-sql-server\/full-text-indexing-workbench\/\">Full-Text Indexing Workbench<\/a>.<\/p>\n<h2>Ad-hoc Brute-force searches of databases.<\/h2>\n<p>With no great sweat, it is possible to search for a phrase in a table or result using LIKE or PATINDEX. This is often called the &#8216;brute-force search. In many cases, it is surprisingly effective. I&#8217;ve gone into detail in how to use the wildcard features of PATINDEX here in the <a href=\"https:\/\/www.simple-talk.com\/sql\/t-sql-programming\/patindex-workbench\/\">PATINDEX Workbench<\/a>, and give an example of its use in a migration script for a database refactoring that involved creating two new columns in an address table.<\/p>\n<h3>The blunt instrument &#8211; Search anywhere in a query result<\/h3>\n<p>For the crudest sort of searches, we need to represent all the data from the result of a SQL query in string-format. The easiest way of accomplishing this is to get the result of a SQL Expression in XML form and then just keep the contents of every element in one big string. To do this, we&#8217;ll use this function.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">IF OBJECT_ID(N'dbo.StripXMLTags') IS NOT NULL DROP FUNCTION dbo.StripXMLTags\r\nGO\r\ncreate function dbo.StripXMLTags( @xml XML ) returns varchar(max) as\r\nbegin\r\n\u00a0\u00a0\u00a0 declare @TextWithoutTags varchar(max);\r\n\u00a0\u00a0\u00a0 with TheXMLDocument(contents) as\r\n\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 select TheXML.node.query('.') from @XML.nodes('\/') as TheXML(node)\r\n\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0 select @TextWithoutTags = contents.value('.', 'varchar(max)') from TheXMLDocument\r\n\u00a0\u00a0\u00a0 return @TextWithoutTags\r\nend\r\n\r\n<\/pre>\n<p>To show how quick and effective this is, we&#8217;ll\u00a0 search all columns of 20,000 rows in AdventureWorks&#8217; <b>person.contact<\/b> table under two seconds *\/<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">Select patindex('%P.O Box 5%',dbo.StripXMLTags((Select * from person.contact for xml path))) \r\n<\/pre>\n<p>Note that we&#8217;re using the old 2008 version of AdventureWorks. Please amend to taste if using the later version. Of course, if you were looking for a particular string, you could display the context of the first string found, if any (otherwise NULL)<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">with TheText(TheResult) as (Select dbo.StripXMLTags((Select * from person.contact for xml path)))\r\nSelect case when HitLocation&gt;0 then substring(TheResult,HitLocation-20,40) else null end\u00a0 from\r\n(select patindex('%P.O Box 5%',theResult) as hitLocation, Theresult from\u00a0 TheText)f\r\n\r\n<\/pre>\n<p>If we wanted to know the location of all the strings, then we might put the matches into a table<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">Declare @string varchar(max),\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @len int, @Hit int, @ii int\r\nDeclare @FoundStrings table (context varchar(40), location int)\r\nSelect @string= dbo.StripXMLTags((Select * from person.contact for xml path))\r\nSelect @len=len(@String),@ii=@len\r\nWhile @ii&gt;0\r\n\u00a0 begin \r\n\u00a0 Select @hit= patindex('%hank%',right(@string,@ii))\r\n\u00a0 if @hit=0 break\r\n\u00a0 insert into @FoundStrings(Context, Location)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 Select substring (@String,@len-@ii+@hit,40),@len-@ii+@hit\r\n\u00a0 Select @ii=@ii-@hit\r\n\u00a0 end\r\nSelect * from\u00a0 @FoundStrings\r\n<\/pre>\n<p>As you can see, it is possible to search the results of any SQL expression you need.\u00a0 It is perfectly possible to automatically search every table in the database if you&#8217;re really keen on finding all occurrences of a string.<\/p>\n<p>You&#8217;d want to turn it into a function for use<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">IF OBJECT_ID(N'dbo.SearchWithinAQuery') IS NOT NULL DROP FUNCTION dbo.SearchWithinAQuery\r\nGO\r\ncreate function dbo.SearchWithinAQuery( @xml XML, @SearchPattern Varchar(80) )\r\n\/* searches the inner text within an XML document, returning the context of\r\nevery match. The searches are in PATINDEX format \r\nAuthor: Phil Factor\r\nRevision: 1.3\r\ndate: 8 Apr 2014\r\nexample:\r\n- code: SELECT * from dbo.SearchWithinAQuery((select * from person.contact for xml path ),'%hank%')\r\n*\/\r\n\r\nRETURNS\r\n@FoundStrings TABLE\r\n(\r\ncontext varchar(40), \r\nlocation int --the substring location in which the string is found\r\n)\r\nAS\r\nbegin\r\nDeclare @string varchar(max), @len int, @Hit int, @ii int\r\nSelect @string= dbo.StripXMLTags(@xml), @len=len(@String),@ii=@len\r\nWhile @ii&gt;0--loop around until all instances are found\r\n\u00a0 begin \r\n\u00a0 Select @hit= patindex(@SearchPattern,right(@string,@ii))\r\n\u00a0 if @hit=0 break --if no more are to be found \r\n\u00a0 insert into @FoundStrings(Context, Location)--insert each found item\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 Select substring (@String,@len-@ii+@hit,40),@len-@ii+@hit\r\n\u00a0 Select @ii=@ii-@hit--decrement the index from the right of the string\r\n\u00a0 end\r\nReturn\r\nend\r\n<\/pre>\n<p>You might want to use the same &#8216;PATINDEX&#8217; loop to also search through free-form notes or text documents. Here we look through Bram Stoker&#8217;s Dracula to look for instances of the word &#8216;terror&#8217;. (I&#8217;ve included the text as a zip file at the bottom of the article &#8211; remember to change the path!)<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">DECLARE @string VARCHAR(MAX), @len int, @Hit int, @ii int\r\nSELECT\u00a0 @string = BulkColumn --read the text of a file\r\n\u00a0 FROM\u00a0\u00a0\u00a0 OPENROWSET(BULK 'D:\\files\\dracula.txt', SINGLE_BLOB) AS x \r\n--read in the text of Dracula \r\nDeclare @FoundStrings table (context varchar(40), location int)\r\n---our results table variable\r\nSelect @len=len(@String),@ii=@len\r\nWhile @ii&gt;0 --while there is some of the string to do\r\n\u00a0 begin \r\n\u00a0 Select @hit= patindex('%terror%',right(@string,@ii))\r\n\u00a0 if @hit=0 break\r\n\u00a0 insert into @FoundStrings(Context, Location)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 Select substring (@String,@len-@ii+@hit-20,40),@len-@ii+@hit\r\n\u00a0 Select @ii=@ii-@hit\r\n\u00a0 end\r\nSelect * from\u00a0 @FoundStrings\r\n<\/pre>\n<p>OK. That&#8217;s fine if you are just trying to find a particular string, but you could want something a bit more refined.<\/p>\n<h3>A sharper stick for free-form searches.<\/h3>\n<p>Sometimes, for example, you want a way of selecting customers from your database in a Google-style way, without having to bother with whether it is a surname, town, postcode or customer_ID that you are typing in.<\/p>\n<p>Imagine that you, working for AdventureWorks know only that your customer has the word hacienda in the address but you reckon that his postcode starts with &#8217;94&#8217; With this code, you&#8217;ve homed in on Mr Wright from your 18508 customers in half a second<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">Select ContactString from\r\n\u00a0 (Select coalesce(title+' ','')+firstname+' '+coalesce(middleName+' ','')\r\n\u00a0\u00a0\u00a0\u00a0 +lastname+coalesce(' '+suffix,'')+' Email:'+emailAddress+' Phone:'+Phone\u00a0 \r\n\u00a0\u00a0\u00a0\u00a0 +' Address: '+AddressLine1+ coalesce(' '+AddressLine2,'')+', '+City+ ' '\r\n\u00a0\u00a0\u00a0\u00a0 +postalcode +' State\/Province:'+SP.Name+ ' Country:'+SP.CountryRegionCode\r\n\u00a0\u00a0\u00a0\u00a0 as ContactString\r\n\u00a0 from person.contact c\r\n\u00a0\u00a0\u00a0 inner join Sales.Individual\u00a0 I ON C.ContactID = I.ContactID\r\n\u00a0\u00a0\u00a0 inner join Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID\r\n\u00a0\u00a0\u00a0 inner join\u00a0 Person.Address AS A ON A.AddressID = CA.AddressID\r\n\u00a0\u00a0\u00a0 inner join person.StateProvince SP ON A.StateProvinceID=SP.StateProvinceID\r\n\u00a0\u00a0\u00a0 )f\r\nwhere ContactString like '%Hacienda%94%'\r\n<\/pre>\n<p>Where databases are relatively small, you can do well with routines that search several columns of views, or derived tables. As soon as the amount of data grows, this becomes a bad idea. Note also that if you didn&#8217;t know the order of occurrence within the line, you&#8217;d have to\u00a0 do &#8230;<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">...\r\n\twhere ContactString like '%Hacienda%94%' or ContactString like '%94%Hacienda%'\r\n<\/pre>\n<p>&#8230; which doubles the time it takes. We&#8217;ll revisit this problem later on after I&#8217;ve introduced you to some &#8216;search-engine&#8217; techniques.<\/p>\n<p>If you have a small database, though, the brute-force approach is quick and easy, but mostly just for ad-hoc work, since these queries tend to have a high CPU load. \u00a0<\/p>\n<h2 id=\"inversion\">The Inversion Technique for Searching databases<\/h2>\n<p>Here is a very simple technique, which we&#8217;ve used for a long time with\u00a0 a succession of applications. It is usually called the &#8216;Inverted&#8217; or\u00a0 &#8216;Inversion&#8217; index technique. (see <a href=\"http:\/\/en.wikipedia.org\/wiki\/Search_engine_indexing\">Search engine indexing <\/a> for a full discussion). Both FullText indexing and\u00a0 Apache Lucene use it. MapReduce is an elaboration of the same simple technique. The technique scales very well and we&#8217;ve tested\u00a0 it indexing up to 8 million documents. Even when searching through eight million documents, It gets results within three seconds if the indexes are right, but you&#8217;d expect most searches to be far quicker. In the example code I&#8217;ll use to demonstrate, It found strings in the King James Bible in 20 ms that took just under two seconds with a brute-force search. Because each search takes preparation, and any permanent inversion index takes considerable maintenance to cope with insertions, updates and deletes, it isn&#8217;t much good for ad-hoc work.<\/p>\n<p>Firstly, we will need to create our inversion index. I generally use two tables, one of which has every word in it that exists within the data you want to search, and another table that stores the location of every instance of every word. The term &#8216;location&#8217; will mean different things in practice, depending on the problem you want to solve, but in these examples, I will use the publication(or document), offset character location, and sequence (1..n)<\/p>\n<p>First we&#8217;ll need a way of chopping text into its constituent words (The Map process). This is a routine that does the job but it is slow. I&#8217;ve found nothing as quick in TSQL, but it is an ideal candidate for a CLR. I&#8217;ve left one of my quick unit-test routines just to show how I do such things.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\/*--------------------*\/\r\nIF OBJECT_ID(N'IterativeWordChop') IS NOT NULL DROP FUNCTION IterativeWordChop\r\nGO\r\nCREATE FUNCTION [dbo].[IterativeWordChop] \r\n\/*\r\nsummary:\u00a0\u00a0 &gt;\r\n\r\nThis Table-valued function takes any text as a parameter and splits it into its constituent words, passing back the order in which they occured and their location in the text. \r\nAuthor: Phil Factor\r\nRevision: 1.3\r\ndate: 2 Apr 2014\r\nexample:\r\n\u00a0\u00a0\u00a0\u00a0 - code: SELECT * FROM IterativeWordChop('this tests stuff. Will it work?')\r\n\u00a0\u00a0\u00a0\u00a0 - code: SELECT * FROM IterativeWordChop('this ------- tests it again; Will it work ...')\r\n\u00a0\u00a0\u00a0\u00a0 - code: SELECT * FROM IterativeWordChop('Do we allow %Wildcards% like %x%?')\r\nreturns:\u00a0\u00a0 &gt;\r\nTable of SequenceNumber, item (word) and sequence no.\r\n**\/\r\n( \r\n@string VARCHAR(MAX)\r\n) \r\nRETURNS\r\n@Results TABLE\r\n(\r\nItem VARCHAR(255),\r\nlocation int,\r\nSequence int identity primary key\r\n)\r\nAS\r\nBEGIN\r\nDECLARE @Len INT, @Start INT, @end INT, @Cursor INT,@length INT\r\nSELECT @Cursor=1, @len=LEN(@string)\r\nWHILE @cursor&lt;@len\r\n\u00a0\u00a0 BEGIN\r\n\u00a0\u00a0 SELECT @start=PATINDEX('%[^A-Za-z0-9][A-Za-z0-9%]%',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ' '+SUBSTRING (@string,@cursor,50)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )-1\r\n\u00a0\u00a0 if @start&lt;0 break\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \r\n\u00a0\u00a0 SELECT @length=PATINDEX('%[^A-Z''a-z0-9-%]%',SUBSTRING (@string,@cursor+@start+1,50)+' ')\u00a0\u00a0 \r\n\u00a0\u00a0 INSERT INTO @results(item, location) \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0 SUBSTRING(@string,@cursor+@start,@length), @cursor+@start\r\n\u00a0\u00a0 SELECT @Cursor=@Cursor+@Start+@length+1\r\n\u00a0\u00a0 END\r\nRETURN\r\nEND\r\ngo\r\n---sanity-check the word-chop routine to make sure it is more or less working\r\nif exists (select count(*)\r\nfrom (\r\n\u00a0 Select item, location,sequence\r\n\u00a0\u00a0 FROM IterativeWordChop('Hello. This tests it again; Will it work? ...')\r\n\u00a0 union all\r\n\u00a0\u00a0 Select * from ( values\r\n\u00a0\u00a0\u00a0\u00a0 ('Hello',1,1), ('This',8,2), ('tests',13,3), ('it',19,4), ('again',22,5), ('Will',29,6),\r\n\u00a0\u00a0\u00a0\u00a0 ('it',34,7), ('work',37,8)) CorrectResult(item, location,sequence)\r\n\u00a0\u00a0\u00a0\u00a0 )f --our 'correct' values\r\n\u00a0 group by sequence, location,item having count(*)&lt;2\r\n\u00a0 )\r\n\u00a0 Raiserror('fault in IterativeWordChop', 16,1)\r\n<\/pre>\n<p>Now, armed with a way of chopping up text. We can now\u00a0 create the basic tables. We have one table that contains all the words in our set of publications, and another table that maps the text, word by word, recording the location, sequence and publication. In this first example, we&#8217;re only going to have one publication.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--tidy up any existing data from a previous run. \r\nIF\u00a0 OBJECT_ID(N'publication') IS NOT NULL DROP table publication\r\nGO\r\nIF\u00a0 OBJECT_ID(N'WordOccurence') IS NOT NULL DROP table wordOccurence\r\nGO\r\nIF\u00a0 OBJECT_ID(N'Word') IS NOT NULL DROP table word\r\nGO\r\n\r\n--create a table with the occurrence of each word in it.\r\nCreate table Word\r\n(\r\nItem\u00a0\u00a0 VARCHAR(255) not null primary key,\r\nfrequency\u00a0 int not null default(0)\r\n)\r\ngo\r\n--create a table with the word-mapping in it.\r\nCreate table WordOccurence\r\n(\r\nItem\u00a0 VARCHAR(255) not null ,\r\nlocation int not null,\r\nSequence int not null,\r\nPublication\u00a0 int not null\r\n)\r\n\r\nALTER TABLE WordOccurence ADD PRIMARY KEY(item, sequence);\r\n\r\nCreate table publication\r\n(\r\nPublication_ID int primary key,\r\nTheText Varchar(max)\r\n<\/pre>\n<p>Because of the way we want insert the data, the foreign key gets inserted later.<\/p>\n<p>Now we have got this far, we can read the entire King James Bible (which is there on the server&#8217;s disk -download it from the head of the article) into our publication table and index it up (map each word) This will hopefully give us enough text to give the algorithm a realistic size of data.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\/* Let's put the text of a large book into a VarChar(MAX)variable *\/\r\nDECLARE @LotsOfText VARCHAR(MAX) \r\nSELECT\u00a0 @LotsOfText = BulkColumn \r\n-- 2005 onwards only for this code)\r\n\u00a0 FROM\u00a0\u00a0\u00a0 OPENROWSET(BULK 'D:\\files\\kjb.txt', SINGLE_BLOB) AS x\u00a0 \r\ninsert into publication (Publication_ID, TheText) values(1,@LotsOfText)\r\n\r\ninsert into WordOccurence (Item,location,Sequence,Publication)\r\n\u00a0 SELECT Item,location,Sequence,1\u00a0 FROM Iterativewordchop(@LotsOfText)--\r\n\r\ninsert into word(item, frequency)\r\nSelect\u00a0 item, count(*) \r\n\u00a0\u00a0\u00a0\u00a0 from WordOccurence group by item \r\n\r\nALTER TABLE WordOccurence ADD FOREIGN KEY (item) REFERENCES word\r\n<\/pre>\n<p>So we can begin testing this and trying various experiments. Just to kick things off, I&#8217;ve implemented a function that will find exact phrases, and show instance where they occur in the text. This receives the words you wish to search for, and then determines how rare they are. It searches only for the rarest in the phrase (you can adjust the proportion.<\/p>\n<p>In practice, I only search for the rarest words in a string and then use a regex to do a brute-force search once the search has been narrowed down to the locations that have the rarest words together in them. This cuts out a lot of tiresome logic! In this case, we use a variant of relational division to find the locations where all the words occur, but it isn&#8217;t so much use for fuzzier searches. The fuzzier searches are too complicated to make for an entertaining read in an article, but it is a great exercise to think them up. For a start, you could do quite a lot of fuzzy searches using a LIKE join between your search words and the WORD table.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">IF OBJECT_ID(N'FindString') IS NOT NULL DROP FUNCTION FindString\r\nGO\r\nCREATE FUNCTION [dbo].[FindString] \r\n\/*\r\nsummary:\u00a0\u00a0 &gt;\r\n\r\nThis Table-valued function takes text as a parameter and tries to find it in the WordOccurence table\r\n\r\nAuthor: Phil Factor\r\nRevision: 1.3\r\ndate: 10 Apr 2014\r\nexample:\r\n\u00a0\u00a0\u00a0\u00a0 - code: SELECT * FROM FindString('Who is he that hideth counsel without knowledge?',1)\r\n\u00a0\u00a0\u00a0\u00a0 - code: SELECT * FROM FindString(' And I turned to see the voice that spake with me. And being turned, I saw seven golden candlesticks')\r\nreturns:\u00a0\u00a0 &gt;\r\npasses back the location where they were found, and the number of words matched in the string.\r\n**\/\r\n( \r\n@string VARCHAR(100),\r\n@Publication int\r\n) \r\nRETURNS\r\n@finds table\r\n(location int,\r\n\u00a0 hits int)\r\n\r\nAS\r\nBEGIN\r\nDeclare @WordsToLookUp table(Item VARCHAR(255), location int, Sequence int primary key)\r\nDeclare @wordCount int,\u00a0 @searches int\r\n\u00a0\u00a0\u00a0 -- chop the string into its constituent words, with the sequence\r\ninsert into @WordsToLookUp(Item,location,Sequence)\r\n\u00a0 Select Item,location,Sequence FROM dbo.Iterativewordchop (@string)\r\n\u00a0\u00a0\u00a0 -- determine how many words and work out what proportion to search for\r\nSelect @WordCount =count(*) from\u00a0 @WordsToLookUp\r\nSelect @searches=case when @wordcount &lt; 3 then @wordcount else 2+(@wordcount\/2) end\r\n\u00a0\u00a0\u00a0 -- \r\ninsert into @finds (location, hits)\r\nSelect min(wordOccurence.location), count(*) as matches\r\nfrom WordOccurence\r\ninner join \r\n(Select top (@searches) word.item, searchterm.sequence from \r\n\u00a0\u00a0\u00a0\u00a0 @WordsToLookUp searchterm\r\n\u00a0\u00a0\u00a0 inner join Word\r\n\u00a0\u00a0\u00a0 on searchTerm.item=word.item order by frequency\r\n) LessFrequentWords(item,Sequence)\r\non\u00a0 LessFrequentWords.item =WordOccurence.item\r\nwhere publication=@Publication\r\ngroup by wordoccurence.sequence-LessFrequentWords.sequence\r\nhaving count(*)&gt;= @searches order by count(*) desc\r\nreturn \r\nend\r\n<\/pre>\n<p>So now, armed with this, we can soon become &#8216;well versed&#8217; in the bible, being able to quote chapter and verse for all sorts of biblical quotations.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">Declare @originalText Varchar(max)\r\nSelect @originalText=TheText from publication where publication_ID=1\r\nSelect substring(@originalText,Location,180), hits FROM FindString('Which is the\r\nearnest of our inheritance until the redemption of the purchased\r\npossession, unto the praise of his glory.',1)\r\nSelect substring(@originalText,Location,180), hits FROM FindString('antichrist',1)\r\nSelect substring(@originalText,Location,60), hits FROM FindString('unto Philadelphia, and unto Laodicea',1)\r\n<\/pre>\n<p>I&#8217;ve already shown you how to find all instances of the word &#8216;Terror&#8217; in the book &#8216;Dracula&#8217; by brute-force. You can substitute Dracula for the KJB so as to\u00a0 use the inversion-index form,\u00a0 and try the two back-to-back. I&#8217;ve shown you a simple algorithm, but with everything laid out workbench-style, I guess that you&#8217;ll soon find ways of improving performance.<\/p>\n<h2>\u00a0<\/h2>\n<p>We&#8217;ll end with a more typical problem, and one that I&#8217;ve had to solve more than once! The application developers want the end users to locate customers or groups of customers by means of a simple google-like search. We negotiate an interface where a stored procedure receives a string containing words and we return all the customers that have all the words in. We allow wildcards<\/p>\n<p>We can start off by creating our inversion tables: very similar to the previous example.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--if the index tables exist, then delete them\r\nIF\u00a0 OBJECT_ID(N'CustomerWordOccurence') IS NOT NULL DROP table CustomerwordOccurence\r\nIF\u00a0 OBJECT_ID(N'CustomerWord') IS NOT NULL DROP table Customerword\r\nGO\r\n--the table that holds each word (and its frequency\r\nCreate table CustomerWord\r\n(\r\nItem\u00a0\u00a0 VARCHAR(255) not null primary key,\r\nfrequency\u00a0 int not null default(0)\r\n)\r\ngo\r\n--the table that maps customer data, holding each word, and its location\r\nCreate table CustomerWordOccurence\r\n(\r\nItem\u00a0 VARCHAR(255) not null ,\r\nlocation int not null,\r\nSequence int not null,\r\nPublication\u00a0 int not null\r\n)\r\nALTER TABLE CustomerWordOccurence ADD PRIMARY KEY(item, sequence,publication);\r\n<\/pre>\n<p>This time, we can create a text version of each row along with a primary key, and map that text. We use a very similar technique to the previous example , but each row becomes its own &#8216;publication&#8217;<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\t\t\/* first we map the text value of each row into the CustomerWordOccurence table, recording the location of each word *\/\r\ninsert into CustomerWordOccurence\r\nSelect item, location, sequence,contactid from\r\n\u00a0 (Select c.contactID, coalesce(title+' ','')+firstname+' '+coalesce(middleName+' ','')\r\n\u00a0\u00a0\u00a0\u00a0 +lastname+coalesce(' '+suffix,'')+' Email:'+emailAddress+' Phone:'+Phone\u00a0 \r\n\u00a0\u00a0\u00a0\u00a0 +' Address: '+AddressLine1+ coalesce(' '+AddressLine2,'')+', '+City+ ' '\r\n\u00a0\u00a0\u00a0\u00a0 +postalcode +' State\/Province:'+SP.Name+ ' Country:'+SP.CountryRegionCode\r\n\u00a0\u00a0\u00a0\u00a0 as ContactString\r\n\u00a0 from AdventureWorks.person.contact c\r\n\u00a0\u00a0\u00a0 inner join AdventureWorks.Sales.Individual\u00a0 I ON C.ContactID = I.ContactID\r\n\u00a0\u00a0\u00a0 inner join AdventureWorks.Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 and AddressTypeID=2\r\n\u00a0\u00a0\u00a0 inner join\u00a0 AdventureWorks.Person.Address AS A ON A.AddressID = CA.AddressID\r\n\u00a0\u00a0\u00a0 inner join AdventureWorks.person.StateProvince SP ON A.StateProvinceID=SP.StateProvinceID\r\n\u00a0\u00a0\u00a0 )f\r\ncross apply dbo.iterativeWordChop(contactString)\r\n\r\n\/* then we store each unique word, storing its frequency ( we don't use the frequency in this example code) *\/\r\ninsert into Customerword(item, frequency)\r\nSelect\u00a0 item, count(*) \r\n\u00a0\u00a0\u00a0\u00a0 from CustomerWordOccurence group by item \r\nALTER TABLE CustomerWordOccurence ADD FOREIGN KEY (item) REFERENCES Customerword\r\n<\/pre>\n<p>Now we can define our procedure that returns a conventional result corresponding to the AdventureWorks customers (Sorry, AdventureWorks 2008 only) In this case we are allowing wildcards so we use the CustomerWord table to speed up that search (it takes a fifth of the time compared with an inner joing ON LIKE). \u00a0\u00a0<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\/* and we create the stored procedure that we've agreed with the application programmers.*\/ \r\nIF OBJECT_ID(N'CustomerAddress') IS NOT NULL DROP procedure CustomerAddress\r\nGO\r\nCREATE PROCEDURE CustomerAddress \r\n\r\n\/***summary:\u00a0\u00a0 &gt;\r\nSelect rows froom AdventureWorks 2008 based on the lookup of the words in the CustomerWordOccurence and CustomerWord tables, in which are stored every word from the data, along with the contactid of the contact\r\n\r\nAuthor: Phil Factor\r\nRevision: 1.0\r\ndate: 14 Apr 2014\r\nexample:\r\n\u00a0\u00a0\u00a0\u00a0 - code: Execute CustomerAddress 'Andrew california'\r\n\u00a0\u00a0\u00a0\u00a0 - code: Execute CustomerAddress '%x% Birmingham\u00a0 england'\r\n\u00a0\u00a0\u00a0\u00a0 - code: Execute CustomerAddress '%orge% british columbia'\r\nreturns:\u00a0\u00a0 &gt;\r\n**\/\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @SearchString Varchar(200) = '%%'\r\nAS\r\nBEGIN\r\n\r\n\u00a0 Declare @WordsToLookUp table(Item VARCHAR(255), location int, Sequence int primary key)\r\n\u00a0 Declare @WordCount int\r\n\u00a0 insert into @WordsToLookUp(Item,location,Sequence)\r\n\u00a0\u00a0\u00a0 Select Item,location,Sequence FROM dbo.Iterativewordchop (@SearchString)\r\n\u00a0 Select @WordCount =count(*) from\u00a0 @WordsToLookUp\r\n\u00a0 Select c.ContactID,\u00a0\u00a0\u00a0\u00a0\u00a0 NameStyle,\u00a0\u00a0\u00a0 Title, FirstName,\u00a0\u00a0\u00a0 MiddleName,\u00a0\u00a0 LastName,\u00a0\u00a0\u00a0\u00a0 Suffix,\r\n\u00a0\u00a0\u00a0 EmailAddress,\u00a0\u00a0\u00a0 EmailPromotion,\u00a0\u00a0\u00a0\u00a0\u00a0 Phone, AddressLine1,AddressLine2,City,\r\n\u00a0\u00a0\u00a0 PostalCode,SP.Name as StateProvince\r\n\u00a0\u00a0 from AdventureWorks.person.contact c --select your result\r\n\u00a0\u00a0\u00a0 inner join AdventureWorks.Sales.Individual\u00a0 I ON C.ContactID = I.ContactID\r\n\u00a0\u00a0\u00a0 inner join AdventureWorks.Sales.CustomerAddress AS CA ON CA.CustomerID = I.CustomerID\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 and AddressTypeID=2\r\n\u00a0\u00a0\u00a0 inner join\u00a0 AdventureWorks.Person.Address AS A ON A.AddressID = CA.AddressID\r\n\u00a0\u00a0\u00a0 inner join AdventureWorks.person.StateProvince SP ON A.StateProvinceID=SP.StateProvinceID\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 where c.contactID in\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (Select publication from --where you have at least one hit of each\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (select publication,WLU.Sequence\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 from CustomerWord CW\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 inner join @WordsToLookUp WLU on CW.item like WLU.item\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 inner join CustomerWordOccurence CWO on CW.item=CWO.item\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 group by publication,WLU.Sequence)AllWordMatches\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 group by publication having count(*)=@WordCount)\r\n\u00a0\u00a0 End \u00a0 \r\n<\/pre>\n<p>And we can try a few searches out to see what customers AdventureWorks has got.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">Execute CustomerAddress 'taylor palo alto'\r\nExecute CustomerAddress 'Andrew california'\r\nExecute CustomerAddress 'rain drop'\r\nExecute CustomerAddress '%x% Birmingham\u00a0 england'\r\nExecute CustomerAddress '%orge% british columbia'\r\n<\/pre>\n<p>Naturally, in real life, you would need to update the inversion tables every time you do a change to the database. The logic is simple, you just delete all entries in the mapping table that relate to the changed rows and just insert the updated mapping. I&#8217;ll not show you how to do that in AdventureWorks, though, as it is pretty routine stuff.<\/p>\n<h2>\u00a0<\/h2>\n<p>So here we have it. I&#8217;ve introduced a number of different techniques that can be used to do searches through data in SQL Server databases. With all the examples, the more useful they are, the more effort in writing them, or maintaining the inversion\/mapping tables. In the examples, I&#8217;ve tried to keep the code simple and, with the invertion indexes in particular, there are many ingenious optimizations and tricks you can add. I&#8217;ve seen versions, for example, that allow complex expressions in the query terms, and attempts to order by relevance. There are all manner of proximity tricks and wildcard techniques. I hope you&#8217;ll be moved to try some out!<\/p>\n<h3>See Also<\/h3>\n<ul>\n<li><a href=\"http:\/\/www.red-gate.com\/products\/sql-development\/sql-search\/\">SQL Search: Search for SQL (code, not Data) in your databases, for free<\/a><\/li>\n<li><a href=\"http:\/\/en.wikipedia.org\/wiki\/Lucene\">Apache Lucene Search<\/a><\/li>\n<li><a href=\"https:\/\/www.simple-talk.com\/sql\/learn-sql-server\/understanding-full-text-indexing-in-sql-server\/\">Understanding Full-Text Indexing in SQL Server<\/a> by Robert Sheldon<\/li>\n<li><a href=\"https:\/\/www.simple-talk.com\/sql\/learn-sql-server\/sql-server-full-text-search-language-features\/\">SQL Server Full Text Search Language Features<\/a> by Hilary Cotter<\/li>\n<li><a href=\"https:\/\/www.simple-talk.com\/sql\/learn-sql-server\/sql-server-full-text-search-language-features,-part-2\/\">SQL Server Full Text Search Language Features, Part 2<\/a> by Hilary Cotter<\/li>\n<li><a href=\"https:\/\/www.simple-talk.com\/sql\/database-administration\/full-text-searches-on-documents-in-filetables\/\">Full Text Searches on Documents in FileTables<\/a> by Feodor Geogiev<\/li>\n<li><a href=\"https:\/\/www.simple-talk.com\/sql\/learn-sql-server\/full-text-indexing-workbench\/\">Full-Text Indexing Workbench<\/a> by Robert Sheldon<\/li>\n<li><a href=\"https:\/\/www.simple-talk.com\/sql\/database-administration\/exploring-semantic-search-key-term-relevance\/\">Exploring Semantic Search Key Term Relevance<\/a> by Joe Sack<\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google. Besides the obvious Full-Text search, there are plenty of techniques for finding that pesky data that resists the normal SELECT blandishments. Phil Factor describes some alternative techniques.&hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4168,4150,4151,4252],"coauthors":[6813],"class_list":["post-1793","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-database","tag-sql","tag-sql-server","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1793"}],"version-history":[{"count":8,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1793\/revisions"}],"predecessor-version":[{"id":75327,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1793\/revisions\/75327"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1793"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1793"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}