{"id":73382,"date":"2011-09-06T12:49:43","date_gmt":"2011-09-06T12:49:43","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/uncategorized\/introduction-to-regular-expressions-in-oracle\/"},"modified":"2021-07-14T13:08:10","modified_gmt":"2021-07-14T13:08:10","slug":"introduction-to-regular-expressions-in-oracle","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/oracle-databases\/introduction-to-regular-expressions-in-oracle\/","title":{"rendered":"Introduction to Regular Expressions in Oracle"},"content":{"rendered":"<p>Regular expressions \u2013 RegEx or RegExp to their friends \u2013 were fashionably late in coming to the Oracle party. The seeds of what we know today as regular expressions were formulated in the period immediately following the Second World War in fields as diverse as formal language theory and neurophysiology; but it wasn\u2019t until the PL\/SQL Web toolkit arrived for 8i and 9i that they popped their heads over the Oracle parapet. By this time they were already rock stars in the worlds of Unix and Perl.<\/p>\n<p>But I\u2019m being rude here; I should introduce you. Regular expressions are a codified means to accomplish flexible pattern-matching in strings of text.<\/p>\n<p>And, if you\u2019re rereading that definition and thinking \u2018huh?!\u2019, that\u2019s a feeling you might want to get used to, because, while regular expressions are powerful and definitely useful, they can seem very much like hieroglyphics to the uninitiated. But don\u2019t worry, I\u2019ll hold your hand.<\/p>\n<p>String pattern matching has always been possible in Oracle sql and pl\/sql, of course. With the LIKE condition and the simple metacharacters of \u201c%\u201d and \u201c_\u201d, character patterns could be described and integrated into queries. However, the moment you required a little complexity these basic wildcards cowered in fear. An example: my name is David, but friends tend to call me Dave when they have a beer or two in them. To match either spelling of my name, not using regular expressions, you would have to query the table for \u2018Dav%\u2019. But then your results might be flooded with Davidsons and Davises and Davinas. And I\u2019m definitely not pretty enough to be called Davina!<\/p>\n<p>Even more impossible in a regular-expression-less world is something that sounds like the simplest of tasks \u2013 coming up with a pattern to match an email address.<br \/>\nOracle\u2019s implementation of the regular expression functionality is wrapped up in the following functions: REGEXP_LIKE, REGEXP_INSTR, REGEXP_SUBSTR, REGEXP_REPLACE and REGEXP_COUNT, and they\u2019re like the superhero alter egos of the more mundane LIKE, INSTR, SUBSTR, REPLACE and COUNT functions.<\/p>\n<pre><table border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\r\n<tbody>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">REGEXP_LIKE<\/td>\r\n<td valign=\"top\" width=\"477\">REGEXP_LIKE(&lt;source_string&gt;,&lt;pattern&gt;,\r\n&lt;match_pattern&gt;)<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">REGEXP_INSTR<\/td>\r\n<td valign=\"top\" width=\"477\">REGEXP_INSTR(&lt;source_string&gt;, &lt;pattern&gt;[[,\r\n&lt;start_position&gt;][, &lt;occurrence&gt;][, &lt;return_option&gt;][,\r\n&lt;match_parameter&gt;][, &lt;sub_expression&gt;]])<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">REGEXP_SUBSTR<\/td>\r\n<td valign=\"top\" width=\"477\">REGEXP_SUBSTR(source_string, pattern\r\n[, position [, occurrence\r\n[, match_parameter]]])<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">REGEXP_REPLACE<\/td>\r\n<td valign=\"top\" width=\"477\">REGEXP_REPLACE(&lt;source_string&gt;, &lt;pattern&gt;,\r\n&lt;replace_string&gt;, &lt;position&gt;, &lt;occurrence&gt;,\r\n&lt;match_parameter&gt;)<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">REGEXP_COUNT<\/td>\r\n<td valign=\"top\" width=\"477\">REGEXP_COUNT(&lt;source_string&gt;, &lt;pattern&gt;[[,\r\n&lt;start_position&gt;], [&lt;match_parameter&gt;]])<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table><\/pre>\n<p>[The possible match parameters are as follows: \u2018c\u2019, which stands for \u2018case sensitive\u2019; \u2018i\u2019, which stands for case insensitive; \u2018m\u2019, which means treat the source string as multiple lines; \u2018n\u2019 which means allow a period (.) to match new line; and \u2018x\u2019, which means ignore whitespace.]<br \/>\nThe true power of regular expressions, however, lies not in these functions but in the amazing versatility of the available metacharacters.<\/p>\n<pre><table border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\r\n<tbody>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">.<\/td>\r\n<td valign=\"top\" width=\"477\">Matches any single character<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">x?<\/td>\r\n<td valign=\"top\" width=\"477\">Matches \u2018x\u2019 0 or 1 times<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">x*<\/td>\r\n<td valign=\"top\" width=\"477\">Matches \u2018x\u2019 0 or more times<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">x+<\/td>\r\n<td valign=\"top\" width=\"477\">Matches \u2018x\u2019 1 or more times<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">x|y<\/td>\r\n<td valign=\"top\" width=\"477\">Matches either \u2018x\u2019 or \u2018y\u2019<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">x{n}<\/td>\r\n<td valign=\"top\" width=\"477\">Matches \u2018x\u2019 exactly <em>n<\/em>\r\ntimes.<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">x{n,m}<\/td>\r\n<td valign=\"top\" width=\"477\">Matches \u2018x\u2019 at least <em>n<\/em>\r\ntimes, but no more than <em>m times<\/em><\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">x{n,}<\/td>\r\n<td valign=\"top\" width=\"477\">Matches \u2018x\u2019 at least <em>n<\/em>\r\ntimes<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">[xyz]<\/td>\r\n<td valign=\"top\" width=\"477\">Matches \u2018x\u2019,\u2019y\u2019 or \u2018z\u2019<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">[^xyz]<\/td>\r\n<td valign=\"top\" width=\"477\">Matches any single character other than \u2018x\u2019,\u2019y\u2019 and \u2018z\u2019<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">(xyz)<\/td>\r\n<td valign=\"top\" width=\"477\">Takes \u2018xyz\u2019 as a group, rather than individual characters<\/td>\r\n<\/tr>\r\n<tr>\r\n<td valign=\"top\" width=\"140\">a-z<\/td>\r\n<td valign=\"top\" width=\"477\">Matches any character from \u2018a\u2019 to \u2018z\u2019<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table><\/pre>\n<p>The list is much longer \u2013 this introductory article will not teach you all you need to know about regular expressions \u2013 and, as with every new skill, adeptness will only come with practice, through encountering real-world problems and surmounting them. Practically every string pattern can be matched using regular expressions.<br \/>\nAn example: imagine you\u2019re looking for a drink. However, you want to return every  complexion of the word: drink, drank, drunk, drinking and drunkard.  A regular expression is perfect for this, using the rules above:<\/p>\n<pre>SELECT col1\r\nFROM table1\r\nWHERE REGEXP_LIKE (col1,  \u2018dr(ink(ing)?|(unk)|(ank)|(unkard))\u2019,\u2019i\u2019);<\/pre>\n<p>The simple pattern above uses the following rules: it uses numerous pairs of parentheses to group characters, the \u2018?\u2019 to indicate that the grouping immediately before it \u2013 \u2018(ing)\u2019 \u2013 may appear 0 or 1 times, and the \u2018|\u2019 to indicate that the pattern should match any of the groupings.  And the match parameter \u2018i\u2019 indicates that we want our pattern to be case insensitive.<\/p>\n<p>However, before you run off and re-engineer your whole database and every application you have ever built, peppering them with regular expressions, it is worth mentioning that they are not the solution to every problem and they do have their drawbacks. <\/p>\n<p>Firstly, there is their complexity. Believe me, no one is completely fluent in regular expressions; they will always be harder to read than simpler matching functions. So if LIKE, INSTR, SUBSTR and REPLACE meet your needs, you should consider sticking with them. <\/p>\n<p>Also, be aware that like the functions they replace, Oracle\u2019s regular expression functions can result in full-table scans that lead to horribly inefficient queries. Ordinary indexes cannot handle their awesomeness. Fortunately, it is possible to use regular expressions in function-based indexes, speeding up queries that use regular expressions \u2013 and even those that do not \u2013 considerably.<\/p>\n<p>Oh, and by the way, the following regular expression will match those email addresses for you. <\/p>\n<pre>REGEXP_LIKE (email_address, '[a-zA-Z0-9._%-]+@[a-zA-Z0-9._%-]+\\.[a-zA-Z]{2,4}');<\/pre>\n<p>And, how rude of me, I don\u2019t think I properly introduced myself. My name is:<\/p>\n<pre>REGEXP_LIKE(first_name, 'Dav(e|id)');<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Regular expressions \u2013 RegEx or RegExp to their friends \u2013 were fashionably late in coming to the Oracle party. The seeds of what we know today as regular expressions were formulated in the period immediately following the Second World War in fields as diverse as formal language theory and neurophysiology; but it wasn\u2019t until the PL\/SQL Web toolkit arrived for&hellip;<\/p>\n","protected":false},"author":221907,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143533],"tags":[],"coauthors":[],"class_list":["post-73382","post","type-post","status-publish","format-standard","hentry","category-oracle-databases"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73382","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221907"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=73382"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73382\/revisions"}],"predecessor-version":[{"id":91813,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/73382\/revisions\/91813"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=73382"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=73382"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=73382"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=73382"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}