{"id":77243,"date":"2018-02-12T15:35:44","date_gmt":"2018-02-12T15:35:44","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=77243"},"modified":"2021-07-29T19:44:16","modified_gmt":"2021-07-29T19:44:16","slug":"url-matching-c","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/development\/dotnet-development\/url-matching-c\/","title":{"rendered":"URL Matching in C#"},"content":{"rendered":"<p>Ah, URLs. The Unified Resource Locator (URL) is ubiquitous in enterprise software. It doesn\u2019t matter whether it\u2019s a desktop, a web application, or a backend service, URLs have the unique ability to catch you off guard when you least expect it.<\/p>\n<p>One can lean on the ASP.NET framework for URL routing which provides its own way of matching URLs to action methods. But alas, as is often the case, a full feature framework might not get you where you need to be. URL routing has a powerful way to invoke action methods inside MVC controllers but doesn\u2019t help with URL matching.<\/p>\n<p>If you\u2019ve worked with URLs before and found it hard, then you\u2019re doing it right. If it was easy, then this write up is for you. There are many traps hidden inside these URLs. A URL appears harmless on the surface but, when you look closer, it can be perilous.<\/p>\n<p>In this take, I\u2019d like to give you a deep dive into working with URLs in plain C#. In IT, there may come a time when you have this URL from a config and must match it with another one. The URL can come from the web request that you need to intercept through middleware with a match. I\u2019ll stick to real examples I\u2019ve come across in my programming adventures in the enterprise.<\/p>\n<p>To start, let\u2019s define what the internals of a URL looks like:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"462\" height=\"95\" class=\"wp-image-77244\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-11.png\" \/><\/p>\n<p>For our purposes, we care about the scheme, authority, path, query, and fragment. You can think of the scheme as the protocol, i.e., HTTP or HTTPS. The authority is the root or domain, for example, mycompany.com. The path, query, and fragment make up the rest of the URL. The URL spec defines each segment in this specific order. For example, the scheme always comes before the authority. The path comes after the scheme and authority. The query and fragment come after the path if there is one in the URL.<\/p>\n<p>With the textbook definition in place, it\u2019s time for string matching URLs. I\u2019ll stick to the terminology from the figure so it is crystal clear for you.<\/p>\n<h2>String URL Match<\/h2>\n<p>Given a URL, it is somewhat reasonable to do a string comparison:<\/p>\n<pre class=\"lang:c# theme:vs2012\">  const string orig = \"https:\/\/example.com\/abc\";\r\n  const string dest = \"https:\/\/example.com\/abc\";\r\n  var isEqual = orig.Equals(dest);\r\n  Assert.True(isEqual);<\/pre>\n<p>All code samples use <strong>xUnit<\/strong> assertions to prove out matching concepts. Note the <strong>String.Equals<\/strong> comparison to get a string match with a URL.<\/p>\n<p>One thing to look out for is that URLs are case-insensitive in the spec. This means <strong>https:\/\/example.com<\/strong> matches <strong>HTTPS:\/\/example.com<\/strong>. A na\u00efve string comparison with an equals method does not account for this.<\/p>\n<p>To make this more robust, add case-insensitivity to the comparison:<\/p>\n<pre class=\"lang:c# theme:vs2012\">  const string orig = \"HTTPS:\/\/example.com\/Abc\";\r\n  const string dest = \"https:\/\/example.com\/abc\";\r\n  var isEqual = orig.Equals(dest, StringComparison.OrdinalIgnoreCase);\r\n  Assert.True(isEqual);<\/pre>\n<p>The <strong>StringComparison.OrdinalIgnoreCase<\/strong> enumerator will do a byte match for each character while ignoring casing. This works well for URLs which are made up of ASCII characters. Note that there is an extra parameter to the overloaded the <strong>String.Equals<\/strong> method. With this much effort necessary to add robustness, it should appear often in your code.<\/p>\n<p>Another interesting aspect is that the path of the URL can end with a forward slash. For example, <strong>\/abc\/<\/strong> also matches <strong>\/abc<\/strong> without a trailing forward slash. The config or app providing the URL can go either way and you must account for this.<\/p>\n<p>Using string manipulation, we can trim the ends then do a match:<\/p>\n<pre class=\"lang:c# theme:vs2012\">  const string orig = \"https:\/\/example.com\/abc\/\";\r\n  const string dest = \"https:\/\/example.com\/abc\";\r\n  var trimmedOrig = orig.TrimEnd('\/');\r\n  var trimmedDest = dest.TrimEnd('\/');\r\n  var isEqual = trimmedOrig.Equals(trimmedDest, StringComparison.OrdinalIgnoreCase);\r\n  Assert.True(isEqual);<\/pre>\n<p>This accounts for many mishaps with string comparisons. You will start to notice you need a good amount of trimming around URLs. When engaging in this line of work, it is best to stay alert and practice defensive coding. It\u2019s difficult to imagine the many radical new ways folks can type in a simple URL. Human beings are not like computers and may find innovative ways to muck up URLs.<\/p>\n<p>C# uses the .NET framework behind the scenes and provides a list of methods that can aid with URL matching. The <strong>System.String<\/strong> type, for example, has many extension methods available. It\u2019s like having a full array of tools at your fingertips, time to examine which methods are most useful.<\/p>\n<p>Let\u2019s say we want to match the scheme to make sure it\u2019s HTTPS:<\/p>\n<pre class=\"lang:c# theme:vs2012\">  const string orig = \"https:\/\/example.com\/abc\/\";\r\n  const string dest = \"https:\/\/\";\r\n  var isEqual = orig.StartsWith(dest, StringComparison.OrdinalIgnoreCase);\r\n  Assert.True(isEqual);<\/pre>\n<p>Note that it is safe to assume the scheme comes first according to the spec. The <strong>String.StartsWith<\/strong> method has a sibling method that can match the end of the string. This is useful for doing a match on the path of the URL. This is assuming your URLs always end with the path only.<\/p>\n<p>So, for example:<\/p>\n<pre class=\"lang:c# theme:vs2012\">  const string orig = \"https:\/\/example.com\/abc\/\";\r\n  const string dest = \"\/abc\";\r\n  var trimmedOrig = orig.TrimEnd('\/');\r\n  var isEqual = trimmedOrig.EndsWith(dest, StringComparison.OrdinalIgnoreCase);\r\n  Assert.True(isEqual);<\/pre>\n<p>One can be clever with string matching in C#. Your string comparisons have an arsenal of methods at your disposal, so you can be as effective as possible. Let\u2019s say, for example, I want to know if a given URL even has a query. The spec defines this as <strong>?key=value<\/strong>. Note that the question mark is a unique character. This question mark character is in the URL spec and does not belong elsewhere.<\/p>\n<p>So, for example:<\/p>\n<pre class=\"lang:c# theme:vs2012\">  const string orig = \"https:\/\/example.com\/abc?key=value#fragid\";\r\n  var hasQueryString = orig.Contains(\"?\");\r\n  Assert.True(hasQueryString);<\/pre>\n<p>If you can make safe assumptions about your URLs, like in the example above. Feel free to exploit these assumptions to your advantage with string comparison methods. All you need is to know is which method to use and a little imagination.<\/p>\n<h2>LINQ URL Match<\/h2>\n<p>With URLs coming from a config or any data source, what you might get back is a list. With the .NET framework, you can use LINQ to iterate through URL lists and then do a match. Imagine there is a list of URLs that must match a target URL. All I want to know is whether the URL exists within the list.<\/p>\n<p>Say, for example:<\/p>\n<pre class=\"lang:c# theme:vs2012\">  var mockUrls = new []\r\n  {\r\n    \"https:\/\/example.com\/abc\",\r\n    \"https:\/\/target.com\/abc\/\"\r\n  };\r\n  const string url = \"https:\/\/target.com\/abc\";\r\n  var hasUrl = mockUrls.Any(\r\n    u =&gt; u.TrimEnd('\/').Equals(url, StringComparison.OrdinalIgnoreCase));\r\n  Assert.True(hasUrl);<\/pre>\n<p>The <strong>IEnumerable.Any<\/strong> method allows you to match a list with a URL. Note the use of a lambda expressions to further refine the match. This becomes quintessential when you need to trim and ignore case sensitivity. At the end, this lambda expression expects a true or false which comes from the equal string comparison. If any items on the list return true then the entire method returns true.<\/p>\n<p>For example, let\u2019s say you have a list of paths that belong to the URL that needs a match. What you need is to combine the paths to the whole URL, then do a match. The string type has a <strong>String.Join<\/strong> method you can use to do the job. This join method takes in a list you can further refine using LINQ.<\/p>\n<p>So, for example:<\/p>\n<pre class=\"lang:c# theme:vs2012\">  var paths = new []\r\n  {\r\n    \"\/abc\",\r\n    \"\/123\/\"\r\n  };\r\n  const string url = \"https:\/\/example.com\/\";\r\n  var combinedUrl = url.TrimEnd('\/') + \"\/\" + string.Join(\r\n    \"\/\", paths.Where(p =&gt; !string.IsNullOrWhiteSpace(p)).Select(p =&gt; p.Trim('\/')));\r\n  Assert.Equal(\"https:\/\/example.com\/abc\/123\", combinedUrl);<\/pre>\n<p>I am purposely being naughty with the list of paths. One path has a trailing slash while the other does not. The goal here is to illustrate what kind of assumptions you can and cannot make with URLs. The way you write URL matching can have a life of its own depending on the assumptions.<\/p>\n<p>Note the <strong>IEnumerable.Where<\/strong> method to filter out empty paths. LINQ has many more methods available you can use for URL matching. What I find is that I tend to use both <strong>IEnumerable.Any<\/strong> and <strong>IEnumerable.Select()<\/strong> often. These extension methods are part of the <strong>IEnumerable<\/strong> interface in C#. This means it can support a wide array of list types including an array of integers.<\/p>\n<p>LINQ gets enabled on a list when you add <strong>System.Linq<\/strong> to the using statements. Inside Visual Studio, these extension methods don\u2019t show up in IntelliSense until you do so. Feel free to explore this namespace if you need more ideas when working with URLs.<\/p>\n<p>What you will find in .NET is that each type may have methods that come with it. The string type, for example, has a list of methods through the <strong>System<\/strong> namespace. So far, you can see how these methods are useful to you. It is like having a toolbelt with a whole array of functionality available.<\/p>\n<h2>URI Match<\/h2>\n<p>The .NET framework has a type to encapsulate URLs if necessary. There is a <strong>System.Uri<\/strong> type that can parse any valid URL. The string and LINQ methods I have explained so far do not parse but only provide URL matching. The <strong>Uri<\/strong> type has a list of methods and properties you can use to break a URL apart for further analysis.<\/p>\n<p>Let\u2019s say you have a URL with a scheme, authority, path, query, and fragment. Attempting to match against each piece requires good Regex skills. The good news is that a <strong>Uri<\/strong> type can do matches in an object-oriented fashion. This OOP (object oriented programming) approach can help keep the code nice and tidy.<\/p>\n<p>One gotcha is that the <strong>Query<\/strong> property returns a string type, not a dictionary object. This will require that you parse out the string into a key-value pair. When you are working with the query inside a URL, you often need it as a dictionary to do lookups.<\/p>\n<p>So, for example:<\/p>\n<pre class=\"lang:c# theme:vs2012\">  \r\nvar uri = new Uri(\"https:\/\/example.com\/abc\/123?key1=value&amp;key2#fragid\");\r\n\r\nvar scheme = uri.GetLeftPart(UriPartial.Scheme);\r\nvar path = uri.GetLeftPart(UriPartial.Path);\r\nvar fragment = uri.Fragment.TrimStart('#');\r\n\r\nvar splitQuery = uri.Query.TrimStart('?').Split('&amp;');\r\nvar queryString = new Dictionary&lt;string, string&gt;();\r\n\r\nforeach (var item in splitQuery)\r\n{\r\n  var splitItem = item.Split('=');\r\n  var itemKey = splitItem[0];\r\n  var itemValue = splitItem.Length &gt; 1 ? splitItem[1] : string.Empty;\r\n\r\n  if (!queryString.ContainsKey(itemKey))\r\n  {\r\n    queryString.Add(itemKey, itemValue);\r\n  }\r\n}\r\n\r\nAssert.Equal(\"https:\/\/\", scheme);\r\nAssert.Equal(\"https:\/\/example.com\/abc\/123\", path);\r\nAssert.Equal(\"fragid\", fragment);\r\nAssert.Equal(\"value\", queryString[\"key1\"]);\r\n<\/pre>\n<p><span style=\"color: #000000;\">You can get the schema and path through the <strong>Uri.GetLeftPart<\/strong> method. Note the use of the <strong>System.UriPartial<\/strong> enumerable to get each segment of the URL. The Fragment property has the fragment of the URL.<\/span><\/p>\n<p><span style=\"color: #000000;\">For the Query, note that<strong> ?key1=value&amp;key2<\/strong> is a valid query string because the spec is lenient. The <strong>String.Split<\/strong> method gives me back an array I can turn into a dictionary object. For duplicate keys, I use a <strong>Dictionary.ContainsKey<\/strong> first then a <strong>Dictionary.Add<\/strong> if it\u2019s not in the dictionary. This is a defensive way of dealing with potential typos from a bad config, for example. For those in .NET Core 2.0+, there is a shiny new <strong>Dictionary.TryAdd<\/strong> that has <\/span><a href=\"https:\/\/github.com\/dotnet\/corefx\/issues\/1942\">this same logic as part of the method<\/a><span style=\"color: #000000;\">. Each <strong>itemValue<\/strong> can come from the Query or get a default value of <strong>string.Empty<\/strong>. Empty keys in the Query are still plausible. The asserts prove out that code above works as expected.<\/span><\/p>\n<p>One gotcha comes from the scheme segment. Note that it returns the colon and backslashes as part of the scheme itself. If the goal is to match it against HTTPS, for example, it might be wise to match it with a <strong>String.StartsWith<\/strong> and ignore casing.<\/p>\n<p>This covers just about everything you will encounter when matching URLs with a <strong>Uri<\/strong> type. I hope you can see it is far from trivial. One nice advantage is you get the <strong>Uri<\/strong> type through the <strong>System<\/strong> namespace. This namespace often appears inside many using statements in C#.<\/p>\n<h2>Conclusion<\/h2>\n<p>The .NET framework comes with a set of namespaces useful for working with URLs. So far, you have seen the <strong>System<\/strong> and <strong>System.Linq<\/strong> namespaces at work. In C#, there are two types of primary concerns which are <strong>System.String<\/strong> and <strong>System.Uri<\/strong>. These two types have many methods which are useful to you. For the <strong>System.String<\/strong> type, keep an eye on <strong>String.StartsWith<\/strong> and <strong>String.Equal<\/strong> with case insensitivity. For working with a list of URLs, use any list type that implements the <strong>IEnumerable<\/strong> interface. The <strong>System.Linq<\/strong> namespace will enable a set of extensions methods for your favorite type.\u00a0<span style=\"color: #000000;\">To parse the Query into a dictionary type use the <strong>System.Collections.Generic<\/strong> namespace.<\/span><\/p>\n<p>All these namespaces have been available in .NET since 3.5 and are part of the .NET Standard library. This means this code is guaranteed to work with many implementations of the .NET framework which include .NET Core. Microsoft is pushing for a standards-based approach and these namespaces are part of it. It is nice to have working code that has a commitment and supports a standard.<\/p>\n<p>Because we are talking about the .NET framework and not only niche features in C#, these same namespaces and object-oriented types are available in PowerShell if you have a language version that supports .NET version 3.5 at a minimum. This means you can go all the way back to PowerShell 3.0.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Comparing URLs in C# code is a common task and seems simple. Camilo Reyes shows us that there are many pitfalls to avoid since people can come up with several ways to type the same URL. He then demonstrates how to solve several URL comparison problems.&hellip;<\/p>\n","protected":false},"author":274017,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143538],"tags":[95509],"coauthors":[41241],"class_list":["post-77243","post","type-post","status-publish","format-standard","hentry","category-dotnet-development","tag-standardize"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77243","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/274017"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=77243"}],"version-history":[{"count":8,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77243\/revisions"}],"predecessor-version":[{"id":78171,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77243\/revisions\/78171"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=77243"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=77243"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=77243"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=77243"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}