{"id":82012,"date":"2007-05-20T18:16:59","date_gmt":"2007-05-20T18:16:59","guid":{"rendered":"https:\/\/www.webstaging.red-gate.com\/simple-talk\/?p=73149"},"modified":"2018-12-12T13:42:58","modified_gmt":"2018-12-12T13:42:58","slug":"sorting-and-case-sensitive-collations","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/sorting-and-case-sensitive-collations\/","title":{"rendered":"Sorting and case sensitive collations"},"content":{"rendered":"<\/p>\n<p><em>This is an addition to my book, on page 255, to the end of the section on collation (before the statement that I won&#8217;t delve any deeper into collations, of course :)), I want to add the following bit of information about collations and how they affect sorting and searching.&nbsp; <\/em> <\/p>\n<p><em>You can download the entire addendum with source code from the: <br \/><\/em><a title=\"Pro SQL Server 2005 Database Design and Optimization Homepage\" href=\"http:\/\/drsql.org\/ProSQLServerDatabaseDesign.aspx\"><em>Pro SQL Server 2005 Database Design and Optimization Homepage<\/em><\/a>  <\/p>\n<p><strong>Sort and Searching<\/strong>  <\/p>\n<p>One of the more confusing parts of collation is how they affect the ordering of data&nbsp;when sorted and how that can affect other operations like the LIKE operator.&nbsp; When it comes to case insensitive collations, it is pretty clear, data is sorted as if &#8216;A&#8217; and &#8216;a&#8217;,&nbsp;are the same character.&nbsp; The other common situation is accent insensitivity where &#8216;a&#8217; and &#8216;\u1ea5&#8217; appear as the same character.&nbsp; When you are using a case and accent insensitive collation there will be no guarantee when sorting data that either of these characters would come first in the list.&nbsp; The great part about this is that when you search for where like &#8216;%A%&#8217;, you know that you will get back: aardvark, Abel, Pen\u1ea5 with no worries.&nbsp;  <\/p>\n<p>In some situations, this is not desirable to have this be the case, and you set up a column, table, or database to be case sensitive and accent sensitive.&nbsp; This is where sometimes you can get confused when using between, like, or other range type queries on characters.  <\/p>\n<p>As an example, let me create the following table and seed it with some characters:  <\/p>\n<p>create table dbo.TestSorting<br \/>(<br \/>&nbsp;&nbsp;&nbsp;&nbsp; value nvarchar(1) collate Latin1_General_CI_AI <br \/>) <br \/>insert into dbo.TestSorting<br \/>select &#8216;A&#8217;<br \/>union all<br \/>select &#8216;a&#8217;<br \/>union all<br \/>select nchar(256) &#8211;\u0100<br \/>union all <br \/>select &#8216;b&#8217;<br \/>union all<br \/>select &#8216;B&#8217;<\/p>\n<p>Doing a normal case insensitive search using the base collation:<\/p>\n<p>select value<br \/>from dbo.TestSorting<br \/>where value like &#8216;[A-Z]%&#8217; <\/p>\n<p>All rows are returned, as expected:  <\/p>\n<p>value<br \/>&#8212;&#8211;<br \/>A<br \/>a<br \/>\u0100<br \/>b<br \/>B  <\/p>\n<p>The problem comes in when you want to do a case sensitive search.&nbsp; You choose a case sensitive collation, and either use it in the WHERE clause or in your table declaration, and:  <\/p>\n<p>select value<br \/>from dbo.TestSorting<br \/>where value like &#8216;[A-Z]%&#8217; collate Latin1_General_CS_AI  <\/p>\n<p>This returns what seems a confusing set:  <\/p>\n<p>value<br \/>&#8212;&#8211;<br \/>A<br \/>\u0100<br \/>b<br \/>B  <\/p>\n<p>Everyone who does this the first time thinks &#8220;Why?&#8221; And I did too (in fact it was why I wrote this section of the book!)&nbsp; The key is to kook at how the values sort in this collation:  <\/p>\n<p>select value<br \/>from dbo.TestSorting<br \/>order by value collate Latin1_General_CS_AI  <\/p>\n<p>value<br \/>&#8212;&#8211;<br \/>a<br \/>\u0100<br \/>A<br \/>b<br \/>B  <\/p>\n<p>Case sensitive sorts things aAbBcC, not, as most of us&nbsp;programmers expect intuitively: ABCabc. That is actually how a binary collation works, since it sorts based on the ASCII or Unicode value of the character.&nbsp; Using a binary collation would work, but then you lose the ability to include the special characters in your search which is given to you by using accent insensitivity.  <\/p>\n<p>Instead, you can simply use a LIKE such as this:  <\/p>\n<p>select value<br \/>from dbo.TestSorting<br \/>where value like &#8216;[ABCDEFGHIJKLMNOPQRSTUVWXYZ]%&#8217; collate Latin1_General_CS_AI  <\/p>\n<p>This returns only the capital letters including the accented capital letter I included.  <\/p>\n<p>value<br \/>&#8212;&#8211;<br \/>A<br \/>\u0100<br \/>B  <\/p>\n<p>It is a good idea to take a look at the sorting of data when you choose a collation to make sure it is clear to you how data will be sorted.  <\/p>\n<p>First in the data you have in your table, like this, if you want to see how it will sort in a binary collation:  <\/p>\n<p>select value<br \/>from dbo.TestSorting<br \/>order by value collate Latin1_General_bin  <\/p>\n<p>value<br \/>&#8212;&#8211;<br \/>A<br \/>B<br \/>a<br \/>b<br \/>\u0100<\/p>\n<p>And then I have a query I use for this purpose that I can use to generate the characters I am interested and see how they sort <\/p>\n<p>;with digits (i) as( select 1 as i union all select 2 as i union all select 3 union all<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; select 4 union all select 5 union all select 6 union all select 7 union all <br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; select 8 union all select 9 union all select 0)<br \/>,sequence (i) as (<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; SELECT D1.i + (10*D2.i) + (100*D3.i) + (1000*D4.i) + (10000*D5.i) <br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3&nbsp;<br \/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CROSS JOIN digits AS D4 CROSS JOIN digits as D5<br \/>)<br \/>select i, nchar(i) as character<br \/>from sequence<br \/>where i between 48 and 122 &#8211;vary to include any characters in the character set of choice <br \/>order by nchar(i) collate Latin1_General_bin  <\/p>\n<p>Crossposted to <a href=\"http:\/\/drsql.spaces.live.com\/\"><a href=\"http:\/\/drsql.spaces.live.com\">http:\/\/drsql.spaces.live.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is an addition to my book, on page 255, to the end of the section on collation (before the statement that I won&#8217;t delve any deeper into collations, of course :)), I want to add the following bit of information about collations and how they affect sorting and searching.&nbsp; You can download the entire&#8230;&hellip;<\/p>\n","protected":false},"author":56085,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[],"coauthors":[19684],"class_list":["post-82012","post","type-post","status-publish","format-standard","hentry","category-blogs"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/82012","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/56085"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=82012"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/82012\/revisions"}],"predecessor-version":[{"id":82388,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/82012\/revisions\/82388"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=82012"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=82012"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=82012"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=82012"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}