SQL Server 2008: New languages and older operating systems

SQL Server 2008 introduces a few new languages like Welsh, Tibetan and Norwegian. While these new languages map to the codepages in Windows Server 2008, in earlier operating systems (like XP) this is not so, and simple queries like “select * from mytable” can break with the SQL error 4078.

The problem occurs if you use a collation that uses one of the new languages in a char, varchar or text column. In queries that return such a non-unicode column SQL Server relies on the client to use the appropriate codepage when displaying or converting the returned characters to Unicode. If the codepage does not exist on the client side, the query will fail. For example, in the following we declare a table variable with a varchar column that uses the new FRISIAN_100_CS_AS collation.

DECLARE @t TABLE (

      id INT PRIMARY KEY IDENTITY(1, 1),

      data VARCHAR(100) COLLATE FRISIAN _100_CS_AS)

                                                              

INSERT  INTO @t VALUES  ( N’aaa’ )

SELECT data FROM @t

The above statement will work fine on the computer on which SQL Server 2008 is installed. If you execute it in Query Analyzer on a computer with Windows XP, then you will get the following error:

Server: Msg 4078, Level 16, State 1, Line 13

The statement failed because column ‘data’ (ID=0) uses collation Welsh_100_CS_AS, which is not recognized by older client drivers. Try upgrading the client operating system or applying a service update to the database client software, or use a different collation. See SQL Server Books Online for more information on changing collations.

(If it was run from a .Net application, the SqlException exception message is: “The Collation specified by SQL Server is not supported.”)

(I use Query Analyzer only because with CTP6 of SQL Server 2008 you cannot use Management Studio 2005. However, you can use osql, sqlcmd, Query Analyzer, your own applications, etc.)

The same problem occurs with .Net applications. Whether an application that executes the above query will actually work will depend on what operating system it is executed on.

A reasonably simple workaround is not to use select *, and when querying columns that are based on the char, varchar, or text, cast them to nchar, nvarchar or ntext. In this case the data will be returned in Unicode (actually UCS-2), and the query will work on pre Windows Server 2008 systems. In a .Net application you would also need to collate the result to a known collation, like latin1_general.

The new languages in SQL Server 2008 that seem to be affected are:

Chinese_Traditional_Stroke_Count, Chinese_Traditional_Bopomofo, Chinese_Simplified_Pinyin, Chinese_Simplified_Stroke_Order, Chinese_Traditional_Pinyin, Chinese_Traditional_Stroke_Order, Danish_Greenlandic, Japanese_XJIS, Japanese_Bushu_Kakusu, Norwegian, Romansh, Serbian_Latin, Serbian_Cyrillic, Bosnian_Latin, Bosnian_Cyrillic, Urdu, Persian, Mapudungan, Upper_Sorbian, Bashkir, Maltese, Sami_Norway, Sami_Sweden_Finland, Turkmen, Bengali, Assamese, Pashto, Tibetan, Welsh, Khmer, Lao, Frisian, Tamazight, Nepali, Azeri_Latin, Azeri_Cyrillic