{"id":105570,"date":"2025-03-17T16:51:26","date_gmt":"2025-03-17T16:51:26","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=105570"},"modified":"2025-03-17T16:51:28","modified_gmt":"2025-03-17T16:51:28","slug":"mysql-vs-postgresql-character-sets-and-collations","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/mysql-vs-postgresql-character-sets-and-collations\/","title":{"rendered":"MySQL vs. PostgreSQL: Character Sets and Collations"},"content":{"rendered":"\n<p><strong>This is part of a series of posts from Aisha Bukar comparing MySQL and PostgreSQL. You can see the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/collections\/aisha-bukar-series-comparing-mysql-and-postgresql\/\">entire series here<\/a>.<\/strong><\/p>\n\n\n\n\n<p>When working with strings in a database system, you need to ensure that the data is stored, sorted, and compared accurately. That\u2019s where character sets and collations come in. Character sets determine how text is saved while collations define how the database sorts and compares text.<\/p>\n\n\n\n<p>For example, if you\u2019re storing text in different languages (like English, Chinese, or Arabic), choosing the right character set ensures the text is saved properly. Picking the right collation ensures sorting is correct\u2014like ordering names alphabetically while ignoring case differences or accents.<\/p>\n\n\n\n<p>Usually, the default that is chosen at installation is all that is worried about, setting the values to whatever the data for a given region where you are storing data. However, it is important to understand what your initial choices mean, and how to handle any special cases. If you don\u2019t set your values correctly, your database might store or sort text incorrectly, leading to errors, weird behavior, or frustrated users.<\/p>\n\n\n\n<p>&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"574\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/03\/image-1-1024x574.png\" alt=\"\" class=\"wp-image-105610\" style=\"width:445px;height:auto\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/03\/image-1-1024x574.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/03\/image-1-300x168.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/03\/image-1-768x431.png 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/03\/image-1.png 1355w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-are-character-sets-and-collations\"><a id=\"post-105570-_t1g9n34grsws\"><\/a>What are Character Sets and Collations?<\/h2>\n\n\n\n<p><strong>Character Sets<\/strong>: A character set or <code>CHARSET<\/code> defines how a string is stored in a database. These characters can be symbols, letters, or numbers.<\/p>\n\n\n\n<p>Imagine you have a database for storing users&#8217; names from different countries, and you want to ensure that names in any language (English, Chinese, Russian, etc.) can be saved correctly. This can be done using a character set to display these characters correctly.<\/p>\n\n\n\n<p>MySQL and PostgreSQL both support multiple character sets but there are of course some key differences to explore in both databases, so stick around!<\/p>\n\n\n\n<p><strong>Collations<\/strong>: <a id=\"post-105570-_slq5pv4lehgn\"><\/a>A collation is a set of rules for comparing and sorting text in a specific character set. Collations help ensure your database sorts text correctly based on your needs.<\/p>\n\n\n\n<p>For example: Does &#8220;Apple&#8221; come before &#8220;Banana&#8221;? Or is &#8220;\u00e9&#8221; treated the same as &#8220;e&#8221;? In some collations, &#8220;\u00e9&#8221; and &#8220;e&#8221; might be considered equal, even though they look different. Some collations can also be case insensitive where &#8220;Apple&#8221; = &#8220;apple&#8221; as well as \u201cApple\u201d = \u201caPPl\u00e9\u201d.<\/p>\n\n\n\n<p>Every character set has at least one collation, and some have multiple options depending on the use case. Both MySQL and PostgreSQL support various collations, but they handle them in slightly different ways. Let\u2019s dive into how this works!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-character-sets\">Character Sets<\/h2>\n\n\n\n<p>In this section I will compare how character sets are implemented in PostgreSQL and MySQL.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-character-sets-in-mysql\"><a id=\"post-105570-_1nh5i8uazll5\"><\/a>Character Sets in MySQL<\/h3>\n\n\n\n<p>Character sets (or <code>CHARSET<\/code>) define how sets of characters can be stored in a database.<\/p>\n\n\n\n<p>MySQL uses a default character set for storing and handling text. Right now, that default is utf8mb4, which is great because it supports all kinds of characters, including emojis and symbols from many languages. Think of utf8mb4 as the &#8220;universal&#8221; option for storing text.<\/p>\n\n\n\n<p>If you want to check the available character sets in your MySQL schema, you can do this by running the following command:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SHOW CHARACTER SET;<\/pre>\n\n\n\n<p>This returns all the available character set in your MySQL schema<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"608\" height=\"393\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-1-2.png\" alt=\"\" class=\"wp-image-105596\" style=\"box-shadow:var(--wp--preset--shadow--deep)\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-1-2.png 608w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-1-2-300x194.png 300w\" sizes=\"auto, (max-width: 608px) 100vw, 608px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Here is an example of a character set being used in MySQL:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE users (\n    id INT PRIMARY KEY,\n    name VARCHAR(255) CHARACTER SET utf8mb4\n);<\/pre>\n\n\n\n<p>The <code>name<\/code> column can store text in any language using the <code>utf8mb4<\/code> character set to support all Unicode characters, including emojis.<\/p>\n\n\n\n<p>Now, let\u2019s insert a statement into the users table using characters from English, French, Arabic, and Chinese language.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">INSERT INTO users (id, name) VALUES\n(1, 'My name is Aisha'),\n(2, '\u0627\u0633\u0645\u064a \u0639\u0627\u0626\u0634\u0629'),\n(3, 'Je m\\'appelle Aisha'),\n(4, '\u6211\u7684\u540d\u5b57\u662f\u827e\u838e');<\/pre>\n\n\n\n<p>Let&#8217;s take a look at the data we just inserted:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT * FROM users;<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"321\" height=\"164\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-2-2.png\" alt=\"\" class=\"wp-image-105597\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-2-2.png 321w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-2-2-300x153.png 300w\" sizes=\"auto, (max-width: 321px) 100vw, 321px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"1080\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-3-2.jpeg\" alt=\"\" class=\"wp-image-105598\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-3-2.jpeg 1920w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-3-2-300x169.jpeg 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-3-2-1024x576.jpeg 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-3-2-768x432.jpeg 768w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-3-2-1536x864.jpeg 1536w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The above image is the output of the query we just ran. You might be wondering how the utf8mb4 character set can store so many different characters. It works by using something called the Basic Multilingual Plane (BMP). <a id=\"post-105570-_30j0zll\"><\/a>The <a href=\"https:\/\/en.wikipedia.org\/wiki\/Plane_(Unicode)\">Basic Multilingual Plane (BMP) <\/a> is the main group of Unicode characters, including letters, symbols, and punctuation used in most modern languages. It covers the basic characters you see in everyday text.<\/p>\n\n\n\n<p>With <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.4\/en\/charset-unicode-utf8mb4.html\">utf8mb4<\/a>, these BMP characters usually take 1-3 bytes to store. However, utf8mb4 can also handle characters outside this group, like emojis and rare symbols, by using up to 4 bytes. This makes it capable of storing all Unicode characters including various emojis.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1280\" height=\"783\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/basic-multilingual-plane-bmp-2.png\" alt=\"Basic Multilingual Plane (BMP)\" class=\"wp-image-105599\" style=\"box-shadow:var(--wp--preset--shadow--deep)\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/basic-multilingual-plane-bmp-2.png 1280w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/basic-multilingual-plane-bmp-2-300x184.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/basic-multilingual-plane-bmp-2-1024x626.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/basic-multilingual-plane-bmp-2-768x470.png 768w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><em>Roadmap to Unicode BMP multilingual<\/em> by <em>Drmccreedy<\/em>, used under <em>CC BY-SA 4.0 License<\/em> \/cropped from original. Available at:<a href=\"https:\/\/en.wikipedia.org\/wiki\/Plane_(Unicode)#\/media\/File:Roadmap_to_Unicode_BMP_multilingual.svg\"> https:\/\/en.wikipedia.org\/wiki\/Plane_(Unicode)#\/media\/File:Roadmap_to_Unicode_BMP_multilingual.svg<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-character-set-repertoire\"><a id=\"post-105570-_63ne7hs6s261\"><\/a>Character Set repertoire?<\/h4>\n\n\n\n<p>A character set repertoire in MySQL simply refers to the set of characters that a particular character set can store and handle. Think of it as the &#8220;alphabet&#8221; for a given character set. Each character set has a specific range of letters, symbols, and other characters it supports. Here are some common MySQL character sets and their repertoires:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li><strong>ASCII<\/strong>: Includes basic English letters (A-Z, a-z), numbers (0-9), and a few symbols like @, #, and %.<\/li>\n\n\n\n<li><strong>latin1<\/strong>: Adds support for Western European characters, like \u00e9, \u00e7, and \u00fc.<\/li>\n\n\n\n<li><strong>utf8<\/strong>: Supports many languages, including most European and Asian scripts, but does not cover all Unicode characters (e.g., some emojis).<\/li>\n\n\n\n<li><strong>utf8mb4<\/strong>: Fully supports all Unicode characters, including emojis, special symbols, and scripts from around the world.<\/li>\n<\/ul>\n<\/div>\n\n\n<p><a id=\"post-105570-_2y7vkuu5nsq0\"><\/a> The repertoire of a character set determines what text you can store. If you choose ASCII, you can\u2019t store characters like \u00f1 or \u00fc. If you choose utf8, you can store many international characters but might run into trouble with some emoji symbols, utf8mb4 is the safest choice if you need to handle a wide variety of text, including emojis.<\/p>\n\n\n\n<p>The character set you choose significantly affects your database in two ways. First, it impacts <strong>data compatibility<\/strong>. For example, a limited character set like ASCII only supports basic English characters. If you try to store characters outside this range, such as accents or emojis, you may encounter errors. For instance, storing the word &#8220;caf\u00e9&#8221; in ASCII might fail because the &#8220;\u00e9&#8221; character isn&#8217;t supported.<\/p>\n\n\n\n<p>Second, it influences <strong>storage space<\/strong>. Character sets with larger repertoires, such as utf8mb4, can store all Unicode characters, including emojis, but they use more storage. For example, utf8mb4 can use up to 4 bytes per character, while latin1 uses just 1 byte per character for Western European languages. Choosing the right character set means balancing your storage requirements with the range of characters your application needs to handle.<\/p>\n\n\n\n<p>MySQL supports several character sets, each with its own use. Here\u2019s a breakdown of some of the character sets available in MySQL:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>\n<p><strong>Charset<\/strong><\/p>\n<\/td><td>\n<p><strong>Description<\/strong><\/p>\n<\/td><td>\n<p><strong>Default collation<\/strong><\/p>\n<\/td><td>\n<p><strong>Max. length (bytes)<\/strong><\/p>\n<\/td><td>\n<p><strong>Use-case<\/strong><\/p>\n<\/td><\/tr><tr><td>\n<p><strong>armscii8<\/strong><\/p>\n<\/td><td>\n<p>ARMSCII-8 Armenian<\/p>\n<\/td><td>\n<p>armscii8_general_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For Armenian text<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>ascii<\/strong><\/p>\n<\/td><td>\n<p>US ASCII<\/p>\n<\/td><td>\n<p>ascii_general_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For basic English characters (A-Z, 0-9, and symbols).<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>big5<\/strong><\/p>\n<\/td><td>\n<p>Big5 Traditional Chinese<\/p>\n<\/td><td>\n<p>big5_chinese_ci<\/p>\n<\/td><td>\n<p>2<\/p>\n<\/td><td>\n<p>For traditional Chinese characters (used in Taiwan and Hong Kong).<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>binary<\/strong><\/p>\n<\/td><td>\n<p>Binary pseudo charset<\/p>\n<\/td><td>\n<p>binary<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For raw binary data, not text. Useful for storing non-text data (e.g., images, encrypted data).<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>cp1250<\/strong><\/p>\n<\/td><td>\n<p>Windows Central European<\/p>\n<\/td><td>\n<p>cp1250_general_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For Central European languages like Polish, Czech, and Hungarian.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>cp1251<\/strong><\/p>\n<\/td><td>\n<p>Windows Cyrillic<\/p>\n<\/td><td>\n<p>cp1251_general_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For Cyrillic-based languages like Russian and Bulgarian.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>cp1256<\/strong><\/p>\n<\/td><td>\n<p>Windows Arabic<\/p>\n<\/td><td>\n<p>cp1256_general_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For Arabic text (primarily used in the Middle East).<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>cp932<\/strong><\/p>\n<\/td><td>\n<p>SJIS for Windows Japanese<\/p>\n<\/td><td>\n<p>cp932_japanese_ci<\/p>\n<\/td><td>\n<p>2<\/p>\n<\/td><td>\n<p>For Japanese text using Shift-JIS encoding, often in Windows.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>dec8<\/strong><\/p>\n<\/td><td>\n<p>DEC West European<\/p>\n<\/td><td>\n<p>dec8_swedish_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For Western European languages used in DEC systems.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>greek<\/strong><\/p>\n<\/td><td>\n<p>ISO 8859-7 Greek<\/p>\n<\/td><td>\n<p>greek_general_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For Greek text<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>hebrew<\/strong><\/p>\n<\/td><td>\n<p>ISO 8859-8 Hebrew<\/p>\n<\/td><td>\n<p>hebrew_general_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For Hebrew text<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin1<\/strong><\/p>\n<\/td><td>\n<p>cp1252 West European<\/p>\n<\/td><td>\n<p>latin1_swedish_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For Western European languages like Spanish, French, and English.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin5<\/strong><\/p>\n<\/td><td>\n<p>ISO 8859-9 Turkish<\/p>\n<\/td><td>\n<p>latin5_turkish_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For Turkish texts.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin7<\/strong><\/p>\n<\/td><td>\n<p>ISO 8859-13 Baltic<\/p>\n<\/td><td>\n<p>latin7_general_ci<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>For Baltic languages like Lithuanian and Latvian.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>utf16<\/strong><\/p>\n<\/td><td>\n<p>UTF-16 Unicode<\/p>\n<\/td><td>\n<p>utf16_general_ci<\/p>\n<\/td><td>\n<p>4<\/p>\n<\/td><td>\n<p>For all Unicode characters (supports all languages).<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>utf32<\/strong><\/p>\n<\/td><td>\n<p>UTF-32 Unicode<\/p>\n<\/td><td>\n<p>utf32_general_ci<\/p>\n<\/td><td>\n<p>4<\/p>\n<\/td><td>\n<p>For all Unicode characters, with a 32-bit encoding.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>utf8mb3<\/strong><\/p>\n<\/td><td>\n<p>UTF- 8 Unicode<\/p>\n<\/td><td>\n<p>utf8mb3_general_<\/p>\n<\/td><td>\n<p>3<\/p>\n<\/td><td>\n<p>For most languages, but limited to 3 bytes per character (older version of UTF-8).<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>utf8mb4<\/strong><\/p>\n<\/td><td>\n<p>UTF- 8 Unicode<\/p>\n<\/td><td>\n<p>utf8mb4_0900_ai_ci<\/p>\n<\/td><td>\n<p>4<\/p>\n<\/td><td>\n<p>Full Unicode support, including emojis and all scripts.<\/p>\n<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>For a full list on all the available character sets available in MySQL, kindly visit the official <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.4\/en\/charset-charsets.html\">Documentation.<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-character-sets-in-postgresql\"><a id=\"post-105570-_ge0f7smat37n\"><\/a>Character Sets in PostgreSQL<\/h3>\n\n\n\n<p>PostgreSQL manages text encoding and storage using character sets to ensure proper handling of multilingual text. Much like MySQL, it uses UTF-8 as its default character set. UTF-8 is a universal encoding standard that can represent almost all characters from every language. This ensures maximum compatibility with modern applications and systems that rely on Unicode.<\/p>\n\n\n\n<p>PostgreSQL also allows you set the character set from the server, however, it does not support a few conversion of some character set within the server, such as, big5, gbk, sjis, and a few others that were mentioned in the <a href=\"https:\/\/www.postgresql.org\/docs\/current\/multibyte.html%5D\">documentation<\/a>. It is also worthy to note that postgresql does not support multiple <a href=\"https:\/\/www.postgresql.org\/docs\/current\/infoschema-character-sets.html\">character set<\/a> within a single database.<\/p>\n\n\n\n<p>Here is a breakdown of some of the supported character set that are available on PostgreSQL:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>\n<p><strong>Character Set<\/strong><\/p>\n<\/td><td>\n<p><strong>Description<\/strong><\/p>\n<\/td><td>\n<p><strong>Bytes<\/strong><\/p>\n<\/td><td>\n<p><strong>Use-case<\/strong><\/p>\n<\/td><\/tr><tr><td>\n<p><strong>UTF-8<\/strong><\/p>\n<\/td><td>\n<p>Unicode, variable-width encoding.<\/p>\n<\/td><td>\n<p>1-4<\/p>\n<\/td><td>\n<p>Ideal for modern applications needing support for multiple languages.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>SQL_ASCII<\/strong><\/p>\n<\/td><td>\n<p>Raw byte encoding with no checks.<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>Suitable for legacy systems without Unicode requirements. Avoid for new apps.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>LATIN1<\/strong><\/p>\n<\/td><td>\n<p>ISO 8859-1 (Western European languages)<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>Use in applications limited to Western European text.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>LATIN2<\/strong><\/p>\n<\/td><td>\n<p>ISO 8859-2 (Central and Eastern European)<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>Best for Central and Eastern European languages like Czech, Polish, etc.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>LATIN5<\/strong><\/p>\n<\/td><td>\n<p>ISO 8859-9 (Turkish)<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>Optimized for Turkish language applications.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>LATIN7<\/strong><\/p>\n<\/td><td>\n<p>ISO 8859-13 (Baltic languages)<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>Use for Baltic region languages like Latvian and Lithuanian.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>WIN1250<\/strong><\/p>\n<\/td><td>\n<p>Windows Central European (CP1250).<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>Use for Windows-based systems handling Central European languages.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>WIN1251<\/strong><\/p>\n<\/td><td>\n<p>Windows Cyrillic (CP1251)<\/p>\n<\/td><td>\n<p>1<\/p>\n<\/td><td>\n<p>Ideal for Windows applications supporting Cyrillic scripts.<\/p>\n<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>UTF8 is the most versatile and modern character set for global applications. The other character sets (e.g., LATINx and WIN125x) are primarily for compatibility with specific languages or legacy systems. SQL_ASCII should be avoided for new applications due to lack of encoding validation.<\/p>\n\n\n\n<p>For a full list of all the available character sets available in PostgreSQL, please visit the official <a href=\"https:\/\/www.postgresql.org\/docs\/current\/multibyte.html\">documentation.<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-collations\">Collations<\/h2>\n\n\n\n<p>In this section I will compare how collations are implemented in PostgreSQL and MySQL.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-collations-in-mysql\">Collations in MySQL<\/h3>\n\n\n\n<p>In MySQL, collations determine how string comparisons and sorting are performed. When you create a database, table, or column, you can set a specific collation to decide how text is compared.<\/p>\n\n\n\n<p>Every character set (like <code>utf8mb4<\/code>) has a set of collations and usually a default collation. Utf8mb4 has a default collation of <code>utf8mb4_0900_ai_ci<\/code> . Every collation has a set of attributes that describe how it is used when doing comparisons. The following list will help you understand what these parts are, from the name:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li><strong>utf8mb4<\/strong> is the character set that supports the full range of Unicode characters.<\/li>\n\n\n\n<li><strong>0900<\/strong>: This indicates that the collation is based on the Unicode Collation Algorithm (UCA) version 9.0. The UCA is a set of rules that defines how Unicode text should be sorted and compared.<\/li>\n\n\n\n<li><strong>ai<\/strong>: This stands for Accent Insensitive. It means that during comparisons, accents (diacritics) are ignored. For example, &#8220;\u00e9&#8221; and &#8220;e&#8221; would be treated as equivalent.<\/li>\n\n\n\n<li><strong>ci<\/strong>: This stands for Case Insensitive. It means that uppercase and lowercase letters are considered equal in comparisons. For example, &#8220;A&#8221; and &#8220;a&#8221; would be treated as the same letter.<\/li>\n<\/ul>\n<\/div>\n\n\n<p>So, if you compare the strings &#8220;caf\u00e9&#8221; and &#8220;CAFE&#8221; in MySQL with this collation, they will be considered equal because accents (the acute accent on &#8220;\u00e9&#8221;) are ignored and case differences (uppercase &#8220;C&#8221; vs lowercase &#8220;c&#8221;) are also ignored.<\/p>\n\n\n\n<p>Here are ways you can configure your collations in MySQL:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-server-level\">Server level<\/h4>\n\n\n\n<p>You can specify the character set and collation at server startup using the &#8211;character-set-server and &#8211;collation-server options directly in the command line when starting the MySQL server using the following command:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mysqld --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci<\/pre>\n\n\n\n<p>You can also choose to modify the configuration file (my.cnf or my.ini) by adding the above command under the [mysqld] section. After starting the server with these options, log in to MySQL and check the values to ensure they are applied.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SHOW VARIABLES LIKE 'character_set_server';\nSHOW VARIABLES LIKE 'collation_server';<\/pre>\n\n\n\n<p>This will return something like the following, based on your server\u2019s collation and character set.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"342\" height=\"130\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-5-2.png\" alt=\"\" class=\"wp-image-105600\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-5-2.png 342w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-5-2-300x114.png 300w\" sizes=\"auto, (max-width: 342px) 100vw, 342px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"399\" height=\"137\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-6-2.png\" alt=\"\" class=\"wp-image-105601\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-6-2.png 399w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-6-2-300x103.png 300w\" sizes=\"auto, (max-width: 399px) 100vw, 399px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-database-level\">Database level<\/h4>\n\n\n\n<p>You can specify a collation when creating a database by running the following command:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;  <\/pre>\n\n\n\n<p>To be sure the collation has been set successfully, you can check this by running the following command:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">USE mydb;\nSELECT @@character_set_database, @@collation_database;<\/pre>\n\n\n\n<p>You should get an output similar to the image below<\/p>\n\n\n\n<figure class=\"wp-block-image wp-duotone-unset-1\"><img loading=\"lazy\" decoding=\"async\" width=\"415\" height=\"105\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-7-2.png\" alt=\"\" class=\"wp-image-105602\" style=\"box-shadow:var(--wp--preset--shadow--deep)\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-7-2.png 415w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-7-2-300x76.png 300w\" sizes=\"auto, (max-width: 415px) 100vw, 415px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-table-or-column-level\">Table or column level<\/h4>\n\n\n\n<p>You can specify the collation for a particular table or even column. Here\u2019s how you can do this using the following command:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE mytable (  \n       name VARCHAR(100) COLLATE utf8mb4_unicode_ci  );  <\/pre>\n\n\n\n<p>If you want to create a new database with a specific character set and collation, you can run the following command:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE DATABASE mynew_db\n  CHARACTER SET utf8mb4\n  COLLATE utf8mb4_unicode_ci;<\/pre>\n\n\n\n<p>Then, you can check the output with the following command:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">USE mynew_db;\nSELECT @@character_set_database, @@collation_database;<\/pre>\n\n\n\n<p>You should get an output similar to the image below<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"449\" height=\"104\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-8-2.png\" alt=\"\" class=\"wp-image-105603\" style=\"box-shadow:var(--wp--preset--shadow--deep)\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-8-2.png 449w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-8-2-300x69.png 300w\" sizes=\"auto, (max-width: 449px) 100vw, 449px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>You can also set both the character set and collation in your table by running the following command:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE your_table_name (\n  id INT PRIMARY KEY,\n  text_column VARCHAR(255)\n) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;<\/pre>\n\n\n\n<p>Then, verify the character set and collation by running the SELECT statement above.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"449\" height=\"104\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-9-2.png\" alt=\"\" class=\"wp-image-105604\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-9-2.png 449w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-9-2-300x69.png 300w\" sizes=\"auto, (max-width: 449px) 100vw, 449px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The choice of character set in MySQL directly affects both storage and speed, such as storage and speed impact.<\/p>\n\n\n\n<p>Different character sets require varying amounts of storage per character. For example, latin1 uses 1 byte per character and is suitable for basic Western European languages while utf8mb4 can use up to 4 bytes per character to support all Unicode characters, including emojis and complex scripts like Chinese. This means databases using utf8mb4 may require more disk space than those using latin1.<\/p>\n\n\n\n<p>For speed impact, larger character sets can increase read\/write times because they consume more space, which may slow down operations involving large datasets.<\/p>\n\n\n\n<p>Collations tied to character sets, such as utf8mb4_unicode_ci, can be more computationally intensive because they follow complex Unicode rules for comparisons and sorting. Simpler collations like utf8mb4_general_ci are faster but less accurate for multilingual text.<\/p>\n\n\n\n<p>For more details on character sets and collations in MySQL, please refer to the official <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/charset-general.html\">documentation<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-collation-in-postgresql\"><a id=\"post-105570-_s4c1i89c7f1w\"><\/a><a id=\"post-105570-_2et92p0\"><\/a><a id=\"post-105570-_tyjcwt\"><\/a>Collation in PostgreSQL<\/h3>\n\n\n\n<p>PostgreSQL handles collations similarly but with a bit more flexibility in terms of locale and internationalisation support. Collation in PostgreSQL defines how text is sorted and compared according to locale-specific rules. You can specify a collation at the database, table, or column level, just like in MySQL.<\/p>\n\n\n\n<p>Collations in PostgreSQL are heavily tied to the system&#8217;s locale settings. PostgreSQL uses locales, which are sets of rules defining language, region-specific sorting, and text behaviour. You must have the relevant locale installed on your system to use its collation.<\/p>\n\n\n\n<p>When you create a database or a column, you can set the locale to ensure that text is sorted correctly for a specific language or region. For example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT * FROM users ORDER BY name COLLATE \"en_US\";<\/pre>\n\n\n\n<p>This query sorts the rows in the users table by the name column using the &#8220;en_US&#8221; collation. It orders the names according to the rules of the English (United States) locale. Collation defines how text is compared and sorted, so using &#8220;en_US&#8221; ensures that the sorting respects U.S. English alphabetical conventions, such as handling uppercase and lowercase letters, accents, and special characters in a specific way.<\/p>\n\n\n\n<p>In addition to system locales, PostgreSQL also supports ICU collations, which provide more robust internationalisation support. ICU collations offer consistent behaviour across platforms. For more information on collation in PostgreSQL, please visit the <a href=\"https:\/\/www.postgresql.org\/docs\/current\/collation.html\">docs.<\/a><\/p>\n\n\n\n<p>When you create a database, you can specify the character set to use with the encoding parameter. By default, PostgreSQL uses UTF8, which supports most languages and characters globally.<\/p>\n\n\n\n<p>Here\u2019s an example of how to create a database with a specified character set, using LATIN1 in this case:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE DATABASE my_db\nWITH ENCODING 'WIN1252'\nLC_COLLATE 'en_US'\nLC_CTYPE 'en_US'\nTEMPLATE template0;<\/pre>\n\n\n\n<p>The template0 ensures that the new database uses a clean slate without inheriting the default UTF-8 encoding.<\/p>\n\n\n\n<p>To verify the character set of the database, run the following command:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT pg_encoding_to_char(encoding) AS encoding_name, \n       datname, datcollate, datctype\nFROM pg_database\nWHERE datname = 'my_db';<\/pre>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"563\" height=\"101\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-10-2.png\" alt=\"\" class=\"wp-image-105605\" style=\"box-shadow:var(--wp--preset--shadow--deep)\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-10-2.png 563w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-10-2-300x54.png 300w\" sizes=\"auto, (max-width: 563px) 100vw, 563px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>You must also ensure the locale you specify (e.g., <code>LC_COLLATE<\/code> and <code>LC_CTYPE<\/code>) must support the encoding. For example, <code>LATIN1<\/code> typical locales include: <code>en_US<\/code> (US English), <code>de_DE<\/code> (German), and <code>fr_FR<\/code> (French). If there is an incompatibility, PostgreSQL will raise an error during database creation. Let\u2019s try this out using an incompatible locale:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE DATABASE invalid_db\nWITH ENCODING 'LATIN1'\nLC_COLLATE 'zh_CN'\nLC_CTYPE 'zh_CN';<\/pre>\n\n\n\n<p>Would return this error:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1041\" height=\"137\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-11-2.png\" alt=\"\" class=\"wp-image-105606\" srcset=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-11-2.png 1041w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-11-2-300x39.png 300w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-11-2-1024x135.png 1024w, https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2025\/02\/word-image-105570-11-2-768x101.png 768w\" sizes=\"auto, (max-width: 1041px) 100vw, 1041px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>This is why it&#8217;s important to know the right locale for each character set.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-comparison-table-mysql-vs-postgresql\"><a id=\"post-105570-_eugbrxjzkwhp\"><\/a>Comparison table: MySQL vs. PostgreSQL<\/h2>\n\n\n\n<p><a id=\"post-105570-_ldhwsksn9k49\"><\/a> MySQL and PostgreSQL support different character sets to store and manage text in various languages. Picking the right character set is important for compatibility, efficient storage, and correct text handling. Here\u2019s a simple comparison of the character sets they support, along with their differences and use cases.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>\n<p><a id=\"post-105570-_vohzx483ek9j\"><\/a><strong>Character Set<\/strong><\/p>\n<\/td><td>\n<p><strong>MySQL Use Case<\/strong><\/p>\n<\/td><td>\n<p><strong>PostgreSQL Use Case<\/strong><\/p>\n<\/td><td>\n<p><strong>Key Difference<\/strong><\/p>\n<\/td><\/tr><tr><td>\n<p><strong>utf8mb4<\/strong><\/p>\n<\/td><td>\n<p>Recommended for full Unicode support, including emojis (up to 4 bytes per char).<\/p>\n<\/td><td>\n<p>Supports full Unicode, including emojis (up to 4 bytes per char).<\/p>\n<\/td><td>\n<p>No significant difference<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin1<\/strong><\/p>\n<\/td><td>\n<p>Suitable for Western European languages like English, French, and German.<\/p>\n<\/td><td>\n<p>Also used for Western European languages, limited to ISO-8859-1.<\/p>\n<\/td><td>\n<p>No significant difference. Both lack Unicode support.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>ascii<\/strong><\/p>\n<\/td><td>\n<p>Good for simple English text or codes without special characters.<\/p>\n<\/td><td>\n<p>Similar use case for plain ASCII text, lacks Unicode or multilingual support.<\/p>\n<\/td><td>\n<p>MySQL offers better error detection for invalid ASCII input compared to PostgreSQL&#8217;s SQL_ASCII<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin2<\/strong><\/p>\n<\/td><td>\n<p>Supports Central European languages like Polish, Czech, Slovak, and Hungarian.<\/p>\n<\/td><td>\n<p>Same use case for Central European languages with accented characters.<\/p>\n<\/td><td>\n<p>Functionally similar for regional language needs.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin3<\/strong><\/p>\n<\/td><td>\n<p>N\/A<\/p>\n<\/td><td>\n<p>Used for South European languages like Maltese and Esperanto.<\/p>\n<\/td><td>\n<p>PostgreSQL provides latin3 for specific niche use cases not addressed by MySQL.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin4<\/strong><\/p>\n<\/td><td>\n<p>N\/A<\/p>\n<\/td><td>\n<p>Designed for Northern European languages like Estonian and Latvian.<\/p>\n<\/td><td>\n<p>PostgreSQL&#8217;s latin4 fills a gap for some Northern European languages.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin5<\/strong><\/p>\n<\/td><td>\n<p>Designed for Turkish, replacing some latin1 characters with Turkish-specific ones.<\/p>\n<\/td><td>\n<p>Same use case for Turkish and other Western European languages.<\/p>\n<\/td><td>\n<p>Both implementations are functionally similar.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin6<\/strong><\/p>\n<\/td><td>\n<p>N\/A<\/p>\n<\/td><td>\n<p>Used for Nordic languages such as Sami.<\/p>\n<\/td><td>\n<p>Exclusive to PostgreSQL for linguistic needs in Nordic regions.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin7<\/strong><\/p>\n<\/td><td>\n<p>Used for Baltic languages like Lithuanian, Latvian, and Estonian.<\/p>\n<\/td><td>\n<p>Same use case for Baltic and Scandinavian languages.<\/p>\n<\/td><td>\n<p>Similar, with no major differences.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin8<\/strong><\/p>\n<\/td><td>\n<p>N\/A<\/p>\n<\/td><td>\n<p>Provides support for Celtic languages like Gaelic and Welsh.<\/p>\n<\/td><td>\n<p>PostgreSQL supports languages like Gaelic, which MySQL does not address with a specific encoding.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>latin9<\/strong><\/p>\n<\/td><td>\n<p>N\/A<\/p>\n<\/td><td>\n<p>An updated version of latin1 with the Euro (\u20ac) symbol and accents.<\/p>\n<\/td><td>\n<p>PostgreSQL includes latin9 for modern applications requiring the Euro currency symbol and accents.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>gbk<\/strong><\/p>\n<\/td><td>\n<p>Used for storing Simplified Chinese characters.<\/p>\n<\/td><td>\n<p>Also used for storing Simplified Chinese characters.<\/p>\n<\/td><td>\n<p>Both MySQL and PostgreSQL provide GBK<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>greek<\/strong><\/p>\n<\/td><td>\n<p>Supports greek language texts.<\/p>\n<\/td><td>\n<p>Used for Greek language texts (ISO 8859-7).<\/p>\n<\/td><td>\n<p>Similar, with no major differences.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>hebrew<\/strong><\/p>\n<\/td><td>\n<p>Supports hebrew characters<\/p>\n<\/td><td>\n<p>Used for Hebrew language texts (ISO 8859-8).<\/p>\n<\/td><td>\n<p>Both databases support Hebrew texts with minimal differences.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>cp1251<\/strong><\/p>\n<\/td><td>\n<p>Supports Cyrillic-based languages like Russian.<\/p>\n<\/td><td>\n<p>Equivalent is ISO 8859-5, used for Cyrillic texts.<\/p>\n<\/td><td>\n<p>Functionally similar; PostgreSQL uses a standard naming convention (ISO prefix).<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>cp1256<\/strong><\/p>\n<\/td><td>\n<p>Used for Arabic text storage.<\/p>\n<\/td><td>\n<p>Equivalent is ISO 8859-6, supporting Arabic.<\/p>\n<\/td><td>\n<p>Both support Arabic, but PostgreSQL uses ISO standard naming.<\/p>\n<\/td><\/tr><tr><td>\n<p><strong>SJIS<\/strong><\/p>\n<\/td><td>\n<p>Supports Japanese characters (Shift-JIS). Also has support for windows Japanese with cp932<\/p>\n<\/td><td>\n<p>Used for Japanese texts.<\/p>\n<\/td><td>\n<p>Both databases support Japanese texts, however, MySQL offers better support.<\/p>\n<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The Latin 1, 2, 5 and 7 is a progression based on the limitations of the earlier sets. Each new Latin character set was developed to address the needs of languages or regions not covered by the previous ones. PostgreSQL supports a wider range of specialized character sets like latin3, latin4, and latin9, catering to niche language requirements.<\/p>\n\n\n\n<p>However, Unicode eventually replaced these character sets because it supports all languages in a single encoding, eliminating the need for multiple Latin variants.<\/p>\n\n\n\n<p>Also, characters not supported by the chosen character set might get corrupted or not saved at all. This is why it is important to know the list of character sets supported by both<a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.4\/en\/charset-charsets.html\"> MySQL<\/a> and <a href=\"https:\/\/www.postgresql.org\/docs\/current\/multibyte.html\">PostgreSQL<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusion\"><a id=\"post-105570-_9j8rjsv2dgsv\"><\/a>Conclusion<\/h2>\n\n\n\n<p>Character sets and collations are critical for ensuring proper storage, retrieval, and comparison of text data in any database. MySQL offers a wider variety of character sets, making it highly customizable for diverse applications, but PostgreSQL simplifies things with a strong emphasis on Unicode and robust support for locales.<\/p>\n\n\n\n<p>When choosing (of moving back and forth) between MySQL and PostgreSQL, consider the languages, regions, and storage requirements of your application. Both databases provide powerful tools to handle multilingual and complex text data. Understanding how they implement character sets and collations will help you design a database that is efficient, compatible, and easy to maintain.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When working with strings in a database system, you need to ensure that the data is stored, sorted, and compared accurately. That\u2019s where character sets and collations come in. Character sets determine how text is saved while collations define how the database sorts and compares text. For example, if you\u2019re storing text in different languages&#8230;&hellip;<\/p>\n","protected":false},"author":341597,"featured_media":105583,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[145792,143534],"tags":[159268,5854,158978],"coauthors":[158988],"class_list":["post-105570","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mysql","category-postgresql","tag-aishabukar_mysql_postgresql","tag-mysql","tag-postgresql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/105570","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/341597"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=105570"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/105570\/revisions"}],"predecessor-version":[{"id":105617,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/105570\/revisions\/105617"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/105583"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=105570"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=105570"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=105570"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=105570"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}