Make your own free website on Tripod.com
Google
 

Character Set Test Area

You can test below various charsets in use today by many computers & browsers around the world. Most of them will require you to have certain Unicode-compatible fonts like Arial Unicode MS (included with Microsoft Office 2000 & higher) & Tahoma. I also recommend to view most of them under a recent browser like Microsoft Internet Explorer 5.0 or Netscape Navigator 4.7 or higher, which have a wider charset support as shown on the table below; I also recommend that you run a recent operating system like Microsoft Windows 2000 or Apple Macintosh OS 9 or higher, which especially have better support for Unicode, Bidirectionality, & CJK charsets. Some of them will display correctly only on Internet Explorer, while some will only display on Netscape. A few of them will also display on other third-party browsers listed below at the end of this page.
I don’t include here any samples of the 8-bit IBM EBCDIC charsets used by IBM mainframes (and supported by some other operating systems, like Windows 2000), because I don’t have access to such machines or don’t have access to any HTML editor that can save to EBCDIC format instead of ASCII (7-bit single-byte charsets), ANSI (Windows-125x, ISO-8859 series, X-Mac-series, & other 8-bit charsets), or Unicode (7-bit, 8-bit, & 16-bit). (EBCDIC files are much more difficult to show in a browser because they are not derived from ASCII like all ones listed in the table below, so their HTML tags must be written according to EBCDIC, not ASCII as we are accustomed.)

All of these charsets begin with the basic 7-bit US-ASCII area (C0 control bytes 00-1F & character bytes 20-7F), which is not shown because of character repetition. ISO-8859, ISO-IR, & ISCII charsets follow the 7-bit US-ASCII area with the C1 Controls area (bytes 80-9F) & then with their own proprietary characters defined from Unicode. Other charsets (like Windows-125x, X-Mac-series, & IBMxxx) follow US-ASCII immediately with their proprietary characters, with no provision for C1 Controls.
Chinese, Japanese, & Korean charsets (collectively called CJK) are a special case of charsets: because each of them involves thousands of characters, they use special rules to make up characters outside US-ASCII that don’t fit on a standard 8-bit system like ISO-8859. All of them (except ISO-2022 & HZ) follow US-ASCII with a series of double-byte 16-bit characters made up by combining 8-bit Header Bytes with 8-bit Trailer Bytes. EUC (which stands for Extended Unix Code) and MacKorean & MacChineseSimp use bytes A1-FE as both header & trailer bytes (Japanese also adds header 8E for half-width Kana); Big5 & MacChineseTrad use bytes A1-FE as headers and 40-7E & A1-FE as trailers; Shift-JIS & MacJapanese use 81-9F & E0-FC as headers, 40-7E & 80-FC as trailers, and also provide for 8-bit half-width Kana through single bytes A0-DF (with no preceding headers); UHC uses 81-FD (skipping C9) as headers and 41-5A, 61-7A, & 81-FE as trailers (no provision for half-width Hangeul letters); Johab, being more difficult to describe, uses 84-F9 (with several gaps) as headers and 31-7E & 81-FE as trailers; and the GB family, being more character-complete, uses 81-FE as headers and 40-7E & 80-FE as trailers, thus covering all known CJK Ideographs (also known as Hanzi in Chinese, Kanji in Japanese, & Hanja in Korean).
ISO-2022 charsets, intended for use on electronic mail messages (which are mostly handled by 7-bit systems), use 7-bit byte strings & escape/shift sequences to generate 14-bit characters that follow after US-ASCII. Japanese JIS also uses 7-bit strings & escape sequences to generate 7-bit half-width Kana through single bytes 21-5F. Chinese HZ, also intended for e-mail messages, uses 7-bit strings & tilde/brace sequences (~{ & ~}) instead of escape/shift sequences to generate 14-bit characters.
Finally, Unicode involves four distinct transformations: a 7-bit mail-safe one, a standard 8-bit Web-safe one, and two 16-bit versions that differ only in byte-ordering (Little-Endian & Big-Endian). The 7-bit UTF-7 uses Base64 strings (beginning with byte 2B [+] and ending with byte 2D [-]) to generate all characters & controls defined in Unicode; the 8-bit UTF-8 follows US-ASCII characters with a series of double-byte (16-bit) & triple-byte (24-bit) characters using bytes C0-DF as 16-bit headers, E0-EF as 24-bit headers, & 80-BF as 16-bit trailers & 24-bit middles & trailers; and the 16-bit UTF-16 (in its Little- & Big-Endian versions) is purely 16-bit (unlike ASCII, which is 7-bit, and unlike ISO-8859, which is 8-bit), ranging from 0000-FFFF (yes, two bytes per each single character among 65,536 possible ones).

Users of Macintosh OS X: you may also want to test these charsets with the new Apple Safari browser (included with MacOS X Jaguar & higher; also available for download at the link) & then tell me about your results.
Windows & Internet Explorer users: many of the charsets below will require you to have installed the necessary files to view them correctly. These files are mainly NLS files that are installed on your WINDOWS/SYSTEM or WINNT/SYSTEM32 folder, depending on your version. Windows 95, 98 & Me include NLS files for some of them; the rest are available from Internet Explorer’s Language Packs, from Charset Decoding, & from the National Language Support (NLS) Files link on this site. Windows 2000 includes NLS files for all of them except ISO-8859-13, for which there is an NLS file also available at the previous link. Windows XP & Server 2003 include NLS files for all of them.

Character Set Test PagesMIME NameWindows CodepageSupported Browsers
Internet ExplorerNetscape NavigatorOpera BrowserApple SafariMozillaOther Browsers
WinMacWinMacWinMac
For each Supported Browser cell below: Version # indicates the minimum version I’m aware of that supports the charset (the + means that later versions still support it); No: indicates that it is not supported by the browser; and ?: indicates that I don’t know whether or not it is supported by the browser.
For the case of Other Browsers, the cells may contain notes about certain browsers that support certain charset(s), or simply a ?.
If you know of any older versions of each browser that support certain charset(s), or for version or support corrections, you are welcome to tell me.
American/Western European Charsets
American/Western European (Latin 1-IBM)ibm8508505.0+5.1+6.0+6.0+NoNo?0.9+Ximian Galeon (Linux)
American/Western European (Latin 1-ISO)iso-8859-1, latin1285913.0+5.1+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
American/Western European (Latin 1-Windows)windows-125212523.0+5.1+3.0+4.7+7.0+??0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
American/Western European (Roman-Macintosh)macintosh, x-mac-roman‡100005.0+5.1+6.0+3.0+No?1.0+0.9+Ximian Galeon (Linux)
Celtic (Latin 8)iso-8859-14, latin828604NoNo6.0+6.0+7.0+7.0+?0.9+Ximian Galeon (Linux)
French Canadian (IBM)ibm8638636.0+?NoNoNoNo?No?
Icelandic (IBM)ibm8618615.0+?NoNoNoNo?No?
Icelandic (Macintosh)x-mac-icelandic100795.0+5.1+6.0+6.0+No?1.0+0.9+Ximian Galeon (Linux)
New Western European (Latin 9)iso-8859-15, latin9286055.0+?6.0+6.0+7.0+7.0+?0.9+Ximian Galeon (Linux)
New Western European (OEM Latin I)ibm00858,
PC-Multilingual-850+Euro
8585.0+
(2000+ only)
?NoNoNoNo?No?
Portuguese (IBM)ibm8608606.0+?NoNoNoNo?No?
Arabic/Farsi/Urdu Charsets
Arabic (ASMO)asmo-7087084.0+?NoNo7.0+*7.0+*?NoAlis Tango 3.0
(Win3.1/95/98 only)
Arabic (ASMO-Transparent)asmo-720, dos-7207204.0+?NoNoNoNo?No?
Arabic (IBM)ibm8648646.0+5.1+6.0+6.0+NoNo?0.9+Ximian Galeon (Linux)
Arabic (ISO)iso-8859-6, arabic285964.0+5.1+6.0+6.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
Arabic/Farsi/Urdu (Windows)windows-125612564.0+5.1+6.0+6.0+7.0+??0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
Arabic/Urdu (Macintosh)x-mac-arabic100045.0+5.1+6.0+*
7.0+
6.0+No?1.0+0.9+Ximian Galeon (Linux)
Farsi (ISIRI)isiri-3342None yetNoNoNoNoNoNo?NoAlis Tango 3.0
(Win3.1/95/98 only)
Farsi (Macintosh)x-mac-farsi10014?NoNo6.0+*?NoNo?0.9+*?
Armenian Charsets
Armenian (ArmSCII)armscii-8None yetNoNo6.0+6.0+NoNo?0.9+Ximian Galeon (Linux)
Baltic Charsets
Baltic (Latin 7-IBM)ibm7757755.0+5.1+NoNoNoNo?No?
Baltic (Latin 7-ISO)iso-8859-13, latin7286036.0+?6.0+6.0+7.0+7.0+?0.9+Ximian Galeon (Linux)
Baltic (Latin 7-Windows)windows-125712574.0+5.1+4.7+4.7+7.0+??0.9+Ximian Galeon (Linux)
North European/Baltic (Latin 4)iso-8859-4, latin4285944.0+5.1+4.7+4.7+7.0+7.0+1.0+0.9+Ximian Galeon (Linux)
Central/Eastern European Charsets
Baltic/Central European (Macintosh)x-mac-ce100295.0+5.1+6.0+3.0+No?1.0+0.9+Ximian Galeon (Linux)
Central/Eastern European (Latin 2-IBM)ibm8528524.0+5.1+6.0+6.0+NoNo?0.9+Ximian Galeon (Linux)
Central/Eastern European (Latin 2-ISO)iso-8859-2, latin2285923.0+5.1+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
Central/Eastern European (Latin 2-Windows)windows-125012503.0+5.1+3.0+3.0+7.0+??0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
Croatian (Macintosh)x-mac-croatian100826.0+5.1+6.0+6.0+No?1.0+0.9+Ximian Galeon (Linux)
Romanian (Latin 10)iso-8859-16, latin1028606NoNo7.0+7.0+7.0+7.0+?0.9+Ximian Galeon (Linux)
Romanian (Macintosh)x-mac-romanian100106.0+?6.0+6.0+No?1.0+0.9+Ximian Galeon (Linux)
Cyrillic Charsets
Cyrillic (ECMA)iso-ir-111None yetNoNo6.0+6.0+NoNo?0.9+Ximian Galeon (Linux)
Cyrillic (ISO)iso-8859-5, cyrillic285953.0+5.1+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
Cyrillic (Windows)windows-125112513.0+5.1+3.0+3.0+7.0+??0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
Cyrillic OEM (IBM)ibm8558556.0+?6.0+6.0+NoNo?0.9+Ximian Galeon (Linux)
Russian Cyrillic (IBM)ibm8668664.0+5.1+4.7+*
6.0+
4.7*+
6.0+
7.0+??0.9+Ximian Galeon (Linux)
Russian Cyrillic (KOI)koi8-r208664.0+5.1+3.0+*
6.0+
3.0+*
6.0+
7.0+7.0+?0.9+Ximian Galeon (Linux)
Russian Cyrillic (Macintosh)x-mac-cyrillic100075.0+5.1+6.0+3.0+No?1.0+0.9+Ximian Galeon (Linux)
Ukrainian Cyrillic (KOI)koi8-u, koi8-ru218664.0+5.1+6.0+6.0+7.0+7.0+?0.9+Ximian Galeon (Linux)
Ukrainian Cyrillic (Macintosh)x-mac-ukrainian100176.0+5.1+6.0+6.0+No?1.0+0.9+Ximian Galeon (Linux)
Georgian Charsets
Georgian (GeoSTD)geostd8None yetNoNo7.0+*????0.9+*?
Greek Charsets
Greek (IBM)ibm7377375.0+5.1+NoNoNoNo?No?
Greek (ISO)iso-8859-7, greek285973.0+5.1+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux)
Greek (Macintosh)x-mac-greek100065.0+5.1+6.0+3.0+No?1.0+0.9+Ximian Galeon (Linux)
Greek (Windows)windows-125312533.0+5.1+4.0+4.0+7.0+??0.9+Ximian Galeon (Linux)
Greek Modern (IBM)ibm8698696.0+?NoNoNoNo?No?
Hebrew Charsets
Hebrew (IBM)ibm862, dos-8628624.0+?6.0+6.0+NoNo?0.9+Ximian Galeon (Linux)
Hebrew (ISO-Logical)iso-8859-8-i, logical385984.0+5.1+6.0+6.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
Hebrew (ISO-Visual)iso-8859-8, visual285984.0+5.1+6.0+6.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
Hebrew (Macintosh)x-mac-hebrew100055.0+5.1+6.0+*
7.0+
6.0+No?1.0+0.9+Ximian Galeon (Linux)
Hebrew (Windows)windows-125512554.0+5.1+6.0+6.0+7.0+??0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
Indic Charsets
Assamese (ISCII)x-iscii-as570065.0+
(2000+ only)
NoNoNoNoNo?No?
Bengali (ISCII)x-iscii-be570035.0+
(2000+ only)
NoNoNoNoNo?No?
Devanagari (ISCII)x-iscii-de570025.0+
(2000+ only)
NoNoNoNoNo?No?
Devanagari (Macintosh)x-mac-devanagari100??NoNo6.0+*?NoNo?0.9+*Ximian Galeon (Linux)*
Gujarati (ISCII)x-iscii-gu570105.0+
(2000+ only)
NoNoNoNoNo?No?
Gujarati (Macintosh)x-mac-gujarati100??NoNo6.0+*?NoNo?0.9+*Ximian Galeon (Linux)*
Gurmukhi (ISCII)x-iscii-pa570115.0+
(2000+ only)
NoNoNoNoNo?No?
Gurmukhi (Macintosh)x-mac-gurmukhi100??NoNo6.0+*?NoNo?0.9+*Ximian Galeon (Linux)*
Kannada (ISCII)x-iscii-ka570085.0+
(2000+ only)
NoNoNoNoNo?No?
Malayalam (ISCII)x-iscii-ma570095.0+
(2000+ only)
NoNoNoNoNo?No?
Oriya (ISCII)x-iscii-or570075.0+
(2000+ only)
NoNoNoNoNo?No?
Tamil (ISCII)x-iscii-ta570045.0+
(2000+ only)
NoNoNoNoNo?No?
Telugu (ISCII)x-iscii-te570055.0+
(2000+ only)
NoNoNoNoNo?No?
Japanese Charsets
Japanese (EUC)euc-jp519324.0+5.1+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only),
NJStar Asian Explorer
Japanese (ISO/JIS)iso-2022-jp, JIS_X0208-1983502204.0+†5.1+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only),
NJStar Asian Explorer
Japanese (ISO/JIS-2)iso-2022-jp-2None yetNoNo6.0+^6.0+^NoNo?0.9+^?
Japanese (JIS-Allow 1-byte Kana)_ISO-2022-JP$ESC, csISO2022JP502214.0+†5.1+6.0+6.0+7.0+^7.0+^1.0+0.9+NJStar Asian Explorer
Japanese (JIS-Allow 1-byte Kana, SO/SI)_ISO-2022-JP$SIO502224.0+†5.1+NoNo7.0+^7.0+^1.0+NoNJStar Asian Explorer
Japanese (JIS-Extended)JIS_X0212-199020932****NoNoNoNo?No?
Japanese (Macintosh)x-mac-japanese100015.0+5.1+No?No?1.0+No?
Japanese (ShiftJIS)shift_jis9324.0+5.1+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only),
NJStar Asian Explorer
Korean Charsets
Korean (EUC)euc-kr519494.0+5.1+3.0+3.0+7.0+7.0+?0.9+Ximian Galeon (Linux), NJStar Asian Explorer
Korean (ISO)iso-2022-kr502254.0+†5.1+4.0+
(not 6.x)
4.0+
(not 6.x)
NoNo?0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only),
NJStar Asian Explorer
Korean (Johab)johab, x-johab‡13615.0+?6.0+*
7.0+
6.0+*
7.0+
NoNo?0.9+Ximian Galeon (Linux)
Korean (Macintosh)x-mac-korean100035.0+5.1+No?No?1.0+No?
Korean (UHC)ks_c_5601-1987, ksc56019494.0+5.1+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only),
NJStar Asian Explorer
North European/Nordic Charsets
Lappish (Sami)windows-sami-21259?NoNoNoNo7.0+??No?
Nordic (IBM)ibm8658656.0+?NoNoNoNo?No?
North European/Nordic (Latin 6)iso-8859-10, latin628600NoNo6.0+6.0+7.0+7.0+?0.9+Ximian Galeon (Linux)
Simplified Chinese Charsets
Simplified Chinese (EUC)euc-cn519365.0+5.1+NoNo7.0+7.0+?NoNJStar Asian Explorer
Simplified Chinese (GB18030)gb18030549366.0+?6.0+6.0+7.0+7.0+?0.9+Ximian Galeon (Linux)
Simplified Chinese (GB2312)gb2312209365.0+?3.0+3.0+7.0+7.0+?0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only),
NJStar Asian Explorer
Simplified Chinese (GBK)gbk9364.0+5.1+6.0+6.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only),
NJStar Asian Explorer
Simplified Chinese (HZ)hz-gb-2312529364.0+¶5.1+4.7+¶4.7+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only),
NJStar Asian Explorer
Simplified Chinese (ISO)iso-2022-cn50227**†**7.0+7.0+7.0+7.0+?1.0+NJStar Asian Explorer
Simplified Chinese (ISO-Extended)iso-2022-cn-extNone yetNoNo7.0+^7.0+^7.0+^7.0+^?1.0+^?
Simplified Chinese (Macintosh)x-mac-chinesesimp100085.0+5.1+No?No?1.0+No?
South European/Turkish Charsets
South European (Latin 3)iso-8859-3, latin3285934.0+5.1+6.0+6.0+7.0+7.0+?0.9+Ximian Galeon (Linux)
Turkish (Latin 5-IBM)ibm8578575.0+5.1+6.0+6.0+NoNo?0.9+Ximian Galeon (Linux)
Turkish (Latin 5-ISO)iso-8859-9, latin5285994.0+5.1+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux)
Turkish (Latin 5-Macintosh)x-mac-turkish100815.0+5.1+6.0+3.0+No?1.0+0.9+Ximian Galeon (Linux)
Turkish (Latin 5-Windows)windows-125412544.0+5.1+3.0+4.7+7.0+7.0+?0.9+Ximian Galeon (Linux)
Thai Charsets
Thai (ISO)iso-8859-11286014.0+5.1+NoNo7.0+7.0+?No?
Thai (Macintosh)x-mac-thai100216.0+5.1+No?No?1.0+No?
Thai (TIS/Windows)tis-620, windows-8748744.0+5.1+6.0+6.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux)
Traditional Chinese Charsets
Traditional Chinese (Big5)big59504.0+5.1+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only),
NJStar Asian Explorer
Traditional Chinese (EUC)euc-tw, x-euc-tw‡51950****3.0+*
6.0+
3.0+*
6.0+
7.0+7.0+?0.9+Ximian Galeon (Linux), NJStar Asian Explorer
Traditional Chinese (HKSCS)big5-hkscs54950****6.0+6.0+7.0+*7.0+*?0.9+Ximian Galeon (Linux)
Traditional Chinese (ISO)iso-2022-tw50229**†**NoNoNoNo?No?
Traditional Chinese (Macintosh)x-mac-chinesetrad100025.0+5.1+No?No?1.0+No?
Unicode Charsets (Further information about Unicode is available at its official site.)
United States Charsets
United States (ASCII) iso-646-us, us-ascii201273.0+•3.0+3.0+3.0+7.0+7.0+1.0+0.9+Ximian Galeon (Linux), Alis Tango 3.0
(Win3.1/95/98 only)
United States (OEM)ibm4374375.0+?NoNoNoNo?No?
Vietnamese Charsets
Vietnamese (TCVN)x-viet-tcvn5712None yetNoNo6.0+6.0+NoNo?0.9+Ximian Galeon (Linux)
Vietnamese (VISCII)visciiNone yetNoNo6.0+6.0+7.0+7.0+?0.9+Ximian Galeon (Linux)
Vietnamese (VPS)x-viet-vps, x-vps‡None yetNoNo6.0+6.0+7.0+7.0+?0.9+Ximian Galeon (Linux)
Vietnamese (Windows)windows-125812584.0+5.1+6.0+6.0+7.0+??0.9+Ximian Galeon (Linux)
* This browser claims to support this charset, but it’s false (or incomplete) in practice, at least on Windows versions earlier than XP.
** East Asian versions of Windows 2000, XP, & Server 2003 claim to support this charset on Internet Explorer 5.0 & higher, but I haven’t confirmed such claim.
^ This browser has no official support for this charset, although it may interpret certain escape sequences or 8-bit bytes.
† Internet Explorer can also interpret 8-bit bytes from ISO-2022 charsets as 16-bit double-byte characters in terms of Big5, GBK, Shift-JIS, or UHC, at least on Windows.
‡ Different browsers may differ in using MIME names to associate this charset; some MIME names will only work on some browsers, whereas other names will only work on other browsers. (For example, Internet Explorer for Windows only interprets Macintosh as Western Mac, whereas Netscape & Mozilla only interpret X-Mac-Roman for the same purpose instead.)
• Internet Explorer 5.0+ can also interpret 8-bit bytes from US-ASCII as repetitions of 7-bit bytes, at least on Windows.
¶ Internet Explorer & Netscape/Mozilla can also interpret 8-bit bytes from HZ-GB-2312 as 16-bit double-byte characters in terms of GBK.

Browsers listed here

Run by: Leroy Vargas. For feedback related to charsets or this website, Leroy can be contacted through his Lycos address.