merged tcl 8.1 branch back into the main trunk

author: stanton <stanton> 1999-04-16 00:46:29 (GMT)
committer: stanton <stanton> 1999-04-16 00:46:29 (GMT)
commit: 97464e6cba8eb0008cf2727c15718671992b913f (patch)
tree: ce9959f2747257d98d52ec8d18bf3b0de99b9535 /tools/encoding/cjk.inf
parent: a8c96ddb94d1483a9de5e340b740cb74ef6cafa7 (diff)
download: tcl-97464e6cba8eb0008cf2727c15718671992b913f.zip
tcl-97464e6cba8eb0008cf2727c15718671992b913f.tar.gz
tcl-97464e6cba8eb0008cf2727c15718671992b913f.tar.bz2
1 files changed, 4467 insertions, 0 deletions
diff --git a/tools/encoding/cjk.inf b/tools/encoding/cjk.inf
new file mode 100644
index 0000000..9fbe527
--- /dev/null
+++ b/tools/encoding/cjk.inf
@@ -0,0 +1,4467 @@
+--- BEGIN (CJK.INF VERSION 2.1 07/12/96) 185553 BYTES ---
+CJK.INF Version 2.1 (July 12, 1996)
+
+Copyright (C) 1995-1996 Ken Lunde. All Rights Reserved.
+
+CJK is a registered trademark and service mark of The Research
+  Libraries Group, Inc.
+
+Online Companion to "Understanding Japanese Information Processing"
+- ENGLISH: 1993, O'Reilly & Associates, Inc., ISBN 1-56592-043-0
+- JAPANESE: 1995, SOFTBANK Corporation, ISBN 4-89052-708-7
+
+
+	This online document provides information on CJK (that is,
+Chinese, Japanese, and Korean) character set standards and encoding
+systems. In short, it provides detailed information on how CJK text is
+handled electronically. I am happy to share this information with
+others, and I would appreciate any comments/feedback on its content.
+The current version (master copy) of this document is maintained at:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
+
+This file may also be obtained by contacting me directly using one of
+the e-mail addresses listed in the CONTACT INFORMATION section.
+
+
+TABLE OF CONTENTS
+
+  VERSION HISTORY
+  RESTRICTIONS
+  CONTACT INFORMATION
+  WHAT HAPPENED TO JAPAN.INF?
+  DISCLAIMER
+  CONVENTIONS
+  INTRODUCTION
+  PART 1: WHAT'S UP WITH UJIP?
+  PART 2: CJK CHARACTER SET STANDARDS
+    2.1: JAPANESE
+      2.1.1: JIS X 0201-1976
+      2.1.2: JIS X 0208-1990
+      2.1.3: JIS X 0212-1990
+      2.1.4: JIS X 0221-1995
+      2.1.5: JIS X 0213-199X
+      2.1.6: OBSOLETE STANDARDS
+    2.2: CHINESE (PRC)
+      2.2.1: GB 1988-89
+      2.2.2: GB 2312-80
+      2.2.3: GB 6345.1-86
+      2.2.4: GB 7589-87
+      2.2.5: GB 7590-87
+      2.2.6: GB 8565.2-88
+      2.2.7: GB/T 12345-90
+      2.2.8: GB/T 13131-9X
+      2.2.9: GB/T 13132-9X
+      2.2.10: GB 13000.1-93
+      2.2.11: ISO-IR-165:1992
+      2.2.12: OBSOLETE STANDARDS
+    2.3: CHINESE (TAIWAN)
+      2.3.1: BIG FIVE
+      2.3.2: CNS 11643-1992
+      2.3.3: CNS 5205
+      2.3.4: OBSOLETE STANDARDS
+    2.4: KOREAN
+      2.4.1: KS C 5636-1993
+      2.4.2: KS C 5601-1992
+      2.4.3: KS C 5657-1991
+      2.4.4: GB 12052-89
+      2.4.5: KS C 5700-1995
+      2.4.6: OBSOLETE STANDARDS
+    2.5: CJK
+      2.5.1: ISO 10646-1:1993
+      2.5.2: CCCII
+      2.5.3: ANSI Z39.64-1989
+    2.6: OTHER
+      2.6.1: GB 8045-87
+      2.6.2: TCVN-5773:1993
+  PART 3: CJK ENCODING SYSTEMS
+    3.1: 7-BIT ISO 2022 ENCODING
+      3.1.1: CODE SPACE
+      3.1.2: ISO-REGISTERED ESCAPE SEQUENCES
+      3.1.3: ISO-2022-JP AND ISO-2022-JP-2
+      3.1.4: ISO-2022-KR
+      3.1.5: ISO-2022-CN AND ISO-2022-CN-EXT
+    3.2: EUC ENCODING
+      3.2.1: JAPANESE REPRESENTATION
+      3.2.2: CHINESE (PRC) REPRESENTATION
+      3.2.3: CHINESE (TAIWAN) REPRESENTATION
+      3.2.4: KOREAN REPRESENTATION
+    3.3: LOCALE-SPECIFIC ENCODINGS
+      3.3.1: SHIFT-JIS
+      3.3.2: HZ (HZ-GB-2312)
+      3.3.3: zW
+      3.3.4: BIG FIVE
+      3.3.5: JOHAB
+      3.3.6: N-BYTE HANGUL
+      3.3.7: UCS-2
+      3.3.8: UCS-4
+      3.3.9: UTF-7
+      3.3.10: UTF-8
+      3.3.11: UTF-16
+      3.3.12: ANSI Z39.64-1989
+      3.3.13: BASE64
+      3.3.14: IBM DBCS-HOST
+      3.3.15: IBM DBCS-PC
+      3.3.16: IBM DBCS-/TBCS-EUC
+      3.3.17: UNIFIED HANGUL CODE
+      3.3.18: TRON CODE
+      3.3.19: GBK
+    3.4: CJK CODE PAGES
+  PART 4: CJK CHARACTER SET COMPATIBILITY ISSUES
+    4.1: JAPANESE
+    4.2: CHINESE (PRC)
+    4.3: CHINESE (TAIWAN)
+    4.4: KOREAN
+    4.5: ISO 10646-1:1993
+    4.6: UNICODE
+    4.7: CODE CONVERSION TIPS
+  PART 5: CJK-CAPABLE OPERATING SYSTEMS
+    5.1: MS-DOS
+    5.2: WINDOWS
+    5.3: MACINTOSH
+    5.4: UNIX AND X WINDOWS
+    5.5: OTHERS
+  PART 6: CJK TEXT AND INTERNET SERVICES
+    6.1: ELECTRONIC MAIL
+    6.2: USENET NEWS
+    6.3: GOPHER
+    6.4: WORLD-WIDE WEB
+    6.5: FILE TRANSFER TIPS
+  PART 7: CJK TEXT HANDLING SOFTWARE
+    7.1: MULE
+    7.2: CNPRINT
+    7.3: MASS
+    7.4: ADOBE TYPE MANAGER (ATM)
+    7.5: MACINTOSH SOFTWARE
+    7.6: MACBLUE TELNET
+    7.7: CXTERM
+    7.8: UW-DBM
+    7.9: POSTSCRIPT
+    7.10: NJWIN
+  PART 8: CJK PROGRAMMING ISSUES
+    8.1: C AND C++
+    8.2: PERL
+    8.3: JAVA
+  A FINAL NOTE
+  ACKNOWLEDGMENTS
+  APPENDIX A: INFORMATION SOURCES
+    A.1: USENET NEWSGROUPS AND MAILING LISTS
+      A.1.1: USENET NEWSGROUPS
+      A.1.2: MAILING LISTS
+    A.2: INTERNET RESOURCES
+      A.2.1: USEFUL FTP SITES
+      A.2.2: USEFUL TELNET SITES
+      A.2.3: USEFUL GOPHER SITES
+      A.2.4: USEFUL WWW SITES
+      A.2.5: USEFUL MAIL SERVERS
+    A.3: OTHER RESOURCES
+      A.3.1: BOOKS
+      A.3.2: MAGAZINES
+      A.3.3: JOURNALS
+      A.3.4: RFCs
+      A.3.5: FAQs
+
+
+VERSION HISTORY
+
+	The following is a complete listing of the earlier versions of
+this document along with their release dates and sizes (in bytes):
+
+  Document   Version  Release Date  Size
+  ^^^^^^^^   ^^^^^^^  ^^^^^^^^^^^^  ^^^^
+  JAPAN.INF  1.0      Unknown       Unknown
+  JAPAN.INF  1.1      08/19/91      101,784
+  JAPAN.INF  1.2      03/20/92      166,929 (JIS) or 165,639 (Shift-JIS/EUC)
+  CJK.INF    1.0      06/09/95      103,985
+  CJK.INF    1.1      06/12/95      112,771
+  CJK.INF    1.2      06/14/95      125,275
+  CJK.INF    1.3      06/16/95      130,069
+  CJK.INF    1.4      06/19/95      142,543
+  CJK.INF    1.5      06/22/95      146,064
+  CJK.INF    1.6      06/29/95      150,882
+  CJK.INF    1.7      08/15/95      153,772
+  CJK.INF    1.8      09/11/95      157,295
+  CJK.INF    1.9      12/18/95      170,698
+  CJK.INF    2.0      03/12/96      175,973
+
+With the release of this version, all of the above are now considered
+obsolete. Also, note the three-year gap between the last installment
+of JAPAN.INF and the first installment of CJK.INF -- I was writing
+UJIP and my PhD dissertation during those three years. Ah, so much for
+excuses...
+
+
+RESTRICTIONS
+
+	This document is provided free-of-charge to *anyone*, but no
+person or company is permitted to modify, sell, or otherwise
+distribute it for profit or other purposes. This document may be
+bundled with commercial products only with the prior consent from the
+author, and provided that it is not modified in any way whatsoever.
+The point here is that I worked long and hard on this document so that
+lots of fine folks and companies can benefit from its contents -- not
+profit from it.
+
+
+CONTACT INFORMATION
+
+	I would enjoy hearing from readers of this document, even if
+it is just to say "hello" or whatever. I can be contacted as follows:
+
+  Ken Lunde
+  Adobe Systems Incorporated
+  1585 Charleston Road
+  P.O. Box 7900
+  Mountain View, CA 94039-7900 USA
+  415-962-3866 (office phone)
+  415-960-0886 (facsimile)
+  lunde@adobe.com (preferred)
+  lunde@ora.com or ujip@ora.com
+  WWW Home Page: http://jasper.ora.com/lunde/
+
+If you wonder what I do for my day job, read on.
+	I have been working for Adobe Systems for over four years now
+(before that I was a graduate student at UW-Madison), and my current
+position is Project Manager, CJK Type Development.
+
+
+WHAT HAPPENED TO JAPAN.INF?
+
+	Put bluntly, JAPAN.INF died. It first evolved into my first
+book entitled "Understanding Japanese Information Processing" (this
+book is now into its second printing, and the Japanese translation was
+just published). After my book came out, I did attempt to update
+JAPAN.INF, but the effort felt a bit futile. I decided that something
+fresh was necessary.
+	JAPAN.INF also evolved into this document, which breaks the
+Japanese barrier by providing similar information on Chinese and
+Korean character sets and encodings. It fills the Chinese and Korean
+gap, so to speak. My specialty (and hobby, believe it or not) is the
+field of CJK character sets and encoding systems, so I felt that
+shifting this document more towards those lines was appropriate use of
+my (copious) free time (I wish there were more than 24 hours in a
+day!). Besides, this document now becomes useful to a much broader
+audience.
+
+
+DISCLAIMER
+
+	Ah yes, the ever popular disclaimer! Here's mine. Although I
+list my address here at Adobe Systems Incorporated for contact
+purposes, Adobe Systems does not endorse this document which I have
+created, and have continued (and will continue) to update on a regular
+basis (uh, yeah, I promise this time!). This document is a personal
+endeavor to inform people of how CJK text can be handled on a variety
+of platforms.
+
+
+CONVENTIONS
+
+	The notation that is used for detailing Internet resource
+information, such as the Internet protocol type, site name, path, and
+file follows the URL (Uniform Resource Locator) notation, namely:
+
+  protocol://site-name/path/file
+
+An example URL is as follows:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/00README
+
+The protocol is FTP, the site-name is ftp.ora.com, the path is pub/
+examples/nutshell/ujip/, and the file is 00README. Also note that this
+same notation is used for invoking FTP on WWW (World Wide Web)
+browsing software, such as Mosaic, Netscape, or Lynx.
+	Note that most references to HTTP documents use the four-
+letter file extension ".html". However, some HTTP documents are on
+file systems that support only three-letter file extensions (can you
+say "MS-DOS"?), so you may encounter just ".htm". This is just to let
+you know that what you see is not a typo.
+	References to my book "Understanding Japanese Information
+Processing" are (affectionately) abbreviated as UJIP. These references
+also apply to the Japanese translation (UJIP-J).
+	Hexadecimal values are prefixed with 0x, and every two
+hexadecimal digits represent a one-byte value. Other values can be
+assumed to be in decimal notation.
+	Chinese characters are referred to as kanji (Japanese), hanzi
+(Chinese), or hanja (Korean), depending on context.
+	References to ISO 10646-1:1993 also refer to Unicode
+(usually). I have done this so that I do not have to repeat "Unicode"
+in the same context as ISO 10646-1:1993. There are times, however,
+when I need to distinguish ISO 10646-1:1993 from Unicode.
+
+
+INTRODUCTION
+
+	Electronic mail (e-mail), just one of the many Internet
+resources, has become a very efficient means of communicating both
+locally and world-wide. While it is very simple to send text which
+uses only the 94 printable ASCII characters, character sets that
+contain more than these ASCII characters pose special problems.
+	This document is primarily concerned with CJK character set
+and encoding issues. Much of this sort of information is not easily
+obtained. This represents one person's attempt at making such
+information more widely available.
+
+
+PART 1: WHAT'S UP WITH UJIP?
+
+	UJIP (First Edition) was published in September 1993 by
+O'Reilly & Associates, Incorporated. The second printing (*not* the
+Second Edition) was subsequently published in March 1994. The page
+count for both printings is unchanged at 470.
+	The following files contain the latest information about
+changes (additions and corrections) made to UJIP and UJIP-J for
+various printings, both for those that have taken place (such as for
+the second printing of the English edition) and for those that are
+planned (the first digit is the edition, and the second is the
+printing):
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/errata/ujip-errata-1-2.txt
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/errata/ujip-errata-1-3.txt
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/errata/ujip-j-errata-1-2.txt
+
+I *highly* recommend that all readers of UJIP obtain these errata
+files. Those without FTP access can request copies directly from me.
+	The Japanese translation of UJIP (UJIP-J), co-published by
+O'Reilly & Associates, Incorporated and SOFTBANK Corporation, was just
+released. The translation was done by my good friend Jack Halpern,
+along with one of his colleagues, Takeo Suzuki. The Japanese edition
+incorporates corrections and updates not yet found in the English
+edition. The page count is 535.
+	Late-breaking news! I am currently working on UJIP Second
+Edition (to be retitled as "Understanding CJK Information Processing"
+and abbreviated UCJKIP). If all goes well, it should be available by
+January 1997, and will be well over 700 pages. If there was something
+you wanted to see in UJIP, now's your chance to send me a request...
+
+
+PART 2: CJK CHARACTER SET STANDARDS
+
+	These sections describe the character sets used in Japan,
+China (PRC and Taiwan), and Korea. Exact numbers of characters are
+provided for each character set standard (when known), as well as
+tidbits of information not otherwise available. This provides the
+basic foundations for understanding how CJK scripts are handled on
+computer systems.
+	The two basic types of characters enumerated by CJK character
+set standards are Chinese characters (kanji, hanzi, or hanja), which
+number in the thousands (and, in some cases, tens of thousands), and
+characters other than Chinese characters (symbols, numerals, kana
+hangul, alphabets, and so on), which usually number in the hundreds
+(there are thousands of pre-combined hangul, though).
+	If you happen to be running X Windows, it is very easy to
+display these CJK character sets (if a bitmapped font for the
+character set exists, that is). Here is what I usually do:
+
+o Obtain a BDF (Bitmap Distribution Format) font for the target
+  character set. Try the following URLs for starters:
+
+  ftp://cair-archive.kaist.ac.kr/pub/hangul/fonts/
+  ftp://etlport.etl.go.jp/pub/mule/fonts/
+  ftp://ftp.ifcss.org/pub/software/fonts/{big5,cns,gb,misc,unicode}/bdf/
+  ftp://ftp.kuis.kyoto-u.ac.jp/misc/fonts/jisksp-fonts/
+  ftp://ftp.net.tsinghua.edu.cn/pub/Chinese/fonts/
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/unix/
+  ftp://ftp.technet.sg:/pub/chinese/fonts/
+  http://ccic.ifcss.org/www/pub/software/fonts/
+
+  BDF files usually have the string "bdf" somewhere in their file
+  name, usually at the end. If the file is compressed (noticing that
+  it ends in .gz or .Z is a good indication), decompress it. BDF files
+  are text files.
+
+o Convert the BDF file to SNF (Server Natural Format) or PCF (Portable
+  Compiled Format) using the programs "bdftosnf" or "bdftopcf,"
+  respectively. Example command lines are as follows:
+
+  % bdftopcf jiskan16-1990.bdf > k16-90.pcf
+  % bdftosnf jiskan16-1990.bdf > k16-90.snf
+
+  SNF files (and the "bdftosnf" program) are used on X11R4 and
+  earlier, and PCF files (and the "bdftopcf" program) are used on
+  X11R5 and later.
+
+o Copy the SNF or PCF file to a directory in the font search path (or
+  make a new path). Supposing I made a new directory called "fonts" in
+  my home directory, I then run "mkfontdir" on the directory
+  containing the SNF or PCF files as follows:
+
+  % mkfontdir ~/fonts
+
+  This creates a fonts.dir file in ~/fonts. I can now add this
+  directory to my font search path with the following command:
+
+  % xset +fp ~/fonts
+
+o The command "xfd" (X Font Displayer) with the "-fn" switch followed
+  by a font name then invokes a window that displays all the
+  characters of the font. In the case of two-byte (CJK) fonts, one row
+  is displayed at a time. The following is an example command line:
+
+  % xfd -fn -misc-fixed-medium-r-normal--16-150-75-75-c-160-jisx0208.1990-0
+
+  You can create a "fonts.alias" file in the same directory as the
+  "fonts.dir" file in order to shorten the name when accessing the
+  font. The alias "k16-90" could be used instead if the content of the
+  fonts.alias file is as follows:
+
+  k16-90  -misc-fixed-medium-r-normal--16-150-75-75-c-160-jisx0208.1990-0
+
+  Don't forget to execute the following command in order to make the
+  X Font Server aware of the new alias:
+
+  % xset fp rehash
+
+  Now you can use a simpler command line for "xfd" as follows:
+
+  % xfd -fn k16-90
+
+	The "X Window System User's Guide" (Volume 3 of the X Window
+System series by O'Reilly & Associates, Inc.) provides detailed
+information on managing fonts under X Windows (pp 123-160). The
+article entitled "The X Administrator: Font Formats and Utilities" (pp
+14-34 in "The X Resource," Issue 2), describes the BDF, SNF, and PCF
+formats in great detail.
+	There is another bitmap format called HBF (Hanzi Bitmap
+Format), which is similar to BDF, but optimized for fixed-width
+(monospaced) fonts. It is described in the article entitled "The HBF
+Font Format: Optimizing Fixed-pitch Font Support" (pp 113-123 in "The
+X Resource," Issue 10), and also at the following URL:
+
+  ftp://ftp.ifcss.org/pub/software/fonts/hbf-discussion/
+
+HBF fonts can be found at the following URL:
+
+  ftp://ftp.ifcss.org/pub/software/fonts/{big5,cns,gb,misc,unicode}/hbf/
+
+	Lastly, you may wish to check out my newly-developed CJK
+Character Set Server, which generates various CJK character sets with
+proper encoding applied. It is written in Perl, and accessed through
+an HTML form. This server can be considered an upgrade to my JChar
+tool (written in C). The URL is:
+
+  http://jasper.ora.com/lunde/cjk-char.html
+
+
+2.1: JAPANESE
+
+	All (national) character set standards that originate in Japan
+have names that begin with the three letters JIS. JIS is short for
+"Japanese Industrial Standard." But it is JSA (Japanese Standards
+Association) who publishes the corresponding manuals. Chapter 3 and
+Appendixes H and J of UJIP provide more detailed information on
+Japanese character set standards.
+
+
+2.1.1: JIS X 0201-1976
+
+	JIS X 0201-1976 (formerly JIS C 6220-1969; reaffirmed in 1989;
+and its revision [with no character set changes] is currently under
+public review) enumerates two sets of characters: JIS-Roman and
+half-width katakana.
+	JIS-Roman is the Japanese equivalent of the ASCII character
+set, namely 128 characters consisting of the following:
+
+o 10 numerals
+o 52 uppercase and lowercase characters of the Latin alphabet
+o 32 symbols (punctuation and so on)
+o 34 non-printing characters (white space and control characters)
+
+The term "white space" refers to characters that occupy space, but
+have no appearance, such as tabs, spaces, and termination characters
+(line feed, carriage return, and form feed).
+	So, how are JIS-Roman and ASCII different? The following
+three codes are (usually) different:
+
+  Code   ASCII        JIS-Roman
+  ^^^^   ^^^^^        ^^^^^^^^^
+  0x5C   backslash    yen symbol
+  0x7C   broken bar   bar
+  0x7E   tilde        overbar
+
+	Half-width katakana consists of 63 characters that provide a
+minimal set of characters necessary for expressing Japanese. The
+shapes are compressed, and visually occupy a space half that of
+*normal* Japanese characters.
+
+
+2.1.2: JIS X 0208-1990
+
+	This basic Japanese character set standard enumerates 6,879
+characters, 6,355 of which are kanji separated into two levels. Kanji
+in the first level are arranged by (most frequent) reading, and those
+in the second level are arranged by radical then total number of
+(remaining) strokes.
+
+o Row 1: 94 symbols
+o Row 2: 53 symbols
+o Row 3: 10 numerals and 52 uppercase and lowercase Latin alphabet
+o Row 4: 83 hiragana
+o Row 5: 86 katakana
+o Row 6: 48 uppercase and lowercase Greek alphabet
+o Row 7: 66 uppercase and lowercase Cyrillic (Russian) alphabet
+o Row 8: 32 line-drawing elements
+o Rows 16 through 47: 2,965 kanji (JIS Level 1 Kanji; last is 47-51)
+o Rows 48 through 84: 3,390 kanji (JIS Level 2 Kanji; last is 84-06)
+
+Appendix B of UJIP provides a complete illustration of the JIS X
+0208-1990 character set standard by KUTEN (row-cell) code. Appendix G
+(pp 294-317) of "Developing International Software for Windows 95 and
+Windows NT" by Nadine Kano illustrates the JIS X 0208-1990 character
+set standard plus the Microsoft extensions by Shift-JIS code
+(Microsoft calls this Code Page 932).
+	Earlier versions of this standard were dated 1978 (JIS C
+6226-1978) and 1983 (JIS X 0208-1983, formerly JIS C 6226-1983).
+	JIS X 0208 went through a revision (from November 1995 until
+February 1996), and is slated for publication sometime in 1996 (to
+become JIS X 0208-1996). More information on this revision is
+available at the following URL:
+
+  ftp://ftp.tiu.ac.jp/jis/jisx0208/
+
+
+2.1.3: JIS X 0212-1990
+
+	This supplemental Japanese character set standard enumerates
+6,067 characters, 5,801 of which are kanji ordered by radical then
+total number of (remaining) strokes. All 5,801 kanji are unique when
+compared to those in JIS X 0208-1990 (see Section 2.1.2). The
+remaining 266 characters are categorized as non-kanji.
+
+o Row 2: 21 diacritics and symbols
+o Row 6: 21 Greek characters with diacritics
+o Row 7: 26 Eastern European characters
+o Rows 9 through 11: 198 alphabetic characters
+o Rows 16 through 77: 5,801 kanji (last is 77-67)
+
+Appendix C of UJIP provides a complete illustration of the JIS X
+0212-1990 character set standard by KUTEN (row-cell) code.
+	The only commercial operating system that provides JIS X
+0212-1990 support is BTRON by Personal Media Corporation:
+
+  http://www.personal-media.co.jp/
+
+Section 3.3.18 provides information about TRON Code (used by BTRON),
+and details how it encodes the JIS X 0212-1990 character set.
+
+
+2.1.4: JIS X 0221-1995
+
+	This document is, for all practical purposes, the Japanese
+translation of ISO 10646-1:1993 (see Section 2.5.1). Like ISO
+10646-1:1993, it is based on Unicode Version 1.1.
+	It is noteworthy that JIS X 0221-1995 enumerates subsets that
+are applicable for Japanese use (a brief description of their contents
+in parentheses):
+
+o BASIC JAPANESE (JIS X 0208-1990 and JIS X 0201-1976 -- characters
+  that can be created by means of combining are not included -- 6,884
+  characters)
+o JAPANESE NON IDEOGRAPHICS SUPPLEMENT (1,913 characters: all non-
+  kanji of JIS X 0212-1990 plus hundreds of non-JIS characters)
+o JAPANESE IDEOGRAPHICS SUPPLEMENT 1 (918 frequently-used kanji from
+  JIS X 0212-1990, including 28 that are identical to kanji forms in
+  JIS C 6226-1978)
+o JAPANESE IDEOGRAPHICS SUPPLEMENT 2 (the remainder of JIS X 0212-
+  1990, namely 4,883 kanji)
+o JAPANESE IDEOGRAPHICS SUPPLEMENT 3 (the remaining kanji of ISO
+  10646-1:1993, namely 8,746 characters)
+o FULLWIDTH ALPHANUMERICS (94 characters; for compatibility)
+o HALFWIDTH KATAKANA (63 characters; for compatibility)
+
+	Pages 893 through 993 provide Kangxi Zidian (a classic
+300-year-old Chinese character dictionary containing approximately
+50,000 characters) and Dai Kanwa Jiten (also known as Morohashi)
+indexes for the entire Chinese character block, namely from 0x4E00
+through 0x9FA5.
+	At 25,750 Yen, it is actually cheaper than ISO 10646-1:1993!
+
+
+2.1.5: JIS X 0213-199X
+
+	I recently became aware that JSA plans to publish an extension
+to JIS X 0208, containing approximately 2,000 characters (kanji and
+non-kanji). A public review of this new standard is planned for Summer
+1996. I would expect that its information will eventually be available
+at the following URL:
+
+    ftp://ftp.tiu.ac.jp/jis/
+
+
+2.1.6: OBSOLETE STANDARDS
+
+	JIS C 6226-1978 and JIS X 0208-1983 (formerly JIS C 6226-1983)
+have been superseded by JIS X 0208-1990. Section 4.1 provides details
+on the changes made between these earlier versions of JIS X 0208.
+	JIS X 0221-1995 does not mean the end of JIS X 0201-1976, JIS
+X 0208-1990, and JIS X 0212-1990. Instead, it will co-exist with those
+standards.
+
+
+2.2: CHINESE (PRC)
+
+	All character set standards that originate in PRC have
+designations that begin with "GB." "GB" is short for "Guo Biao" (which
+is, in turn, short for "Guojia Biaojun") and means "National
+Standard." A select few also have "/T" attached. The "T" presumably is
+short for "Traditional." Section 2.2.11 describes ISO-IR-165:1992,
+which is a variant of GB 2312-80. It is included here because of this
+relationship.
+	Most people correlate GB character set standards with
+simplified Chinese, but as you will see below, that is not always the
+case.
+	There are three basic character sets, each one having a
+simplified and traditional version.
+
+  Character Set  Set Number  Character Forms
+  ^^^^^^^^^^^^^  ^^^^^^^^^^  ^^^^^^^^^^^^^^^
+  GB 2312-80     0           Simplified
+  GB/T 12345-90  1           Traditional of GB 2312-80
+  GB 7589-87     2           Simplified
+  GB/T 13131-9X  3           Traditional of GB 7589-87
+  GB 7590-87     4           Simplified
+  GB/T 13132-9X  5           Traditional of GB 7590-87
+
+
+2.2.1: GB 1988-89
+
+	This character set, formerly GB 1988-80 and sometimes referred
+to as GB-Roman, is the Chinese analog to ASCII and ISO 646. The main
+difference is that the currency symbol (0x24), which is represented as
+a dollar sign ($) in ASCII, is represented as a Chinese Yuan
+(currency) symbol instead. GB 1988-89 is sometimes referred to as
+GB-Roman.
+
+
+2.2.2: GB 2312-80
+
+	This basic (simplified) Chinese character set standard
+enumerates 7,445 characters, 6,763 of which are hanzi separated into
+two levels. Hanzi in the first level are arranged by reading, and
+those in the second level are arranges by radical then total number of
+(remaining) strokes. GB 2312-80 is also known as the "Primary Set,"
+GB0 (zero), or just GB.
+
+o Row 1: 94 symbols
+o Row 2: 72 numerals
+o Row 3: 94 full-width GB 1988-89 characters (see Section 2.2.1)
+o Row 4: 83 hiragana
+o Row 5: 86 katakana
+o Row 6: 48 uppercase and lowercase Greek alphabet
+o Row 7: 66 uppercase and lowercase Cyrillic (Russian) alphabet
+o Row 8: 26 Pinyin and 37 Bopomofo characters
+o Row 9: 76 line-drawing elements (09-04 through 09-79)
+o Rows 16 through 55: 3,755 hanzi (Level 1 Hanzi; last is 55-89)
+o Rows 56 through 87: 3,008 hanzi (Level 2 Hanzi; last is 87-94)
+
+Compare some of the structure with JIS X 0208-1990, and you will find
+many similarities, such as:
+
+o Hiragana, katakana, Greek, and Cyrillic characters are in Rows 4, 5,
+  6, and 7, respectively
+o Chinese characters begin at Row 16
+o Chinese characters are separated into two levels
+o Level 1 arranged by reading
+o Level 2 arranged by radical then total number of strokes
+
+The Japanese standard, JIS C 6226-1978, came out in 1978, which means
+that it pre-dates GB 2312-80. The above similarities could not be by
+coincidence, but rather by design.
+	Appendix G (pp 318-344) of "Developing International Software
+for Windows 95 and Windows NT" by Nadine Kano illustrates the GB 2312-
+80 character set standard by EUC code (Microsoft calls this Code Page
+936). Code Page 936 incorporates the correction of the hanzi at 79-81,
+and the correction of the order of 07-22 and 07-23 (see Section 2.2.3
+for more details).
+
+
+2.2.3: GB 6345.1-86
+
+	This document specifies corrections and additions to GB
+2312-80 (see Section 2.2.2). The following is a detailed enumeration
+of the changes:
+
+o The form of "g" in Row 3 (position 71) was altered
+o Row 8 has six additional Pinyin characters (08-27 through 08-32)
+o Row 10 contains half-width versions of Row 3 (94 characters)
+o Row 11 contains half-width versions of the Pinyin characters from
+  Row 8 (32 characters; 11-01 through 11-32)
+o The hanzi at 79-81 was corrected to have a simplified left-side
+  radical (this was an error in GB 2312-80)
+
+Note that these changes affect the total number of characters in GB
+2312-80 -- an increase of 132 characters. This now makes 7,577 as
+the total number of characters in GB 2312-80 (7,445 plus 132).
+	There was, however, an undocumented correction made in GB
+6345.1-86. The order of characters 07-22 and 07-23 (uppercase
+Cyrillic) were reversed. This error is apparently in the first and
+perhaps second printing of the GB 2312-80 manual, because the copy I
+have is from the third printing, and this has been corrected. Page 145
+(Figure 113) of John Clews' "Language Automation Worldwide: The
+Development of Character Set Standards" illustrates this error.
+Developers should take special note of this -- I have seen GB 2312-80
+based font products that propagate this ordering error.
+
+
+2.2.4: GB 7589-87
+
+	This character set enumerates 7,237 hanzi in Rows 16 through
+92 (last is 92-93), and they are ordered by radical then total number
+of (remaining) strokes. GB 7589-87 is also known as the "Second
+Supplementary Set" or GB2.
+
+
+2.2.5: GB 7590-87
+
+	This character set enumerates 7,039 hanzi in Rows 16 through
+90 (last is 90-83), and they are ordered by radical then total number
+of (remaining) strokes. GB 7590-87 is also known as the "Fourth
+Supplementary Set" or GB4.
+
+
+2.2.6: GB 8565.2-88
+
+	This standard makes additions to GB 2312-80 (these additions
+are separate from those made in GB 6345.1-86 described in Section
+2.2.3). GB 8565.2-88 is also known as GB8. In this case there are 705
+additions, indicated as follows:
+
+o Row 13 contains 50 hanzi from GB 7589-87 (last is 13-50)
+o Row 14 contains 92 hanzi from GB 7590-87 (last is 14-92)
+o Row 15 contains 69 non-hanzi indicating dates and times, plus 24
+  miscellaneous hanzi (for personal/place names and radicals; last is
+  15-93).
+o Rows 90 through 94 contain 470 hanzi from GB 7589-87 (94 each)
+
+GB 8565.2-88 therefore provides a total of 8,150 characters (7,445
+plus 705).
+
+
+2.2.7: GB/T 12345-90
+
+	This character set is nearly identical to GB 2312-80 (see
+Section 2.2.2) in terms of the number and arrangement of characters,
+but simplified hanzi are replaced by their traditional versions. GB/T
+12345-90 is also known as the "Supplementary Set" or GB1.
+	The following are some interesting facts about this character
+set (some instances of simplified/traditional pairs that appear below
+are actually character form differences):
+
+o 29 vertical-use characters (punctuation and parentheses) included in
+  Row 6 (06-57 through 06-85).
+
+o 2,118 traditional hanzi replace simplified hanzi in Rows 16 through
+  87. The "G1-Unique" appendix of the unofficial version (supplied to
+  the CJK-JRG for Han Unification purposes) is missing the following
+  four (specifies only 2,114):
+
+  0x5B3B    0x6D2F
+  0x5E7C    0x6F71
+
+  But, ISO 10646-1:1993 ended up getting these hanzi included anyway,
+  with correct mappings.
+
+o Four simplified/traditional hanzi pairs (eight affected code points)
+  in rows 16 through 87 are swapped:
+
+  0x3A73 <-> 0x6161
+  0x5577 <-> 0x6167
+  0x5360 <-> 0x6245 (see the next bullet)
+  0x4334 <-> 0x7761
+
+o One hanzi (0x6245), after being swapped, had its left-side radical
+  unsimplified (this character, now at 0x5360, is considered part of
+  the 2,118 traditional hanzi from the second bullet):
+
+  0x6245 -> 0x5360
+
+o 103 hanzi included in Rows 88 (94 characters) and 89 (9 characters;
+  89-01 through 89-09). These are all related to characters between
+  Rows 16 and 87.
+
+  - 41 simplified hanzi from Rows 16 through 87 moved to Rows 88 and
+    89 (traditional hanzi are now at the original code points):
+
+    0x3327 -> 0x7827  0x3E5D -> 0x7846  0x4B49 -> 0x7869
+    0x3365 -> 0x7828  0x3F64 -> 0x7849  0x4C28 -> 0x786B
+    0x3373 -> 0x7829  0x402F -> 0x784B  0x4D3F -> 0x786F
+    0x3533 -> 0x782C  0x4030 -> 0x784C  0x4D72 -> 0x7871
+    0x356D -> 0x782D  0x406F -> 0x784E  0x5236 -> 0x7878
+    0x3637 -> 0x782F  0x4131 -> 0x7850  0x5374 -> 0x7879
+    0x3736 -> 0x7832  0x463B -> 0x785C  0x5438 -> 0x787C
+    0x3761 -> 0x7833  0x463E -> 0x785D  0x5446 -> 0x787D
+    0x3849 -> 0x7835  0x464B -> 0x785E  0x5622 -> 0x7921
+    0x3963 -> 0x7838  0x464D -> 0x785F  0x563B -> 0x7923
+    0x3B2E -> 0x783B  0x4653 -> 0x7860  0x5656 -> 0x7926
+    0x3C38 -> 0x7840  0x4837 -> 0x7866  0x567E -> 0x7928
+    0x3C5B -> 0x7842  0x4961 -> 0x7867  0x573C -> 0x7929
+    0x3C76 -> 0x7843  0x4A75 -> 0x7868
+
+  - 62 hanzi added to Rows 88 and 89 (the gaps from the above are
+    filled in). These were mostly to account for multiple traditional
+    hanzi collapsing into a single simplified form.
+
+  - The following code point mappings illustrate how all of these 103
+    hanzi are related to hanzi between Rows 16 and 87 (note how many
+    of these 103 hanzi map to a single code point):
+
+    0x7821 -> 0x305A  0x7844 -> 0x3D2A  0x7867 -> 0x4961
+    0x7822 -> 0x3065  0x7845 -> 0x3E21  0x7868 -> 0x4A75
+    0x7823 -> 0x316D  0x7846 -> 0x3E5D  0x7869 -> 0x4B49
+    0x7824 -> 0x3170  0x7847 -> 0x3E6D  0x786A -> 0x4B55
+    0x7825 -> 0x3237  0x7848 -> 0x3F4B  0x786B -> 0x4C28
+    0x7826 -> 0x3245  0x7849 -> 0x3F64  0x786C -> 0x4C28
+    0x7827 -> 0x3327  0x784A -> 0x4027  0x786D -> 0x4C28
+    0x7828 -> 0x3365  0x784B -> 0x402F  0x786E -> 0x4C33
+    0x7829 -> 0x3373  0x784C -> 0x4030  0x786F -> 0x4D3F
+    0x782A -> 0x3376  0x784D -> 0x405B  0x7870 -> 0x4D45
+    0x782B -> 0x3531  0x784E -> 0x406F  0x7871 -> 0x4D72
+    0x782C -> 0x3533  0x784F -> 0x407A  0x7872 -> 0x4F35
+    0x782D -> 0x356D  0x7850 -> 0x4131  0x7873 -> 0x4F35
+    0x782E -> 0x362C  0x7851 -> 0x414B  0x7874 -> 0x4F4C
+    0x782F -> 0x3637  0x7852 -> 0x4231  0x7875 -> 0x4F72
+    0x7830 -> 0x3671  0x7853 -> 0x425E  0x7876 -> 0x506B
+    0x7831 -> 0x3722  0x7854 -> 0x4339  0x7877 -> 0x5229
+    0x7832 -> 0x3736  0x7855 -> 0x4349  0x7878 -> 0x5236
+    0x7833 -> 0x3761  0x7856 -> 0x4349  0x7879 -> 0x5374
+    0x7834 -> 0x3834  0x7857 -> 0x4349  0x787A -> 0x5379
+    0x7835 -> 0x3849  0x7858 -> 0x4356  0x787B -> 0x5375
+    0x7836 -> 0x3948  0x7859 -> 0x4366  0x787C -> 0x5438
+    0x7837 -> 0x394E  0x785A -> 0x436F  0x787D -> 0x5446
+    0x7838 -> 0x3963  0x785B -> 0x3159  0x787E -> 0x5460
+    0x7839 -> 0x6358  0x785C -> 0x463B  0x7921 -> 0x5622
+    0x783A -> 0x3A7A  0x785D -> 0x463E  0x7922 -> 0x563B
+    0x783B -> 0x3B2E  0x785E -> 0x464B  0x7923 -> 0x563B
+    0x783C -> 0x3B58  0x785F -> 0x464D  0x7924 -> 0x5642
+    0x783D -> 0x3B63  0x7860 -> 0x4653  0x7925 -> 0x5646
+    0x783E -> 0x3B71  0x7861 -> 0x4727  0x7926 -> 0x5656
+    0x783F -> 0x3C22  0x7862 -> 0x4729  0x7927 -> 0x566C
+    0x7840 -> 0x3C38  0x7863 -> 0x4F4B  0x7928 -> 0x567E
+    0x7841 -> 0x3C52  0x7864 -> 0x476F  0x7929 -> 0x573C
+    0x7842 -> 0x3C5B  0x7865 -> 0x477A
+    0x7843 -> 0x3C76  0x7866 -> 0x4837
+
+So, if we total everything up, we see that GB/T 12345-90 has 2,180
+hanzi (2,118 are replacements for GB 2312-80 code points, and 62 are
+additional) and 29 non-hanzi not found in GB 2312-80.
+	Note that the printing of the GB/T 12345-90 has some
+character-form errors. The errors I am aware of are as follows:
+
+  Code Point  Description of Error
+  ^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^
+  0x4125      The upper-left element should be "tree" instead of
+              "warrior"
+  0x596C      The "bird" radical should not include the "fire" element
+
+
+2.2.8: GB/T 13131-9X
+
+	This character set is identical to GB 7589-87 (see Section
+2.2.4) in terms of number of characters, but simplified hanzi are
+replaced by their traditional versions. The exact number of such
+substitutions is currently unknown to this author. GB/T 13131-9X is
+also known as the "Third Supplementary Set" or GB3.
+
+
+2.2.9: GB/T 13132-9X
+
+	This character set is identical to GB 7590-87 (see Section
+2.2.5) in terms of number of characters, but simplified hanzi are
+replaced by their traditional versions. The exact number of such
+substitutions is currently unknown to this author. GB/T 13132-9X is
+also known as the "Fifth Supplementary Set" or GB5.
+
+
+2.2.10: GB 13000.1-93
+
+	This document is, for all practical purposes, the Chinese
+translation of ISO 10646-1:1993 (see Section 2.5.1).
+
+
+2.2.11: ISO-IR-165:1992
+
+	This standard, also known as the CCITT Chinese Set, is a
+variant of GB 2312-80 with the following characteristics:
+
+o GB 6345.1-86 modifications (including the undocumented one) and
+  additions, namely 132 characters (see Section 2.2.3)
+o GB 8565.2-88 additions, namely 705 characters (see Section 2.2.6)
+o Row 6 contains 22 background (shading) characters (06-60 through
+  06-81)
+o Row 12 contains 94 hanzi
+o Row 13 contains 44 additional hanzi (13-51 through 13-94; fills the
+  row)
+o Row 15 contains 1 additional hanzi (15-94)
+
+ISO-IR-165:1992 can therefore be considered a superset of GB 2312-80,
+GB 6345.1-86, and GB 8565.2-88. This means 8,443 total characters
+compared to the 7,445 in GB 2312-80, 7,577 in GB 6345.1-86, and the
+8,150 in GB 8565.2-88.
+
+
+2.2.12: OBSOLETE STANDARDS
+
+	Most GB standards seem to be revised through other documents,
+so it is hard to point to a standard and claim that it is obsolete.
+The only revision I am aware of is the GB 1988-89 (the original was
+named GB 1988-80).
+
+
+2.3: CHINESE (TAIWAN)
+
+	The sections below describe two major Taiwanese character
+sets, namely Big Five and CNS 11643-1992. As you will learn they are
+somewhat compatible. CCCII, also developed in Taiwan, is described in
+Section 2.5.2.
+
+
+2.3.1: BIG FIVE
+
+	The Big Five character set is composed of 94 rows of 157
+characters each (the 157 characters of each row are encoded in an
+initial group of 63 codes followed by the remaining 94 codes). The
+following is a break-down of its contents:
+
+o Row 1: 157 symbols
+o Row 2: 157 symbols
+o Row 3: 94 symbols
+o Rows 4 through 38: 5,401 hanzi (Level 1 Hanzi; last is 38-63)
+o Rows 41 through 89: 7,652 hanzi (Level 2 Hanzi; last is 89-116)
+
+This forms what I consider to be the basic Big Five set. Actually, two
+of the hanzi in Level 2 are duplicates, so there are actually only
+7,650 unique hanzi in Level 2.
+	There are two major extensions to Big Five. The first really
+has no name, and can be considered part of the basic Big Five set as
+specified above. It adds the following characters:
+
+o Rows 38-39: 4 Japanese iteration marks, 83 hiragana, 86 katakana, 66
+  uppercase and lowercase Cyrillic (Russian) alphabet, 10 circled
+  digits, and 10 parenthesized digits
+
+	The other extension was developed by a company called ETen
+Information System in Taiwan, and is actually considered to be the
+most widely used version of Big Five. It provides the following
+extensions to Big Five (different from the above extension):
+
+o Rows 38-40: 10 circled digits, 10 parenthesized digits, 10 lowercase
+  Roman numerals, 25 classical radicals, 15 Japanese-specific symbols,
+  83 hiragana, 86 katakana, 66 uppercase and lowercase Cyrillic
+  (Russian) alphabet, 3 arrows, 10 radical-like hanzi elements, 40
+  fraction-like digits, and 7 symbols
+o Row 89: 7 hanzi, 33 double-lined line-drawing elements, and a black
+  box
+
+	It is *very* important to note that while these two extensions
+have many common portions (in particular, hiragana, katakana, the
+Cyrillic alphabet, and so on), they do not share the same code points
+for such characters.
+	Appendix G (pp 407-450) of "Developing International Software
+for Windows 95 and Windows NT" by Nadine Kano illustrates the Big Five
+character set standard by Big Five code (Microsoft calls this Code
+Page 950). Code Page 950 incorporates some of the ETen extensions,
+namely those in Row 89.
+
+
+2.3.2: CNS 11643-1992
+
+	CNS 11643-1992 (also known as CNS 11643 X 5012), by
+definition, consists of 16 planes of characters, seven of which have
+character assignments. Each plane is a 94-row-by-94-cell matrix
+capable of holding a total of 8,836 characters. CNS stands for
+"Chinese National Standard."
+	CNS 11643-1992 specifies characters only in the first seven
+planes. A break-down of characters, by plane, is as follows:
+
+o Plane 1:
+  - 438 symbols in Rows 1 through 6
+  - 213 classical radicals in Rows 7 through 9
+  - 33 graphic representations of control characters in Row 34
+  - 5,401 hanzi in Rows 36 through 93 (last is 93-43)
+o Plane 2: 7,650 hanzi in Rows 1 through 82 (last is 82-36)
+o Plane 3: 6,148 hanzi in Rows 1 through 66 (last is 66-38)
+o Plane 4: 7,298 hanzi in Rows 1 through 78 (last is 78-60)
+o Plane 5: 8,603 hanzi in Rows 1 through 92 (last is 92-49)
+o Plane 6: 6,388 hanzi in Rows 1 through 68 (last is 68-90)
+o Plane 7: 6,539 hanzi in Rows 1 through 70 (last is 70-53)
+
+The total number of characters in CNS 11643-1992 is a staggering
+48,711 characters, 48,027 of which are hanzi. Also note that number of
+hanzi in Plane 1 is identical to Level 1 hanzi of Big Five (see
+Section 2.3.1). The 2 extra hanzi in Level 2 hanzi of Big Five are
+actually redundant, and are therefore not in CNS 11643-1992 Plane 2.
+	It is rumored that Plane 8 is currently being defined, and
+will add yet more hanzi to this standard.
+
+
+2.3.3: CNS 5205
+
+	This character set is Taiwan's analog to ASCII and ISO 646,
+and is reportedly rarely used. How it differs from ASCII, if at all,
+is unknown to this author.
+
+
+2.3.4: OBSOLETE STANDARDS
+
+	CNS 11643-1986 specified characters only in the first three
+planes, as described in Section 2.3.2. Also, Plane 3 of CNS 11643-1992
+was called Plane 14 of CNS 11643-1986.
+
+
+2.4: KOREAN
+
+	The sections below describe the most current Korean character
+sets, namely KS C 5636-1993, KS C 5601-1992, KS C 5657-1991, and KS C
+5700-1995. "KS" stands for "Korean Standard."
+
+
+2.4.1: KS C 5636-1993
+
+	This character set (published on January 6, 1993), formerly KS
+C 5636-1989 (published on April 22, 1989) and sometimes referred to as
+KS-Roman, is the Korean analog to ASCII and ISO 646-1991. The primary
+difference is that the ASCII backslash (0x5C) is represented as a Won
+symbol.
+
+
+2.4.2: KS C 5601-1992
+
+	This basic Korean character set standard enumerates 8,224
+characters, 4,888 of which are hanja, and 2,350 of which are pre-
+combined hangul. The hanja and hangul blocks are arranged by reading.
+The following is a break-down of its contents:
+
+o Row 1: 94 symbols
+o Row 2: 69 abbreviations and symbols
+o Row 3: 94 full-width KS C 5636-1993 characters (see Section 2.4.1)
+o Row 4: 94 hangul elements
+o Row 5: 68 lowercase and uppercase Roman numerals and lowercase and
+  uppercase Greek alphabet
+o Row 6: 68 line-drawing elements
+o Row 7: 79 abbreviations
+o Row 8: 91 phonetic symbols, circled characters, and fractions
+o Row 9: 94 phonetic symbols, parenthesized characters, subscripts,
+  and superscripts
+o Row 10: 83 hiragana
+o Row 11: 86 katakana
+o Row 12: 66 lowercase and uppercase Cyrillic (Russian) alphabet
+o Rows 16 through 40: 2,350 pre-combined hangul (last is 40-94)
+o Rows 42 through 93: 4,888 hanja (last is 93-94)
+
+Rows 41 and 94 are designated for user-defined characters.
+	There are many similarities with JIS X 0208-1990 and GB
+2312-80, such as hiragana, katakana, Greek, and Cyrillic characters,
+but they are assigned to different rows.
+	There is an interesting note about the hanja block (Rows 42
+through 93). Although there are 4,888 hanja, not all are unique. The
+hanja block is arranged by reading, and in those cases when a hanja
+has more than one reading, that hanja is duplicated (sometimes more
+than once) in the same character set. There are 268 such cases of
+duplicate hanja in KS C 5601-1992, meaning that it contains 4,620
+unique hanja. If you have a copy of the KS C 5601-1992 manual handy,
+you can compare the following four code points:
+
+  0x6445
+  0x5162
+  0x5525
+  0x6879
+
+While most of these cases involve two hanja instances, there are four
+hanja that have three instances, and one (listed above) that has four!
+This is the only CJK character set that has this property of
+intentionally duplicating Chinese characters. See Section 4.4 for more
+details.
+	Annex 3 of this standard defines the complete set of 11,172
+pre-combined hangul characters, also known as Johab. Johab refers to
+the encoding method, and is almost like encoding all possible three-
+letter words (meaning that most are nonsense). See Section 3.3.5 for
+more details on Johab encoding.
+
+
+2.4.3: KS C 5657-1991
+
+	This character set standard provides supplemental characters
+for Korean writing, to include symbols, pre-combined hangul, and
+hanja. The following is a break-down of its contents:
+
+o Rows 1 through 7: 613 lowercase and uppercase Latin characters with
+  diacritics (see note below)
+o Rows 8 through 10: 273 lowercase and uppercase Greek characters with
+  diacritics
+o Rows 11 through 13: 275 symbols
+o Row 14: 27 compound hangul elements
+o Rows 16 through 36: 1,930 pre-combined hangul (last is 36-50)
+o Rows 37 through 54: 1,675 pre-combined hangul (last is 54-77; see
+  note below)
+o Rows 55 through 85: 2,856 hanja (last is 85-36)
+
+The KS C 5657-1991 manual has a possible error (or at least an
+inconsistency) for Rows 1 through 7. The manual says that there are
+615 characters in that range, but I only counted 613. The difference
+can be found on page 19 as the following two characters:
+
+  Character Code  Character
+  ^^^^^^^^^^^^^^  ^^^^^^^^^
+  0x2137          X
+  0x217A          TM
+
+An "X" doesn't belong there (it is already in KS C 5601-1992 at code
+point 0x2358), and the trademark symbol is also part of KS C 5601-1992
+at code point 0x2262. This is why I feel that my count of 613 is more
+accurate than what is explicitly stated in the manual on page 2.
+	Also, page 2 of the manual says that Rows 37 through 54
+contains 1,677 pre-combined hangul, but I only counted 1,675 (17 rows
+of 94 characters plus a final row with 77 characters -- do the math
+for yourself).
+	Here's another interesting note. My official copy of this
+standard has all of its 2,856 hanja hand-written.
+
+
+2.4.4: GB 12052-89
+
+	You may be asking yourself why a GB standard is listed under
+the Korean section of this document. Well, there is a rather large
+Korean population in China (Korea was considered part of China before
+the 1890s), and they need a character set standard for communicating
+using hangul. GB 12052-89 is a Korean character set standard
+established by China (PRC), and enumerates a total of 5,979
+characters.
+	The following is the arrangement of this character set:
+
+o Row 1: 94 symbols
+o Row 2: 72 numerals
+o Row 3: 94 full-width ASCII characters
+o Row 4: 83 hiragana
+o Row 5: 86 katakana
+o Row 6: 48 uppercase and lowercase Greek alphabet
+o Row 7: 66 uppercase and lowercase Cyrillic (Russian) alphabet
+o Row 8: 26 Pinyin and 37 Bopomofo characters
+o Row 9: 76 line-drawing elements (09-04 through 09-79)
+o Rows 16 through 37: 2,068 pre-combined hangul (Level 1 Hangul, Part
+  1; last is 37-94)
+o Rows 38 through 52: 1,356 pre-combined hangul (Level 1 Hangul, Part
+  2; last is 52-40)
+o Rows 53 through 71: 1,779 pre-combined hangul (Level 2 Hangul; last
+  is 71-87)
+o Rows 71 through 72: 94 "Idu" hanja (71-89 through 72-88)
+
+	There are a few interesting notes I can make about this
+character set:
+
+o Rows 1 through 9 are identical to the same rows in GB 2312-80,
+  except that 03-04 is a dollar sign, not a Chinese Yuan (currency)
+  symbol.
+
+o The GB 12052-89 manual states on pp 1 and 3 that Rows 53 through 72
+  contain 1,876 characters, but I only counted 1,873 (1,779 hangul
+  plus 94 hanja).
+
+o The total number of characters, 5,979, is correctly stated in the
+  manual although the hangul count is incorrect.
+
+o The arrangement and ordering of these hangul bear no relationship to
+  that of KS C 5601-1992. Both standards order by reading, which is
+  the only way in which they are similar.
+
+	I am not aware to what extent this character set is being
+used (and who might be using it).
+
+
+2.4.5: KS C 5700-1995
+
+	Korea has developed a new character set standard called KS C
+5700-1995. It is equivalent to ISO 10646-1:1993, but have pre-combined
+hangul as provided (and ordered) in Unicode Version 2.0 (meaning that
+all 11,172 hangul are in a contiguous block).
+
+
+2.4.6: OBSOLETE STANDARDS
+
+	KS C 5601-1986, KS C 5601-1987, and KS C 5601-1989 are the
+same, character-set wise, to KS C 5601-1992. The 1992 edition provides
+more material in the form of annexes. KS C 5601-1982, the original
+version, enumerated only the 51 basic hangul elements in a one-byte 7-
+and 8-bit encoding. This information is still part of KS C 5601-1992,
+but in Annex 4.
+	There were two earlier multiple-byte standards called KS C
+5619-1982 and KIPS. KS C 5619-1982 enumerated 51 hangul elements,
+1,316 pre-combined hangul, and 1,672 hanja. KIPS (Korean Information
+Processing System) enumerated 2,058 pre-combined hangul and 2,392
+hanja. Both have been rendered obsolete by KS C 5601-1987.
+
+
+2.5: CJK
+
+	The only true CJK character sets available today are CCCII,
+ANSI Z39.64-1989 (also known as EACC or REACC), and ISO 10646-1:1993.
+ISO 10646-1:1993 is unique in that it goes beyond CJK (Chinese
+characters) to provide virtually all commonly-used alphabetic scripts.
+	Of these three, only ISO 10646-1:1993 is expected to gain
+wide-spread acceptance. CCCII and ANSI Z39.64-1989 are still used
+today, but primarily for bibliographic purposes.
+
+
+2.5.1: ISO 10646-1:1993
+
+	Published by ISO (International Organization for
+Standardization) in Switzerland, this character set enumerates over
+34,000 characters. Its I-zone ("I" stands for "Ideograph") enumerates
+approximately 21,000 Chinese characters, which is the result of a
+massive effort by the CJK-JRG (CJK Joint Research Group) called "Han
+Unification." The CJK-JRG is now called the IRG (Ideographic
+Rapporteur Group), and is off doing additional research for future
+Chinese character allocations to ISO 10646-1:1993.
+	The Basic Multilingual Plane (BMP) of ISO 10646-1:1993 is
+equivalent to Unicode. While Unicode is comprised of a single plane of
+characters (which doesn't allow much room for future expansion), ISO
+10646-1:1993 contains hundreds of such planes.
+	One very nice feature of this standard's manual are the CJK
+code correspondence tables in Section 26 (pp 262-698). Four columns
+are provided for each ISO 10646-1:1993 I-zone code point -- simplified
+Chinese, traditional Chinese, Japanese, and Korean. If the ISO
+10646-1:1993 Chinese character maps to one of these locales, the
+hexadecimal character code, (decimal) row-cell value, and glyph for
+that locale is provided. The corresponding tables in Volume 2 of "The
+Unicode Standard" provide character codes (sometimes the hexadecimal
+character code, and sometimes the row-cell value) and a single
+glyph. Quite unfortunate. I hear that a new edition of "The Unicode
+Standard" is about to be released. I hope that this problem has been
+addressed.
+	ISO 10646-1:1993 does not replace existing national character
+set standards. It simply provides a single character set that is a
+superset of *most* national character sets. For example, only a
+fraction of the 48,027 hanzi in CNS 11643-1992 are included in ISO
+10646-1:1993. I feel that it is best to think of ISO 10646-1:1993 as
+"just another character set." My philosophy is to support the maximum
+number of character sets and encodings as possible.
+	A note about ordering this standard. If you order through ANSI
+in the United States, try to get an original manual. It is not easy,
+though. You see, ANSI has duplication rights for ISO documents.
+Photocopying Section 26 (pp 262-698) doesn't do the Chinese characters
+much justice, and some characters become hard-to-read. Unfortunately,
+there is no way to indicate that you want an original ISO document
+through ANSI's ordering process, so some post-ordering haggling may
+become necessary.
+	More information on ISO 10646-1:1993 can be found at the
+following URL:
+
+  http://www.unicode.org/
+
+	Japan, China (PRC), and Korea have developed their own
+national standards that are based on ISO 10646-1:1993. They are
+designated as JIS X 0221-1995 (see Section 2.1.4), GB 13000.1-93 (see
+Section 2.2.10), and KS C 5700-1995 (see Section 2.4.5), respectively.
+	Note that these national-standard versions of Unicode are
+aligned differently with its three versions:
+
+  Unicode Version 1.0
+  Unicode Version 1.1 <-> ISO 10646-1:1993, JIS X 0221-1995, GB 13000.1-93
+  Unicode Version 2.0 <-> KS C 5700-1995
+
+One of the major changes made for Unicode Version 2.0 is the inclusion
+of all 11,172 hangul. Versions 1.1 has 6,656 hangul.
+
+
+2.5.2: CCCII
+
+	The Chinese Character Analysis Group in Taiwan developed CCCII
+(Chinese Character Code for Information Interchange) in the 1980s.
+This character set is composed of 94 planes that have 94 rows and 94
+cells (94 x 94 x 94 = 830,584 characters). Furthermore, every six
+planes constitute a "layer" (6 x 94 x 94 = 53,016 characters). The
+following is the contents of each of the 16 layers (the 16th layer
+contains only four planes):
+
+o Layer 1: Symbols and Traditional Chinese characters
+o Layer 2: Simplified Chinese characters from PRC
+o Layers 3 through 12: Variant Chinese character forms
+o Layer 13: Japanese kana and kokuji (Japanese-made kanji)
+o Layer 14: Korean hangul
+o Layer 15: Reserved
+o Layer 16: Miscellaneous characters (Japanese and Korean)
+
+	Layers 1 through 12 have a special meaning and relationship.
+The same code point in these layers is designed to hold the same
+character, but with different forms. Layer 1 code points contain the
+traditional character forms, Layer 2 code points contain the
+simplified character forms (if any), and Layers 3 through 12 contain
+variant character forms (if any). For example, given a Chinese
+character with three forms, its encoding and arrangement may be as
+follows:
+
+  Character Form  Code Point  Layer
+  ^^^^^^^^^^^^^^  ^^^^^^^^^^  ^^^^^
+  Traditional     0x224E41    1
+  Simplified      0x284E41    2
+  Variant         0x2E4E41    3
+
+Note how the second and third bytes (0x4E41) are identical in all
+three instances -- only the first byte's value, which indicates the
+layer, differs. Needless to say, this method of arrangement provides
+easy access to related Chinese character forms. No wonder it is used
+for bibliographic purposes.
+	The first layer is composed as follows:
+
+o Plane 1/Row 2: 56 mathematical symbols
+o Plane 1/Row 3: The ASCII character set
+o Plane 1/Row 11: 35 Chinese punctuation marks
+o Plane 1/Rows 12 through 14: 214 classical radicals
+o Plane 1/Row 15: 41 Chinese numerical symbols, 37 phonetic symbols,
+  and 4 tone marks
+o Plane 1/Rows 16 through 67: 4,808 common Chinese characters
+o Plane 1/Row 68 through Plane 3/Row 64: 17,032 less common Chinese
+  characters
+o Plane 3/Row 65 through Plane 6/Row 5: 20,583 rare Chinese characters
+
+Note that Row 1 of all planes is reserved, and never assigned
+characters. Take this into account when studying the above table
+ranges that span planes (that is, skip Row 1).
+	In addition to the above, there are 11,517 simplified Chinese
+characters in Layer 2 (3,625 are considered PRC simplified forms, and
+the remaining 7,892 are regular simplified forms). This provides a
+total of 53,940 Chinese characters.
+	Further information on CCCII (to include very interesting
+historical notes) can be found on pp 146-149 of John Clews' "Language
+Automation Worldwide: The Development of Character Set Standards" and
+Chapter 6 of Huang & Huang's "An Introduction to Chinese, Japanese,
+and Korean Computing."
+
+
+2.5.3: ANSI Z39.64-1989
+
+	This national standard is designated as ANSI Z39.64-1989 and
+named "East Asian Character Code" (EACC), but was originally known as
+REACC (RLIN East Asian Character Code), that is, before it became a
+national standard. RLIN stands for "Research Libraries Information
+Network," which was developed by the Research Libraries Group (RLG)
+located in Mountain View, California.
+	RLG's Home Page is at the following URL:
+
+  http://www.rlg.org/
+
+	The structure of ANSI Z39.64-1989 is based on CCCII, but with
+a few differences. Many consider it to be superior to and a
+replacement for CCCII (see Section 2.5.2).
+	The ANSI Z39.64-1989 standard is available through ANSI, but
+you should be aware that it is distributed in the form of several
+microfiche. Not a terribly useful storage medium these days. I had my
+set tranformed into tangible printed pages. You can also obtain this
+standard through NISO (National Information Standards Organization)
+Press Fulfillment. Their URL is:
+
+  http://www.niso.org/
+
+	EACC has been designated by the Library of Congress as a
+character set for use in USMARC (United States MAchine-Readable
+Cataloging) records, and is used extensively by East Asian libraries
+across North America.
+	EACC is also being used in Australia for the National CJK
+Project. Check out the following URL for more details:
+
+  http://www.nla.gov.au/1/asian/ncjk/cjkhome.html
+
+	Further information on ANSI Z39.64-1989 (to include very
+interesting historical notes) can be found on pp 150-156 of John
+Clews' "Language Automation Worldwide: The Development of Character
+Set Standards" (although a source at RLG tells me that some of Clews'
+facts are wrong) and Chapter 6 of Huang & Huang's "An Introduction to
+Chinese, Japanese, and Korean Computing."
+	The authoritative paper on EACC is "RLIN East Asian Character
+Code and the RLIN CJK Thesaurus" by Karen Smith Yoshimura and Alan
+Tucker, published in "Proceedings of the Second Asian-Pacific
+Conference on Library Science," May 20-24,1985, Seoul, Korea.
+
+
+2.6: OTHER
+
+	This section includes character set standards that don't
+properly fall under the above sections.
+
+
+2.6.1: GB 8045-87
+
+	GB 8045-87 is a Mongolian character set standard established
+by China (PRC). This standard enumerates 94 Mongolian characters. Of
+these 94 characters, 12 are punctuation (vertically-oriented), and the
+remaining 82 are characters specific to the Mongolian script.
+Mongolian is written vertically like Chinese.
+	I do not discuss the encoding for GB 8045-87 in Part 3, so
+will do it here. The GB 8045-87 manual describes a 7- and 8-bit
+encoding. The 7-bit encoding puts these 94 characters in the standard
+ASCII printable range, namely 0x21 through 0x7E. Code point 0x20 is
+marked as "MSP" which stands for "Mongolian space." The 8-bit encoding
+puts these 94 characters in the range 0xA1 through 0xFE, with the
+"MSP" character at code point 0xA0. The GB 1988-89 set is then encoded
+in the range 0x21 through 0x7E.
+
+
+2.6.2: TCVN-5773:1993
+
+	TCVN-5773:1993 (also called NSCII, which is short for Nom
+Standard Code for Information Interchange) is the Vietnamese analog to
+ISO 10646-1:1993, but adds 1,775 Vietnamese-specific Chinese
+characters. These 1,775 characters are encoded in the range 0xA000
+through 0xA6EE.
+	More information on TCVN-5773:1993 can be found at the
+following URL:
+
+  ftp://unicode.org/pub/MappingTables/EastAsiaMaps/
+
+There are two files at the above URL that pertain to this standard.
+The first is a README, and the second is a Macintosh HyperCard stack
+(requires HyperCard):
+
+  TCVN-NSCII.README
+  TCVN-NSCIIstack_1.0.sea.hqx
+
+
+PART 3: CJK ENCODING SYSTEMS
+
+	These sections describe the various systems for encoding the
+character set standards listed in Part 2. The first two described,
+7-bit ISO 2022 and EUC, are not specific to a locale, and in some
+cases not specific to CJK.
+	The CJK Character Set Server at the following URL can generate
+character sets based on encodings described in this section:
+
+  http://jasper.ora.com/lunde/cjk-char.html
+
+I suggest that you use this as a way to obtain files that illustrate
+these encodings in action.
+	But first, please take a peek at the following table, which is
+an attempt to illustrate how two Chinese characters (that stand for
+"kanji/hanzi/hanja") are encoded using the various methods presented
+in the following sections (character codes as hexadecimal digits, and
+escape sequences or shift sequences as printable characters):
+
+o Japanese (JIS X 0208-1990 & JIS X 0201-1976):
+  - 7-bit ISO 2022        <ESC> & @ <ESC> $ B 0x3441 0x3B7A <ESC> ( J
+  - ISO-2022-JP           <ESC> $ B 0x3441 0x3B7A <ESC> ( J
+  - EUC                   0xB4C1 0xBBFA
+  - Shift-JIS             0x8ABF 0x8E9A
+
+o Simplified Chinese (GB 2312-80 & GB 1988-89 or ASCII):
+  - 7-bit ISO 2022        <ESC> $ A 0x3A3A 0x5756 <ESC> ( T
+  - ISO-2022-CN           <ESC> $ ) A <SO> 0x3A3A 0x5756 <SI>
+  - EUC                   0xBABA 0xD7D6
+  - HZ (HZ-GB-2312)       ~{ 0x3A3A 0x5756 ~}
+  - zW                    zW 0x3A3A 0x5756
+
+o Traditional Chinese (CNS 11643-1992):
+  - 7-bit ISO 2022        <ESC> $ ( G 0x6947 0x4773 <ESC> ( B
+  - ISO-2022-CN           <ESC> $ ) G <SO> 0x6947 0x4773 <SI>
+  - EUC                   0xE9C7 0xC7F3 or 0x8EA1E9C7 0x8EA1C7F3
+
+o Traditional Chinese (Big Five):
+  - Big Five              0xBA7E 0xA672
+
+o Korean (KS C 5601-1992 & ASCII):
+  - 7-bit ISO 2022        <ESC> $ ( C 0x7953 0x6D2E <ESC> ( B
+  - ISO-2022-KR           <ESC> $ ) C <SO> 0x7953 0x6D2E <SI>
+  - EUC                   0xF9D3 0xEDAE
+  - Johab                 0xF7D3 0xF1AE
+
+o CJK (ISO 10646-1:1993, JIS X 0221-1995, GB 13000.1-93, or KS C
+  5700-1995):
+  - UCS-2                 0x6F22 0x5B57
+  - UCS-4                 0x00006F22 0x00005B57
+
+The above should have given you a taste of what information the
+following sections provide.
+
+
+3.1: 7-BIT ISO 2022 ENCODING
+
+	7-bit ISO 2022 is the name commonly given to the encoding
+system that uses escape sequences to shift between character sets.
+(ISO 2022 encoded Japanese text is also known as "JIS" encoding, but
+is different from ISO-2022-JP and ISO-2022-JP-2, and will be explained
+in Section 3.1.3.) This encoding comes from the ISO 2022-1993
+standard.
+	An escape sequence, as the name implies, consists of an escape
+character followed by a sequence of one or more characters. These
+escape sequences are used to change character set of the text
+stream. This may also mean a shift from one- to two-byte-per-character
+mode (or vice versa).
+	7-bit ISO 2022 Character sets fall into two types: one-byte
+and two-byte. CJK character sets, for obvious reasons, fall into the
+latter group.
+	One advantage that 7-bit ISO 2022 encoding has over other
+encoding systems is that its escape sequences specify the character
+set, thus specify the locale. 7-bit ISO 2022 encoding also encodes
+text using only seven-bit bytes, which has the benefit of being able
+to survive Internet travel (e-mail).
+
+
+3.1.1: CODE SPACE
+
+	Each byte in the representation of graphic (printable)
+characters fall into the range 0x21 (decimal 33) through 0x7E (decimal
+126). For one-byte character sets, this means a maximum of 94
+characters. For two-byte character sets, this means a maximum of 8,836
+characters (94 x 94 = 8,836).
+
+  One-byte Characters                           Encoding Range
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  first byte range                              0x21-0x7E
+
+  Two-byte Characters                           Encoding Range
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  first byte range                              0x21-0x7E
+  second byte range                             0x21-0x7E
+
+White space and control characters (of which the "escape" character is
+one) are still found in 0x00-0x20 and 0x7F.
+
+
+3.1.2: ISO-REGISTERED ESCAPE SEQUENCES
+
+	The following is a table that provides the ISO-registered
+escape sequences for various one- and two-byte character sets
+mentioned in Part 2 of this document (ISO registration numbers
+provided in the fourth column):
+
+  One-byte Character Set  Escape Sequence      Hexadecimal     ISO Reg
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^      ^^^^^^^^^^^     ^^^^^^^
+  ASCII (ANSI X3.4-1986)  <ESC> ( B            0x1B2842        6
+  Half-width katakana     <ESC> ( I            0x1B2849        13
+  JIS X 0201-1976 Roman   <ESC> ( J            0x1B284A        14
+  GB 1988-89 Roman        <ESC> ( T            0x1B2854        57
+
+  Two-byte Character Set  Escape Sequence      Hexadecimal     ISO Reg
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^      ^^^^^^^^^^^     ^^^^^^^
+  JIS C 6226-1978         <ESC> $ @            0x1B2440        42
+  GB 2312-80              <ESC> $ A            0x1B2441        58
+  JIS X 0208-1983         <ESC> $ B            0x1B2442        87
+  KS C 5601-1992          <ESC> $ ( C          0x1B242843      149
+  JIS X 0212-1990         <ESC> $ ( D          0x1B242844      159
+  ISO-IR-165:1992         <ESC> $ ( E          0x1B242845      165
+  JIS X 0208-1990         <ESC> & @ <ESC> $ B  0x1B26401B2442  168
+  CNS 11643-1992 Plane 1  <ESC> $ ( G          0x1B242847      171
+  CNS 11643-1992 Plane 2  <ESC> $ ( H          0x1B242848      172
+  CNS 11643-1992 Plane 3  <ESC> $ ( I          0x1B242849      183
+  CNS 11643-1992 Plane 4  <ESC> $ ( J          0x1B24284A      184
+  CNS 11643-1992 Plane 5  <ESC> $ ( K          0x1B24284B      185
+  CNS 11643-1992 Plane 6  <ESC> $ ( L          0x1B24284C      186
+  CNS 11643-1992 Plane 7  <ESC> $ ( M          0x1B24284D      187
+
+Note that the first four two-byte character sets do not use an opening
+parenthesis (0x28 or "(") in their escape sequences, which means that
+they don't follow the 7-bit ISO 2022 rules precisely. They are shorter
+for historical reasons, and are retained for backwards compatibility.
+Also note that not all of the CJK character set standards described in
+Part 2 have ISO-registered escape sequences.
+	There are other encoding methods that are similar to 7-bit ISO
+2022 in that they are suitable for Internet use, but are locale-
+specific. These include HZ and zW encoding, both of which are specific
+to the GB 2312-80 character set (see Sections 3.3.2 and 3.3.3).
+ISO-2022-JP, ISO-2022-KR, ISO-2022-CN, and ISO-2022-CN-EXT are
+described below.
+
+
+3.1.3: ISO-2022-JP AND ISO-2022-JP-2
+
+	ISO-2022-JP is best described as a subset of 7-bit ISO 2022
+encoding for Japanese, and reflects how Japanese text is encoded for
+e-mail messages. ISO-2022-JP-2 is an extension that supports
+additional character sets.
+	There are only four escape sequences permitted in ISO-2022-JP,
+indicated as follows:
+
+  One-byte Character Set  Escape Sequence      Hexadecimal
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^      ^^^^^^^^^^^
+  ASCII (ANSI X3.4-1986)  <ESC> ( B            0x1B2842
+  JIS X 0201-1976 Roman   <ESC> ( J            0x1B284A
+
+  Two-byte Character Set  Escape Sequence      Hexadecimal
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^      ^^^^^^^^^^^
+  JIS C 6226-1978         <ESC> $ @            0x1B2440
+  JIS X 0208-1983         <ESC> $ B            0x1B2442
+
+Note the lack of JIS X 0208-1990, JIS X 0212-1990, and half-width
+katakana escape sequences. The JIS X 0208-1983 escape sequence is used
+to indicate both JIS X 0208-1983 and JIS X 0208-1990 (for practical
+reasons).
+	ISO-2022-JP-2 permits additional escape sequences, indicated
+as follows:
+
+  One-byte Character Set  Escape Sequence      Hexadecimal
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^      ^^^^^^^^^^^
+  ASCII (ANSI X3.4-1986)  <ESC> ( B            0x1B2842
+  JIS X 0201-1976 Roman   <ESC> ( J            0x1B284A
+
+  Two-byte Character Set  Escape Sequence      Hexadecimal
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^      ^^^^^^^^^^^
+  JIS C 6226-1978         <ESC> $ @            0x1B2440
+  JIS X 0208-1983         <ESC> $ B            0x1B2442
+  JIS X 0212-1990         <ESC> $ ( D          0x1B242844
+  GB 2312-80              <ESC> $ A            0x1B2441
+  KS C 5601-1992          <ESC> $ ( C          0x1B242843
+
+With the introduction of ISO-2022-KR (see Section 3.1.4), ISO-2022-CN
+(see Section 3.1.5), and ISO-2022-CN-EXT (see Section 3.1.5), the
+usefulness of supporting GB 2312-80 and KS C 5601-1992 can be
+questioned. However, ISO-2022-JP-2 provides support for JIS X
+0212-1990.
+	More detailed information on ISO-2022-JP encoding can be found
+in RFC 1468. And, more detailed information on ISO-2022-JP-2 encoding
+can be found in RFC 1554.
+
+
+3.1.4: ISO-2022-KR
+
+	ISO-2022-KR is similar to ISO-2022-JP (see Section 3.1.3) in
+that it reflects how Korean text is encoded for e-mail messages.
+However, its actual implementation is a bit different. Below is a
+summary.
+	There are only two shift sequences used in ISO-2022-KR,
+indicated as follows:
+
+  One-byte Character Set  Shift Sequence       Hexadecimal
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^       ^^^^^^^^^^^
+  ASCII (ANSI X3.4-1986)  <SI>                 0x0F
+
+  Two-byte Character Set  Shift Sequence       Hexadecimal
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^       ^^^^^^^^^^^
+  KS C 5601-1992          <SO>                 0x0E
+
+Furthermore, the following designator sequence must appear only once,
+at the beginning of a line, before any KS C 5601-1992 characters (this
+usually means that it appears by itself on the first line of the
+file):
+
+  <ESC> $ ) C             0x1B242943
+
+It almost looks the same as the KS C 5601-1992 escape sequence in
+7-bit ISO 2022, but look again. The opening parenthesis (0x28 or "(")
+is replaced by a closing parenthesis (0x29 or ")"). This designator
+sequence serves a different purpose than an escape sequence. It is
+like a flag indicating that "this document contains KS C 5601-1992
+characters." The <SO> and <SI> control characters actually perform the
+switching between one- (ASCII) and two-byte (KS C 5601-1992) codes.
+	More detailed information on ISO-2022-KR encoding can be found
+in RFC 1557.
+
+
+3.1.5: ISO-2022-CN AND ISO-2022-CN-EXT
+
+	ISO-2022-CN and ISO-2022-CN-EXT are similar to ISO-2022-JP
+(see Section 3.1.3) and ISO-2022-KR (see Section 3.1.4) in that they
+reflect how Chinese text is encoded for e-mail messages.
+	Like with ISO-2022-KR, there are only two shift sequences,
+indicated as follows:
+
+  One-byte Character Set  Shift Sequence       Hexadecimal
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^       ^^^^^^^^^^^
+  ASCII (ANSI X3.4-1986)  <SI>                 0x0F
+
+  Two-byte Character Set  Shift Sequence       Hexadecimal
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^       ^^^^^^^^^^^
+  <Too Many to List>      <SO>                 0x0E
+
+But, unlike ISO-2022-KR, there are single shift sequences. Single
+shift means that they are used before every (single) character, not
+before sequences of characters.
+
+  Single Shift Type       Shift Sequence       Hexadecimal
+  ^^^^^^^^^^^^^^^^^       ^^^^^^^^^^^^^^       ^^^^^^^^^^^
+  SS2                     <ESC> N              0x1B4E
+  SS3                     <ESC> O (not zero!)  0x1B4F
+
+	ISO-2022-CN supports the following character sets using SO and
+SS2 designations:
+
+  Character Set           Type   Designation Sequence  Hexadecimal
+  ^^^^^^^^^^^^^           ^^^^   ^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^
+  GB 2312-80              SO     <ESC> $ ) A           0x1B242941
+  CNS 11643-1992 Plane 1  SO     <ESC> $ ) G           0x1B242947
+  CNS 11643-1992 Plane 2  SS2    <ESC> $ * H           0x1B242A48
+
+The designator sequences must appear once on a line before any
+instance of the character set it designates. If two lines contain
+characters from the same character set, both lines must include the
+designator sequence (this is so the text can be displayed correctly
+when scroll back in a window). This is different behavior from
+ISO-2022-KR where the designator sequence appears once in the entire
+file (this is because ISO-2022-KR supports a single two-byte character
+set).
+	ISO-2022-CN-EXT supports the following character sets using
+SO, SS2, and SS3 designations (notice how ISO-2022-CN is still
+supported in the same manner):
+
+  Character Set           Type   Designation Sequence  Hexadecimal
+  ^^^^^^^^^^^^^           ^^^^   ^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^
+  GB 2312-80              SO     <ESC> $ ) A           0x1B242941
+  GB/T 12345-90           SO     NOT REGISTERED
+  ISO-IR-165              SO     <ESC> $ ) E           0x1B242945
+  CNS 11643-1992 Plane 1  SO     <ESC> $ ) G           0x1B242947
+  CNS 11643-1992 Plane 2  SS2    <ESC> $ * H           0x1B242A48
+  GB 7589-87              SS2    NOT REGISTERED
+  GB/T 13131-9X           SS2    NOT REGISTERED
+  CNS 11643-1992 Plane 3  SS3    <ESC> $ + I           0x1B242B49
+  CNS 11643-1992 Plane 4  SS3    <ESC> $ + J           0x1B242B4A
+  CNS 11643-1992 Plane 5  SS3    <ESC> $ + K           0x1B242B4B
+  CNS 11643-1992 Plane 6  SS3    <ESC> $ + L           0x1B242B4C
+  CNS 11643-1992 Plane 7  SS3    <ESC> $ + M           0x1B242B4D
+  GB 7590-87              SS3    NOT REGISTERED
+  GB/T 13132-9X           SS3    NOT REGISTERED
+
+Support for character sets indicated as NOT REGISTERED will be added
+once they are ISO-registered.
+	More detailed information on ISO-2022-CN and ISO-2022-CN-EXT
+encodings can be found in RFC 1922.
+
+
+3.2: EUC ENCODING
+
+	EUC stands for "Extended UNIX Code," and is a rich encoding
+system from ISO 2022-1993 that is designed to handle large or multiple
+character sets. It is primarily used on UNIX systems, such as Sun's
+Solaris.
+	EUC consists of four codes sets, numbered 0 through 3. The
+only code set that is more or less fixed by definition is code set 0,
+which is specified to contain ASCII or a locale's equivalent (such as
+JIS X 0201-1976 for Japanese or GB 1988-89 for PRC Chinese).
+	It is quite common to append the locale name to "EUC" when
+designating a specific instance of EUC encoding. Common designations
+include EUC-JP, EUC-CN, EUC-KR, and EUC-TW.
+
+
+3.2.1: JAPANESE REPRESENTATION
+
+	The following table illustrates the Japanese representation of
+EUC packed format:
+
+  EUC Code Sets                                 Encoding Range
+  ^^^^^^^^^^^^^                                 ^^^^^^^^^^^^^^
+  Code set 0 (ASCII or JIS X 0201-1976 Roman):  0x21-0x7E
+  Code set 1 (JIS X 0208):                      0xA1A1-0xFEFE
+  Code set 2 (half-width katakana):             0x8EA1-0x8EDF
+  Code set 3 (JIS X 0212-1990):                 0x8FA1A1-0x8FFEFE
+
+An earlier version of EUC for Japanese used code set 3 as the user-
+defined range.
+
+
+3.2.2: CHINESE (PRC) REPRESENTATION
+
+	The following table illustrates the Chinese (PRC)
+representation of EUC packed format:
+
+  EUC Code Sets                                 Encoding Range
+  ^^^^^^^^^^^^^                                 ^^^^^^^^^^^^^^
+  Code set 0 (ASCII or GB 1988-89):             0x21-0x7E
+  Code set 1 (GB 2312-80):                      0xA1A1-0xFEFE
+  Code set 2:                                   unused
+  Code set 3:                                   unused
+
+Note how code sets 2 and 3 are unused.
+	The encoding used on Macintosh is quite similar, but has a
+shortened two-byte range (0xA1A1 through 0xFCFE) plus additional
+one-byte code points, namely 0x80 ("u" with dieresis), 0xFD
+("copyright" symbol: "c" in a circle), 0xFE ("trademark" symbol: "TM"
+as a superscript), and 0xFF ("ellipsis" symbol: three dots).
+
+
+3.2.3: CHINESE (TAIWAN) REPRESENTATION
+
+	The following table illustrates the Chinese (Taiwan)
+representation of EUC packed format:
+
+  EUC Code Sets                                 Encoding Range
+  ^^^^^^^^^^^^^                                 ^^^^^^^^^^^^^^
+  Code set 0 (ASCII):                           0x21-0x7E
+  Code set 1 (CNS 11643-1992 Plane 1):          0xA1A1-0xFEFE
+  Code set 2 (CNS 11643-1992 Planes 1-16):      0x8EA1A1A1-0x8EB0FEFE
+  Code set 3:                                   unused
+
+Note how CNS 11643-1992 Plane 1 is redundantly encoded in code set 1
+(two-byte) and code set 2 (four-byte). The second byte of code set 2
+indicates the plane number. For example, 0xA1 is Plane 1 and so on up
+until 0xB0, which is Plane 16.
+
+
+3.2.4: KOREAN REPRESENTATION
+
+	The following table illustrates the Korean representation of
+EUC packed format (this is also known as "Wansung" encoding -- the
+Korean word "wansung" means "pre-compose"):
+
+  EUC Code Sets                                 Encoding Range
+  ^^^^^^^^^^^^^                                 ^^^^^^^^^^^^^^
+  Code set 0 (ASCII or KS C 5636-1993):         0x21-0x7E
+  Code set 1 (KS C 5601-1992):                  0xA1A1-0xFEFE
+  Code set 2:                                   unused
+  Code set 3:                                   unused
+
+Note how code sets 2 and 3 are unused.
+	The encoding used on Macintosh is quite similar, but has a
+shortened two-byte range (0xA1A1 through 0xFDFE) plus additional
+one-byte code points, namely 0x81 ("won" symbol), 0x82 (hyphen), 0x83
+("copyright" symbol: "c" in a circle), 0xFE ("trademark" symbol: "TM"
+as a superscript), and 0xFF ("ellipsis" symbol: three dots).
+	See Section 3.3.17 for a description of Microsoft's extension
+to this encoding, called Unified Hangul Code.
+
+
+3.3: LOCALE-SPECIFIC ENCODINGS
+
+	The encoding systems described in the following sections are
+considered to be locale-specific, namely that are used to encode a
+specific character set standard. This is not to say that they are not
+widely used (actually, some of these are among the most widely used
+encoding systems!), but rather that they are tied to a specific
+character set.
+
+
+3.3.1: SHIFT-JIS
+
+	Shift-JIS (also known as MS Kanji, SJIS, or DBCS-PC) is the
+encoding system used on machines that support MS-DOS or Windows, and
+also for Macintosh (KanjiTalk or Japanese Language Kit). It was
+originally developed by Microsoft Corporation as a way to support the
+Japanese character set on MS-DOS. The following tables provide the
+Shift-JIS encoding ranges:
+
+  Two-byte Standard Characters                  Encoding Ranges
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                  ^^^^^^^^^^^^^^^
+  first byte ranges                             0x81-0x9F, 0xE0-0xEF
+  second byte ranges                            0x40-0x7E, 0x80-0xFC
+
+  Two-byte User-defined Characters              Encoding Ranges
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^              ^^^^^^^^^^^^^^^
+  first byte range                              0xF0-0xFC
+  second byte ranges                            0x40-0x7E, 0x80-0xFC
+
+  One-byte Characters                           Encoding Range
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  Half-width katakana                           0xA1-0xDF
+  ASCII/JIS-Roman                               0x21-0x7E
+
+It is important to note that the user-defined range does not
+correspond to code points in other encodings that support Japanese,
+such as 7-bit ISO 2022 or EUC. This is a portability problem. It is
+also unique in that it does not support the JIS X 0212-1990 character
+set standard.
+	The encoding used on Macintosh is quite similar to the above
+table, but has additional one-byte code points, namely 0x80
+(backslash), 0xFD ("copyright" symbol: "c" in a circle), 0xFE
+("trademark" symbol: "TM" as a superscript), and 0xFF ("ellipsis"
+symbol: three dots).
+
+
+3.3.2: HZ (HZ-GB-2312)
+
+	HZ is a simple yet very powerful and reliable system for
+encoding GB 2312-80 text which was developed by Fung Fung Lee
+(lee@umunhum.stanford.edu). HZ encoding is commonly used when
+exchanging e-mail or posting messages to Usenet News (specifically, to
+alt.chinese.text).
+	The actual encoding ranges used for one- and two-byte
+characters is almost identical to 7-bit ISO 2022 encoding (see Section
+3.1.1). The first-byte range is limited to 0x21 through 0x77. But,
+instead of using an escape sequence to shift between one- and two-byte
+character modes, a simple string of two printable characters is used.
+
+  One-byte Character Set  Shift Sequence       Hexadecimal
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^       ^^^^^^^^^^^
+  ASCII                   ~}                   0x7E7D
+
+  Two-byte Character Set  Shift Sequence       Hexadecimal
+  ^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^       ^^^^^^^^^^^
+  GB 2312-80              ~{                   0x7E7B
+
+The tilde character (0x7E) is interpreted as an escape character in HZ
+encoding, so it has special meaning. If a tilde character is to appear
+in one-byte-per-character mode, it must be doubled (so ~~ would appear
+as just ~). This means that there are three escape sequences used in
+HZ encoding:
+
+  Escape Sequence  Meaning
+  ^^^^^^^^^^^^^^^  ^^^^^^^
+  ~~               ~ in one-byte-per-character mode
+  ~}               Shift into one-byte-per-character mode
+  ~{               Shift into two-byte-per-character mode
+
+There is also a fourth escape sequence, namely ~ plus a newline
+character (~\n). This escape sequence is a line-continuation marker to
+be consumed with no output produced.
+	This method works without problems because the shift sequences
+represent empty positions in the very last row of the GB 2312-80 table
+(actually, the second- and third-from-last code points). HZ encoding
+makes 77 of the 94 rows accessible, and because there are no defined
+characters beyond row 77, this causes no problems.
+	The complete HZ specification is part of the HZ package,
+described in RFC 1843, and available in HTML format. These are
+available at the following URLs:
+
+  ftp://ftp.ifcss.org/pub/software/unix/convert/HZ-2.0.tar.gz
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/Ch9/rfc-1843.txt
+  http://umunhum.stanford.edu/~lee/chicomp/HZ_spec.html
+
+In addition, RFC 1842 establishes "HZ-GB-2312" as the "charset"
+parameter in MIME-encoded e-mail headers. Its properties are identical
+to HZ encoding as described in RFC 1843.
+
+
+3.3.3: zW
+
+	zW encoding, developed by Ya-Gui Wei and Edmund Lai, is older
+than and somewhat similar to HZ encoding (HZ is considered to be a
+better encoding system, and users are encouraged to switch over to HZ
+encoding).
+	zW encoding is named by how it encodes each line of GB 2312-80
+text, namely lines that contain Chinese text must begin with the two
+characters "z" and "W" ("zW"). This encoding method does not permit
+the mixture of one- (ASCII) and two-byte (GB 2312-80) characters on a
+per-character basis, but rather on a per-line basis. That is, each
+line can contain only Chinese or ASCII text, but not both.
+	More information on zW encoding can be found as part of the
+ZWDOS package available at the following URL:
+
+  ftp://ftp.ifcss.org/pub/software/dos/ZWDOS/
+
+
+3.3.4: BIG FIVE
+
+	Big Five is the encoding system used on machines that support
+MS-DOS or Windows, and also for Macintosh (such as the Chinese
+Language Kit or the fully-localized operating system).
+
+  Two-byte Standard Characters                  Encoding Ranges
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                  ^^^^^^^^^^^^^^^
+  first byte range                              0xA1-0xFE
+  second byte ranges                            0x40-0x7E, 0xA1-0xFE
+
+  One-byte Characters                           Encoding Range
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  ASCII                                         0x21-0x7E
+
+	The encoding used on Macintosh is quite similar to the above,
+but has a slightly shortened two-byte range (second byte range up to
+0xFC only) plus additional one-byte code points, namely 0x80
+(backslash), 0xFD ("copyright" symbol: "c" in a circle), 0xFE
+("trademark" symbol: "TM" as a superscript), and 0xFF ("ellipsis"
+symbol: three dots).
+
+
+3.3.5: JOHAB
+
+	Korean hangul characters are typically encoded in what is
+known as pre-combined form, namely 2 or 3 hangul elements bound into a
+single character. KS C 5601-1992 enumerates 2,350 such pre-combined
+forms. While this number is felt to be sufficient for most purposes,
+it does not account for the total number of possible permutations. The
+encoding system that encodes all possible pre-combined hangul is known
+as Johab encoding (also known as "two-byte combination code" -- the
+Korean word "johab" means "combine"), and is described in Annex 3 of
+the KS C 5601-1992 standard. This encoding is almost like encoding all
+possible three-letter words in English -- while all combinations are
+possible, only a fraction represent *real* words.
+	Pre-combined hangul can be composed of 19 different initial,
+21 different medial, and 27 different final hangul elements (28,
+actually, if you count the placeholder). This provides a maximum of
+11,172 pre-combined hangul. Of these 67 hangul elements, 51 are unique
+(some can occur in different positions). Each of these positions are
+encoded using five bits each (five bits can encode up to 32 unique
+objects). The encoding array looks as follows:
+
+o Bit 1: always on
+o Bits 2-6: initial hangul element
+o Bits 7-11: medial hangul element
+o Bits 12-16: final hangul element
+
+Initial and final elements are consonants, and the medial elements are
+vowels. This encoding must be treated as a 16-bite entity because the
+bit array of the medial hangul element spans the first and second byte.
+	Johab encoding also provides the complete set of KS C 5601-
+1992 symbols and hanja, but in different code points. Annex 3 of the
+KS C 5601-1992 manual (pp 33-34) contains a complete symbol and hanja
+mapping table between EUC and Johab code points. (The KS C 5601-1989
+manual did not have this.) The code space ranges for Johab encoding
+are as follows:
+
+  One-byte Characters                           Encoding Range
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  ASCII or KS C 5636-1993                       0x21-0x7E
+
+  Two-byte Pre-combined Hangul                  Encoding Ranges
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                  ^^^^^^^^^^^^^^^
+  first byte range                              0x84-0xD3
+  second byte ranges                            0x41-0x7E, 0x81-0xFE
+
+  Two-byte Symbols and Hanja                    Encoding Ranges
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                  ^^^^^^^^^^^^^^^
+  first byte ranges                             0xD8-0xDE, 0xE0-0xF9
+  second byte ranges                            0x31-0x7E, 0x91-0xFE
+
+Note that the second byte ranges encode a total of 188 characters, and
+that the second byte ranges for hangul and symbols/hanja are slightly
+different (yet the same size, namely 188 characters).
+	Here is a summary of the above table, which better describes
+what is encoded where. Rows 0x84 through 0xD3 provide 80 rows of 188
+characters each (15,040 code points, which is more than enough for the
+11,172 pre-combined hangul). Row 0xD8 provides 188 user-defined
+positions, the same as Rows 41 and 94 in the standard KS C 5601-1992
+table. Rows 0xD9 through 0xDE encode Rows 1 through 12 of the standard
+KS C 5601-1992 table (symbols). Rows 0xE0 through 0xF9 encode Rows 42
+through 94 of the KS C 5601-1992 table (hanja). The following URL
+provides a complete mapping table for the KS C 5601-1992 symbols and
+hanja:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/map/non-hangul-codes.txt
+
+The following URLs provides similar information (they are the same
+file), but only for the 11,172 pre-combined hangul:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/map/hangul-codes.txt
+  ftp://unicode.org/pub/MappingTables/EastAsiaMaps/hangul-codes.txt
+
+	Of further interest may be that Microsoft designates Johab
+encoding as its Code Page 1361. Microsoft if planning to support Johab
+encoding for Korean Windows NT.
+
+
+3.3.6: N-BYTE HANGUL
+
+	In the days before full two-byte capable operating systems,
+each of the 51 basic hangul elements were encoding using a single
+(7-bit) byte. The encoding range spans 0x40 through 0x7C, but there
+are several unassigned gaps. This is known as the "N-byte Hangul"
+code, and is described in Annex 4 (page 35) of the KS C 5601-1992
+manual.
+	The following table illustrates these 51 one-byte code points
+(the pronunciation or meaning of the hangul element is provided in
+parentheses) and how they map to the three 5-bit arrays in Johab
+encoding (expressed as binary patterns):
+
+  Element        Initial  Medial   Final
+  ^^^^^^^        ^^^^^^^  ^^^^^^   ^^^^^
+  0x40 ("fill")  00001    00010    00001
+  0x41 (g)       00010    *****    00010
+  0x42 (gg)      00011    *****    00011
+  0x43 (gs)      *****    *****    00100
+  0x44 (n)       00100    *****    00101
+  0x45 (nj)      *****    *****    00110
+  0x46 (nh)      *****    *****    00111
+  0x47 (d)       00101    *****    01000
+  0x48 (dd)      00110    *****    *****
+  0x49 (r)       00111    *****    01001
+  0x4A (rg)      *****    *****    01010
+  0x4B (rm)      *****    *****    01011
+  0x4C (rb)      *****    *****    01100
+  0x4D (rs)      *****    *****    01101
+  0x4E (rt)      *****    *****    01110
+  0x4F (rp)      *****    *****    01111
+  0x50 (rh)      *****    *****    10000
+  0x51 (m)       01000    *****    10001
+  0x52 (b)       01001    *****    10011
+  0x53 (bb)      01010    *****    *****
+  0x54 (bs)      *****    *****    10100
+  0x55 (s)       01011    *****    10101
+  0x56 (ss)      01100    *****    10110
+  0x57 (ng)      01101    *****    10111
+  0x58 (j)       01110    *****    11000
+  0x59 (jj)      01111    *****    *****
+  0x5A (c)       10000    *****    11001
+  0x5B (k)       10001    *****    11010
+  0x5C (t)       10010    *****    11011
+  0x5D (p)       10011    *****    11100
+  0x5E (h)       10100    *****    11101
+  0x5F UNASSIGNED
+  0x60 UNASSIGNED
+  0x61 UNASSIGNED
+  0x62 (a)       *****    00011    *****
+  0x63 (ae)      *****    00100    *****
+  0x64 (ya)      *****    00101    *****
+  0x65 (yae)     *****    00110    *****
+  0x66 (eo)      *****    00111    *****
+  0x67 (e)       *****    01010    *****
+  0x68 UNASSIGNED
+  0x69 UNASSIGNED
+  0x6A (yeo)     *****    01011    *****
+  0x6B (ye)      *****    01100    *****
+  0x6C (o)       *****    01101    *****
+  0x6D (wa)      *****    01110    *****
+  0x6E (wae)     *****    01111    *****
+  0x6F (oe)      *****    10010    *****
+  0x70 UNASSIGNED
+  0x71 UNASSIGNED
+  0x72 (yo)      *****    10011    *****
+  0x73 (u)       *****    10100    *****
+  0x74 (weo)     *****    10101    *****
+  0x75 (we)      *****    10110    *****
+  0x76 (wi)      *****    10111    *****
+  0x77 (yu)      *****    11010    *****
+  0x78 UNASSIGNED
+  0x79 UNASSIGNED
+  0x7A (eu)      *****    11011    *****
+  0x7B (yi)      *****    11100    *****
+  0x7C (i)       *****    11101    *****
+
+	There are utilities to convert N-byte Hangul code to other,
+more widely-used, encoding methods. Pointers to these and other code
+conversion utilities can be found in Section 4.7.
+
+
+3.3.7: UCS-2
+
+	UCS-2 (Universal Character Set containing 2 bytes) encoding is
+one way to encode ISO 10646-1:1993 text, and is considered identical
+to Unicode encoding. Its encoding range, which is quite simple, is as
+follows:
+
+  ISO 10646-1:1993 Characters                   Encoding Range
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                  ^^^^^^^^^^^^^^
+  first byte range                              0x00-0xFF
+  second byte range                             0x00-0xFF
+
+Yes, folks, the whole range of 65,536 possible code points are
+available for encoding characters. The "signature" that indicates a
+file using UCS-2 is as follows:
+
+  0xFEFF
+
+	Escape sequences for UCS-2 have already been registered with
+ISO, and are as follows:
+
+  ISO 10646-1:1993        Escape Sequence      Hexadecimal     ISO Reg
+  ^^^^^^^^^^^^^^^^        ^^^^^^^^^^^^^^^      ^^^^^^^^^^^     ^^^^^^^
+  UCS-2 Level 1           <ESC> % / @          0x1B252F40      162
+  UCS-2 Level 2           <ESC> % / C          0x1B252F43      174
+  UCS-2 Level 3           <ESC> % / E          0x1B252F45      176
+
+So what do these three levels mean? Level 3 means all characters in
+ISO 10646-1:1993 with no restrictions (0x0000 through 0xFFFF).
+	Level 2 begins to restrict the character set by not including
+the following characters or character ranges:
+
+  0x0300-0x0345  0x09D7         0x0BD7         0x11A8-0x11F9
+  0x0360-0x0361  0x0A3C         0x0C55-0x0C56  0x20D0-0x20E1
+  0x0483-0x0486  0x0A70-0x0A71  0x0CD5-0x0CD6  0x302A-0x302F
+  0x093C         0x0ABC         0x0D57         0x3099-0x309A
+  0x0953-0x0954  0x0B3C         0x1100-0x1159  0xFE20-0xFE23
+  0x09BC         0x0B56-0x0B57  0x115F-0x11A2
+
+These are all combining characters, and represent 364 code points.
+	Level 1 further restricts the character set by not including
+the following characters or character ranges:
+
+  0x05B0-0x05B9  0x09BE-0x09C4  0x0B47-0x0B48  0x0D02-0x0D03
+  0x05BB-0x05BD  0x09C7-0x09C8  0x0B4B-0x0B4D  0x0D3E-0x0D43
+  0x05BF         0x09CB-0x09CD  0x0B82-0x0B83  0x0D46-0x0D48
+  0x05C1-0x05C2  0x09E2-0x09E3  0x0BBE-0x0BC2  0x0D4A-0x0D4D
+  0x064B-0x0652  0x0A02         0x0BC6-0x0BC8  0x0E31
+  0x0670         0x0A3E-0x0A42  0x0BCA-0x0BCD  0x0E34-0x0E3A
+  0x06D6-0x06E4  0x0A47-0x0A48  0x0C01-0x0C03  0x0E47-0x0E4E
+  0x06E7-0x06E8  0x0A4B-0x0A4D  0x0C3E-0x0C44  0x0EB1
+  0x06EA-0x06ED  0x0A81-0x0A83  0x0C46-0x0C48  0x0EB4-0x0EB9
+  0x0901-0x0903  0x0ABE-0x0AC5  0x0C4A-0x0C4D  0x0EBB-0x0EBC
+  0x093E-0x094D  0x0AC7-0x0AC9  0x0C82-0x0C83  0x0EC8-0x0ECD
+  0x0951-0x0952  0x0ACB-0x0ACD  0x0CBE-0x0CC4  0xFB1E
+  0x0962-0x0963  0x0B01-0x0B03  0x0CC6-0x0CC8
+  0x0981-0x0983  0x0B3E-0x0B43  0x0CCA-0x0CCD
+
+These, too, are all combining characters, and represent 586 code
+points (222 above plus the 364 characters from the Level 2
+restriction).
+
+
+3.3.8: UCS-4
+
+	UCS-4 (Universal Character Set containing 4 bytes) encoding is
+another way to encode ISO 10646-1:1993 text, and is used for future
+expansion of the character set. Its encoding range is as follows:
+
+  ISO 10646-1:1993 Characters                   Encoding Range
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                  ^^^^^^^^^^^^^^
+  first byte range                              0x00-0x7F
+  second byte range                             0x00-0xFF
+  third byte range                              0x00-0xFF
+  fourth byte range                             0x00-0xFF
+
+Note that the first byte range only goes up to 0x7F. This means that
+UCS-4 is a 31-bit encoding. And, in case you're wondering, 31 bits
+provide 2,147,483,648 code points. The "signature" that indicates a
+file using UCS-4 is as follows:
+
+  0x0000 0xFEFF
+
+	Escape sequences for UCS-4 have already been registered with
+ISO, and are as follows:
+
+  ISO 10646-1:1993        Escape Sequence      Hexadecimal     ISO Reg
+  ^^^^^^^^^^^^^^^^        ^^^^^^^^^^^^^^^      ^^^^^^^^^^^     ^^^^^^^
+  UCS-4 Level 1           <ESC> % / A          0x1B252F41      163
+  UCS-4 Level 2           <ESC> % / D          0x1B252F44      175
+  UCS-4 Level 3           <ESC> % / F          0x1B252F46      177
+
+See the end of Section 3.3.7 for a description of these three levels.
+But, in the case of UCS-4, simply prepend "0000" to all the values.
+
+
+3.3.9: UTF-7
+
+	It turns out that *raw* ISO 10646-1:1993 encoding (that is,
+UCS-2 or UCS-4) can cause problems because null bytes (0x00) are
+possible (and frequent). Several UTFs (UCS Transformation Formats)
+have been developed to deal with this and other problems. I must admit
+that I don't know too much about UTFs, and what I provide below is
+minimal, but does include pointers to more complete descriptions.
+	UTF-7 is a mail-safe 7-bit transformation format for UCS-2
+(including UTF-16). It uses straight ASCII for many ASCII characters,
+and switches into a Base64 encoding of UCS-2 or UTF-16 for everything
+else. It was designed to be usable in MIME-compliant e-mail headers as
+well as message bodies, and to pass through gateways to non-ASCII mail
+systems (like Bitnet). More detailed information on UTF-7 can be found
+in RFC 1642, and a UTF-7 converter is available. The following URLs
+provide this information:
+
+  http://www.stonehand.com/unicode/standard/utf7.html
+  ftp://unicode.org/pub/Programs/ConvertUTF/
+
+
+3.3.10: UTF-8
+
+	UTF-8 (also known as UTF-2 or FSS-UTF -- FSS stands for "file
+system safe") can represent any character in UCS-2 and UCS-4, and is
+officially an annex to ISO 10646-1:1993. It is different from UTF-7 in
+that it encodes character sets into 8-bit bytes. UCS-2 and UCS-4 have
+problems with some file systems and utilities, so this UTF was
+developed.
+	More detailed information on UTF-8 and its relationship with
+ISO 10646-1:1993 can be found at the following URLs:
+
+  http://www.stonehand.com/unicode/standard/utf8.html
+  ftp://unicode.org/pub/Programs/ConvertUTF/
+
+	X/Open Company Limited also published a document that
+describes UTF-8 in detail (they call it FSS-UTF), and you can find
+information about it at the following URL:
+
+  http://www.xopen.co.uk/public/pubs/catalog/c501.htm
+
+The new programming language called Java supports Unicode through
+UTF-8. More information on Java is at the following URL:
+
+  http://www.javasoft.com/
+
+
+3.3.11: UTF-16
+
+	UTF-16 (formerly UCS-2E), like UTF-8, is now officially an
+annex to ISO 10646-1:1993. From what I've read, UTF-16 transforms
+UCS-4 into a 16-bit form. UTF-16 can then be further encoded in UTF-7
+or UTF-8 (but doing this is not according to the standard -- there is
+little to gain by doing so).
+	More detailed information on UTF-16 and its relationship with
+ISO 10646-1:1993 can be found at the following URLs:
+
+  http://www.stonehand.com/unicode/standard/utf16.html
+  ftp://unicode.org/pub/Programs/ConvertUTF/
+
+
+3.3.12: ANSI Z39.64-1989
+
+	The encoding used for ANSI Z39.64-1989 (and CCCII) is three-
+byte 7-bit ISO 2022, namely the following code space:
+
+  Three-byte ANSI Z39.64-1989                   Encoding Range
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^                   ^^^^^^^^^^^^^^
+  first byte range                              0x21-0x7E
+  second byte range                             0x21-0x7E
+  third byte range                              0x21-0x7E
+
+
+3.3.13: BASE64
+
+	Base64 encoding is mentioned here only because of its common
+usage in e-mail headers, and relationship with MIME (Multi-purpose
+Internet Mail Extensions). It is also a source of confusion. Base64 is
+a method of encoding arbitrary bytes into the safest 64-character
+ASCII subset, and is defined in RFC 1341 (which adapted it from RFC
+1113). RFC 1341 was made obsolete by RFC 1521. RFC 1522 also provides
+useful information, particularly for handling non-ASCII text, and
+obsoletes RFC 1342.
+	Here is how it works. Every three bytes are encoded as a
+four-byte sequence. That is, the 24 bits that make up the three bytes
+are split into four 6-bit segments (6 bits can encode up to 64
+characters). Each 6-bit segment is then converted into a character in
+the Base64 Alphabet (see below). There is a 65th character, "=", which
+has a special purpose (it functions as a "pad" if a full three-byte
+sequence is not found). This all may sound a bit like uuencoding, but
+it is different. The Base64 Alphabet is as follows:
+
+  ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
+
+	My name, written in Japanese kanji, is as follows when it is
+EUC-encoded (six bytes, expressed as three groups of hexadecimal
+values, one group for each character):
+
+  0xBEAE 0xCED3 0xB7F5
+
+When these three EUC-encoded characters are converted to Base64
+encoding, they appear as follows (eight bytes):
+
+  vq7O07f1
+
+	Base64 encoding is most commonly used for encoding non-ASCII
+text that appears in e-mail headers. Of all the portions of an e-mail
+message, its header gets manipulated the most during transmission, and
+Base64 encoding offers a safe way to further encode non-ASCII text so
+that it is not altered by mail-routing software. This is where Base64
+encoding can cause confusion. For example, what goes through your mind
+when you see the following chunk o' text?
+
+  From: lunde@adobe.com (=?ISO-2022-JP?B?vq7O07f1?=)
+
+Many folks think that they are seeing ISO-2022-JP encoding. Not
+true. The "ISO-2022-JP" portion is just a flag that indicates the
+original encoding before Base64 encoding was applied. The actual
+Base64-encoded portion is enclosed between question marks (?) as
+follows:
+
+  From: lunde@adobe.com (=?ISO-2022-JP?B?vq7O07f1?=)
+                                        >^^^^^^^^<
+
+The whole string enclosed in parentheses has several components, and
+the following explains their purpose and relationships (using the
+above string as an example):
+
+  Component      Explanation
+  ^^^^^^^^^      ^^^^^^^^^^^
+  =?             Signals start of encoded string
+  ISO-2022-JP    Charset name ("ISO-2022-JP" is for Japanese)
+  ?              Delimiter
+  B              Encoding ("B" is for Base64)
+  ?              Delimiter
+  vq7O07f1       Example string of type "charset" encoded by "encoding"
+  ?=             Signals end of encoded string
+
+	One typically does not need to worry about encoding text as
+Base64 (MIME-compliant mailing software usually performs this task for
+you). The problem is usually trying to decode Base64-encoded text. A
+Base64 decoder is available in Perl at the following URL:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/perl/b64decode.pl
+
+Note that this program takes "raw" Base64 data as input. Any non-
+Base64 stuff must be stripped. I usually run this from within Mule
+("C-u M-| b64decode.pl") after defining a region around the Base64-
+encoded material. I hope to replace this program soon with one that
+automatically recognizes the Base64-encoded portions.
+	Most MIME-compliant e-mail software can decode Base64-encoded
+text.
+
+
+3.3.14: IBM DBCS-HOST
+
+	The oldest two-byte encoding system is IBM's DBCS-Host. DBCS
+stands for Double-Byte Character Set. DBCS-Host is still in use on
+IBM's mainframe computer systems (hence the use of "Host").
+	DBCS-Host encoding is EBCDIC-based, and uses Shift characters,
+0x0E and 0x0F, to switch between one- and two-byte mode. Its encoding
+specifications are as follows:
+
+  Two-byte Characters                           Encoding Range
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  first byte range                              0x41-0xFE
+  second byte range                             0x41-0xFE
+
+  Two-byte "Space" Character                    Code Point
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^                    ^^^^^^^^^^
+  first- and second byte                        0x4040
+
+  One-byte Characters                           Encoding Range
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  EBCDIC                                        0x41-0xF9
+
+  Shifting Characters                           Code Point
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  Two-byte                                      0x0E
+  One-byte                                      0x0F
+
+This same encoding specification is shared by all of IBM's CJK
+character sets, namely for Japanese, Simplified Chinese, Traditional
+Chinese, and Korean.
+
+
+3.3.15: IBM DBCS-PC
+
+	IBM's DBCS-PC encoding is used on IBM personal computers (that
+is where the "PC" comes from). DBCS-PC encoding is ASCII-based, and
+uses the values of characters' bytes themselves to switch between one-
+and two-byte mode. Its encoding specifications are as follows:
+
+  Two-byte Characters                           Encoding Ranges
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^^
+  first byte range                              0x81-0xFE
+  second byte range                             0x40-0x7E, 0x80-0xFE
+
+  One-byte Characters                           Encoding Range
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  ASCII                                         0x21-0x7E
+
+This same encoding specification is shared by all of IBM's CJK
+character sets, namely for Japanese, Simplified Chinese, Traditional
+Chinese, and Korean.
+	DBCS-PC encoding for Japanese, although conforming to the
+above encoding specifications, actually uses the same encoding
+specifications for Shift-JIS, to include the full user-defined range
+(see Section 3.3.1 for more details on Shift-JIS encoding). One big
+accommodation is the half-width katakana range, namely 0xA1 through
+0xDF. Further, the DBCS-PC code space that is outside the Shift-JIS
+specification is unused.
+	DBCS-PC encoding for Korean uses the equivalent of EUC code
+set 1 code points (0xA1A1 through 0xFEFE) for those characters that
+are common with KS C 5601-1992. Those characters that are not common
+with KS C 5601-1992, namely IBM's extensions, are within the DBCS-PC
+encoding space, but outside EUC encoding space (0x9A through 0xA0).
+Many hanja and pre-combined hangul are part of IBM's Korean extension.
+	Note that DBCS-PC is sort of useless without a corresponding
+SBCS (Single-Byte Character Set) for the one-byte range. Mixing DBCS
+and SBCS results in a MBCS (Multiple-Byte Character Set). How these
+are mixed to form MBCSs is detailed in Section 3.4.
+
+
+3.3.16: IBM DBCS-/TBCS-EUC
+
+	IBM has also developed DBCS-EUC and TBCS-EUC encodings. TBCS
+stands for Triple-Byte Character Set. These essentially follow the EUC
+encoding specifications, and were developed for use with IBM's AIX
+(Advanced Interactive Executive) operating system, which is
+UNIX-based.
+	Refer to Section 3.2 for all the details on EUC encoding.
+
+
+3.3.17: UNIFIED HANGUL CODE
+
+	Microsoft has developed what is called "Unified Hangul Code"
+(UHC) for its Windows 95 operating system (this was also known as
+"Extended Wansung"). It is the optional, not standard, character set
+of Win95K.
+	UHC provides full compatibility with KS C 5601-1992 EUC
+encoding (see Section 3.2.4), but adds additional encoding ranges for
+holding additional pre-combined hangul (more precisely, the 8,822 that
+are needed to fully support the Johab character set). The following is
+a table that provides the encoding ranges for UHC encoding:
+
+  Two-byte Standard Characters                  Encoding Ranges
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                  ^^^^^^^^^^^^^^^
+  first byte range                              0x81-0xFE
+  second byte ranges                            0x41-0x5A, 0x61-0x7A,
+                                                and 0x81-0xFE
+
+  One-byte Characters                           Encoding Range
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  ASCII                                         0x21-0x7E
+
+Note that 0xA1A1 through 0xFEFE in the above encoding is still
+identical, in terms of character-to-code allocation, with KS C 5601-
+1992 in EUC encoding.
+	Appendix G (pp 345-406) of "Developing International Software
+for Windows 95 and Windows NT" by Nadine Kano illustrates the KS C
+5601-1992 character set standard plus these Microsoft extensions
+(8,822 pre-combined hangul) by UHC code (Microsoft calls this Code
+Page 949).
+
+
+3.3.18: TRON CODE
+
+	TRON (The Real-time Operating system Nucleus) is an OS
+developed in Japan some time ago. Personal Media Corporation has done
+work to develop BTRON (Business TRON), which is unique in that it is
+the only commercially-available OS that supports JIS X 0212-1990.
+	TRON Code provides a one- and two-byte encoding space and a
+method for switching between them.
+	The following is how the two-byte space in TRON Code is
+allocated:
+
+  A-Zone (8,836 characters; JIS X 0208-1990)    Encoding Range
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^^^^^^^^^^^^^
+  first byte range                              0x21-0x7E
+  second byte range                             0x21-0x7E
+
+  B-Zone (11,844 characters; JIS X 0212-1990)   Encoding Range
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   ^^^^^^^^^^^^^^
+  first byte range                              0x80-0xFD
+  second byte range                             0x21-0x7E
+
+  C-Zone (11,844 characters; unassigned)        Encoding Range
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^        ^^^^^^^^^^^^^^
+  first byte range                              0x21-0x7E
+  second byte range                             0x80-0xFD
+
+  D-Zone (15,876 characters; unassigned)        Encoding Range
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^        ^^^^^^^^^^^^^^
+  first byte range                              0x80-0xFD
+  second byte range                             0x80-0xFD
+
+Note how the B-Zone is larger that the conventional 94-by-94
+matrix. In fact, the JIS X 0212-1990 portion of the B-Zone is
+restricted to 0xA121-0xFD7E (93-by-94 matrix -- 0xFE as a first-byte
+value is unavailable, and you will see why in a minute).
+	TRON Code implements "language specifying codes" consisting of
+two bytes as follows:
+
+  Two-byte Japanese                             0xFE21
+  One-byte English                              0xFE80
+
+0xFE21 in a one-byte stream invokes two-byte Japanese mode, and 0xFE80
+in a two-byte stream invokes one-byte English mode.
+	The following is the one-byte encoding range for TRON Code:
+
+  One-byte Characters                           0x21-0x7E and 0x80-0xFD
+
+Control codes are in 0x00-0x20 and 0x7F (the usual ASCII control code
+range). Also, 0xA0 is reserved as a fixed-width space character.
+
+
+3.3.19: GBK
+
+	GBK is an extension to GB 2312-80 that adds all ISO 10646-
+1:1993 (GB 13000.1-93) hanzi not already in GB 2312-80. GBK is defined
+as a normative annex of GB 13000.1-93 (see Section 2.2.10). The "K" in
+"GBK" is the first sound in the Chinese word meaning "extension" (read
+"Kuo Zhan").
+	GBK is divided into five levels as follows:
+
+  Level  Encoded Range  Total Code Points  Total Encoded Characters
+  ^^^^^  ^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^
+  GBK/1  0xA1A1-0xA9FE    846                717
+  GBK/2  0xB0A1-0xF7FE  6,768              6,763
+  GBK/3  0x8140-0xA0FE  6,080              6,080
+  GBK/4  0xAA40-0xFEA0  8,160              8,160
+  GBK/5  0xA840-0xA9A0    192                166
+
+	There are also 1,894 user-defined code points as follows:
+
+  Encoded Range  Total Code Points
+  ^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^
+  0xAAA1-0xAFFE  564
+  0xF8A1-0xFEFE  658
+  0xA140-0xA7A0  672
+
+	GBK thus provides a total of 23,940 code points, 21,886 of
+which are assigned.
+	Each "row" in the GBK code table consists of 190 characters.
+The following describes the encoding ranges of GBK in detail:
+
+  Two-byte Standard Characters                  Encoding Ranges
+  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                  ^^^^^^^^^^^^^^^
+  first byte range                              0x81-0xFE
+  second byte ranges                            0x40-0x7E and 0x80-0xFE
+
+  One-byte Characters                           Encoding Range
+  ^^^^^^^^^^^^^^^^^^^                           ^^^^^^^^^^^^^^
+  ASCII                                         0x21-0x7E
+
+Note that the sub-range 0xA1A1-0xFEFE in the above encoding is still
+identical, in terms of character-to-code allocation, with GB 2312-80
+in EUC encoding. GBK is therefore backward-compatible with GB 2312-80
+and forward-compatible with ISO 10646-1:1993.
+	GBK is the standard character set and encoding for the
+Simplified Chinese version of Windows 95.
+
+
+3.4: CJK CODE PAGES
+
+	Many times one reads about references to "Code Pages" in
+material about CJK (and other) character sets and encodings. These are
+not literal pages, but rather references to a character set and
+encoding combination. In the case of CJK Code Pages, they definitely
+comprise more than one page!
+	Microsoft refers to its supported CJK character sets and
+encodings through such Code Page designations. The following is a
+listing of several Microsoft CJK Code Pages along with their
+characteristics:
+
+  Code Page  Characteristics
+  ^^^^^^^^^  ^^^^^^^^^^^^^^^
+  932        JIS X 0208-1990 base, Shift-JIS encoding, Microsoft
+             extensions (NEC Row 13 and IBM select characters in
+             redundantly encoded in Rows 89 through 92 and Rows 115
+             through 119)
+  936        GB 2312-80 base, EUC encoding
+  949        KS C 5601-1992 base, Unified Hangul Code encoding,
+             remaining 8,822 pre-combined hangul as extension (all of
+             this is referred to as Unified Hangul Code)
+  950        Big Five base, Big Five encoding, Microsoft extensions
+             (actually, the ETen extensions of Row 89)
+  1361       Johab base, Johab encoding
+
+	IBM also uses Code Page designations, and, in fact, some
+designations (and associated characteristics) are nearly identical to
+those in the above table, most notably, Code Pages 932 and 936. IBM's
+Code Page 932 does not include NEC Row 13 or IBM select characters in
+Rows 89 through 92.
+	The best way to describe IBM Code Page designations is by
+first listing the SBCS (Single-Byte Character Set) and DBCS (Double-
+Byte Character Set) Code Page designations (those designated by "Host"
+use EBCDIC-based encodings):
+
+  IBM SBCS Code Page          Characteristics
+  ^^^^^^^^^^^^^^^^^^          ^^^^^^^^^^^^^^^
+  37 (US)                     SBCS-Host
+  290 (Japanese)              SBCS-Host
+  833 (Korean)                SBCS-Host
+  836 (Simplified Chinese)    SBCS-Host
+  891 (Korean)                SBCS-PC
+  897 (Japanese)              SBCS-PC
+  903 (Simplified Chinese)    SBCS-PC
+  904 (Traditional Chinese)   SBCS-PC
+
+  IBM DBCS Code Page          Characteristics
+  ^^^^^^^^^^^^^^^^^^          ^^^^^^^^^^^^^^^
+  300 (Japanese)              DBCS-Host
+  301 (Japanese)              DBCS-PC
+  834 (Korean)                DBCS-Host
+  835 (Traditional Chinese)   DBCS-Host
+  837 (Simplified Chinese)    DBCS-Host
+  926 (Korean)                DBCS-PC
+  927 (Traditional Chinese)   DBCS-PC
+  928 (Simplified Chinese)    DBCS-PC
+
+So far there appears to be no relationship with Microsoft's CJK Code
+Pages, but when we combine the above SBCS and DBCS Code Pages into
+MBCS (Multiple-Byte Character Set) Code Pages, things become a bit
+more revealing:
+
+  IBM MBCS Code Page          Characteristics
+  ^^^^^^^^^^^^^^^^^^          ^^^^^^^^^^^^^^^
+  930 (Japanese)              MBCS-Host (Code Pages 300 and 290)
+  932 (Japanese)              MBCS-PC (Code Pages 301 and 897)
+  933 (Korean)                MBCS-Host (Code Pages 834 and 833)
+  934 (Korean)                MBCS-PC (Code Pages 926 and 891)
+  938 (Traditional Chinese)   MBCS-PC (Code Pages 927 and 904)
+  936 (Simplified Chinese)    MBCS-PC (Code Pages 928 and 903)
+  5031 (Simplified Chinese)   MBCS-Host (Code Pages 837 and 836)
+  5033 (Traditional Chinese)  MBCS-Host (Code Pages 835 and 37)
+
+So, you can now see that many of Microsoft's CJK Code Pages are
+derived from those established by IBM.
+	More detailed information on the encoding specifications for
+DBCS-Host and DBCS-PC can be found in Sections 3.3.14 and 3.3.15,
+respectively.
+
+
+PART 4: CJK CHARACTER SET COMPATIBILITY ISSUES
+
+	The sections below provide detailed information about
+compatibility issues between CJK character sets, to include tidbits of
+useful information.
+	One thing to mention first is that conversion to and from
+IBM's DBCS-Host (Section 3.3.14) and DBCS-PC (Section 3.3.15)
+encodings is table-driven, and fully documented in the following IBM
+publication:
+
+o IBM Corporation. "Character Data Representation Architecture - Level
+  2, Registry." 1993. IBM order number SC09-1391-01.
+
+Unfortunately, the CJK-related tables are not supplied in machine-
+readable format, and must be obtained from IBM directly. The only real
+compatibility issue is trying to obtain the conversion tables from
+IBM.
+
+
+4.1: JAPANESE
+
+	In general, when a Japanese character set was revised,
+characters were simply added (usually appended at the end). However,
+when JIS C 6226-1978 was revised in 1983 (to become JIS X 0208-1983),
+a bit more happened (this is still a controversy).
+	A detailed treatment of the two main transitions, JIS C 6226-
+1978 to JIS X 0208-1983 and JIS X 0208-1983 to JIS X 0208-1990, is
+covered in Appendix J of UJIP. I provide machine-readable files that
+detail these transitions at the following URL:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/AppJ/
+
+	An interesting side note here is that there is a reason why
+there are many lists that illustrate JIS C 6226-1978 and JIS X 0208-
+1983 kanji form differences. While most share the same basic set of
+changes, there are some inconsistencies. Well, it turns out that JIS C
+6226-1978 had ten printings, and not all of them shared the same kanji
+forms. If comparisons between JIS C 6226-1978 and JIS X 0208-1983 were
+made using different printings of the JIS C 6226-1978 manual, the
+results can differ slightly.
+	There are also interesting correspondences between JIS X
+0208-1990 and JIS X 0212-1990. 28 kanji that vanished during the JIS C
+6226-1978 to JIS X 0208-1983 transition (they were replaced by
+simplified versions) were restored in JIS X 0212-1990 (at totally
+different code points). Appendix J of UJIP discusses this, and a file
+at the following URL details the 28 mappings:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/AppJ/TJ2.jis
+
+
+4.2: CHINESE (PRC)
+
+	The basic PRC standard, GB 2312-80, has been revised, but not
+through a later version of the standard. Instead, the revisions were
+carried out in the form of three other documents. Specifically, they
+are (in order of publication):
+
+o GB 6345.1-86 (see Section 2.2.3)
+o GB 8565.2-88 (see Section 2.2.6)
+o GB/T 12345-90 (see Section 2.2.7)
+
+Unless you are aware of these documents, figuring out what has been
+corrected or added to GB 2312-80 is nearly impossible.
+
+
+4.3: CHINESE (TAIWAN)
+
+	The first question people think of with regard to Big Five and
+CNS 11643-1992 is compatibility. It turns out that Planes 1 and 2 of
+CNS 11643-1992 are more or less equivalent to Big Five, but a handful
+of hanzi are in a different order. The following tables detail the
+mapping from Big Five (with the ETen extension) to CNS 11643-1992
+(when using this conversion table, keep in mind the encoding space
+ranges for both Big Five and CNS 11643-1992):
+
+Big Five Level 1 Correspondence to CNS 11643-1992 Plane 1:
+
+  0xA140-0xA1F5 <-> 0x2121-0x2256
+         0xA1F6 <-> 0x2258
+         0xA1F7 <-> 0x2257
+  0xA1F8-0xA2AE <-> 0x2259-0x234E
+  0xA2AF-0xA3BF <-> 0x2421-0x2570
+  0xA3C0-0xA3E0 <-> 0x4221-0x4241  # Symbols for control characters
+  0xA440-0xACFD <-> 0x4421-0x5322  # Level 1 Hanzi BEGIN
+         0xACFE <-> 0x5753
+  0xAD40-0xAFCF <-> 0x5323-0x5752
+  0xAFD0-0xBBC7 <-> 0x5754-0x6B4F
+  0xBBC8-0xBE51 <-> 0x6B51-0x6F5B
+         0xBE52 <-> 0x6B50
+  0xBE53-0xC1AA <-> 0x6F5C-0x7534
+  0xC1AB-0xC2CA <-> 0x7536-0x7736
+         0xC2CB <-> 0x7535
+  0xC2CC-0xC360 <-> 0x7737-0x782C
+  0xC361-0xC3B8 <-> 0x782E-0x7863
+         0xC3B9 <-> 0x7865
+         0xC3BA <-> 0x7864
+  0xC3BB-0xC455 <-> 0x7866-0x7961
+         0xC456 <-> 0x782D
+  0xC457-0xC67E <-> 0x7962-0x7D4B  # Level 1 Hanzi END
+  0xC6A1-0xC6AA <-> 0x2621-0x262A  # Circled numerals
+  0xC6AB-0xC6B4 <-> 0x262B-0x2634  # Parenthesized numerals
+  0xC6B5-0xC6BE <-> 0x2635-0x263E  # Lowercase Roman numerals
+  0xC6BF-0xC6C0 <-> 0x2723-0x2724  # 213 radicals BEGIN
+  0xC6C1-0xC6C2 <-> 0x2726, 0x2728
+  0xC6C3-0xC6C5 <-> 0x272D-0x272F
+  0xC6C6-0xC6C7 <-> 0x2734, 0x2737
+  0xC6C8-0xC6C9 <-> 0x273A, 0x273C
+  0xC6CA-0xC6CB <-> 0x2742, 0x2747
+  0xC6CC-0xC6CD <-> 0x274E, 0x2753
+  0xC6CE-0xC6CF <-> 0x2754-0x2755
+  0xC6D0-0xC6D1 <-> 0x2759-0x275A
+  0xC6D2-0xC6D3 <-> 0x2761, 0x2766
+  0xC6D4-0xC6D5 <-> 0x2829-0x282A
+  0xC6D6-0xC6D7 <-> 0x2863, 0x286C # 213 radicals END
+  0xC6D8-0xC6E6  -> ******         # Japanese symbols
+  0xC6E7-0xC77A  -> ******         # Hiragana
+  0xC77B-0xC7F2  -> ******         # Katakana
+  0xC7F3-0xC875  -> ******         # Cyrillic alphabet
+  0xC876-0xC878  -> ******         # Symbols
+         0xC87A  -> ******         # Hanzi element
+         0xC87C  -> ******         # Hanzi element
+  0xC87E-0xC8A1  -> ******         # Hanzi elements
+  0xC8A3-0xC8A4  -> ******         # Hanzi elements
+  0xC8A5-0xC8CC  -> ******         # Combined numerals
+  0xC8CD-0xC8D3  -> ******         # Japanese symbols
+
+Big Five Level 1 Correspondences to CNS 11643-1992 Plane 4:
+
+         0xC879 <-> 0x2123         # Hanzi element
+         0xC87B <-> 0x2124         # Hanzi element
+         0xC87D <-> 0x212A         # Hanzi element
+         0xC8A2 <-> 0x2152         # Hanzi element
+
+Big Five Level 2 Correspondence to CNS 11643-1992 Plane 1:
+
+         0xC94A  -> 0x4442         # duplicate of 0xA461
+
+Big Five Level 2 Correspondences to CNS 11643-1992 Plane 2:
+
+  0xC940-0xC949 <-> 0x2121-0x212A  # Level 2 Hanzi BEGIN
+  0xC94B-0xC96B <-> 0x212B-0x214B
+  0xC96C-0xC9BD <-> 0x214D-0x217C
+         0xC9BE <-> 0x214C
+  0xC9BF-0xC9EC <-> 0x217D-0x224C
+  0xC9ED-0xCAF6 <-> 0x224E-0x2438
+         0xCAF7 <-> 0x224D
+  0xCAF8-0xD6CB <-> 0x2439-0x376E
+         0xD6CC <-> 0x3E63
+  0xD6CD-0xD779 <-> 0x3770-0x387D
+         0xD77A <-> 0x3F6A
+  0xD77B-0xDADE <-> 0x387E-0x3E62
+         0xDADF <-> 0x376F
+  0xDAE0-0xDBA6 <-> 0x3E64-0x3F69
+  0xDBA7-0xDDFB <-> 0x3F6B-0x4423
+         0xDDFC  -> 0x4176         # duplicate of 0xDCD1
+  0xDDFD-0xE8A2 <-> 0x4424-0x554A
+  0xE8A3-0xE975 <-> 0x554C-0x5721
+  0xE976-0xEB5A <-> 0x5723-0x5A27
+  0xEB5B-0xEBF0 <-> 0x5A29-0x5B3E
+         0xEBF1 <-> 0x554B
+  0xEBF2-0xECDD <-> 0x5B3F-0x5C69
+         0xECDE <-> 0x5722
+  0xECDF-0xEDA9 <-> 0x5C6A-0x5D73
+  0xEDAA-0xEEEA <-> 0x5D75-0x6038
+         0xEEEB <-> 0x642F
+  0xEEEC-0xF055 <-> 0x6039-0x6242
+         0xF056 <-> 0x5D74
+  0xF057-0xF0CA <-> 0x6243-0x6336
+         0xF0CB <-> 0x5A28
+  0xF0CC-0xF162 <-> 0x6337-0x642E
+  0xF163-0xF16A <-> 0x6430-0x6437
+         0xF16B <-> 0x6761
+  0xF16C-0xF267 <-> 0x6438-0x6572
+         0xF268 <-> 0x6934
+  0xF269-0xF2C2 <-> 0x6573-0x664C
+  0xF2C3-0xF374 <-> 0x664E-0x6760
+  0xF375-0xF465 <-> 0x6762-0x6933
+  0xF466-0xF4B4 <-> 0x6935-0x6961
+         0xF4B5 <-> 0x664D
+  0xF4B6-0xF4FC <-> 0x6962-0x6A4A
+  0xF4FD-0xF662 <-> 0x6A4C-0x6C51
+         0xF663 <-> 0x6A4B
+  0xF664-0xF976 <-> 0x6C52-0x7165
+  0xF977-0xF9C3 <-> 0x7167-0x7233
+         0xF9C4 <-> 0x7166
+         0xF9C5 <-> 0x7234
+         0xF9C6 <-> 0x7240
+  0xF9C7-0xF9D1 <-> 0x7235-0x723F
+  0xF9D2-0xF9D5 <-> 0x7241-0x7244  # Level 2 Hanzi END
+  0xF9DD-0xF9FE  -> ******         # Symbols
+
+Big Five Level 2 Correspondence to CNS 11643-1992 Plane 3:
+
+         0xF9D6 <-> 0x4337         # ETen-specific hanzi
+         0xF9D7 <-> 0x4F50         # ETen-specific hanzi
+         0xF9D8 <-> 0x444E         # ETen-specific hanzi
+         0xF9D9 <-> 0x504A         # ETen-specific hanzi
+         0xF9DA <-> 0x2C5D         # ETen-specific hanzi
+         0xF9DB <-> 0x3D7E         # ETen-specific hanzi
+         0xF9DC <-> 0x4B5C         # ETen-specific hanzi
+
+I adapted the above from material Ross Paterson (rap@doc.ic.ac.uk)
+kindly made available at the following URL:
+
+  http://www.ifcss.org:8001/www/pub/software/info/cjk-codes/
+
+Check it out. Basically, I just changed the CNS 11643-1992 codes from
+decimal row-cell values to hexadecimal codes, and corrected the
+mappings to correspond to ETen's Big Five (which is considered to be
+the most standard).
+	It turns out that corrections were made to Big Five (at least
+in the ETen and Microsoft implementations thereof) which made it a bit
+closer to CNS 11643-1992 as far as character ordering is concerned.
+The following six lines of code correspondences:
+
+  0xCAF8-0xD6CB <-> 0x2439-0x376E
+         0xD6CC <-> 0x3E63
+  0xD6CD-0xD779 <-> 0x3770-0x387D
+         0xD77A <-> 0x3F6A
+  0xD77B-0xDADE <-> 0x387E-0x3E62
+         0xDADF <-> 0x376F
+
+can now be expressed as the following three lines:
+
+  0xCAF8-0xD779 <-> 0x2439-0x387D
+         0xD77A <-> 0x3F6A
+  0xD77B-0xDBA6 <-> 0x387E-0x3F69
+
+In essence, the ordering of Big Five characters 0xD6CC and 0xDADF were
+reversed. This resulted in the same order as found in CNS 11643-1992
+Plane 2.
+	As for the two duplicate hanzi in Big Five (as indicated in
+the above tables), they have been placed into a compatibility zone in
+ISO 10646-1:1993 (this allows for round-trip conversion). The mapping
+is as follows:
+
+  Big Five  ISO 10646-1:1993
+  ^^^^^^^^  ^^^^^^^^^^^^^^^^
+  0xC94A -> 0xFA0C
+  0xDDFC -> 0xFA0D
+
+	Speaking of duplicate hanzi, Plane 1 of CNS 11643-1992
+contains 213 classical radicals in rows 27 through 29. However, 187 of
+them map directly to hanzi code points in Planes 1, 2, and 3 (and
+naturally to Big Five). Below is a detailed mapping of these 213
+radicals:
+
+  Radical   CNS 11643   Big Five    Radical   CNS 11643   Big Five
+  ^^^^^^^   ^^^^^^^^^   ^^^^^^^^    ^^^^^^^   ^^^^^^^^^   ^^^^^^^^
+  0x2721 -> 0x4421      0xA440      0x282E -> 0x4678      0xA5D8
+  0x2722 -> 0x2121 (3)  ******      0x282F -> 0x4679      0xA5D9
+  0x2723 -> 0x2122 (3)  0xC6BF      0x2830 -> 0x467A      0xA5DA
+  0x2724 -> 0x2123 (3)	0xC6C0      0x2831 -> 0x467B      0xA5DB
+  0x2725 -> 0x4422      0xA441      0x2832 -> 0x467C      0xA5DC
+  0x2726 -> 0x2124 (3)	0xC6C1      0x2833 -> 0x2167 (2)  0xC9A8
+  0x2727 -> 0x4428      0xA447      0x2834 -> 0x467D      0xA5DD
+  0x2728 -> ******	0xC6C2      0x2835 -> 0x467E      0xA5DE
+  0x2729 -> 0x4429      0xA448      0x2836 -> 0x4721      0xA5DF
+  0x272A -> 0x442A      0xA449      0x2837 -> 0x484C      0xA6CB
+  0x272B -> 0x442B      0xA44A      0x2838 -> 0x484D      0xA6CC
+  0x272C -> 0x442C      0xA44B      0x2839 -> 0x484E      0xA6CD
+  0x272D -> 0x2127 (3)	0xC6C3      0x283A -> 0x484F      0xA6CE
+  0x272E -> 0x2128 (3)	0xC6C4      0x283B -> 0x2269 (2)  0xCA49
+  0x272F -> ******	0xC6C5      0x283C -> 0x4850      0xA6CF
+  0x2730 -> 0x442D      0xA44C      0x283D -> 0x4851      0xA6D0
+  0x2731 -> 0x2123 (2)  0xC942      0x283E -> 0x4852      0xA6D1
+  0x2732 -> 0x442E      0xA44D      0x283F -> 0x4854      0xA6D3
+  0x2733 -> 0x4430      0xA44F      0x2840 -> 0x4855      0xA6D4
+  0x2734 -> ******      0xC6C6      0x2841 -> 0x4856      0xA6D5
+  0x2735 -> 0x4431      0xA450      0x2842 -> 0x4857      0xA6D6
+  0x2736 -> 0x2124 (2)  0xC943      0x2843 -> 0x4858      0xA6D7
+  0x2737 -> 0x2129 (3)  0xC6C7      0x2844 -> 0x485B      0xA6DA
+  0x2738 -> 0x4432      0xA451      0x2845 -> 0x485C      0xA6DB
+  0x2739 -> 0x4433      0xA452      0x2846 -> 0x485D      0xA6DC
+  0x273A -> 0x212A (3)  0xC6C8      0x2847 -> 0x485E      0xA6DD
+  0x273B -> 0x2125 (2)  0xC944      0x2848 -> 0x485F      0xA6DE
+  0x273C -> 0x212B (3)  0xC6C9      0x2849 -> 0x4860      0xA6DF
+  0x273D -> 0x4434      0xA453      0x284A -> 0x4861      0xA6E0
+  0x273E -> 0x4447      0xA466      0x284B -> 0x4862      0xA6E1
+  0x273F -> 0x212A (2)  0xC949      0x284C -> 0x4863      0xA6E2
+  0x2740 -> 0x4448      0xA467      0x284D -> 0x226A (2)  0xCA4A
+  0x2741 -> 0x4449      0xA468      0x284E -> 0x226F (2)  0xCA4F
+  0x2742 -> 0x213A (3)  0xC6CA      0x284F -> 0x4865      0xA6E4
+  0x2743 -> 0x444A      0xA469      0x2850 -> 0x4866      0xA6E5
+  0x2744 -> 0x444B      0xA46A      0x2851 -> 0x4867      0xA6E6
+  0x2745 -> 0x444C      0xA46B      0x2852 -> 0x4868      0xA6E7
+  0x2746 -> 0x444D      0xA46C      0x2853 -> 0x2270 (2)  0xCA50
+  0x2747 -> 0x213B (3)  0xC6CB      0x2854 -> 0x4B44      0xA8A3
+  0x2748 -> 0x4450      0xA46F      0x2855 -> 0x4B45      0xA8A4
+  0x2749 -> 0x4451      0xA470      0x2856 -> 0x4B46      0xA8A5
+  0x274A -> 0x4452      0xA471      0x2857 -> 0x4B47      0xA8A6
+  0x274B -> 0x4453      0xA472      0x2858 -> 0x4B48      0xA8A7
+  0x274C -> 0x212B (2)  0xC94B      0x2859 -> 0x4B49      0xA8A8
+  0x274D -> 0x4454      0xA473      0x285A -> 0x2524 (2)  0xCBA4
+  0x274E -> 0x213C (3)  0xC6CC      0x285B -> 0x4B4A      0xA8A9
+  0x274F -> 0x4456      0xA475      0x285C -> 0x4B4B      0xA8AA
+  0x2750 -> 0x4457      0xA476      0x285D -> 0x4B4C      0xA8AB
+  0x2751 -> 0x445A      0xA479      0x285E -> 0x4B4D      0xA8AC
+  0x2752 -> 0x445B      0xA47A      0x285F -> 0x4B4E      0xA8AD
+  0x2753 -> 0x213D (3)  0xC6CD      0x2860 -> 0x4B4F      0xA8AE
+  0x2754 -> 0x213E (3)  0xC6CE      0x2861 -> 0x4B50      0xA8AF
+  0x2755 -> 0x213F (3)  0xC6CF      0x2862 -> 0x4B51      0xA8B0
+  0x2756 -> 0x445C      0xA47B      0x2863 -> 0x272F (3)  0xC6D6
+  0x2757 -> 0x445D      0xA47C      0x2864 -> 0x4B57      0xA8B6
+  0x2758 -> 0x445E      0xA47D      0x2865 -> 0x4B5C      0xA8BB
+  0x2759 -> 0x2140 (3)  0xC6D0      0x2866 -> 0x4B5D      0xA8BC
+  0x275A -> 0x2142 (3)  0xC6D1      0x2867 -> 0x4B5E      0xA8BD
+  0x275B -> 0x212C (2)  0xC94C      0x2868 -> 0x4F5A      0xAAF7
+  0x275C -> 0x4540      0xA4DF      0x2869 -> 0x4F5B      0xAAF8
+  0x275D -> 0x4541      0xA4E0      0x286A -> 0x4F5C      0xAAF9
+  0x275E -> 0x4542      0xA4E1      0x286B -> 0x4F5D      0xAAFA
+  0x275F -> 0x4543      0xA4E2      0x286C -> 0x2A7D (3)  0xC6D7
+  0x2760 -> 0x4545      0xA4E4      0x286D -> 0x4F63      0xAB41
+  0x2761 -> 0x2167 (3)  0xC6D2      0x286E -> 0x4F64      0xAB42
+  0x2762 -> 0x4546      0xA4E5      0x286F -> 0x4F65      0xAB43
+  0x2763 -> 0x4547      0xA4E6      0x2870 -> 0x4F66      0xAB44
+  0x2764 -> 0x4548      0xA4E7      0x2871 -> 0x5372      0xADB1
+  0x2765 -> 0x4549      0xA4E8      0x2872 -> 0x5373      0xADB2
+  0x2766 -> 0x2169 (3)  0xC6D3      0x2873 -> 0x5374      0xADB3
+  0x2767 -> 0x454A      0xA4E9      0x2874 -> 0x5375      0xADB4
+  0x2768 -> 0x454B      0xA4EA      0x2875 -> 0x5376      0xADB5
+  0x2769 -> 0x454C      0xA4EB      0x2876 -> 0x5377      0xADB6
+  0x276A -> 0x454D      0xA4EC      0x2877 -> 0x5378      0xADB7
+  0x276B -> 0x454E      0xA4ED      0x2878 -> 0x5379      0xADB8
+  0x276C -> 0x454F      0xA4EE      0x2879 -> 0x537A      0xADB9
+  0x276D -> 0x4550      0xA4EF      0x287A -> 0x537B      0xADBA
+  0x276E -> 0x213F (2)  0xC95F      0x287B -> 0x537C      0xADBB
+  0x276F -> 0x4551      0xA4F0      0x287C -> 0x586B      0xB0A8
+  0x2770 -> 0x4552      0xA4F1      0x287D -> 0x586C      0xB0A9
+  0x2771 -> 0x4553      0xA4F2      0x287E -> 0x586D      0xB0AA
+  0x2772 -> 0x4554      0xA4F3      0x2921 -> 0x334C (2)  0xD449
+  0x2773 -> 0x2141 (2)  0xC961      0x2922 -> 0x586E      0xB0AB
+  0x2774 -> 0x4555      0xA4F4      0x2923 -> 0x334D (2)  0xD44A
+  0x2775 -> 0x4556      0xA4F5      0x2924 -> 0x586F      0xB0AC
+  0x2776 -> 0x4557      0xA4F6      0x2925 -> 0x5870      0xB0AD
+  0x2777 -> 0x4558      0xA4F7      0x2926 -> 0x5E23      0xB3BD
+  0x2778 -> 0x4559      0xA4F8      0x2927 -> 0x5E24      0xB3BE
+  0x2779 -> 0x2142 (2)  0xC962      0x2928 -> 0x5E25      0xB3BF
+  0x277A -> 0x455A      0xA4F9      0x2929 -> 0x5E26      0xB3C0
+  0x277B -> 0x455B      0xA4FA      0x292A -> 0x5E27      0xB3C1
+  0x277C -> 0x455C      0xA4FB      0x292B -> 0x5E28      0xB3C2
+  0x277D -> 0x455D      0xA4FC      0x292C -> 0x6327      0xB6C0
+  0x277E -> 0x4668      0xA5C8      0x292D -> 0x6328      0xB6C1
+  0x2821 -> 0x4669      0xA5C9      0x292E -> 0x6329      0xB6C2
+  0x2822 -> 0x466A      0xA5CA      0x292F -> 0x4155 (2)  0xDCB0
+  0x2823 -> 0x466B      0xA5CB      0x2930 -> 0x4875 (2)  0xE0EF
+  0x2824 -> 0x466C      0xA5CC      0x2931 -> 0x676F      0xB9A9
+  0x2825 -> 0x466D      0xA5CD      0x2932 -> 0x6770      0xB9AA
+  0x2826 -> 0x466E      0xA5CE      0x2933 -> 0x6771      0xB9AB
+  0x2827 -> 0x4670      0xA5D0      0x2934 -> 0x6B7C      0xBBF3
+  0x2828 -> 0x4674      0xA5D4      0x2935 -> 0x6B7D      0xBBF4
+  0x2829 -> 0x225B (3)  0xC6D4      0x2936 -> 0x702F      0xBEA6
+  0x282A -> 0x225C (3)  0xC6D5      0x2937 -> 0x733E      0xC073
+  0x282B -> 0x4675      0xA5D5      0x2938 -> 0x733F      0xC074
+  0x282C -> 0x4676      0xA5D6      0x2939 -> 0x6142 (2)  0xEFB6
+  0x282D -> 0x4677      0xA5D7
+
+
+4.4: KOREAN
+
+	The 268 duplicate hanja in KS C 5601-1992 can cause problems
+when converting to and from other CJK character sets. When converting
+from KS C 5601-1992, two or more hanja can collapse into a single code
+point. When converting these 268 hanja to KS C 5601-1992, a decision
+about which KS C 5601-1992 code point to map to must be made. The only
+exception to this is mapping to and from ISO 10646-1:1993. That
+standard encodes these 268 duplicate hanja in a compatibility zone,
+namely from 0xF900 through 0xFA0B.
+	The following is a listing of 262 hanja that map to two or
+more code points (four map to three code points, and one maps to four:
+a total of 268 redundantly-encoded hanja) in KS C 5601-1992:
+
+  Standard  Extra     Standard  Extra     Standard  Extra
+  ^^^^^^^^  ^^^^^     ^^^^^^^^  ^^^^^     ^^^^^^^^  ^^^^^
+  0x4A39 -> 0x4D4F    0x5573 -> 0x6631    0x573C -> 0x6B29
+  0x4B3D -> 0x7A22    0x5574 -> 0x6633    0x573E -> 0x6B3A
+  0x4C38 -> 0x7A66    0x5575 -> 0x6637    0x573F -> 0x6B3B
+  0x4C5A -> 0x4B56    0x5576 -> 0x6638    0x5740 -> 0x6B3D
+  0x4C78 -> 0x5050    0x5579 -> 0x663C    0x5741 -> 0x6B41
+  0x4D7A -> 0x4E2D    0x557B -> 0x6646    0x5743 -> 0x6B42
+  0x4E29 -> 0x7C29    0x557C -> 0x6647    0x5744 -> 0x6B46
+  0x4F23 -> 0x4F7B    0x557E -> 0x6652    0x5745 -> 0x6B47
+  0x4F4F -> 0x5022    0x5621 -> 0x6656    0x5747 -> 0x6B4C
+            0x5038    0x5622 -> 0x6659    0x5748 -> 0x6B4F
+  0x5142 -> 0x4B50    0x5623 -> 0x665F    0x5749 -> 0x6B50
+  0x5151 -> 0x505D    0x5624 -> 0x6661    0x574A -> 0x6B51
+  0x5159 -> 0x547C    0x5625 -> 0x6665    0x574C -> 0x6B58
+  0x5167 -> 0x552B    0x5626 -> 0x6664    0x574D -> 0x5270
+  0x522F -> 0x5155    0x5627 -> 0x6666    0x574E -> 0x5271
+  0x5233 -> 0x657C    0x5628 -> 0x6668    0x574F -> 0x5272
+  0x5234 -> 0x6644    0x562A -> 0x666A    0x5750 -> 0x5273
+  0x5235 -> 0x664A    0x562B -> 0x666B    0x5752 -> 0x5274
+  0x5236 -> 0x665C    0x562D -> 0x666F    0x5753 -> 0x5275
+  0x5237 -> 0x6676    0x562E -> 0x6671    0x5754 -> 0x5277
+  0x523A -> 0x6677    0x562F -> 0x6675    0x5755 -> 0x5278
+  0x523B -> 0x5638    0x5631 -> 0x6679    0x5757 -> 0x6C26
+            0x672C    0x5633 -> 0x6721    0x5759 -> 0x6C27
+  0x5241 -> 0x564D    0x5634 -> 0x6726    0x575B -> 0x6C2A
+  0x5263 -> 0x6871    0x5635 -> 0x6729    0x575D -> 0x6C30
+  0x526E -> 0x6A74    0x5637 -> 0x672A    0x575E -> 0x6C31
+  0x526F -> 0x6B2A    0x563A -> 0x672D    0x5762 -> 0x6C35
+  0x527A -> 0x6C32    0x563B -> 0x6730    0x5765 -> 0x6C38
+  0x527B -> 0x6C49    0x563C -> 0x673F    0x5767 -> 0x6C3A
+  0x527C -> 0x6C4A    0x563E -> 0x6746    0x576A -> 0x6C40
+  0x527E -> 0x7331    0x5640 -> 0x6747    0x576B -> 0x6C41
+  0x5321 -> 0x552E    0x5642 -> 0x674B    0x576C -> 0x6C45
+  0x5358 -> 0x7738    0x5643 -> 0x674D    0x576E -> 0x6C46
+  0x536B -> 0x7748    0x5644 -> 0x674F    0x5770 -> 0x6C55
+  0x5378 -> 0x7674    0x5645 -> 0x6750    0x5772 -> 0x6C5D
+  0x5441 -> 0x5466    0x5647 -> 0x6753    0x5773 -> 0x6C5E
+  0x5457 -> 0x7753    0x5649 -> 0x675F    0x5774 -> 0x6C61
+  0x547A -> 0x5154    0x564A -> 0x6764    0x5776 -> 0x6C64
+  0x547B -> 0x5158    0x564B -> 0x6766    0x5777 -> 0x6C67
+  0x547D -> 0x515B    0x564C -> 0x523E    0x5778 -> 0x6C68
+  0x547E -> 0x515C    0x564F -> 0x5242    0x5779 -> 0x6C77
+  0x5521 -> 0x515D    0x5650 -> 0x5243    0x577A -> 0x6C78
+  0x5522 -> 0x515E    0x5653 -> 0x5244    0x577C -> 0x6C7A
+  0x5523 -> 0x515F    0x5654 -> 0x5246    0x5821 -> 0x6D21
+  0x5524 -> 0x5160    0x5655 -> 0x5247    0x5822 -> 0x6D22
+  0x5526 -> 0x5163    0x5656 -> 0x5248    0x5823 -> 0x6D23
+  0x5527 -> 0x5164    0x5657 -> 0x5249    0x5A72 -> 0x5B64
+  0x5528 -> 0x5165    0x5658 -> 0x524A    0x5C56 -> 0x5D25
+  0x552A -> 0x5166    0x565A -> 0x524B    0x5C5F -> 0x7870
+  0x552C -> 0x5168    0x565B -> 0x524D    0x5C74 -> 0x5D55
+  0x552D -> 0x5169    0x565C -> 0x524E    0x5D41 -> 0x5B45
+  0x552F -> 0x516A    0x565E -> 0x524F    0x5F2F -> 0x616D
+  0x5530 -> 0x516B    0x565F -> 0x5250    0x5F52 -> 0x6D6E
+  0x5531 -> 0x516D    0x5660 -> 0x5251    0x5F5D -> 0x5F61
+  0x5534 -> 0x516F    0x5661 -> 0x5252    0x5F63 -> 0x5E7E
+  0x5535 -> 0x5170    0x5662 -> 0x5253    0x6063 -> 0x612D
+  0x5536 -> 0x5172    0x5663 -> 0x5254              0x6672
+  0x5539 -> 0x5176    0x5665 -> 0x5255    0x607D -> 0x5F68
+  0x553D -> 0x517A    0x5666 -> 0x5256    0x6163 -> 0x574B
+  0x5540 -> 0x517C    0x5667 -> 0x5257              0x6B52
+  0x5541 -> 0x517D    0x566B -> 0x5259    0x6226 -> 0x5E7C
+  0x5543 -> 0x517E    0x566C -> 0x525A    0x6326 -> 0x6429
+  0x5544 -> 0x5222    0x566F -> 0x525E    0x635B -> 0x723D
+  0x5545 -> 0x5223    0x5670 -> 0x525F    0x6427 -> 0x727A
+  0x5546 -> 0x5227    0x5671 -> 0x5261    0x6442 -> 0x6777
+  0x5547 -> 0x5228    0x5674 -> 0x5262    0x6445 -> 0x5162
+  0x5548 -> 0x5229    0x5675 -> 0x6867              0x5525
+  0x5549 -> 0x522A    0x5676 -> 0x6868              0x6879
+  0x554D -> 0x522B    0x5677 -> 0x6870    0x6534 -> 0x652E
+  0x554E -> 0x522D    0x5679 -> 0x6877    0x6636 -> 0x6C2F
+  0x5552 -> 0x5232    0x567A -> 0x687B    0x6728 -> 0x6071
+  0x5553 -> 0x6531    0x567B -> 0x687E    0x6856 -> 0x6A41
+  0x5554 -> 0x6532    0x567E -> 0x6927    0x6C36 -> 0x5764
+  0x5555 -> 0x6539    0x5721 -> 0x692C    0x6C56 -> 0x666C
+  0x5557 -> 0x653B    0x5723 -> 0x694C    0x6D29 -> 0x7427
+  0x5558 -> 0x653C    0x5724 -> 0x5264    0x6D33 -> 0x6E5B
+  0x5559 -> 0x6544    0x5726 -> 0x5265    0x6F37 -> 0x746E
+  0x555D -> 0x654E    0x5727 -> 0x5266    0x7263 -> 0x6375
+  0x555E -> 0x6550    0x5728 -> 0x5267    0x7333 -> 0x4B67
+  0x555F -> 0x6552    0x5729 -> 0x5268    0x7351 -> 0x5F33
+  0x5561 -> 0x6556    0x572B -> 0x5269    0x742C -> 0x7676
+  0x5564 -> 0x657A    0x572C -> 0x526A    0x7658 -> 0x6421
+  0x5565 -> 0x657B    0x5730 -> 0x526B    0x7835 -> 0x5C25
+  0x5566 -> 0x657E    0x5731 -> 0x6A65    0x786C -> 0x785B
+  0x5569 -> 0x6621    0x5733 -> 0x6A77    0x7932 -> 0x5D74
+  0x556B -> 0x6624    0x5735 -> 0x6A7C    0x7A3C -> 0x7A21
+  0x556C -> 0x6627    0x5736 -> 0x6A7E    0x7B29 -> 0x6741
+  0x556F -> 0x662D    0x5738 -> 0x6B24    0x7C41 -> 0x4D68
+  0x5571 -> 0x662F    0x573A -> 0x6B27    0x7D3B -> 0x6977
+  0x5572 -> 0x6630
+
+The above table represents a weekend of my time (but time well spent,
+in my opinion).
+
+
+4.5: ISO 10646-1:1993
+
+	The Chinese character subset of ISO 10646-1:1993
+has excellent round-trip conversion capability with the various
+national character sets. Those national character sets with duplicate
+characters, such as KS C 5601-1992 (268 hanja) and Big Five (2 hanzi),
+have corresponding code points in ISO 10646-1:1993 within
+a compatibility zone. See Sections 4.3 and 4.4 for more details.
+	Other issues regarding ISO 10646-1:1993 have to do with proper
+character rendering (that is, how characters are displayed, printed,
+or otherwise imaged). Many (sometimes) subtle character form
+differences have been collapsed under ISO 10646-1:1993. Language or
+locale was not one of the factors used in performing Han Unification.
+This means that it is nearly impossible to create a single ISO 10646-1:
+1993 font that meets the character form criteria of each of the four
+CJK locales. An ISO 10646-1:1993 code point is not enough information
+to render a Chinese character. If the font was specifically designed
+for a single locale, it is a non-problem, but if there is any CJK
+intent, text must be flagged for language or locale.
+
+
+4.6: UNICODE
+
+	One of the most interesting (and major) differences between
+the current three flavors of Unicode are the number and arrangement of
+pre-combined hangul. The following table provides a summary of the
+differences:
+
+  Unicode       Number of Pre-combined Hangul   UCS-2 Ranges
+  ^^^^^^^       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   ^^^^^^^^^^^^
+  Version 1.0   2,350 Basic Hangul              0x3400-0x3D3D
+
+  Version 1.1   2,350 Basic Hangul              0x3400-0x3D3D
+                1,930 Supplemental Hangul A     0x3D2E-0x44B7
+                2,376 Supplemental Hangul B     0x44BE-0x4DFF
+
+  Version 2.0  11,172 Hangul                    0xAC00-0xD7A3
+
+Of the above three versions, the most controversial is Version 2.0.
+Why? Because it is located in the user-defined range of Unicode
+(O-Zone: 16,384 code points in 0xA000-0xDFFF), and occupies
+approximately two-thirds of its space.
+	The information in the above table is courtesy of the
+following useful document:
+
+  ftp://unicode.org/pub/MappingTables/EastAsiaMaps/Hangul-Codes.txt
+
+The same file is also mirrored at the following URL:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/map/hangul-codes.txt
+
+
+4.7: CODE CONVERSION TIPS
+
+	There are two types of conversions that can be performed. The
+first type is converting between different encodings for the same
+character set. This is usually without problems (but not always). The
+second type is converting from one character set to another (it is not
+usually relevant whether the underlying encoding has changed or not).
+This usually involves the handling of characters that are in one
+character set, but not the other. So, what to do?
+	I suggest JConv for handling Japanese code conversion (this
+means converting between JIS, Shift-JIS, and EUC encodings). This is
+in the category of different encodings for the same character set. The
+following URLs provide executables or source code:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/mac/jconv-30.hqx
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/mac/jconv-dd-181.hqx
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/dos/jconv.exe
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/src/jconv.c
+
+There are other programs available that do the same basic thing as
+JConv, such as kc and nkf. They are available at the following URL:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/unix/
+
+	For software and tables that handles Chinese code conversion
+(this includes conversion to and from Japanese), I suggest browsing at
+the following URLs:
+
+  ftp://etlport.etl.go.jp/pub/iso-2022-cn/convert/
+  ftp://ftp.ifcss.org/pub/software/dos/convert/
+  ftp://ftp.ifcss.org/pub/software/mac/convert/
+  ftp://ftp.ifcss.org/pub/software/ms-win/convert/
+  ftp://ftp.ifcss.org/pub/software/unix/convert/
+  ftp://ftp.ifcss.org/pub/software/vms/convert/
+  ftp://ftp.net.tsinghua.edu.cn/pub/Chinese/convert/
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/map/
+  ftp://ftp.seed.net.tw/Pub/Chinese/DOS/code-convert/
+  http://www.yajima.kuis.kyoto-u.ac.jp/staffs/yasuoka/CJK.html
+
+The latter URL has FTP links to tables created by Koichi Yasuoka
+(yasuoka@kudpc.kyoto-u.ac.jp).
+	The following URLs provide utilities or tables for converting
+between various Korean encodings (the last represent the same file):
+
+  ftp://cair-archive.kaist.ac.kr/pub/hangul/code/
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/map/non-hangul-codes.txt
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/map/hangul-codes.txt
+  ftp://unicode.org/pub/MappingTables/EastAsiaMaps/Hangul-Codes.txt
+
+A popular Korean code conversion utility seems to be "hcode" by
+June-Yub Lee (jylee@cims.nyu.edu).
+	Finally, the following URLs provide many Unicode- and CJK-
+related mapping tables:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/map/
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/unicode/
+  ftp://unicode.org/pub/MappingTables/
+  http://www.yajima.kuis.kyoto-u.ac.jp/staffs/yasuoka/CJK.html
+
+Note that the official and authoritative Unicode mapping tables (from
+Unicode values to various international, national and vendor
+standards) are maintained by the Unicode Consortium at the following
+URL:
+
+  ftp://unicode.org/pub/MappingTables/
+
+Version 2.0 of "The Unicode Standard" (to be published by Addison-
+Wesley shortly) will include these mapping tables on CD-ROM.
+
+
+PART 5: CJK-CAPABLE OPERATING SYSTEMS
+
+	The first step in being able to display CJK text is to obtain
+an operating system that handles such text (or an application that
+sets up its own CJK-capable environment). Below I describe how
+different types of machines can handle CJK text.
+	Actually, for the first few releases of CJK.INF, these
+subsections will be far from complete (some may even be empty!). The
+purpose of CJK.INF is to provide detailed information on character set
+standards and encoding systems, so I therefore consider this sort of
+information secondary.
+
+
+5.1: MS-DOS
+
+	I am not aware of any CJK-capable MS-DOS operating system, but
+localized versions do exist. CJK support has been introduced with
+Microsoft's Windows operating system (see Section 5.2).
+
+
+5.2: WINDOWS
+
+	Microsoft has CJK versions of its Windows operating system
+available. The latest versions of their Windows operating system are
+called Windows 95 and Windows NT. Windows 95 supports the same
+character sets and encodings as in Windows Version 3.1 -- Windows NT
+supports Unicode (ISO 10646-1:1993). Contact Microsoft Corporation for
+more details. The URL of their WWW Home Page is:
+
+  http://www.microsoft.com/
+
+Nadine Kano's "Developing International Software for Windows 95 and
+Windows NT" provides abundant reference material for how CJK is
+supported in Windows 95 and Windows NT. Check it out.
+	TwinBridge is a package that adds CJK functionality to non-CJK
+Windows. Demo versions of TwinBridge for Japanese and Chinese are at
+the following URLs:
+
+  ftp://ftp.netcom.com/pub/tw/twinbrg/Japanese/demo/tbjdemo.zip
+  ftp://ftp.netcom.com/pub/tw/twinbrg/Chinese/demo/tbcdemo.zip
+
+	Another useful CJK add-on for Windows 95 is NJWIN (see Section
+7.10) by Hongbo Data Systems.
+
+
+5.3: MACINTOSH
+
+	Macintosh is well-known as a computer that was designed to
+handle multilingual texts. There are currently fully-localized
+operating systems available for Japanese (KanjiTalk), Chinese
+(simplified and traditional available), and Korean (HangulTalk). In
+addition, Apple has developed "Language Kits" (*LK) for Chinese (CLK)
+and Japanese (JLK). A Korean Language Kit (KLK) will be released
+shortly.
+	These localized operating systems can usually be installed
+together in order to make your system CJK-capable.
+	The common portion of these CJK-capable operating systems is a
+technology Apple calls "WorldScript II" ("WorldScript I" is for one-
+byte scripts). It provides the basic one- and two-byte functionality.
+
+
+5.4: UNIX AND X WINDOWS
+
+	The typical encoding system used on UNIX and X Windows is EUC
+(see Section 3.2). Many systems, such as IBM's AIX, can be configured
+to handle both EUC and Shift-JIS (for Japanese). In addition, X11R6 (X
+Window System, Version 11, Release 6) has many CJK-capable features.
+	If you have a fast PC and a good amount of RAM (more than
+4MB), you should consider replacing MS-DOS (and Microsoft Windows,
+too, if you have it) with Linux, which is a full-blown UNIX operating
+system that runs on Intel processors. You can even run X Windows
+(X11R6). "Running Linux" by Matt Welsh and Lar Kaufman is an excellent
+guide to installing and using Linux. The companion volume, "Linux
+Network Administrator's Guide" by Olaf Kirch is also useful. Because
+there is a fine line -- or no line at all -- between a user and System
+Administrator when using Linux, "Essential System Administration"
+Second Edition by AEleen Frisch is a must-have.
+	Linux and Linux information are available at the following
+URLs:
+
+  ftp://sunsite.unc.edu/pub/Linux/
+  http://sunsite.unc.edu/mdw/linux.html
+
+I personally use Linux, and find it quite useful and powerful. My bias
+comes from being a UNIX user. But, you can't beat the price (free),
+and all of my favorite text-manipulation tools (such as Perl) are
+readily available.
+
+
+5.5: OTHERS
+
+	No information yet.
+
+
+PART 6: CJK TEXT AND INTERNET SERVICES
+
+	Part 5 described how CJK text is handled on a machine
+internally, but this part goes into the implications of handling such
+text externally, namely for information interchange purposes. This
+boils down to handling CJK text on Internet services.
+	For more detailed information on how these and other Internet
+services are used, I suggest "The Whole Internet User's Guide &
+Catalog" by Ed Krol. For more information on setting up and
+maintaining these and other Internet services, I suggest "Managing
+Internet Information Services" by Cricket Liu et al.
+
+
+6.1: ELECTRONIC MAIL
+
+	The most basic Internet service is electronic mail (henceforth
+to be called "e-mail"), which is virtually guaranteed to be available
+to all users regardless of their system.
+	Several Internet standards (called RFCs, short for Request For
+Comments) have been developed to describe how CJK text is to be handled
+over e-mail systems (see Section A.3.4).
+	The bottom-line is that most e-mail systems do not support
+8-bit characters (that is, bytes that have their 8th bit set). Some do
+offer 8-bit support, but you can never know what path your e-mail
+might take while on route to its recipient. This means that 7-bit ISO
+2022 (or equivalent) is the ideal encoding to use when sending CJK
+text through e-mail. If your operating system processes another
+encoding system, you must convert from that encoding to one that is
+compatible with 7-bit ISO 2022.
+	However, even 7-bit ISO 2022 encoding can get mangled by
+mail-routing software -- the escape character, sometimes even part of
+the escape sequence (meaning more than just the escape character), is
+stripped. The JConv tool described in Section 4.7 restores stripped
+escape sequences for Japanese 7-bit ISO 2022.
+	If your mailing software is MIME-compliant, there is a means
+to identify the character set and encoding of the message using the
+"charset" parameter. Some valid "charset" values include the
+following:
+
+o iso-2022-jp     (see Section 3.1.3)
+o iso-2022-jp-2   (see Section 3.1.3)
+o iso-2022-kr     (see Section 3.1.4)
+o iso-2022-cn     (see Section 3.1.5)
+o iso-2022-cn-ext (see Section 3.1.5)
+o iso-8859-1
+
+Insertion of these values should happen automatically.
+	A last-ditch effort to send CJK text through e-mail is to use
+uuencode or Base64 encoding (see Section 3.3.13). Base64 is something
+that is usually done automatically by mailing software -- explicit
+Base64 encoding is not common. The recipient must then run uudecode or
+a Base64 decoder to get the original file (if such utilities are
+available).
+
+
+6.2: USENET NEWS
+
+	Usenet News follows many of the same requirements as e-mail,
+namely that 7-bit ISO 2022 encoding is ideal. However, some newsgroups
+use specific encoding methods, such as:
+
+  alt.chinese.text             (HZ encoding used for Chinese text)
+  alt.chinese.text.big5        (Big Five encoding used for Chinese text)
+  chinese.flame                (UTF-7)
+  chinese.text.unicode         (UTF-8)
+
+Also, the newsgroups in Korean (all begin with "han.*") use EUC (EUC-
+KR) because the news-handling software in Korea has been designed to
+handle eight-bit characters correctly. Mailing list versions of Korean
+newsgroups are likely to use ISO-2022-KR encoding.
+	One common problem with Usenet News is that the escape
+characters used in 7-bit ISO 2022 encoding are sometimes stripped,
+usually by the software used to post the article. This can be quite
+annoying. There are programs available, such as JConv, that repair
+such files by restoring the escape characters.
+	Another common problem are news readers that do not allow
+escape characters to function. One simple solution is to "pipe" the
+article through a display command, such as "more," "page," "less," or
+"cat." This is done by typing a "pipe" character (|) followed by the
+command name anywhere within the article being displayed.
+
+
+6.3: GOPHER
+
+	The World-Wide Web (WWW) has almost eliminated the need for
+using Gopher, so I won't discuss it here. Not that I don't appreciate
+Gopher servers, but what I mean is that WWW browsing software permits
+access to Gopher sites.
+
+
+6.4: WORLD-WIDE WEB
+
+	First, there are two types of WWW browsers available. The most
+common type is the graphics-based browser (examples include Mosaic and
+Netscape). Graphics-based browsers have the unfortunate requirement of
+a TCP/IP (SLIP and PPP support these protocols) connection. Lynx and
+the W3 client for Emacs, which are text-based browsers, can be run
+from the host computer through a standard terminal connection. They
+don't display all the pretty pictures that folks put into their WWW
+documents, but you get all the text (this is, in many ways, a blessing
+in disguise -- transferring graphics is what slows down graphics-based
+browsers the most). When the W3 client is run using Mule, it becomes a
+fully CJK-capable WWW browser. Both Lynx and the W3 client for Emacs
+are freely available. A Japanese-capable Lynx is available at the
+following URL:
+
+  ftp://ftp.ipc.chiba-u.ac.jp/pub.asada/www/lynx/
+
+There is also a WWW page that provides information on Japanese-capable
+Lynx. Its URL is as follows:
+
+  http://www.icsd6.tj.chiba-u.ac.jp/lynx/
+
+	When WWW documents first came online, there was no method for
+handling CJK character sets. This has, fortunately, changed. As of
+this writing, two commercial WWW browsers support Japanese. They are
+Infomosaic by Fujitsu Limited, and Netscape Navigator by Netscape
+Communications Corporation (Version 1.1 added Japanese support). Both
+are graphics-based browsers. The former can be ordered at the
+following URL:
+
+  http://www.fujitsu.co.jp/
+
+The latter can be found at the following URLs:
+
+  http://www.netscape.com/
+  ftp://ftp.netscape.com/
+
+	One can also use a delegate server to *filter* Japanese codes
+to the one supported by your browser. It is also possible to
+"Japanize" existing WWW browsers using assorted tools and patches.
+Katsuhiko Momoi (momoi@tigger.stcloud.msus.edu) has authored an
+excellent guide to Japanizing WWW browsers. Its URL is:
+
+  http://condor.stcloud.msus.edu:20020/netscape.html
+
+I *highly* suggest reading it.
+	Japanese-capable WWW browsers support automatic detection of
+the three Japanese encoding methods (JIS, Shift-JIS, and EUC). Hey,
+but, what about support for the "C" and "K" of CJK? Attempting to
+answer this question provides us an answer to another question: "What
+is the best encoding method to use for CJK WWW documents?"
+	Encoding methods such as EUC and Shift-JIS provide for mixing
+only two character sets. This is because they provide no way to *flag*
+or *tag* text for locale (character set) information. Without flagging
+information, it is impossible to distinguish Japanese EUC from Chinese
+or Korean EUC. However, the escape sequences used in 7-bit ISO 2022
+encoding explicitly provide locale information. 7-bit ISO 2022 is
+ideal for static documents, which is exactly what one finds on WWW.
+	My personal recommendation (for the short-term) is to compose
+WWW documents (also called HTML documents; HTML stands for Hyper Text
+Markup Language) using 7-bit ISO 2022 encoding. The escape sequences
+themselves act as explicit flags that indicate locale. However, some
+WWW clients are confused by 7-bit ISO 2022 encoding, but the products
+by Netscape Communications and Fujitsu Limited prove that this can
+work. See the following URL for a description of this problem:
+
+  http://www.ntt.jp/japan/note-on-JP/LibWWW-patch.html
+
+	Check out the following URLs for information on and proposals
+for international support for WWW:
+
+  http://www.ebt.com:8080/docs/multilingual-www.html
+  http://www.w3.org/hypertext/WWW/International/Overview/
+
+	There is currently an RFC in the works (called an Internet
+Draft) to address the problem of internationalizing HTML by using
+Unicode. It is very promising. The latest draft is available at the
+following URLs:
+
+  ftp://ds.internic.net/internet-drafts/draft-ietf-html-i18n-04.txt.Z
+  ftp://ftp.isi.edu/internet-drafts/draft-ietf-html-i18n-04.txt
+  ftp://munnari.oz.au/internet-drafts/draft-ietf-html-i18n-04.txt.Z
+  ftp://nic.nordu.net/internet-drafts/draft-ietf-html-i18n-04.txt
+
+Note that some have been compressed.
+
+
+6.5: FILE TRANSFER TIPS
+
+	Although CJK encoding systems such as Shift-JIS and EUC make
+extensive use of 8-bit bytes, that does not mean that you need to
+treat the data as binary. Such files are simply to be treated as text,
+and should be transferred in text mode (for example, FTP's ASCII mode,
+which is also called "Type A Transfer").
+	When text files are transferred in binary mode (such as FTP's
+BINARY mode, which is also called Type I Transfer"), line termination
+characters are left unaltered. For example, when transferring a text
+file from UNIX to Macintosh, a text transfer will translate the UNIX
+newline (0x0A) characters to Macintosh carriage return (0x0D)
+characters, but a binary transfer will make no such modifications.
+Text-style conversion is typically desired.
+	The most common types of files that need to be handled as
+binary include tar archives (*.tar), compressed files (*.Z, *.gz,
+*.zip, *.zoo, *.lzh, and so on), and executables (*.exe, *.bin, and so
+on).
+
+
+PART 7: CJK TEXT HANDLING SOFTWARE
+
+	This section describes various CJK-capable software packages.
+I expect this section to grow with future versions of this document. I
+define "CJK-capable" as being able to support Chinese, Japanese, and
+Korean text.
+	The descriptions I provide below are intentionally short. You
+are encouraged to use the information pointers to obtain further
+information or the software itself.
+
+
+7.1: MULE
+
+	Mule (multilingual enhancement to GNU Emacs), written by
+Kenichi Handa (handa@etl.go.jp), is the first (and only?) CJK-capable
+editor for UNIX systems, and is freely available under the terms of
+the GNU General Public License. Mule was developed from Nemacs
+(Nihongo Emacs).
+	Mule is available at the following URL:
+
+  ftp://etlport.etl.go.jp/pub/mule/
+
+	Mule, beginning with Version 2.2, includes handy utilities
+(any2ps and m2ps) for printing files in any of the encodings supported
+by Mule (which is a lot of encodings, by the way). These programs use
+BDF fonts. See the beginning of Part 2 for a list of URLs that have
+CJK BDF fonts.
+	GNU Emacs is a fine editor, and Mule takes it several steps
+further by providing multilingual support. I personally use Mule
+together with SKK (for Japanese input) -- it is a superb combination.
+
+
+7.2: CNPRINT
+
+	CNPRINT, developed by Yidao Cai (cai@neurophys.wisc.edu), is a
+utility to print CJK text (or convert it to a PostScript file), and is
+available for MS-DOS, VMS, and UNIX systems. A wide range of encoding
+methods are supported by CNPRINT.
+	CNPRINT is available at the following URLs:
+
+  ftp://ftp.ifcss.org/pub/software/{dos,unix,vms}/print/
+  ftp://neurophys.wisc.edu/[public.cn]/
+
+
+7.3: MASS
+
+	MASS (Multilingual Application Support Service), developed at
+the National University of Singapore, is a suite of software tools
+that speed and ease the development of UNIX-based CJK (actually, more
+than just CJK) applications. It supports a wide variety of character
+sets and encodings, including ISO 10646-1:1993 (UCS-2, UTF-7, and
+UTF-8), EACC, and CCCII.
+	More information on MASS, to include contact information for
+its developers, can be found at the following URL:
+
+  http://www.iss.nus.sg/RND/MLP/Projects/MASS/MASS.html
+
+
+7.4: ADOBE TYPE MANAGER (ATM)
+
+	Adobe Type Manager for Macintosh, beginning with Version 3.8,
+is CJK-capable (as long as the underlying operating system is CJK-
+capable). Actually, ATM generically supports CID-keyed fonts, which
+are based on a newly-developed file specification for fonts with large
+numbers of characters (like CJK fonts). See Section 7.9 for more
+details.
+	ATM is very easy to obtain. It is bundled with fonts and
+applications from Adobe Systems (chances are you have ATM if you
+recently purchased an Adobe product). But what about Windows? The
+Windows version of ATM should soon follow with identical
+functionality.
+
+
+7.5: MACINTOSH SOFTWARE
+
+	WorldScript II, a System Extension introduced with System 7,
+provides multi-byte script handling, namely CJK support. If a
+Macintosh product claims to support WorldScript II, chances are it is
+CJK-capable (provided that your operating system has the necessary
+extensions loaded).
+	The CJK encodings that are supported by WorldScript II capable
+applications are the same as made available by the underlying
+Macintosh operating system. No import/export of other encodings is
+supported at the operating system level. You must run separate
+conversion utilities for both import and export. Anyway, below are
+some products that are known to be CJK capable.
+	Nisus Writer, written by Nisus Software, is fully CJK-capable
+as long as you have the appropriate scripts installed (such as CLK for
+Chinese or JLK for Japanese). A "Language Key" (read "dongle") is also
+required for Chinese and Korean (and some one-byte scripts such as
+Arabic and Hebrew). A demo version of Nisus Writer is available at the
+following URL:
+
+  ftp://ftp.nisus-soft.com/pub/nisus/demos/
+
+Give it a try! Updates are also available at the same FTP site. Nisus
+Software can be contacted using the following e-mail address or
+through their WWW page:
+
+  info@nisus-soft.com
+  http://www.nisus-soft.com/
+
+I also suggest reading "The Nisus Way" by Joe Kissell. Chapter 13
+provides detailed information about using Nisus Writer with
+WorldScript, and includes a CD-ROM containing among other things a
+trial (expires after 90 days) version of Nisus Writer and a
+non-expiring version of Nisus Compact.
+	ClarisWorks by Claris Corporation, beginning with Version 4.0,
+is compatible with WorldScript II and all Apple language kits. This
+translates into full CJK support. The following URL provides a trial
+version of ClarisWorks:
+
+  ftp://ftp.claris.com/pub/USA-Macintosh/Trial_Software/
+
+The following URL has detailed information on this and other Claris
+products:
+
+  http://www.claris.com/
+
+	The latest version of WordPerfect by Novell Incorporated is
+also compatible with WorldScript II. The following URL has detailed
+information:
+
+  http://wp.novell.com/tree.htm
+
+
+7.6: MACBLUE TELNET
+
+	Although MacBlue Telnet (a modified version of NCSA Telnet) is
+Macintosh software, I describe it separately because it does not
+require the various Apple Language Kits or localized operating
+systems. There are also input methods, adapted from cxterm (see
+Section 7.7), available that cover the CJK spectrum (Japanese,
+Simplified Chinese, Traditional Chinese, and Korean).
+	MacBlue Telnet is available at the following URL:
+
+  ftp://ftp.ifcss.org/pub/software/mac/networking/MacBlueTelnet/
+
+Its associated CJK input methods are at the following URL:
+
+  ftp://ftp.ifcss.org/pub/software/mac/input/
+
+
+7.7: CXTERM
+
+	This program, cxterm, is a CJK-capable xterm for X Windows
+(works with X11R4, X11R5, and X11R6). It is based on the X11R6 xterm.
+It is available at the following URL:
+
+  ftp://ftp.ifcss.org/pub/software/x-win/cxterm/
+
+	The following URL is for a program that adds Unicode
+capability to cxterm:
+
+  ftp://ftp.ifcss.org/pub/software/unix/convert/hztty-2.0.tar.gz
+
+The following URL adds support for other encodings to cxterm:
+
+  ftp://ftp.ifcss.org/pub/software/unix/convert/BeTTY-1.534.tar.gz
+
+
+7.8: UW-DBM
+
+	UW-DBM, for Windows 3.1, Windows 95, and Windows NT, is a
+program that allows users to handle Chinese (Big Five, GB-2312-80, or
+HZ code), Japanese (Shift-JIS), and Korean (KS C 5601-1992)
+simultaneously. More information on UW-DBM is available at the
+following URL:
+
+  http://www.gy.com/ccd/win95/cjkw95.htm
+
+	A demo version of UW-DBM is available at the following URL:
+
+  ftp://ftp.aimnet.com/pub/users/chinabus/uwdbm40.zip
+
+
+7.9: POSTSCRIPT
+
+	With the introduction of CID-keyed Font Technology, PostScript
+has become fully CJK capable.
+	Adobe Systems has developed the following CJK character
+collection for CID-keyed fonts (font developers are encouraged to
+conform to these specifications):
+
+  Character Collection  CIDs   Supported Character Sets & Encodings
+  ^^^^^^^^^^^^^^^^^^^^  ^^^^   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  Adobe-GB1-1           9,897  GB 2312-80 and GB/T 12345-90; 7-bit ISO
+                               2022 and EUC
+  Adobe-CNS1-0         14,099  Big Five (ETen extensions) and CNS
+                               11643-1992 Planes 1 and 2; Big Five,
+                               7-bit ISO 2022, and EUC
+  Adobe-Japan1-2        8,720  JIS X 0208-1990; Shift-JIS, 7-bit ISO
+                               2022, and EUC
+  Adobe-Japan2-0        6,068  JIS X 0212-1990; 7-bit ISO 2022 and EUC
+  Adobe-Korea1-1       18,155  KS C 5601-1992 (Macintosh extensions
+                               plus Johab); 7-bit ISO 2022, EUC, UHC,
+                               and Johab
+
+Note that Macintosh and Windows do not support any of the encodings
+for Adobe-Japan2-0, thus fonts based on that specification are
+unusable for those platforms.
+	Adobe Systems also have a few things in the works (that is,
+they are either proposed or in draft form), all of which are
+supplements to above character collections (that is, they add CIDs):
+
+  Character Collection  CIDs   Supported Character Sets & Encodings
+  ^^^^^^^^^^^^^^^^^^^^  ^^^^   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  Adobe-CNS1-1         +6,018  Add CNS 11643-1992 Plane 3 support (30
+                               of the 6,148 hanzi are in Adobe-CNS1-0)
+
+	To find out more about these CJK character collections or
+CID-keyed font technology, contact the Adobe Developers Association.
+Several CID-related documents have been published. ADA's contact
+information is as follows:
+
+  Adobe Developers Association
+  Adobe Systems Incorporated
+  1585 Charleston Road
+  P.O. Box 7900
+  Mountain View, CA 94039-7900
+  USA
+  +1-415-961-4111 (phone)
+  +1-415-967-9231 (facsimile)
+  devsupp-person@adobe.com
+  http://www.adobe.com/Support/
+
+Adobe Systems has recently developed the CID SDK (CID Software
+Developers Kit), which is on a single CD-ROM. Contact the Adobe
+Developers Association for information on obtaining a copy.
+	The complete CID-keyed font file specification and an overview
+document are available at the following URLs (as a PostScript or PDF
+[Adobe Acrobat] file, respectively):
+
+  ftp://ftp.adobe.com/pub/adobe/DeveloperSupport/TechNotes/PSfiles/
+  ftp://ftp.adobe.com/pub/adobe/DeveloperSupport/TechNotes/PDFfiles/
+
+The file names (not provided above due to URL length) are:
+
+  5014.CMap_CIDFont_Spec.ps    (complete CID engineering specification)
+  5014.CMap_CIDFont_Spec.pdf
+  5092.CID_Overview.ps         (CID technology overview)
+  5092.CID_Overview.pdf
+
+Other related files, most character collection specifications, are
+available only in PDF format at the latter URL indicated above:
+
+  5004.AFM_Spec.pdf            (Includes CID-keyed AFM specification)
+  5078b.pdf                    (Adobe-Japan1-2 character collection)
+  5079b.pdf                    (Adobe-GB1-0 character collection)
+  5080b.pdf                    (Adobe-CNS1-0 character collection)
+  5093b.pdf                    (Adobe-Korea1-0 character collection)
+  5094.pdf                     (Adobe CJK CMap file descriptions)
+  5097b.pdf                    (Adobe-Japan2-0 character collection)
+
+If you do not have Adobe Acrobat, there is a freely-available Acrobat
+Reader (for Macintosh, Windows, MS-DOS, and UNIX) at the following
+URL:
+
+  ftp://ftp.adobe.com/pub/adobe/Applications/Acrobat/
+
+	I have also placed some CJK character collection materials,
+including prototype Unicode (UCS-2 and UTF-8) CMap files, at the
+following URL:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/adobe/
+
+A sample (Adobe-Korea1-0) CIDFont is also available at the above URL.
+	There is also a somewhat brief description of CID-keyed fonts
+at the end of Chapter 6 in UJIP.
+
+
+7.10: NJWIN
+
+	Hongbo Data Systems has recently release a ShareWare ($49 USD)
+product called NJWIN whose purpose is to force the display of CJK text
+in non-CJK applications running under US Windows 95. Actually, there
+are two versions: full CJK and Japanese only.
+	NJWIN and its full description are available at the following
+URL:
+
+  http://www.njstar.com.au/njstar/njwin.htm
+
+Other (popular) URLs that carry NJWIN are as follows:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/windows/
+  ftp://ftp.cc.monash.edu.au/pub/nihongo/
+
+	Hongbo Data Systems' e-mail address is:
+
+  hongbo@njstar.com.au
+
+Their WWW Home Page is at the following URL:
+
+  http://www.njstar.com.au/
+
+
+PART 8: CJK PROGRAMMING ISSUES
+
+	This new section describes issues related to using specific
+programming languages to process CJK text.
+
+
+8.1: C AND C++
+
+	At one time I used C on a regular basis for my CJK programming
+needs, and released three tools for others to use: JConv, JChar, and
+JCode. While these tools are specific to Japanese, they can be easily
+adapted for CJK use. Their source code is available at the following
+URL:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/src/
+
+	I also provided several C code snippets in Chapter 7 of
+UJIP. These are available in machine-readable form at the following
+URL:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/Ch7/
+
+
+8.2: PERL
+
+	Although Perl does not have any special CJK facilities (note
+that most implementations of C and C++ do not either), it provides a
+powerful programming environment that is useful for many CJK-related
+tasks.
+	The noteworthy features of Perl are associative arrays and
+regular expressions. These are features not found in C or C++, and
+allow one to write meaningful code in little time.
+	JPerl is an implementation of Perl that provides two-byte
+support for Japanese (EUC or Shift-JIS encoding). It is not ideal
+because JPerl scripts often cannot run under (non-Japanese) Perl.
+	If you often write programs for internal use, I suggest that
+you check out Perl to see if it can offer you something. Chances are
+that it can. A good place to start looking at Perl are through books
+on the subject (see Section A.3.1) and at the following URL:
+
+  http://www.perl.com/
+
+	For those who like additional reading, "The Perl Journal" is
+starting up, and information is at the following URL:
+
+  http://work.media.mit.edu/the_perl_journal/
+
+
+8.3: JAVA
+
+	I am just starting to learn about the Java programming
+language (and rightly so since my wife is Javanese!). It seems to have
+a lot to offer.
+	The most interesting aspects of Java are:
+
+o Built-in support for Unicode and UTF-8.
+o The programmer must write code in the object-oriented paradigm.
+o Provides a portable way to supply compiled code.
+o Security features for Internet use.
+
+More information on Java are at the following URLs:
+
+  http://www.gamelan.com/
+  http://www.javasoft.com/
+
+Oh, Gamelan is the name of Javanese music.
+	Of the books about Java published thus far, the one I consider
+to be the best is "Java in a Nutshell" by David Flanagan.
+	One programming feature of Perl that I dearly miss in Java are
+regexes (regular expressions). Luckily, some kind person wrote a regex
+package for Java based on Perl regexes. Information on this Java regex
+package is available at the following URL:
+
+  http://www.win.net/~stevesoft/pat/
+
+
+A FINAL NOTE
+
+	I hope that the information presented here will prove
+useful. I would like to keep the electronic version of this document
+as up-to-date as possible, and through readers' input, I am able to
+do so.
+	Many readers will notice that I am very heavy into UNIX and
+Macintosh (well, I recently got my first PC). If anyone has any
+information on CJK-capable interfaces for other platforms, please feel
+free to send it to me, and I will be sure to include it in the next
+version of CJK.INF. Please include sources for the software or
+documentation by providing addresses, phone numbers, FTP sites, and so
+on.
+	Please do not hesitate to ask me further question concerning
+any subject presented in this document.
+
+
+ACKNOWLEDGMENTS
+
+	I would like to express my deepest thanks to Kazumasa Utashiro
+of Internet Initiative Japan (IIJ). He taught to me how to send and
+receive Japanese text using the 7-bit ISO 2022 codes back in 1989.
+With his help I was able to write JAPAN.INF, my book, and this
+document in order to inform others about what he has taught me plus
+more.
+	Next, I thank all the folks at O'Reilly & Associates for
+publishing UJIP. Special thanks to Tim O'Reilly for accepting the book
+proposal, and to Peter Mui for guiding me through the process. I have
+had nothing but good experiences with "them there fine folks."
+	I got to know Jack Halpern through UJIP, and he subsequently
+translated it into Japanese. Many thanks to him.
+	I am also grateful to my employer, Adobe Systems, for letting
+me work on interesting CJK-related projects. I really like what I do
+here. In particular, I want to thank Dan Mills, my manager, for
+putting up with me for these past four years.
+	Lastly, I would also like to thank the countless people who
+provided comments on JAPAN.INF, UJIP, and CJK.INF. I hope that this
+new document lives up to the spirit of my previous efforts.
+
+
+APPENDIX A: OTHER INFORMATION SOURCES
+
+	One of the most useful types of information are pointers to
+other information sources. This appendix provides just that.
+
+
+A.1: USENET NEWSGROUPS AND MAILING LISTS
+
+	Appendix L of UJIP provided information on a number of mailing
+lists. This section supplements that appendix with information on
+other useful mailing lists, and points out which ones in UJIP are
+relevant to readers of CJK.INF.
+
+
+A.1.1: USENET NEWSGROUPS
+
+	The following Usenet Newsgroups typically have postings with
+information relevant to issues discussed in CJK.INF (in alphabetical
+order):
+
+  alt.chinese.computing
+  alt.chinese.text                (HZ encoding used for Chinese text)
+  alt.chinese.text.big5           (Big Five encoding used for Chinese text)
+  alt.japanese.text               (JIS encoding used for Japanese text)
+  chinese.flame                   (UTF-7)
+  chinese.text.unicode            (UTF-8)
+  comp.lang.c
+  comp.lang.c++
+  comp.lang.java
+  comp.lang.perl.misc
+  comp.software.international
+  comp.std.internat
+  fj.editor.mule                  (JIS encoding used for Japanese text)
+  fj.kanji                        (JIS encoding used for Japanese text)
+  fj.net.infosystems.www.browsers (JIS encoding used for Japanese text)
+  fj.news.reader                  (JIS encoding used for Japanese text)
+  han.comp.hangul
+  han.sys.mac
+  sci.lang.japan                  (JIS encoding used for Japanese text)
+
+	If your local news host does not provide a feed of the fj.*
+newsgroups (shame on them!), or if you do not have access to Usenet
+News, you can alternatively fetch them from the following URL:
+
+  ftp://kuso.shef.ac.uk/pub/News/
+
+The subdirectories correspond to the newsgroup name, but with the
+"dots" being replaced by "slashes." For example, the "fj.binaries.mac"
+newsgroup is archived in the "fj/binaries/mac" subdirectory. Many
+thanks to Earl Kinmonth (jp1ek@sunc.shef.uc.uk) for this service.
+	There are some sites that carry full feeds of the fj.*
+newsgroups, and permit public access (meaning that you can configure
+your news reader to point to it). The only one I know of thus far is
+as follows:
+
+  ume.cc.tsukuba.ac.jp
+
+
+A.1.2: MAILING LISTS
+
+	The following are mailing lists that should interest readers
+of this document (some are more active than others). The first line
+after each entry indicates the address (or addresses) that can be used
+for subscribing. The second line is the address for posting.
+
+o CCNET-L MAILING LIST
+  listserv@uga.uga.edu (or listserv@uga)
+  ccnet-l@uga.uga.edu
+
+o China Net Mailing List
+  majordomo@lists.mindspring.com
+  (See http://www.asia-net.com/ or jobs@asia-net.com)
+
+o EASUG (East Asian Software Users Group) Mailing List
+  easug-request@guvax.acc.georgetown.edu
+  easug@guvax.acc.georgetown.edu
+
+o EBTI-L (Electronic Buddhist Text Initiative) Mailing List
+  ebti-l-request@uxmail.ust.hk
+  ebti-l@uxmail.ust.hk
+
+o EFJ (Electronic Frontiers Japan) Mailing List
+  majordomo@lists.twics.com
+  efj@lists.twics.com
+
+o Hangul Mailing List (han.comp.hangul newsgroup)
+  majordomo@cair.kaist.ac.kr
+  hangul@cair.kaist.ac.kr
+
+o INSOFT-L Mailing List
+  majordomo@trans2.b30.ingr.com
+  insoft-l@trans2.b30
+
+o ISO 10646 Mailing List
+  listproc@listproc.hcf.jhu.edu
+  iso10646@listproc.hcf.jhu.edu
+
+o Japan Net Mailing List
+  majordomo@lists.mindspring.com
+  (See http://www.asia-net.com/ or jobs@asia-net.com)
+
+o KanjiTalk Mailing List
+  kanjitalk-request@cs15.atr-sw.atr.co.jp (or kanjitalk-request@crl.go.jp)
+  kanjitalk@cs15.atr-sw.atr.co.jp (or kanjitalk@crl.go.jp)
+
+o Mac Mailing List (han.sys.mac newsgroup)
+  majordomo@krnic.net
+  mac@krnic.net
+
+o Mule Mailing List
+  mule-request@etl.go.jp
+  mule@etl.go.jp or mule-jp@etl.go.jp
+
+o NIHONGO Mailing List (sci.lang.japan newsgroup)
+  listserv@mitvma.mit.edu (or listserv@mitvma)
+  nihongo@mitvma.mit.edu
+
+o Nihongo-Hiroba Mailing List
+  listproc@mcfeeley.cc.utexas.edu
+  nihongo-hiroba@mcfeeley.cc.utexas.edu
+
+o Nisus Mailing List
+  listserv@dartmouth.edu
+  nisus@dartmouth.edu
+
+o TLUG (Tokyo Linux User's Group) Mailing List
+  majordomo@lists.twics.com
+  tlug@lists.twics.com
+
+o Unicode Mailing List
+  unicode-request@unicode.org
+  unicode@unicode.org
+
+o WNN User Mailing List
+  wnn-user-request@wnn.astem.or.jp
+  wnn-user-jp@wnn.astem.or.jp
+
+o WWW Multilingual Mailing List
+  www-mling-request@square.ntt.jp
+  www-mling@square.ntt.jp
+
+If the name of the mailing list is part of the subscription address
+(such as "easug-request"), the message body should look like this:
+
+  subscribe
+
+Including your name is optional. If username in the subscription
+address is "listserv" or "majordomo" (these are names of mailing list
+managing software), the mailing list name must appear after
+"subscribe" in the message body as follows:
+
+  subscribe ccnet-l
+
+Again, including your name is optional.
+	The following URL has information about Japanese-related
+mailing lists:
+
+  gopher://gan1.ncc.go.jp/11/INFO/mail-lists/
+
+
+A.2: INTERNET RESOURCES
+
+	The Internet provides what I would consider to be the greatest
+information resources of all. These can be subcategorized into FTP,
+Telnet, Gopher, WWW, and e-mail.
+
+
+A.2.1: USEFUL FTP SITES
+
+	Below are the URLs for useful FTP sites. The directory
+specified is the recommended place from which to start poking around
+for useful files.
+
+  ftp://cair-archive.kaist.ac.kr/pub/hangul/
+  ftp://etlport.etl.go.jp/pub/mule/
+  ftp://ftp.adobe.com/pub/adobe/
+  ftp://ftp.cc.monash.edu.au/pub/nihongo/
+  ftp://ftp.ifcss.org/pub/software/
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/
+  ftp://ftp.sra.co.jp/pub/
+  ftp://ftp.uwtc.washington.edu/pub/Japanese/
+  ftp://kuso.shef.ac.uk/pub/Japanese/
+  ftp://unicode.org/pub/
+
+This list is expected to grow.
+
+
+A.2.2: USEFUL TELNET SITES
+
+	For those who have a NIFTY-Serve account, there is now a very
+convenient way to access NIFTY-Serve using telnet. The URL is as
+follows:
+
+  telnet://r2.niftyserve.or.jp/
+
+Information about what NIFTY-Serve has to offer (and how to subscribe)
+can be found at the following URL:
+
+  http://www.nifty.co.jp/
+
+	Another information service with a similar access mechanism is
+CompuServe, whose URL is as follows:
+
+  telnet://compuserve.com/
+
+You will need to press the return key to get the "Host Name:" prompt,
+at which time you type "cis" (just follow the menus from this point
+on).
+	You can also do a search on fj.* newsgroup articles at the
+following URL:
+
+  telnet://asahi-net.or.jp/
+
+You login as "fj-db" once you are connected.
+
+
+A.2.3: USEFUL GOPHER SITES
+
+	I am not too much of a Gopher user. There, of course, is the
+following:
+
+  gopher://gopher.ora.com/
+
+Another Gopher site provides information on Japanese-related mailing
+lists:
+
+  gopher://gan1.ncc.go.jp/11/INFO/mail-lists/
+
+If you happen to know of others, please let me know.
+
+
+A.2.4: USEFUL WWW SITES
+
+	Because the World-Wide Web is a constantly changing place (and
+more importantly, because I don't want to re-issue a new version of
+this document every month!), I will maintain links to useful documents
+at my WWW Home Page. Its URL is as follows:
+
+  http://jasper.ora.com/lunde/
+
+If you cannot get to my WWW Home Page, you couldn't get to any that I
+would list here anyway.
+
+
+A.2.5: USEFUL MAIL SERVERS
+
+	In the past (that is, in JAPAN.INF) I included a full list of
+the domains in the "jp" hierarchy. That took up a lot of space, and
+changes very rapidly. You can now send a request to a mail server in
+order to return the most current listing. The mail server is:
+
+  mail-server@nic.ad.jp
+
+The most common command is "send," and the following arguments can be
+supplied to retrieve specific documents (and should be in the message
+body, not on the "Subject:" line):
+
+  send help
+  send index
+  send jpnic/domain-list.txt
+  send jpnic/domain-list-e.txt
+
+The first sends back a help file, the second sends back a complete
+index of files that can be retrieved (use this one to see what other
+useful stuff is available), and the last two send back a complete
+listing of domains in the "fj" hierarchy (the last one send it back in
+English/romanized).
+
+
+A.3: OTHER RESOURCES
+
+	This section provides pointers to specific documentation
+available electronically or in print.
+
+
+A.3.1: BOOKS
+
+	There are other useful reference materials available in print
+or online, in addition to the various national and international
+standards mentioned throughout this document. The following are books
+that I recommend for further reading or mental stimulus. (Sorry for
+plugging my own books in this list, but they are relevant.)
+
+o Clews, John. "Language Automation Worldwide: The Development of
+  Character Set Standards." SESAME Computer Projects. 1988. ISBN
+  1-870095-01-4.
+
+o Flanagan, David. "Java in a Nutshell." O'Reilly & Associates,
+  Inc. 1996. ISBN 1-56592-183-6.
+
+o Frisch, AEleen. "Essential System Administration." Second Edition.
+  O'Reilly & Associates, Inc. 1995. ISBN 1-56592-127-5.
+
+o Huang, Jack & Timothy Huang. "An Introduction to Chinese, Japanese
+  and Korean Computing." World Scientific Computing. 1989. ISBN
+  9971-50-664-5.
+
+o IBM Corporation. "Character Data Representation Architecture - Level
+  2, Registry." 1993. IBM order number SC09-1391-01.
+
+o Kano, Nadine. "Developing International Software for Windows 95 and
+  Windows NT." Microsoft Press. 1995. ISBN 1-55615-840-8.
+
+o Kirch, Olaf. "Linux Network Administrator's Guide." O'Reilly &
+  Associates, Inc. 1995. ISBN 1-56592-087-2.
+
+o Kissell, Joe. "The Nisus Way." MIS:Press. 1996. ISBN 1-55828-455-9.
+
+o Krol, Ed. "The Whole Internet User's Guide & Catalog." Second
+  Edition. O'Reilly & Associates, Inc. 1994. ISBN 1-56592-063-5.
+
+o Liu, Cricket et al. "Managing Internet Information Services."
+  O'Reilly & Associates, Inc. 1994. ISBN 1-56592-062-7.
+
+o Lunde, Ken. "Understanding Japanese Information Processing."
+  O'Reilly & Associates, Incorporated. 1993. ISBN 1-56592-043-0. LCCN
+  PL524.5.L86 1993.
+
+o Lunde, Ken. "Nihongo Joho Shori." SOFTBANK Corporation. 1995. ISBN
+  4-89052-708-7.
+
+o Luong, Tuoc V. et al. "Internationalization: Developing Software for
+  Global Markets." John Wiley & Sons, Incorporated. 1995. ISBN
+  0-471-07661-9.
+
+o Schwartz, Randal L. "Learning Perl." O'Reilly & Associates,
+  Incorporated. 1993. ISBN 1-56592-042-2.
+
+o Stallman, Richard M. "GNU Emacs Manual." Tenth edition. Free
+  Software Foundation. 1994. ISBN 1-882114-04-3.
+
+o Tuthill, Bill. "Solaris International Developer's Guide." SunSoft
+  Press and PTR Prentice Hall. 1993. ISBN 0-13-031063-8.
+
+o Unicode Consortium, The. "The Unicode Standard: Worldwide Character
+  Encoding." Version 1.0. Volume 2. Addison-Wesley. 1992. ISBN
+  0-201-60845-6.
+
+o Vromans, Johan. "Perl 5 Desktop Reference." O'Reilly & Associates,
+  Inc. 1996. ISBN 1-56592-187-9.
+
+o Wall, Larry & Randal L. Schwartz. "Programming Perl." O'Reilly &
+  Associates, Incorporated. 1991. ISBN 0-937175-64-1.
+
+o Welsh, Matt & Lar Kaufman. "Running Linux." O'Reilly & Associates,
+  Inc. 1995. ISBN 1-56592-100-3.
+
+	If you want to get your hands on any of the national or
+international standards mentioned in this document, I suggest the
+following:
+
+o The American National Standards Institute can provide ISO, KS, and
+  JIS standards. Bear in mind that ISO standards will most likely
+  arrive as a photocopy of the original.
+
+  ANSI
+  11 West 42nd Street
+  New York, NY 10036
+  USA
+  +1-212-642-4900 (phone)
+  +1-212-302-1286 (facsimile)
+
+o The International Organization for Standardization can provide
+  ISO standards.
+
+  ISO
+  1, rue de Varemb
+  Case postale 56
+  CH-1211, Geneva 20
+  SWITZERLAND
+  +41-22-749-01-11 (phone)
+  +41-22-733-34-30 (facsimile)
+  central@isocs.iso.ch (e-mail)
+  http://www.iso.ch/ (WWW)
+
+o Chinese (GB and CNS) standards are the hardest to obtain. It is
+  quite unfortunate.
+
+
+A.3.2: MAGAZINES
+
+o "Computing Japan," published monthly, ISSN 1340-7228,
+  editors@cj.gol.com.
+
+o "MANGAJIN," published 10 times per year, ISSN 1051-8177.
+
+o "Multilingual Communications & Computing," published bi-monthly,
+  ISSN 1065-7657, info@multilingual.com.
+
+o "The Perl Journal," published quarterly, ISSN 1087-903X,
+  perl-journal-subscriptions@perl.com.
+
+
+A.3.3: JOURNALS
+
+o "Chinese Information Processing" (CIP), published bi-monthly, ISSN
+  1003-9082. (In Chinese.)
+
+o "Computer Processing of Chinese & Oriental Languages" (CPCOL),
+  co-published twice a year by World Scientific Publishing and Chinese
+  Language Computer Society (CLCS), ISSN 0715-9048.
+
+o "The Electronic Bodhidharma," published by the International
+  Research Institute for Zen (IRIZ) Buddhism, Hanazono University,
+  Japan. More information on the organization that publishes this
+  journal is available at the following URL:
+
+  http://www.iijnet.or.jp/iriz/irizhtml/irizhome.htm
+
+
+A.3.4: RFCs
+
+	Many RFCs (Request For Comments) are relevant to this
+document. They are:
+
+o RFC 1341: "MIME (Multipurpose Internet Mail Extensions): Mechanisms
+  for Specifying and Describing the Format of Internet Message
+  Bodies," by Nathaniel Borenstein and Ned Freed, June 1992.
+
+o RFC 1342: "Representation of Non-ASCII Text in Internet Message
+  Headers," by Keith Moore, June 1992.
+
+o RFC 1468: "Japanese Character Encoding for Internet Messages," by
+  Jun Murai et al., June 1993.
+
+o RFC 1521: "MIME (Multipurpose Internet Mail Extensions) Part One:
+  Mechanisms for Specifying and Describing the Format of Internet
+  Message Bodies," by Nathaniel Borenstein and Ned Freed, September
+  1993. Obsoletes RFC 1341.
+
+o RFC 1522: "MIME (Multipurpose Internet Mail Extensions) Part Two:
+  Message Header Extensions for Non-ASCII Text," by Keith Moore,
+  September 1993. Obsoletes RFC 1342.
+
+o RFC 1554: "ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP," by
+  Masataka Ohta and Kenichi Handa, December 1993.
+
+o RFC 1557: "Korean Character Encoding for Internet Messages," by
+  Uhhyung Choi et al., December 1993.
+
+o RFC 1642: "UTF-7: A Mail-Safe Transformation Format of Unicode," by
+  David Goldsmith and Mark Davis, July 1994.
+
+o RFC 1815: "Character Sets ISO-10646 and ISO-10646-J-1," by Masataka
+  Ohta, July 1995.
+
+o RFC 1842: "ASCII Printable Characters-Based Chinese Character
+  Encoding for Internet Messages," by Ya-Gui Wei et al., August 1995.
+
+o RFC 1843: "HZ - A Data Format for Exchanging Files of Arbitrarily
+  Mixed Chinese and ASCII Characters," by Fung Fung Lee, August 1995.
+
+o RFC 1922: "Chinese Character Encoding for Internet Messages," by
+  Haifeng Zhu et al., March 1996.
+
+These RFCs can be obtained from FTP archives that contain all RFC
+documents, such as at the following URLs
+
+  ftp://nic.ddn.mil/rfc/
+  ftp://ftp.uu.net/inet/rfc/
+
+But these specific ones are mirrored at the following URL for
+convenience:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/Ch9/
+
+
+A.3.5: FAQs
+
+	There are several FAQ (Frequently Asked Questions) files that
+provide useful information. The following is a listing of some along
+with their URLs:
+
+o "Japanese Language Information" FAQ (formerly the "sci.lang.japan"
+  FAQ) by Rafael Santos (santos@mickey.ai.kyutech.ac.jp) at:
+
+  http://www.mickey.ai.kyutech.ac.jp/cgi-bin/japanese/
+
+  Update announcements are usually posted to the sci.lang.japan
+  newsgroup.
+
+o "Programming for Internationalization" FAQ by Michael Gschwind
+  (mike@vlsivie.tuwien.ac.at) at:
+
+  ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
+
+  Also posted to the comp.software.international newsgroup. This and
+  other internationalization documents are also accessible through the
+  following URL:
+
+  http://www.vlsivie.tuwien.ac.at/mike/i18n.html
+
+o Three FAQs about Internet Service Providers in Japan by Taki Naruto
+  (tn@panix.com), Jesse Casman (jcasman@unm.edu), and Kenji Yoshida
+  (kenny@mb.tokyo.infoweb.or.jp), respectively, at:
+
+  http://www.panix.com/~tn/ispj.html
+  http://nobunaga.unm.edu/internet.html
+  http://cswww2.essex.ac.uk/users/whean/japan/net.html
+
+o "Internationalization Reference List" by Eugene Dorr
+  (gdorr@pgh.legent.com) at:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
+
+  Note really a FAQ, but quite useful because it is a very complete
+  listing of I18N-related books.
+
+o "INSOFT-L Service" by Brian Tatro (btatro@tatro.com) at:
+
+  http://iquest.com/~btatro/in2.html
+
+  This includes a link to the FAQ for the INSOFT-L Mailing List (see
+  Section A.1.2).
+
+o "How to Use Japanese on the Internet with a PC: From Login to WWW"
+  by Hideki Hirayama (sgw01623@niftyserve.or.jp) at:
+
+  ftp://ftp.ora.com/pub/examples/nutshell/ujip/faq/jpn-inet.FAQ
+
+o "Hangul and Internet in Korea" FAQ by Jungshik Shin
+  (jshin@minerva.cis.yale.edu) at:
+
+  http://pantheon.cis.yale.edu/~jshin/faq/
+---  END (CJK.INF VERSION 2.1 07/12/96) 185553 BYTES  ---
author	stanton <stanton>	1999-04-16 00:46:29 (GMT)
committer	stanton <stanton>	1999-04-16 00:46:29 (GMT)
commit	97464e6cba8eb0008cf2727c15718671992b913f (patch)
tree	ce9959f2747257d98d52ec8d18bf3b0de99b9535 /tools/encoding/cjk.inf
parent	a8c96ddb94d1483a9de5e340b740cb74ef6cafa7 (diff)
download	tcl-97464e6cba8eb0008cf2727c15718671992b913f.zip tcl-97464e6cba8eb0008cf2727c15718671992b913f.tar.gz tcl-97464e6cba8eb0008cf2727c15718671992b913f.tar.bz2