diff options
Diffstat (limited to 'doc/Encoding.3')
-rw-r--r-- | doc/Encoding.3 | 51 |
1 files changed, 32 insertions, 19 deletions
diff --git a/doc/Encoding.3 b/doc/Encoding.3 index 40fe886..890ae54 100644 --- a/doc/Encoding.3 +++ b/doc/Encoding.3 @@ -4,7 +4,7 @@ '\" See the file "license.terms" for information on usage and redistribution '\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. '\" -'\" RCS: @(#) $Id: Encoding.3,v 1.27 2007/10/26 20:11:51 dgp Exp $ +'\" RCS: @(#) $Id: Encoding.3,v 1.28 2007/10/28 14:17:38 dkf Exp $ '\" .so man.macros .TH Tcl_GetEncoding 3 "8.1" Tcl "Tcl Library Procedures" @@ -118,7 +118,7 @@ block in a (potentially multi-block) input stream, telling the conversion routine to perform any finalization that needs to occur after the last byte is converted and then to reset to an initial state. \fBTCL_ENCODING_STOPONERROR\fR signifies that the conversion routine should -return immediately upon reading a source character that doesn't exist in +return immediately upon reading a source character that does not exist in the target encoding; otherwise a default fallback character will automatically be substituted. .AP Tcl_EncodingState *statePtr in/out @@ -277,18 +277,25 @@ is filled with the corresponding number of bytes that were stored in Windows-only convenience functions for converting between UTF-8 and Windows strings. On Windows 95 (as with the Unix operating system), -all strings exchanged between Tcl and the operating system are "char" +all strings exchanged between Tcl and the operating system are +.QW "char" based. On Windows NT, some strings exchanged between Tcl and the -operating system are "char" oriented while others are in Unicode. By +operating system are +.QW "char" +oriented while others are in Unicode. By convention, in Windows a TCHAR is a character in the ANSI code page on Windows 95 and a Unicode character on Windows NT. .PP -If you planned to use the same "char" based interfaces on both Windows +If you planned to use the same +.QW "char" +based interfaces on both Windows 95 and Windows NT, you could use \fBTcl_UtfToExternal\fR and \fBTcl_ExternalToUtf\fR (or their \fBTcl_DString\fR equivalents) with an encoding of NULL (the current system encoding). On the other hand, if you planned to use the Unicode interface when running on Windows NT -and the "char" interfaces when running on Windows 95, you would have +and the +.QW "char" +interfaces when running on Windows 95, you would have to perform the following type of test over and over in your program (as represented in pseudo-code): .CS @@ -457,8 +464,9 @@ are obsolete interfaces best replaced with calls to \fBTcl_GetEncodingSearchPath\fR and \fBTcl_SetEncodingSearchPath\fR. They are called to access and set the first element of the \fIsearchPath\fR list. Since Tcl searches \fIsearchPath\fR for encoding data files in -list order, these routines establish the ``default'' directory in which -to find encoding data files. +list order, these routines establish the +.QW default +directory in which to find encoding data files. .VE 8.5 .SH "ENCODING FILES" Space would prohibit precompiling into Tcl every possible encoding @@ -471,7 +479,9 @@ external encoding may consist of single-byte, multi-byte, or double-byte characters. .PP Each dynamically-loadable encoding is represented as a text file. The -initial line of the file, beginning with a ``#'' symbol, is a comment +initial line of the file, beginning with a +.QW # +symbol, is a comment that provides a human-readable description of the file. The next line identifies the type of encoding file. It can be one of the following letters: @@ -557,7 +567,7 @@ and 0x8163 in \fBshiftjis\fR map to 203E and 2026 in Unicode, respectively. Following the first page will be all the other pages, each in the same format as the first: one number identifying the page followed by 256 double-byte Unicode characters. If a character in the encoding maps to the -Unicode character 0000, it means that the character doesn't actually exist. +Unicode character 0000, it means that the character does not actually exist. If all characters on a page would map to 0000, that page can be omitted. .PP Case [4] is the escape-sequence encoding file. The lines in an this type of @@ -569,13 +579,13 @@ encoding: E init {} final {} -iso8859-1 \\x1b(B -jis0201 \\x1b(J -jis0208 \\x1b$@ -jis0208 \\x1b$B -jis0212 \\x1b$(D -gb2312 \\x1b$A -ksc5601 \\x1b$(C +iso8859-1 \ex1b(B +jis0201 \ex1b(J +jis0208 \ex1b$@ +jis0208 \ex1b$B +jis0212 \ex1b$(D +gb2312 \ex1b$A +ksc5601 \ex1b$(C .CE .PP In the file, the first column represents an option and the second column @@ -584,8 +594,11 @@ the first character is converted, while \fBfinal\fR is a string to emit or expect after the last character. All other options are names of table-based encodings; the associated value is the escape-sequence that marks that encoding. Tcl syntax is used for the values; in the above -example, for instance, ``\fB{}\fR'' represents the empty string and -``\fB\\x1b\fR'' represents character 27. +example, for instance, +.QW \fB{}\fR +represents the empty string and +.QW \fB\ex1b\fR +represents character 27. .PP When \fBTcl_GetEncoding\fR encounters an encoding \fIname\fR that has not been loaded, it attempts to load an encoding file called \fIname\fB.enc\fR |