diff options
Diffstat (limited to 'doc/encoding.n')
-rw-r--r-- | doc/encoding.n | 79 |
1 files changed, 79 insertions, 0 deletions
diff --git a/doc/encoding.n b/doc/encoding.n new file mode 100644 index 0000000..fc6d4f7 --- /dev/null +++ b/doc/encoding.n @@ -0,0 +1,79 @@ +'\" +'\" Copyright (c) 1998 by Scriptics Corporation. +'\" +'\" See the file "license.terms" for information on usage and redistribution +'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. +'\" +'\" RCS: @(#) $Id: encoding.n,v 1.2 1999/04/16 00:46:34 stanton Exp $ +'\" +.so man.macros +.TH encoding n "8.1" Tcl "Tcl Built-In Commands" +.BS +.SH NAME +encoding \- Manipulate encodings +.SH SYNOPSIS +\fBencoding \fIoption\fR ?\fIarg arg ...\fR? +.BE + +.SH INTRODUCTION +.PP +Strings in Tcl are encoded using 16-bit Unicode characters. Different +operating system interfaces or applications may generate strings in +other encodings such as Shift-JIS. The \fBencoding\fR command helps +to bridge the gap between Unicode and these other formats. + +.SH DESCRIPTION +.PP +Performs one of several encoding related operations, depending on +\fIoption\fR. The legal \fIoption\fRs are: +.TP +\fBencoding convertfrom ?\fIencoding\fR? \fIdata\fR +Convert \fIdata\fR to Unicode from the specified \fIencoding\fR. The +characters in \fIdata\fR are treated as binary data where the lower +8-bits of each character is taken as a single byte. The resulting +sequence of bytes is treated as a string in the specified +\fIencoding\fR. If \fIencoding\fR is not specified, the current +system encoding is used. +.TP +\fBencoding convertto ?\fIencoding\fR? \fIstring\fR +Convert \fIstring\fR from Unicode to the specified \fIencoding\fR. +The result is a sequence of bytes that represents the converted +string. Each byte is stored in the lower 8-bits of a Unicode +character. If \fIencoding\fR is not specified, the current +system encoding is used. +.TP +\fBencoding names\fR +Returns a list containing the names of all of the encodings that are +currently available. +.TP +\fBencoding system\fR ?\fIencoding\fR? +Set the system encoding to \fIencoding\fR. If \fIencoding\fR is +omitted then the command returns the current system encoding. The +system encoding is used whenever Tcl passes strings to system calls. + +.SH EXAMPLE +.PP +It is common practice to write script files using a text editor that +produces output in the euc-jp encoding, which represents the ASCII +characters as singe bytes and Japanese characters as two bytes. This +makes it easy to embed literal strings that correspond to non-ASCII +characters by simply typing the strings in place in the script. +However, because the \fBsource\fR command always reads files using the +ISO8859-1 encoding, Tcl will treat each byte in the file as a separate +character that maps to the 00 page in Unicode. The +resulting Tcl strings will not contain the expected Japanese +characters. Instead, they will contain a sequence of Latin-1 +characters that correspond to the bytes of the original string. The +\fBencoding\fR command can be used to convert this string to the +expected Japanese Unicode characters. For example, +.CS + set s [encoding convertfrom euc-jp "\\xA4\\xCF"] +.CE +would return the Unicode string "\\u306F", which is the Hiragana +letter HA. + +.SH "SEE ALSO" +Tcl_GetEncoding + +.SH KEYWORDS +encoding |