summaryrefslogtreecommitdiffstats
path: root/doc/encoding.n
diff options
context:
space:
mode:
Diffstat (limited to 'doc/encoding.n')
-rw-r--r--doc/encoding.n79
1 files changed, 79 insertions, 0 deletions
diff --git a/doc/encoding.n b/doc/encoding.n
new file mode 100644
index 0000000..fc6d4f7
--- /dev/null
+++ b/doc/encoding.n
@@ -0,0 +1,79 @@
+'\"
+'\" Copyright (c) 1998 by Scriptics Corporation.
+'\"
+'\" See the file "license.terms" for information on usage and redistribution
+'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
+'\"
+'\" RCS: @(#) $Id: encoding.n,v 1.2 1999/04/16 00:46:34 stanton Exp $
+'\"
+.so man.macros
+.TH encoding n "8.1" Tcl "Tcl Built-In Commands"
+.BS
+.SH NAME
+encoding \- Manipulate encodings
+.SH SYNOPSIS
+\fBencoding \fIoption\fR ?\fIarg arg ...\fR?
+.BE
+
+.SH INTRODUCTION
+.PP
+Strings in Tcl are encoded using 16-bit Unicode characters. Different
+operating system interfaces or applications may generate strings in
+other encodings such as Shift-JIS. The \fBencoding\fR command helps
+to bridge the gap between Unicode and these other formats.
+
+.SH DESCRIPTION
+.PP
+Performs one of several encoding related operations, depending on
+\fIoption\fR. The legal \fIoption\fRs are:
+.TP
+\fBencoding convertfrom ?\fIencoding\fR? \fIdata\fR
+Convert \fIdata\fR to Unicode from the specified \fIencoding\fR. The
+characters in \fIdata\fR are treated as binary data where the lower
+8-bits of each character is taken as a single byte. The resulting
+sequence of bytes is treated as a string in the specified
+\fIencoding\fR. If \fIencoding\fR is not specified, the current
+system encoding is used.
+.TP
+\fBencoding convertto ?\fIencoding\fR? \fIstring\fR
+Convert \fIstring\fR from Unicode to the specified \fIencoding\fR.
+The result is a sequence of bytes that represents the converted
+string. Each byte is stored in the lower 8-bits of a Unicode
+character. If \fIencoding\fR is not specified, the current
+system encoding is used.
+.TP
+\fBencoding names\fR
+Returns a list containing the names of all of the encodings that are
+currently available.
+.TP
+\fBencoding system\fR ?\fIencoding\fR?
+Set the system encoding to \fIencoding\fR. If \fIencoding\fR is
+omitted then the command returns the current system encoding. The
+system encoding is used whenever Tcl passes strings to system calls.
+
+.SH EXAMPLE
+.PP
+It is common practice to write script files using a text editor that
+produces output in the euc-jp encoding, which represents the ASCII
+characters as singe bytes and Japanese characters as two bytes. This
+makes it easy to embed literal strings that correspond to non-ASCII
+characters by simply typing the strings in place in the script.
+However, because the \fBsource\fR command always reads files using the
+ISO8859-1 encoding, Tcl will treat each byte in the file as a separate
+character that maps to the 00 page in Unicode. The
+resulting Tcl strings will not contain the expected Japanese
+characters. Instead, they will contain a sequence of Latin-1
+characters that correspond to the bytes of the original string. The
+\fBencoding\fR command can be used to convert this string to the
+expected Japanese Unicode characters. For example,
+.CS
+ set s [encoding convertfrom euc-jp "\\xA4\\xCF"]
+.CE
+would return the Unicode string "\\u306F", which is the Hiragana
+letter HA.
+
+.SH "SEE ALSO"
+Tcl_GetEncoding
+
+.SH KEYWORDS
+encoding