summaryrefslogtreecommitdiffstats
path: root/doc/encoding.n
diff options
context:
space:
mode:
Diffstat (limited to 'doc/encoding.n')
-rw-r--r--doc/encoding.n108
1 files changed, 72 insertions, 36 deletions
diff --git a/doc/encoding.n b/doc/encoding.n
index c1dbf27..bbe197d 100644
--- a/doc/encoding.n
+++ b/doc/encoding.n
@@ -28,30 +28,39 @@ formats.
Performs one of several encoding related operations, depending on
\fIoption\fR. The legal \fIoption\fRs are:
.TP
-\fBencoding convertfrom\fR ?\fB-nocomplain\fR? ?\fB-failindex var\fR?
-?\fIencoding\fR? \fIdata\fR
+\fBencoding convertfrom\fR ?\fB-nocomplain\fR? ?\fB-failindex var\fR? ?\fB-strict\fR? ?\fIencoding\fR? \fIdata\fR
.
Convert \fIdata\fR to a Unicode string from the specified \fIencoding\fR. The
characters in \fIdata\fR are 8 bit binary data. The resulting
sequence of bytes is a string created by applying the given \fIencoding\fR
to the data. If \fIencoding\fR is not specified, the current
system encoding is used.
-.
-The call fails on convertion errors, like an incomplete utf-8 sequence.
-The option \fB-failindex\fR is followed by a variable name. The variable
-is set to \fI-1\fR if no conversion error occured. It is set to the
-first error location in \fIdata\fR in case of a conversion error. All data
-until this error location is transformed and retured. This option may not
-be used together with \fB-nocomplain\fR.
-.
-The call does not fail on conversion errors, if the option
-\fB-nocomplain\fR is given. In this case, any error locations are replaced
-by \fB?\fR. Incomplete sequences are written verbatim to the output string.
-The purpose of this switch is to gain compatibility to prior versions of TCL.
-It is not recommended for any other usage.
+.VS "TCL8.7 TIP346, TIP607, TIP601"
+.PP
+.RS
+The command does not fail on encoding errors. Instead, any not convertable bytes
+(like incomplete UTF-8 sequences, see example below) are put as byte values into
+the output stream.
+.PP
+If the option \fB-failindex\fR with a variable name is given, the error reporting
+is changed in the following manner:
+in case of a conversion error, the position of the input byte causing the error
+is returned in the given variable. The return value of the command are the
+converted characters until the first error position.
+In case of no error, the value \fI-1\fR is written to the variable. This option
+may not be used together with \fB-nocomplain\fR.
+.PP
+The option \fB-nocomplain\fR has no effect and is available for compatibility with
+TCL 9. In TCL 9, the encoding command fails with an error on any encoding issue.
+This switch restores the TCL8.7 behaviour.
+.PP
+The \fB-strict\fR option followes more strict rules in conversion. Currently, only
+the sequence \fB\\xC0\\x80\fR in \fButf-8\fR encoding is disallowed. Additional rules
+may follow.
+.VE "TCL8.7 TIP346, TIP607, TIP601"
+.RE
.TP
-\fBencoding convertto\fR ?\fB-nocomplain\fR? ?\fB-failindex var\fR?
-?\fIencoding\fR? \fIstring\fR
+\fBencoding convertto\fR ?\fB-nocomplain\fR? ?\fB-failindex var\fR? ?\fB-strict\fR? ?\fIencoding\fR? \fIstring\fR
.
Convert \fIstring\fR from Unicode to the specified \fIencoding\fR.
The result is a sequence of bytes that represents the converted
@@ -59,21 +68,29 @@ string. Each byte is stored in the lower 8-bits of a Unicode
character (indeed, the resulting string is a binary string as far as
Tcl is concerned, at least initially). If \fIencoding\fR is not
specified, the current system encoding is used.
-.
-The call fails on convertion errors, like a Unicode character not representable
-in the given \fIencoding\fR.
-.
-The option \fB-failindex\fR is followed by a variable name. The variable
-is set to \fI-1\fR if no conversion error occured. It is set to the
-first error location in \fIdata\fR in case of a conversion error. All data
-until this error location is transformed and retured. This option may not
-be used together with \fB-nocomplain\fR.
-.
-The call does not fail on conversion errors, if the option
-\fB-nocomplain\fR is given. In this case, any error locations are replaced
-by \fB?\fR. Incomplete sequences are written verbatim to the output string.
-The purpose of this switch is to gain compatibility to prior versions of TCL.
-It is not recommended for any other usage.
+.VS "TCL8.7 TIP346, TIP607, TIP601"
+.PP
+.RS
+The command does not fail on encoding errors. Instead, the replacement character
+\fB?\fR is output for any not representable character (like the dot \fB\\U2022\fR
+in \fBiso-8859-1\fR encoding, see example below).
+.PP
+If the option \fB-failindex\fR with a variable name is given, the error reporting
+is changed in the following manner:
+in case of a conversion error, the position of the input character causing the error
+is returned in the given variable. The return value of the command are the
+converted bytes until the first error position. No error condition is raised.
+In case of no error, the value \fI-1\fR is written to the variable. This option
+may not be used together with \fB-nocomplain\fR.
+.PP
+The option \fB-nocomplain\fR has no effect and is available for compatibility with
+TCL 9. In TCL 9, the encoding command fails with an error on any encoding issue.
+This switch restores the TCL8.7 behaviour.
+.PP
+The \fB-strict\fR option followes more strict rules in conversion. Currently, it has
+no effect but may be used in future to add additional encoding checks.
+.VE "TCL8.7 TIP346, TIP607, TIP601"
+.RE
.TP
\fBencoding dirs\fR ?\fIdirectoryList\fR?
.
@@ -104,7 +121,7 @@ omitted then the command returns the current system encoding. The
system encoding is used whenever Tcl passes strings to system calls.
.SH EXAMPLE
.PP
-The following example converts a byte sequence in Japanese euc-jp encoding to a TCL string:
+Example 1: convert a byte sequence in Japanese euc-jp encoding to a TCL string:
.PP
.CS
set s [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"]
@@ -113,8 +130,9 @@ set s [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"]
The result is the unicode codepoint:
.QW "\eu306F" ,
which is the Hiragana letter HA.
+.VS "TCL8.7 TIP346, TIP607, TIP601"
.PP
-The following example detects the error location in an incomplete UTF-8 sequence:
+Example 2: detect the error location in an incomplete UTF-8 sequence:
.PP
.CS
% set s [\fBencoding convertfrom\fR -failindex i utf-8 "A\exC3"]
@@ -123,7 +141,15 @@ A
1
.CE
.PP
-The following example detects the error location while transforming to ISO8859-1
+Example 3: return the incomplete UTF-8 sequence by raw bytes:
+.PP
+.CS
+% set s [\fBencoding convertfrom\fR -nocomplain utf-8 "A\exC3"]
+.CE
+The result is "A" followed by the byte \exC3. The option \fB-nocomplain\fR
+has no effect, but assures to get the same result with TCL9.
+.PP
+Example 4: detect the error location while transforming to ISO8859-1
(ISO-Latin 1):
.PP
.CS
@@ -133,8 +159,18 @@ A
1
.CE
.PP
+Example 5: replace a not representable character by the replacement character:
+.PP
+.CS
+% set s [\fBencoding convertto\fR -nocomplain utf-8 "A\eu0141"]
+A?
+.CE
+The option \fB-nocomplain\fR has no effect, but assures to get the same result
+with TCL9.
+.VE "TCL8.7 TIP346, TIP607, TIP601"
+.PP
.SH "SEE ALSO"
-Tcl_GetEncoding(3)
+Tcl_GetEncoding(3), fconfigure(n)
.SH KEYWORDS
encoding, unicode
.\" Local Variables: