diff options
author | oehhar <harald.oehlmann@elmicron.de> | 2022-10-28 16:00:48 (GMT) |
---|---|---|
committer | oehhar <harald.oehlmann@elmicron.de> | 2022-10-28 16:00:48 (GMT) |
commit | ebaccb9022ecea9225c81b6272dd134bd731a021 (patch) | |
tree | 3ce52a6790defacf560a458913f7e62e2d53ef60 | |
parent | 98005a91dcde7aec64750bcaa58c2ecd8634256d (diff) | |
parent | b0b713c9dc32fd464832adf0ca14b08c55d4a0c4 (diff) | |
download | tcl-ebaccb9022ecea9225c81b6272dd134bd731a021.zip tcl-ebaccb9022ecea9225c81b6272dd134bd731a021.tar.gz tcl-ebaccb9022ecea9225c81b6272dd134bd731a021.tar.bz2 |
Merge core-8-branch
-rw-r--r-- | doc/encoding.n | 102 | ||||
-rw-r--r-- | unix/Makefile.in | 2 | ||||
-rw-r--r-- | win/Makefile.in | 2 |
3 files changed, 68 insertions, 38 deletions
diff --git a/doc/encoding.n b/doc/encoding.n index c1dbf27..4e18798 100644 --- a/doc/encoding.n +++ b/doc/encoding.n @@ -28,30 +28,37 @@ formats. Performs one of several encoding related operations, depending on \fIoption\fR. The legal \fIoption\fRs are: .TP -\fBencoding convertfrom\fR ?\fB-nocomplain\fR? ?\fB-failindex var\fR? -?\fIencoding\fR? \fIdata\fR +\fBencoding convertfrom\fR ?\fB-nocomplain\fR? ?\fB-failindex var\fR? ?\fB-strict\fR? ?\fIencoding\fR? \fIdata\fR . Convert \fIdata\fR to a Unicode string from the specified \fIencoding\fR. The characters in \fIdata\fR are 8 bit binary data. The resulting sequence of bytes is a string created by applying the given \fIencoding\fR to the data. If \fIencoding\fR is not specified, the current system encoding is used. -. -The call fails on convertion errors, like an incomplete utf-8 sequence. -The option \fB-failindex\fR is followed by a variable name. The variable -is set to \fI-1\fR if no conversion error occured. It is set to the -first error location in \fIdata\fR in case of a conversion error. All data -until this error location is transformed and retured. This option may not -be used together with \fB-nocomplain\fR. -. -The call does not fail on conversion errors, if the option -\fB-nocomplain\fR is given. In this case, any error locations are replaced -by \fB?\fR. Incomplete sequences are written verbatim to the output string. -The purpose of this switch is to gain compatibility to prior versions of TCL. -It is not recommended for any other usage. +.VS "TCL8.7 TIP346, TIP607, TIP601" +.PP +.RS +If the option \fB-nocomplain\fR is given, the command does not fail on +encoding errors. Instead, any not convertable bytes (like incomplete UTF-8 + sequences, see example below) are put as byte values into the output stream. +If the option \fB-nocomplain\fR is not given, the command will fail with an +appropriate error message. +.PP +If the option \fB-failindex\fR with a variable name is given, the error reporting +is changed in the following manner: +in case of a conversion error, the position of the input byte causing the error +is returned in the given variable. The return value of the command are the +converted characters until the first error position. No error condition is raised. +In case of no error, the value \fI-1\fR is written to the variable. This option +may not be used together with \fB-nocomplain\fR. +.PP +The \fB-strict\fR option followes more strict rules in conversion. Currently, only +the sequence \fB\\xC0\\x80\fR in \fButf-8\fR encoding is disallowed. Additional rules +may follow. +.VE "TCL8.7 TIP346, TIP607, TIP601" +.RE .TP -\fBencoding convertto\fR ?\fB-nocomplain\fR? ?\fB-failindex var\fR? -?\fIencoding\fR? \fIstring\fR +\fBencoding convertto\fR ?\fB-nocomplain\fR? ?\fB-failindex var\fR? ?\fB-strict\fR? ?\fIencoding\fR? \fIstring\fR . Convert \fIstring\fR from Unicode to the specified \fIencoding\fR. The result is a sequence of bytes that represents the converted @@ -59,21 +66,28 @@ string. Each byte is stored in the lower 8-bits of a Unicode character (indeed, the resulting string is a binary string as far as Tcl is concerned, at least initially). If \fIencoding\fR is not specified, the current system encoding is used. -. -The call fails on convertion errors, like a Unicode character not representable -in the given \fIencoding\fR. -. -The option \fB-failindex\fR is followed by a variable name. The variable -is set to \fI-1\fR if no conversion error occured. It is set to the -first error location in \fIdata\fR in case of a conversion error. All data -until this error location is transformed and retured. This option may not -be used together with \fB-nocomplain\fR. -. -The call does not fail on conversion errors, if the option -\fB-nocomplain\fR is given. In this case, any error locations are replaced -by \fB?\fR. Incomplete sequences are written verbatim to the output string. -The purpose of this switch is to gain compatibility to prior versions of TCL. -It is not recommended for any other usage. +.VS "TCL8.7 TIP346, TIP607, TIP601" +.PP +.RS +If the option \fB-nocomplain\fR is given, the command does not fail on +encoding errors. Instead, the replacement character \fB?\fR is output +for any not representable character (like the dot \fB\\U2022\fR +in \fBiso-8859-1\fR encoding, see example below). +If the option \fB-nocomplain\fR is not given, the command will fail with an +appropriate error message. +.PP +If the option \fB-failindex\fR with a variable name is given, the error reporting +is changed in the following manner: +in case of a conversion error, the position of the input character causing the error +is returned in the given variable. The return value of the command are the +converted bytes until the first error position. No error condition is raised. +In case of no error, the value \fI-1\fR is written to the variable. This option +may not be used together with \fB-nocomplain\fR. +.PP +The \fB-strict\fR option followes more strict rules in conversion. Currently, it has +no effect but may be used in future to add additional encoding checks. +.VE "TCL8.7 TIP346, TIP607, TIP601" +.RE .TP \fBencoding dirs\fR ?\fIdirectoryList\fR? . @@ -104,7 +118,7 @@ omitted then the command returns the current system encoding. The system encoding is used whenever Tcl passes strings to system calls. .SH EXAMPLE .PP -The following example converts a byte sequence in Japanese euc-jp encoding to a TCL string: +Example 1: convert a byte sequence in Japanese euc-jp encoding to a TCL string: .PP .CS set s [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"] @@ -113,8 +127,9 @@ set s [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"] The result is the unicode codepoint: .QW "\eu306F" , which is the Hiragana letter HA. +.VS "TCL8.7 TIP346, TIP607, TIP601" .PP -The following example detects the error location in an incomplete UTF-8 sequence: +Example 2: detect the error location in an incomplete UTF-8 sequence: .PP .CS % set s [\fBencoding convertfrom\fR -failindex i utf-8 "A\exC3"] @@ -123,7 +138,14 @@ A 1 .CE .PP -The following example detects the error location while transforming to ISO8859-1 +Example 3: return the incomplete UTF-8 sequence by raw bytes: +.PP +.CS +% set s [\fBencoding convertfrom\fR -nocomplain utf-8 "A\exC3"] +.CE +The result is "A" followed by the byte \exC3. +.PP +Example 4: detect the error location while transforming to ISO8859-1 (ISO-Latin 1): .PP .CS @@ -133,8 +155,16 @@ A 1 .CE .PP +Example 5: replace a not representable character by the replacement character: +.PP +.CS +% set s [\fBencoding convertto\fR -nocomplain utf-8 "A\eu0141"] +A? +.CE +.VE "TCL8.7 TIP346, TIP607, TIP601" +.PP .SH "SEE ALSO" -Tcl_GetEncoding(3) +Tcl_GetEncoding(3), fconfigure(n) .SH KEYWORDS encoding, unicode .\" Local Variables: diff --git a/unix/Makefile.in b/unix/Makefile.in index d9bd68c..a5f8b23 100644 --- a/unix/Makefile.in +++ b/unix/Makefile.in @@ -857,7 +857,7 @@ clean: clean-packages distclean: distclean-packages clean rm -rf Makefile config.status config.cache config.log tclConfig.sh \ - tclConfig.h *.plist Tcl.framework tcl.pc + tclConfig.h *.plist Tcl.framework tcl.pc tclUuid.h (cd dltest ; $(MAKE) distclean) depend: diff --git a/win/Makefile.in b/win/Makefile.in index 2dc59d9..a74808b 100644 --- a/win/Makefile.in +++ b/win/Makefile.in @@ -1024,7 +1024,7 @@ clean: cleanhelp clean-packages distclean: distclean-packages clean $(RM) Makefile config.status config.cache config.log tclConfig.sh \ - config.status.lineno tclsh.exe.manifest + config.status.lineno tclsh.exe.manifest tclUuid.h # # Bundled package targets |