merge trunkdgp_pkg_migration

author: dgp <dgp@users.sourceforge.net> 2014-07-15 13:18:33 (GMT)
committer: dgp <dgp@users.sourceforge.net> 2014-07-15 13:18:33 (GMT)
commit: 64c066f9bb2daa64cc0cf366d0c0812a828d2d61 (patch)
tree: 934228bab58b4be058c64ed75ac9363292cc7474 /doc/string.n
parent: 773c4c75ee2ec9d60766f09ced30a5fef6d812c6 (diff)
parent: 74d71bafde63ca49cecadc990df7b3a2d7797849 (diff)
download: tcl-dgp_pkg_migration.zip
tcl-dgp_pkg_migration.tar.gz
tcl-dgp_pkg_migration.tar.bz2
1 files changed, 30 insertions, 9 deletions
diff --git a/doc/string.n b/doc/string.n
index f5eae39..72a69ff 100644
--- a/doc/string.n
+++ b/doc/string.n
@@ -5,8 +5,8 @@
 .\" See the file "license.terms" for information on usage and redistribution
 .\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
 .\" 
-.so man.macros
 .TH string n 8.1 Tcl "Tcl Built-In Commands"
+.so man.macros
 .BS
 .\" Note:  do not modify the .SH NAME line immediately below!
 .SH NAME
@@ -19,7 +19,7 @@ string \- Manipulate strings
 Performs one of several string operations, depending on \fIoption\fR.
 The legal \fIoption\fRs (which may be abbreviated) are:
 .TP
-\fBstring compare\fR ?\fB\-nocase\fR? ?\fB\-length int\fR? \fIstring1 string2\fR
+\fBstring compare\fR ?\fB\-nocase\fR? ?\fB\-length\fI length\fR? \fIstring1 string2\fR
 .
 Perform a character-by-character comparison of strings \fIstring1\fR
 and \fIstring2\fR.  Returns \-1, 0, or 1, depending on whether
@@ -29,7 +29,7 @@ first \fIlength\fR characters are used in the comparison.  If
 \fB\-length\fR is negative, it is ignored.  If \fB\-nocase\fR is
 specified, then the strings are compared in a case-insensitive manner.
 .TP
-\fBstring equal\fR ?\fB\-nocase\fR? ?\fB\-length int\fR? \fIstring1 string2\fR
+\fBstring equal\fR ?\fB\-nocase\fR? ?\fB\-length\fI length\fR? \fIstring1 string2\fR
 .
 Perform a character-by-character comparison of strings \fIstring1\fR
 and \fIstring2\fR.  Returns 1 if \fIstring1\fR and \fIstring2\fR are
@@ -131,8 +131,9 @@ Any Unicode printing character, including space.
 .IP \fBpunct\fR 12
 Any Unicode punctuation character.
 .IP \fBspace\fR 12
-Any Unicode whitespace character, zero width space (U+200b),
-word joiner (U+2060) and zero width no-break space (U+feff) (=BOM).
+Any Unicode whitespace character, mongolian vowel separator
+(U+180e), zero width space (U+200b), word joiner (U+2060) or
+zero width no-break space (U+feff) (=BOM).
 .IP \fBtrue\fR 12
 Any of the forms allowed to \fBTcl_GetBoolean\fR where the value is
 true.
@@ -343,10 +344,13 @@ misleading.
 \fBstring bytelength \fIstring\fR
 .
 Returns a decimal string giving the number of bytes used to represent
-\fIstring\fR in memory.  Because UTF\-8 uses one to three bytes to
-represent Unicode characters, the byte length will not be the same as
-the character length in general.  The cases where a script cares about
-the byte length are rare.
+\fIstring\fR in memory when encoded as Tcl's internal modified UTF\-8;
+Tcl may use other encodings for \fIstring\fR as well, and does not
+guarantee to only use a single encoding for a particular \fIstring\fR.
+Because UTF\-8 uses a variable number of bytes to represent Unicode
+characters, the byte length will not be the same as the character
+length in general.  The cases where a script cares about the byte
+length are rare.
 .RS
 .PP
 In almost all cases, you should use the
@@ -354,10 +358,27 @@ In almost all cases, you should use the
 Tcl byte array value).  Refer to the \fBTcl_NumUtfChars\fR manual
 entry for more details on the UTF\-8 representation.
 .PP
+Formally, the \fBstring bytelength\fR operation returns the content of
+the \fIlength\fR field of the \fBTcl_Obj\fR structure, after calling
+\fBTcl_GetString\fR to ensure that the \fIbytes\fR field is populated.
+This is highly unlikely to be useful to Tcl scripts, as Tcl's internal
+encoding is not strict UTF\-8, but rather a modified CESU\-8 with a
+denormalized NUL (identical to that used in a number of places by
+Java's serialization mechanism) to enable basic processing with
+non-Unicode-aware C functions.  As this representation should only
+ever be used by Tcl's implementation, the number of bytes used to
+store the representation is of very low value (except to C extension
+code, which has direct access for the purpose of memory management,
+etc.)
+.PP
 \fICompatibility note:\fR it is likely that this subcommand will be
 withdrawn in a future version of Tcl. It is better to use the
 \fBencoding convertto\fR command to convert a string to a known
 encoding and then apply \fBstring length\fR to that.
+.PP
+.CS
+\fBstring length\fR [encoding convertto utf-8 $theString]
+.CE
 .RE
 .TP
 \fBstring wordend \fIstring charIndex\fR
author	dgp <dgp@users.sourceforge.net>	2014-07-15 13:18:33 (GMT)
committer	dgp <dgp@users.sourceforge.net>	2014-07-15 13:18:33 (GMT)
commit	64c066f9bb2daa64cc0cf366d0c0812a828d2d61 (patch)
tree	934228bab58b4be058c64ed75ac9363292cc7474 /doc/string.n
parent	773c4c75ee2ec9d60766f09ced30a5fef6d812c6 (diff)
parent	74d71bafde63ca49cecadc990df7b3a2d7797849 (diff)
download	tcl-dgp_pkg_migration.zip tcl-dgp_pkg_migration.tar.gz tcl-dgp_pkg_migration.tar.bz2