diff options
Diffstat (limited to 'doc/string.n')
-rw-r--r-- | doc/string.n | 56 |
1 files changed, 46 insertions, 10 deletions
diff --git a/doc/string.n b/doc/string.n index 351c865..00ce85c 100644 --- a/doc/string.n +++ b/doc/string.n @@ -4,9 +4,9 @@ .\" .\" See the file "license.terms" for information on usage and redistribution .\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. -.\" -.so man.macros +.\" .TH string n 8.1 Tcl "Tcl Built-In Commands" +.so man.macros .BS .\" Note: do not modify the .SH NAME line immediately below! .SH NAME @@ -19,6 +19,21 @@ string \- Manipulate strings Performs one of several string operations, depending on \fIoption\fR. The legal \fIoption\fRs (which may be abbreviated) are: .TP +\fBstring cat\fR ?\fIstring1\fR? ?\fIstring2...\fR? +.VS 8.6.2 +Concatenate the given \fIstring\fRs just like placing them directly +next to each other and return the resulting compound string. If no +\fIstring\fRs are present, the result is an empty string. +.RS +.PP +This primitive is occasionally handier than juxtaposition of strings +when mixed quoting is wanted, or when the aim is to return the result +of a concatenation without resorting to \fBreturn\fR \fB\-level 0\fR, +and is more efficient than building a list of arguments and using +\fBjoin\fR with an empty join string. +.RE +.VE +.TP \fBstring compare\fR ?\fB\-nocase\fR? ?\fB\-length\fI length\fR? \fIstring1 string2\fR . Perform a character-by-character comparison of strings \fIstring1\fR @@ -131,8 +146,9 @@ Any Unicode printing character, including space. .IP \fBpunct\fR 12 Any Unicode punctuation character. .IP \fBspace\fR 12 -Any Unicode whitespace character, zero width space (U+200b), -word joiner (U+2060) and zero width no-break space (U+feff) (=BOM). +Any Unicode whitespace character, mongolian vowel separator +(U+180e), zero width space (U+200b), word joiner (U+2060) or +zero width no-break space (U+feff) (=BOM). .IP \fBtrue\fR 12 Any of the forms allowed to \fBTcl_GetBoolean\fR where the value is true. @@ -343,10 +359,13 @@ misleading. \fBstring bytelength \fIstring\fR . Returns a decimal string giving the number of bytes used to represent -\fIstring\fR in memory. Because UTF\-8 uses one to three bytes to -represent Unicode characters, the byte length will not be the same as -the character length in general. The cases where a script cares about -the byte length are rare. +\fIstring\fR in memory when encoded as Tcl's internal modified UTF\-8; +Tcl may use other encodings for \fIstring\fR as well, and does not +guarantee to only use a single encoding for a particular \fIstring\fR. +Because UTF\-8 uses a variable number of bytes to represent Unicode +characters, the byte length will not be the same as the character +length in general. The cases where a script cares about the byte +length are rare. .RS .PP In almost all cases, you should use the @@ -354,10 +373,27 @@ In almost all cases, you should use the Tcl byte array value). Refer to the \fBTcl_NumUtfChars\fR manual entry for more details on the UTF\-8 representation. .PP +Formally, the \fBstring bytelength\fR operation returns the content of +the \fIlength\fR field of the \fBTcl_Obj\fR structure, after calling +\fBTcl_GetString\fR to ensure that the \fIbytes\fR field is populated. +This is highly unlikely to be useful to Tcl scripts, as Tcl's internal +encoding is not strict UTF\-8, but rather a modified CESU\-8 with a +denormalized NUL (identical to that used in a number of places by +Java's serialization mechanism) to enable basic processing with +non-Unicode-aware C functions. As this representation should only +ever be used by Tcl's implementation, the number of bytes used to +store the representation is of very low value (except to C extension +code, which has direct access for the purpose of memory management, +etc.) +.PP \fICompatibility note:\fR it is likely that this subcommand will be withdrawn in a future version of Tcl. It is better to use the \fBencoding convertto\fR command to convert a string to a known encoding and then apply \fBstring length\fR to that. +.PP +.CS +\fBstring length\fR [encoding convertto utf-8 $theString] +.CE .RE .TP \fBstring wordend \fIstring charIndex\fR @@ -407,7 +443,7 @@ would refer to the in .QW abcd ). .IP \fIM\fB+\fIN\fR 10 -The char specified at the integral index that is the sum of +The char specified at the integral index that is the sum of integer values \fIM\fR and \fIN\fR (e.g., .QW \fB1+1\fR would refer to the @@ -415,7 +451,7 @@ would refer to the in .QW abcd ). .IP \fIM\fB\-\fIN\fR 10 -The char specified at the integral index that is the difference of +The char specified at the integral index that is the difference of integer values \fIM\fR and \fIN\fR (e.g., .QW \fB2\-1\fR would refer to the |