diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/ToUpper.3 | 10 | ||||
-rw-r--r-- | doc/Utf.3 | 15 | ||||
-rw-r--r-- | doc/string.n | 68 |
3 files changed, 67 insertions, 26 deletions
diff --git a/doc/ToUpper.3 b/doc/ToUpper.3 index fd9ddfb..5456538 100644 --- a/doc/ToUpper.3 +++ b/doc/ToUpper.3 @@ -8,7 +8,7 @@ .so man.macros .BS .SH NAME -Tcl_UniCharToUpper, Tcl_UniCharToLower, Tcl_UniCharToTitle, Tcl_UtfToUpper, Tcl_UtfToLower, Tcl_UtfToTitle \- routines for manipulating the case of Unicode characters and UTF-8 strings +Tcl_UniCharToUpper, Tcl_UniCharToLower, Tcl_UniCharFold, Tcl_UniCharToTitle, Tcl_UtfToUpper, Tcl_UtfToLower, Tcl_UtfToTitle \- routines for manipulating the case of Unicode characters and UTF-8 strings .SH SYNOPSIS .nf \fB#include <tcl.h>\fR @@ -17,6 +17,9 @@ int \fBTcl_UniCharToUpper\fR(\fIch\fR) .sp int +\fBTcl_UniCharFold\fR(\fIch\fR) +.sp +int \fBTcl_UniCharToLower\fR(\fIch\fR) .sp int @@ -52,6 +55,11 @@ If \fIch\fR represents an upper-case character, character. If no lower-case character is defined, it returns the character unchanged. .PP +If \fIch\fR represents an upper-case or lower-case character, +\fBTcl_UniCharFold\fR returns the corresponding folded +character. If no upper-case or lower-case character is defined, it returns the +character unchanged. +.PP If \fIch\fR represents a lower-case character, \fBTcl_UniCharToTitle\fR returns the corresponding title-case character. If no title-case character is defined, it returns the @@ -233,10 +233,10 @@ characters. .PP \fBTcl_UtfCharComplete\fR returns 1 if the source UTF-8 string \fIsrc\fR of \fIlength\fR bytes is long enough to be decoded by -\fBTcl_UtfToUniChar\fR, or 0 otherwise. This function does not guarantee -that the UTF-8 string is properly formed. This routine is used by -procedures that are operating on a byte at a time and need to know if a -full Unicode character has been seen. +\fBTcl_UtfToUniChar\fR/\fBTcl_UtfNext\fR, or 0 otherwise. This function +does not guarantee that the UTF-8 string is properly formed. This routine +is used by procedures that are operating on a byte at a time and need to +know if a full Unicode character has been seen. .PP \fBTcl_NumUtfChars\fR corresponds to \fBstrlen\fR for UTF-8 strings. It returns the number of Tcl_UniChars that are represented by the UTF-8 string @@ -257,7 +257,8 @@ Given \fIsrc\fR, a pointer to some location in a UTF-8 string, \fBTcl_UtfNext\fR returns a pointer to the next UTF-8 character in the string. The caller must not ask for the next character after the last character in the string if the string is not terminated by a null -character. +character. \fBTcl_UtfCharComplete\fR can be used in that case to +make sure enough bytes are available before calling \fBTcl_UtfNext\fR. .PP \fBTcl_UtfPrev\fR is used to step backward through but not beyond the UTF-8 string that begins at \fIstart\fR. If the UTF-8 string is made @@ -274,12 +275,12 @@ always a pointer to a location in the string. It always returns a pointer to a byte that begins a character when scanning for characters beginning from \fIstart\fR. When \fIsrc\fR is greater than \fIstart\fR, it always returns a pointer less than \fIsrc\fR and greater than or -equal to (\fIsrc\fR - \fBTCL_UTF_MAX\fR). The character that begins +equal to (\fIsrc\fR - \fB4\fR). The character that begins at the returned pointer is the first one that either includes the byte \fIsrc[-1]\fR, or might include it if the right trail bytes are present at \fIsrc\fR and greater. \fBTcl_UtfPrev\fR never reads the byte \fIsrc[0]\fR nor the byte \fIstart[-1]\fR nor the byte -\fIsrc[-\fBTCL_UTF_MAX\fI-1]\fR. +\fIsrc[-\fB5\fI]\fR. .PP \fBTcl_UniCharAtIndex\fR corresponds to a C string array dereference or the Pascal Ord() function. It returns the Unicode character represented at the diff --git a/doc/string.n b/doc/string.n index 7cd53ca..f9cc373 100644 --- a/doc/string.n +++ b/doc/string.n @@ -33,6 +33,20 @@ and is more efficient than building a list of arguments and using \fBjoin\fR with an empty join string. .RE .TP +\fBstring charend \fIstring charIndex\fR +. +Returns the index of the next character just after position \fIcharIndex\fR +in the \fIstring\fR. \fIcharIndex\fR may be specified using the forms in +\fBSTRING INDICES\fR. If position \fIcharIndex\fR of \fIstring\fR holds a +character > U+FFFF, the returned index will be 2 higher than \fIcharIndex\fR. +.TP +\fBstring charstart \fIstring charIndex\fR +. +Returns the index of the character containing \fIcharIndex\fR in the \fIstring\fR. +\fIcharIndex\fR may be specified using the forms in \fBSTRING INDICES\fR. +Normally this will return \fIcharIndex\fR, except if position \fIcharIndex\fR-1 +holds a chraracter > U+FFFF: In that case the returned index will be one higher. +.TP \fBstring compare\fR ?\fB\-nocase\fR? ?\fB\-length\fI length\fR? \fIstring1 string2\fR . Perform a character-by-character comparison of strings \fIstring1\fR @@ -223,6 +237,24 @@ number of bytes used to store the string. If the value is a byte array value (such as those returned from reading a binary encoded channel), then this will return the actual byte length of the value. .TP +\fBstring lineend \fIstring charIndex\fR +. +Returns the index of the character just after the last one in the word +containing character \fIcharIndex\fR of \fIstring\fR. \fIcharIndex\fR +may be specified using the forms in \fBSTRING INDICES\fR. A word is +considered to be any contiguous range of alphanumeric (Unicode letters +or decimal digits) or underscore (Unicode connector punctuation) +characters, or any single character other than these. +.TP +\fBstring linestart \fIstring charIndex\fR +. +Returns the index of the first character in the word containing character +\fIcharIndex\fR of \fIstring\fR. \fIcharIndex\fR may be specified using the +forms in \fBSTRING INDICES\fR. A word is considered to be any contiguous +range of alphanumeric (Unicode letters or decimal digits) or underscore +(Unicode connector punctuation) characters, or any single character other than +these. +.TP \fBstring map\fR ?\fB\-nocase\fR? \fImapping string\fR . Replaces substrings in \fIstring\fR based on the key-value pairs in @@ -371,6 +403,24 @@ characters present in the string given by \fIchars\fR are removed. If \fIchars\fR is not specified then white space is removed (any character for which \fBstring is space\fR returns 1, and "\e0"). .TP +\fBstring wordend \fIstring charIndex\fR +. +Returns the index of the character just after the last one in the word +containing character \fIcharIndex\fR of \fIstring\fR. \fIcharIndex\fR +may be specified using the forms in \fBSTRING INDICES\fR. A word is +considered to be any contiguous range of alphanumeric (Unicode letters +or decimal digits) or underscore (Unicode connector punctuation) +characters, or any single character other than these. +.TP +\fBstring wordstart \fIstring charIndex\fR +. +Returns the index of the first character in the word containing character +\fIcharIndex\fR of \fIstring\fR. \fIcharIndex\fR may be specified using the +forms in \fBSTRING INDICES\fR. A word is considered to be any contiguous +range of alphanumeric (Unicode letters or decimal digits) or underscore +(Unicode connector punctuation) characters, or any single character other than +these. +.TP \fBstring trimright \fIstring\fR ?\fIchars\fR? . Returns a value equal to \fIstring\fR except that any trailing @@ -422,24 +472,6 @@ encoding and then apply \fBstring length\fR to that. \fBstring length\fR [encoding convertto utf-8 $theString] .CE .RE -.TP -\fBstring wordend \fIstring charIndex\fR -. -Returns the index of the character just after the last one in the word -containing character \fIcharIndex\fR of \fIstring\fR. \fIcharIndex\fR -may be specified using the forms in \fBSTRING INDICES\fR. A word is -considered to be any contiguous range of alphanumeric (Unicode letters -or decimal digits) or underscore (Unicode connector punctuation) -characters, or any single character other than these. -.TP -\fBstring wordstart \fIstring charIndex\fR -. -Returns the index of the first character in the word containing character -\fIcharIndex\fR of \fIstring\fR. \fIcharIndex\fR may be specified using the -forms in \fBSTRING INDICES\fR. A word is considered to be any contiguous -range of alphanumeric (Unicode letters or decimal digits) or underscore -(Unicode connector punctuation) characters, or any single character other than -these. .SH "STRING INDICES" .PP When referring to indices into a string (e.g., for \fBstring index\fR |