diff options
author | jan.nijtmans <nijtmans@users.sourceforge.net> | 2021-03-15 11:52:09 (GMT) |
---|---|---|
committer | jan.nijtmans <nijtmans@users.sourceforge.net> | 2021-03-15 11:52:09 (GMT) |
commit | 6e657cfb83595bc0481d833b708554a44e2142fb (patch) | |
tree | 1ca1c944beacd08dc50a55b0e6cf69a1c9d2e739 /doc/Utf.3 | |
parent | 57bf9fe12f8859459556da1de27dbdef24048a68 (diff) | |
parent | 185b0d14932f4cc8503e6dd235da5bd90ebc777c (diff) | |
download | tcl-6e657cfb83595bc0481d833b708554a44e2142fb.zip tcl-6e657cfb83595bc0481d833b708554a44e2142fb.tar.gz tcl-6e657cfb83595bc0481d833b708554a44e2142fb.tar.bz2 |
Implement TIP #575: Switchable Tcl_UtfCharComplete()/Tcl_UtfNext()/Tcl_UtfPrev()
Diffstat (limited to 'doc/Utf.3')
-rw-r--r-- | doc/Utf.3 | 15 |
1 files changed, 8 insertions, 7 deletions
@@ -233,10 +233,10 @@ characters. .PP \fBTcl_UtfCharComplete\fR returns 1 if the source UTF-8 string \fIsrc\fR of \fIlength\fR bytes is long enough to be decoded by -\fBTcl_UtfToUniChar\fR, or 0 otherwise. This function does not guarantee -that the UTF-8 string is properly formed. This routine is used by -procedures that are operating on a byte at a time and need to know if a -full Unicode character has been seen. +\fBTcl_UtfToUniChar\fR/\fBTcl_UtfNext\fR, or 0 otherwise. This function +does not guarantee that the UTF-8 string is properly formed. This routine +is used by procedures that are operating on a byte at a time and need to +know if a full Unicode character has been seen. .PP \fBTcl_NumUtfChars\fR corresponds to \fBstrlen\fR for UTF-8 strings. It returns the number of Tcl_UniChars that are represented by the UTF-8 string @@ -257,7 +257,8 @@ Given \fIsrc\fR, a pointer to some location in a UTF-8 string, \fBTcl_UtfNext\fR returns a pointer to the next UTF-8 character in the string. The caller must not ask for the next character after the last character in the string if the string is not terminated by a null -character. +character. \fBTcl_UtfCharComplete\fR can be used in that case to +make sure enough bytes are available before calling \fBTcl_UtfNext\fR. .PP \fBTcl_UtfPrev\fR is used to step backward through but not beyond the UTF-8 string that begins at \fIstart\fR. If the UTF-8 string is made @@ -274,12 +275,12 @@ always a pointer to a location in the string. It always returns a pointer to a byte that begins a character when scanning for characters beginning from \fIstart\fR. When \fIsrc\fR is greater than \fIstart\fR, it always returns a pointer less than \fIsrc\fR and greater than or -equal to (\fIsrc\fR - \fBTCL_UTF_MAX\fR). The character that begins +equal to (\fIsrc\fR - 4). The character that begins at the returned pointer is the first one that either includes the byte \fIsrc[-1]\fR, or might include it if the right trail bytes are present at \fIsrc\fR and greater. \fBTcl_UtfPrev\fR never reads the byte \fIsrc[0]\fR nor the byte \fIstart[-1]\fR nor the byte -\fIsrc[-\fBTCL_UTF_MAX\fI-1]\fR. +\fIsrc[-5]\fR. .PP \fBTcl_UniCharAtIndex\fR corresponds to a C string array dereference or the Pascal Ord() function. It returns the Unicode character represented at the |