diff options
author | kjnash <k.j.nash@usa.net> | 2020-06-19 12:37:17 (GMT) |
---|---|---|
committer | kjnash <k.j.nash@usa.net> | 2020-06-19 12:37:17 (GMT) |
commit | 4b7534f6ab5679c6ad37ceadc11d6ac0b16d5b5c (patch) | |
tree | 1e4788149c5cf047fe29cfcaaf208fb0f1829d1d /doc/Utf.3 | |
parent | f14b27d11940c7ec28136bcd04a21b6f75870ad9 (diff) | |
parent | 0a8c9a817e923652ceea7b0c797e5b15e80e0a77 (diff) | |
download | tcl-4b7534f6ab5679c6ad37ceadc11d6ac0b16d5b5c.zip tcl-4b7534f6ab5679c6ad37ceadc11d6ac0b16d5b5c.tar.gz tcl-4b7534f6ab5679c6ad37ceadc11d6ac0b16d5b5c.tar.bz2 |
Merge 8.6
Diffstat (limited to 'doc/Utf.3')
-rw-r--r-- | doc/Utf.3 | 34 |
1 files changed, 24 insertions, 10 deletions
@@ -141,8 +141,8 @@ source buffer is long enough such that this routine does not run off the end and dereference non-existent or random memory; if the source buffer is known to be null-terminated, this will not happen. If the input is not in proper UTF-8 format, \fBTcl_UtfToUniChar\fR will store the first -byte of \fIsrc\fR in \fI*chPtr\fR as a Tcl_UniChar between 0x0000 and -0x00ff and return 1. +byte of \fIsrc\fR in \fI*chPtr\fR as a Tcl_UniChar between 0x0080 and +0x00FF and return 1. .PP \fBTcl_UniCharToUtfDString\fR converts the given Unicode string to UTF-8, storing the result in a previously initialized \fBTcl_DString\fR. @@ -223,13 +223,27 @@ string. The caller must not ask for the next character after the last character in the string if the string is not terminated by a null character. .PP -Given \fIsrc\fR, a pointer to some location in a UTF-8 string (or to a -null byte immediately following such a string), \fBTcl_UtfPrev\fR -returns a pointer to the closest preceding byte that starts a UTF-8 -character. -This function will not back up to a position before \fIstart\fR, -the start of the UTF-8 string. If \fIsrc\fR was already at \fIstart\fR, the -return value will be \fIstart\fR. +\fBTcl_UtfPrev\fR is used to step backward through but not beyond the +UTF-8 string that begins at \fIstart\fR. If the UTF-8 string is made +up entirely of complete and well-formed characters, and \fIsrc\fR points +to the lead byte of one of those characters (or to the location one byte +past the end of the string), then repeated calls of \fBTcl_UtfPrev\fR will +return pointers to the lead bytes of each character in the string, one +character at a time, terminating when it returns \fIstart\fR. +.PP +When the conditions of completeness and well-formedness may not be satisfied, +a more precise description of the function of \fBTcl_UtfPrev\fR is necessary. +It always returns a pointer greater than or equal to \fIstart\fR; that is, +always a pointer to a location in the string. It always returns a pointer to +a byte that begins a character when scanning for characters beginning +from \fIstart\fR. When \fIsrc\fR is greater than \fIstart\fR, it +always returns a pointer less than \fIsrc\fR and greater than or +equal to (\fIsrc\fR - \fBTCL_UTF_MAX\fR). The character that begins +at the returned pointer is the first one that either includes the +byte \fIsrc[-1]\fR, or might include it if the right trail bytes are +present at \fIsrc\fR and greater. \fBTcl_UtfPrev\fR never reads the +byte \fIsrc[0]\fR nor the byte \fIstart[-1]\fR nor the byte +\fIsrc[-\fBTCL_UTF_MAX\fI-1]\fR. .PP \fBTcl_UniCharAtIndex\fR corresponds to a C string array dereference or the Pascal Ord() function. It returns the Tcl_UniChar represented at the @@ -240,7 +254,7 @@ characters. Behavior is undefined if a negative \fIindex\fR is given. \fBTcl_UtfAtIndex\fR returns a pointer to the specified character (not byte) \fIindex\fR in the UTF-8 string \fIsrc\fR. The source string must contain at least \fIindex\fR characters. This is equivalent to calling -\fBTcl_UtfNext\fR \fIindex\fR times. If a negative \fIindex\fR is given, +\fBTcl_UtfToUniChar\fR \fIindex\fR times. If a negative \fIindex\fR is given, the return pointer points to the first character in the source string. .PP \fBTcl_UtfBackslash\fR is a utility procedure used by several of the Tcl |