summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorjan.nijtmans <nijtmans@users.sourceforge.net>2021-03-14 16:12:25 (GMT)
committerjan.nijtmans <nijtmans@users.sourceforge.net>2021-03-14 16:12:25 (GMT)
commitb0e0d4b618d58c962735cb62982229a8f67fb632 (patch)
treede0303a5df24ef02ac0c0c66924c0b974087cf88 /doc
parent2ea2ef0609d7e306bf981672cda2e66782ed4db3 (diff)
downloadtcl-b0e0d4b618d58c962735cb62982229a8f67fb632.zip
tcl-b0e0d4b618d58c962735cb62982229a8f67fb632.tar.gz
tcl-b0e0d4b618d58c962735cb62982229a8f67fb632.tar.bz2
Document that Tcl_UtfCharComplete() can (now) be used to protect Tcl_UtfNext() calls against overflow, if the string being handled is not NULL-terminated.
Diffstat (limited to 'doc')
-rw-r--r--doc/Utf.315
1 files changed, 8 insertions, 7 deletions
diff --git a/doc/Utf.3 b/doc/Utf.3
index cca6498..9687eb6 100644
--- a/doc/Utf.3
+++ b/doc/Utf.3
@@ -141,8 +141,8 @@ source buffer is long enough such that this routine does not run off the
end and dereference non-existent or random memory; if the source buffer
is known to be null-terminated, this will not happen. If the input is
not in proper UTF-8 format, \fBTcl_UtfToUniChar\fR will store the first
-byte of \fIsrc\fR in \fI*chPtr\fR as a Tcl_UniChar between 0x0080 and
-0x00FF and return 1.
+byte of \fIsrc\fR in \fI*chPtr\fR as a Tcl_UniChar between 0x80 and
+0xFF and return 1.
.PP
\fBTcl_UniCharToUtfDString\fR converts the given Unicode string
to UTF-8, storing the result in a previously initialized \fBTcl_DString\fR.
@@ -197,10 +197,10 @@ characters.
.PP
\fBTcl_UtfCharComplete\fR returns 1 if the source UTF-8 string \fIsrc\fR
of \fIlength\fR bytes is long enough to be decoded by
-\fBTcl_UtfToUniChar\fR, or 0 otherwise. This function does not guarantee
-that the UTF-8 string is properly formed. This routine is used by
-procedures that are operating on a byte at a time and need to know if a
-full Tcl_UniChar has been seen.
+\fBTcl_UtfToUniChar\fR/\fBTcl_UtfNext\fR, or 0 otherwise. This function
+does not guarantee that the UTF-8 string is properly formed. This routine
+is used by procedures that are operating on a byte at a time and need to
+know if a full Tcl_UniChar has been seen.
.PP
\fBTcl_NumUtfChars\fR corresponds to \fBstrlen\fR for UTF-8 strings. It
returns the number of Tcl_UniChars that are represented by the UTF-8 string
@@ -221,7 +221,8 @@ Given \fIsrc\fR, a pointer to some location in a UTF-8 string,
\fBTcl_UtfNext\fR returns a pointer to the next UTF-8 character in the
string. The caller must not ask for the next character after the last
character in the string if the string is not terminated by a null
-character.
+character. \fBTcl_UtfCharComplete\fR can be used in that case to
+make sure enough bytes are available before calling \fBTcl_UtfNext\fR.
.PP
\fBTcl_UtfPrev\fR is used to step backward through but not beyond the
UTF-8 string that begins at \fIstart\fR. If the UTF-8 string is made