Fix "string tolower" and friends for handling unpaired surrogates correctly. Also add test-cases for those situations.

Various typo's in comments.
author: jan.nijtmans <nijtmans@users.sourceforge.net> 2018-06-24 20:26:27 (GMT)
committer: jan.nijtmans <nijtmans@users.sourceforge.net> 2018-06-24 20:26:27 (GMT)
commit: 94f9cf81ed3e156bd372a3cac249974d1acb4e1d (patch)
tree: 08b03824658d05ae51a9f7494bed57eb8dd6d8d4 /doc/Utf.3
parent: 5c87050dd8b6765c40eeef94ab5773d955c3de17 (diff)
download: tcl-94f9cf81ed3e156bd372a3cac249974d1acb4e1d.zip
tcl-94f9cf81ed3e156bd372a3cac249974d1acb4e1d.tar.gz
tcl-94f9cf81ed3e156bd372a3cac249974d1acb4e1d.tar.bz2
1 files changed, 4 insertions, 1 deletions
diff --git a/doc/Utf.3 b/doc/Utf.3
index 160575b..922fd81 100644
--- a/doc/Utf.3
+++ b/doc/Utf.3
@@ -132,7 +132,10 @@ represent one Unicode character in the UTF-8 representation.
 .PP
 \fBTcl_UniCharToUtf\fR stores the character \fIch\fR as a UTF-8 string
 in starting at \fIbuf\fR.  The return value is the number of bytes stored
-in \fIbuf\fR.
+in \fIbuf\fR. If ch is an upper surrogate (range U+D800 - U+DBFF), then
+the return value will be 0 and nothing will be stored. If you still
+want to produce UTF-8 output for it (even though knowing it's an illegal
+code-point on its own), just call \fBTcl_UniCharToUtf\fR again using ch = -1.
 .PP
 \fBTcl_UtfToUniChar\fR reads one UTF-8 character starting at \fIsrc\fR
 and stores it as a Tcl_UniChar in \fI*chPtr\fR.  The return value is the
author	jan.nijtmans <nijtmans@users.sourceforge.net>	2018-06-24 20:26:27 (GMT)
committer	jan.nijtmans <nijtmans@users.sourceforge.net>	2018-06-24 20:26:27 (GMT)
commit	94f9cf81ed3e156bd372a3cac249974d1acb4e1d (patch)
tree	08b03824658d05ae51a9f7494bed57eb8dd6d8d4 /doc/Utf.3
parent	5c87050dd8b6765c40eeef94ab5773d955c3de17 (diff)
download	tcl-94f9cf81ed3e156bd372a3cac249974d1acb4e1d.zip tcl-94f9cf81ed3e156bd372a3cac249974d1acb4e1d.tar.gz tcl-94f9cf81ed3e156bd372a3cac249974d1acb4e1d.tar.bz2