summaryrefslogtreecommitdiffstats
path: root/generic/tclUtf.c
diff options
context:
space:
mode:
authorjan.nijtmans <nijtmans@users.sourceforge.net>2023-02-01 08:10:12 (GMT)
committerjan.nijtmans <nijtmans@users.sourceforge.net>2023-02-01 08:10:12 (GMT)
commit3eaad4bbc95c9cb3eaaf79872646d4fa7f6d8c6e (patch)
tree4952179bcfbada1be4941093c77d7a531dc7f135 /generic/tclUtf.c
parent537c8e77ba967fbb6f2ef1d7b2134420a3117bad (diff)
downloadtcl-3eaad4bbc95c9cb3eaaf79872646d4fa7f6d8c6e.zip
tcl-3eaad4bbc95c9cb3eaaf79872646d4fa7f6d8c6e.tar.gz
tcl-3eaad4bbc95c9cb3eaaf79872646d4fa7f6d8c6e.tar.bz2
(cherry-pick) Make Tcl_UniCharToUtf more readable and add test to exercise surrogate handling. (test-case was still missing, which cannot be used in Tcl 8.6)
Diffstat (limited to 'generic/tclUtf.c')
-rw-r--r--generic/tclUtf.c14
1 files changed, 6 insertions, 8 deletions
diff --git a/generic/tclUtf.c b/generic/tclUtf.c
index db2be84..cb8bb3e 100644
--- a/generic/tclUtf.c
+++ b/generic/tclUtf.c
@@ -185,17 +185,15 @@ Invalid(
* Stores the given Tcl_UniChar as a sequence of UTF-8 bytes in the
* provided buffer. Equivalent to Plan 9 runetochar().
*
- * Special handling of Surrogate pairs is handled as follows:
- * When this function is called for ch being a high surrogate,
- * the first byte of the 4-byte UTF-8 sequence is produced and
- * the function returns 1. Calling the function again with a
- * low surrogate, the remaining 3 bytes of the 4-byte UTF-8
- * sequence is produced, and the function returns 3. The buffer
- * is used to remember the high surrogate between the two calls.
+ * Surrogate pairs are handled as follows: When ch is a high surrogate,
+ * the first byte of the 4-byte UTF-8 sequence is stored in the buffer and
+ * the function returns 1. If the function is called again with a low
+ * surrogate and the same buffer, the remaining 3 bytes of the 4-byte
+ * UTF-8 sequence are produced.
*
* If no low surrogate follows the high surrogate (which is actually
* illegal), this can be handled reasonably by calling Tcl_UniCharToUtf
- * again with ch = -1. This will produce a 3-byte UTF-8 sequence
+ * again with ch = -1. This produces a 3-byte UTF-8 sequence
* representing the high surrogate.
*
* Results: