diff options
author | dgp <dgp@users.sourceforge.net> | 2020-04-16 19:02:40 (GMT) |
---|---|---|
committer | dgp <dgp@users.sourceforge.net> | 2020-04-16 19:02:40 (GMT) |
commit | 60d5424069845124e51d4032f295d913a17454a1 (patch) | |
tree | 9c2a9dbd34c10c552182aef6a874c34bfaea77ab /generic/tclUtf.c | |
parent | 443928c10f1ac94e6a6adfafb478eb9fa09ac39a (diff) | |
download | tcl-60d5424069845124e51d4032f295d913a17454a1.zip tcl-60d5424069845124e51d4032f295d913a17454a1.tar.gz tcl-60d5424069845124e51d4032f295d913a17454a1.tar.bz2 |
More detailed comments.
Diffstat (limited to 'generic/tclUtf.c')
-rw-r--r-- | generic/tclUtf.c | 70 |
1 files changed, 46 insertions, 24 deletions
diff --git a/generic/tclUtf.c b/generic/tclUtf.c index 91e9c73..67603af 100644 --- a/generic/tclUtf.c +++ b/generic/tclUtf.c @@ -678,13 +678,35 @@ Tcl_UtfFindLast( * * Tcl_UtfNext -- * - * Given a pointer to some current location in a UTF-8 string, move - * forward one character. The caller must ensure that they are not asking - * for the next character after the last character in the string. + * The aim of this routine is to provide a way to iterate forward + * through a UTF-8 string. The caller is expected to pass a non-NULL + * pointer argument /src/ which points to a location within a string. + * (*src) will be read, so /src/ must not point to an unreadable + * location past the end of the string. If /src/ points to the + * beginning of a complete, well-formed and valid UTF_8 byte sequence + * of no more than TCL_UTF_MAX bytes, Tcl_UtfNext returns the pointer + * just past the end of that sequence. In any other circumstance, + * Tcl_UtfNext returns /src/+1. + * + * Because this routine always returns a value > /src/, it is useful + * as a forward iterator that will always make progress. If the string + * is NUL-terminated, Tcl_UtfNext will not read beyond the terminating + * NUL character. If it is not NUL-terminated, the caller must make + * use of the companion routine Tcl_UtfCharComplete to test whether + * there is risk that Tcl_UtfNext will read beyond the end of the string. + * Tcl_UtfNext will never read more than TCL_UTF_MAX bytes. + * + * In a string where all characters are complete and properly formed, + * and /src/ points to the first byte of a character, repeated + * Tcl_UtfNext calls will step to the starting bytes of characters, one + * character at a time. Within those limitations, Tcl_UtfPrev and + * Tcl_UtfNext are inverses. If either condition cannot be met, + * Tcl_UtfPrev and Tcl_UtfNext may not function as inverses and the + * caller will have to take greater care. * * Results: - * The return value is the pointer to the next character in the UTF-8 - * string. + * A pointer to the start of the next character in the string (or to + * the end of the string) as described above. * * Side effects: * None. @@ -725,37 +747,37 @@ Tcl_UtfNext( * * The aim of this routine is to provide a way to move backward * through a UTF-8 string. The caller is expected to pass non-NULL - * pointer arguments start and src. start points to the beginning - * of a string, and src >= start points to a location within (or just - * past the end) of the string. This routine always returns a - * pointer within the string (>= start). When (src == start), it - * returns start. When (src > start), it returns a pointer (< src) - * and (>= src - TCL_UTF_MAX). Subject to these constraints, the - * routine returns a pointer to the earliest byte in the string that - * starts a character when characters are read starting at start and + * pointer arguments /start/ and /src/. /start/ points to the beginning + * of a string, and /src/ (>= /start/) points to a location within (or + * just past the end) of the string. This routine always returns a + * pointer within the string (>= /start/). When (/src/ == /start/), + * it returns /start/. When (/src/ > /start/), it returns a pointer + * (< /src/) and (>= /src/ - TCL_UTF_MAX). Subject to these constraints, + * the routine returns a pointer to the earliest byte in the string that + * starts a character when characters are read starting at /start/ and * that character might include the byte src[-1]. The routine will * examine only those bytes in the range that might be returned. - * It will not examine the byte *src, and because of that cannot + * It will not examine the byte (*src), and because of that cannot * determine for certain in all circumstances whether the character * that begins with the returned pointer will or will not include - * the byte src[-1]. In the scenario, where src points to the end of - * a buffer being filled, the returned pointer point to either the + * the byte src[-1]. In the scenario where /src/ points to the end of + * a buffer being filled, the returned pointer points to either the * final complete character in the string or to the earliest byte * that might start an incomplete character waiting for more bytes to * complete. * - * Because this routine always returns a value < src until the point - * it is forced to return start, it is useful as a backward iterator + * Because this routine always returns a value < /src/ until the point + * it is forced to return /start/, it is useful as a backward iterator * through a string that will always make progress and always be * prevented from running past the beginning of the string. * * In a string where all characters are complete and properly formed, - * and the value of src points to the first byte of a character, - * repeated Tcl_UtfPrev calls will step to the starting bytes of - * characters, one character at a time. Within those limitations, - * Tcl_UtfPrev and Tcl_UtfNext are inverses. If either condition cannot - * be met, Tcl_UtfPrev and Tcl_UtfNext may not function as inverses and - * the caller will have to take greater care. + * and /src/ points to the first byte of a character, repeated + * Tcl_UtfPrev calls will step to the starting bytes of characters, one + * character at a time. Within those limitations, Tcl_UtfPrev and + * Tcl_UtfNext are inverses. If either condition cannot be met, + * Tcl_UtfPrev and Tcl_UtfNext may not function as inverses and the + * caller will have to take greater care. * * Results: * A pointer to the start of a character in the string as described |