| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
do special things with byte arrays. You can use "testbytestring" for that. This makes it more clear which test-cases use invalid byte-sequences. Now "testutfnext" also checks whether Tcl_UtfNext() reads src[-1] and src[end], and will warn you when it does.
|
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
not only those) test-cases:
Selector "ucs2" is meant for Tcl 8.5: No knowledge at all about 4-byte sequences.
Selector "ucs4" is meant for Tcl 8.5 or 8.7 with TCL_UTF_MAX=4 or Tcl 8.6 with TCL_UTF_MAX=6
Selector "tip389" is meant for Tcl 8.6 with TCL_UTF_MAX=4 and Tcl 8.7 with TCL_UTF_MAX=3
(of course, this is too simple: there will be testcases needing more than those 3
|
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
typo. Fixed now.
|
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
bytes.
Tcl_UtfToUniChar() now never reads more than TCL_UTF_MAX bytes any more.
Since the UtfToUtf encoder/decoder now uses TclUtfToUCS4() it doesn't join 2 surrogates as 2 x 3-byte sequences any more. Actually, it shouldn't, because such sequences are invalid UTF-8.
Therefore, added the ucs2 constraint to testcase encoding-15.4. Let's see how TIP #573 goes, this TIP should make this change official.
Other callers of Tcl_UtfToUniChar() needs to be revised for the same problem. Most callers will need to change Tcl_UtfToUniChar() -> TclUtfToUCS4() and Tcl_UtfCharComplete() -> TclUCS4Complete(), but that's not done yet.
|
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
| | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/
| |/| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
[6596c4af31e29b5d]. Just look at the Tcl_UtfAtIndex() implementation for TCL_UTF_MAX=4: It's not the same.
There are no test-cases for Tcl_UniCharAtIndex(), see [f45d0dc1a7], not really worth to write one, since the implementation of this function didn't change in 20 years.
|
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | |/ / / / / / / / / / / / / / / / / / / / / / / / / / / /
| | | | | | | | | | | | | | | | | | | | | | | | | | | | / /
| | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/ /
| |/| | | | | | | | | | | | | | | | | | | | | | | | | | | |
|
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
results for different TCL_UTF_MAX values.
|
| | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/ /
| |/| | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Christian Werner for the Bug report and the Fix.
|
| | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/ /
| |/| | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
testcases in 4 groups (utf-7.10, utf-7.15, utf-7.40 and utf-7.48) , where Tcl_UtfPrev() didn't jump to the beginning of the UTF-8 character, even though there was no limitation which prevented that. So, this is actually a bug-fix for the TIP #389 implementation.
|
| | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/ /
| |/| | | | | | | | | | | | | | | | | | | | | | | | |
|
| | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/ /
| |/| | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | |
For start bytes F0-F4, case TCL_UTF_MAX=4, Tcl_UtfToUniChar() reads 3 bytes but only advances 1 byte. So Tcl_UtfCharComplete() must make sure 3 bytes are available, not 1. Adapt Tcl_UtfCharComplete() accordingly. No change for TCL_UTF_MAX=[3|6]
|
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/ /
| |/| | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | |
Discussion is not over yet, but we need a base for comparision in order to come up with alternatives.
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| |/ / / / / / / / / / / / / / / / / / / / / / / / |
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/
| | | |/| | | | | | | | | | | | | | | | | | | | | |
|
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/ /
| | | | |/| | | | | | | | | | | | | | | | | | | | |
|
| | | | | | | | | | | | | | | | | | | | | | | | | |
|
| | |/ / / / / / / / / / / / / / / / / / / / / /
| | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | |
recent reforms. Older efforts aborted.
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/ /
| | | |/| | | | | | | | | | | | | | | | | | | | |
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/ /
| | | |/| | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | |
in "utf16" build any more.
|
| | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | |
Make sure that Tcl_UtfPrev() never reads more than 3 trail bytes (or 4 when TCL_UTF_MAX > 4). Those are the same limits as for Tcl_UtfNext() and Tcl_UtfToUniChar()
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | |_|_|_|_|_|_|_|_|_|_|_|_|/ / / / / / / /
| | | |/| | | | | | | | | | | | | | | | | | | | |
|
| | | | | | | | | | | | | | | | | | | | | | | | |
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | |_|_|_|_|_|_|_|_|_|_|_|_|/ / / / / / / /
| | | |/| | | | | | | | | | | | | | | | | | | | |
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | |_|_|_|_|_|_|_|_|_|_|_|_|/ / / / / / / /
| | | |/| | | | | | | | | | | | | | | | | | | | |
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | |_|_|_|_|_|_|_|_|_|_|_|_|/ / / / / / / /
| | | |/| | | | | | | | | | | | | | | | | | | | |
|
| | | | | | | | | | | | | | | | | | | | | | | | |
|
| | | |_|_|_|_|_|_|_|_|_|_|/ / / / / / / / / /
| | |/| | | | | | | | | | | | | | | | | | | | |
|
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | |_|_|_|_|_|_|_|/ / / / / / / / / / / / /
| | |/| | | | | | | | | | | | | | | | | | | /
| | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/
| |/| | | | | | | | | | | | | | | | | | | | |
|
| | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | |
bytes after the string-end are read. The command will return -1 in that case. No need for additional arguments any more.
|
| | |_|_|_|_|_|_|_|/ / / / / / / / / / / /
| |/| | | | | | | | | | | | | | | | | | | |
|
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | |/ / / / / / / / / / / / / / / / / / / / /
| |/| | / / / / / / / / / / / / / / / / / / /
| | | |/ / / / / / / / / / / / / / / / / / /
| | |/| | | | | | | | | | | | | | | | | | | |
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | |/ / / / / / / / / / / / / / / / / / / |
|
| | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | | |
without it. e.g. [testbytestring \xC2\xA2] is the same as just \xA2.
|
| | |/ / / / / / / / / / / / / / / / / / / |
|
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | |/ / / / / / / / / / / / / / / / / / /
| | | | | | | | | / / / / / / / / / / / /
| | |_|_|_|_|_|_|/ / / / / / / / / / / /
| |/| | | | | | | | | | | | | | | | | | |
|
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ |
|
| | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | | | |
character, we cannot conclude that the next byte also will be, or can by
taken as a single byte. At least we cannot when TCL_UTF_MAX > 3 so that we
have room for valid two-byte sequences after incomplete sequence detection.
No need for conditional code, just use an algorithm that always works.
|
| | |/ / / / / / / / / / / / / / / / / / |
|
| | |_|_|_|_|_|/ / / / / / / / / / / /
| |/| | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | |
more than 3 bytes. This is more consistant with what Tcl 8.7 does too.
For TCL_UTF_MAX==6: Make sure that Tcl_UtfNext()/Tcl_UtfPrev() never move more than 4 bytes.
For TCL_UTF_MAX==3: No change.
Introduce ucs2_utf16 test constraint, since many test results now become the same for ucs2 and utf16.
|
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | |/ / / / / / / / / / / / / / / / /
| | | | | | | | | | | | | | | | | | /
| | |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|/
| |/| | | | | | | | | | | | | | | | |
|
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | |
Tcl_NumUtfChars(). Don't use "-1" in the Tcl_NumUtfChars() calculation, since that raises more questions than it solves, but that's easy to be remedied as well: Juse use >= in stead of > in the comparation. Great idea, Don!
Backport more code formatting from Tcl 8.6 (e.g. use of CONST, which makes no sense any more in c-files)
|