summaryrefslogtreecommitdiffstats
path: root/generic/tclUtf.c
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | Cherry-pick Tcl_UniCharAtIndex() implementation from [6596c4af31], but ↵jan.nijtmans2020-04-261-22/+4
| | | | | | | | | | | | | | | | | | | | | | | | adapted to the needs of TIPs 389/542.
| * | | | | Merge 8.6jan.nijtmans2020-04-251-39/+11
| |\ \ \ \ \
| * \ \ \ \ \ Merge 8.6jan.nijtmans2020-04-241-1/+1
| |\ \ \ \ \ \
| * \ \ \ \ \ \ Merge 8.6. This mainly introduces the overlong check into Tcl_UtfPrev(). 10 ↵jan.nijtmans2020-04-241-27/+152
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | testcase changed results, all of them due to the Tcl_UtfPrev() improvement. Tcl_UtfNext() is not affected: Previous implementation was based on Tcl_UtfToUniChar(), which already did this check.
| * \ \ \ \ \ \ \ Fix [27944a3661]: Taming test utf-6.88. Long-standing bug in Tcl_UtfNext(). ↵jan.nijtmans2020-04-221-4/+14
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Corner-case when the pointer doesn't advance to the start-byte of the next UTF-8 character.
| | * | | | | | | | Wrong indent in commentjan.nijtmans2020-04-211-1/+1
| | | | | | | | | |
| | * | | | | | | | Merge 8.7jan.nijtmans2020-04-211-6/+14
| | |\ \ \ \ \ \ \ \
| | * | | | | | | | | Proposed fix for [27944a3661]: Taming test utf-6.88. jan.nijtmans2020-04-191-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix is not optimized, it still uses TclUtfToUniChar() in its implementation. But optimizing work is on its way, hopefully, coming through 8.5 .. 8.6 .. and up.
| * | | | | | | | | | More test cleanupjan.nijtmans2020-04-211-2/+2
| | |/ / / / / / / / | |/| | | | | | | |
| * | | | | | | | | Teach Tcl_UtfPrev() that 0xC1 is _always_ an invallid byte. Test-case utf-7.34. jan.nijtmans2020-04-201-6/+14
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | Make sure that Tcl_UtfCharComplete(src, TCL_UTF_MAX) always returns 1, for whatever bytes, since that's the maximum number of bytes Tcl_UtfToUniChar() can read in a single call.
| * | | | | | | | Merge 8.6jan.nijtmans2020-04-151-2/+2
| |\ \ \ \ \ \ \ \
| * \ \ \ \ \ \ \ \ Merge 8.6jan.nijtmans2020-04-141-12/+53
| |\ \ \ \ \ \ \ \ \
| * \ \ \ \ \ \ \ \ \ Merge 8.6jan.nijtmans2020-04-101-3/+3
| |\ \ \ \ \ \ \ \ \ \
| * | | | | | | | | | | Make Tcl_UtfCharComplete() usable for both Tcl_UtfToUniChar() and ↵jan.nijtmans2020-04-061-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tcl_UtfToChar16(). Defect noticed by Don Porter. Thanks! Add test-cases, assuring correct handling of 4-byte UTF-8 sequences. Use "end-1", "end" and "end+1" in testcases related to Tcl_NumUtfChars(), that's more readable/maintainable than integers.
| * | | | | | | | | | | Merge 8.6jan.nijtmans2020-04-051-0/+5
| |\ \ \ \ \ \ \ \ \ \ \
| * | | | | | | | | | | | Fix broken build.dgp2020-04-031-1/+1
| | | | | | | | | | | | |
| * | | | | | | | | | | | Simplify implementation of TclUtfToUCS4: The #undefined Tcl_UtfToUniChar() ↵jan.nijtmans2020-04-031-31/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | already does everything for use here (Unlike in Tcl 8.6, with has to live without TIP #542)
| | | | | | | | | | | | |
| | \ \ \ \ \ \ \ \ \ \ \
| | \ \ \ \ \ \ \ \ \ \ \
| | \ \ \ \ \ \ \ \ \ \ \
| *---. \ \ \ \ \ \ \ \ \ \ \ merge 8.6dgp2020-04-021-25/+72
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \
| * \ \ \ \ \ \ \ \ \ \ \ \ \ \ Merge 8.6jan.nijtmans2020-03-181-14/+12
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| * \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Merge 8.7jan.nijtmans2019-12-031-1/+1
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | * \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Merge 8.6jan.nijtmans2019-12-021-1/+1
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| * | \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Merge 8.7jan.nijtmans2019-09-251-13/+13
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / / / / / / / / /
| | * | | | | | | | | | | | | | | | | Merge 8.6jan.nijtmans2019-09-161-3/+3
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | * \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Merge 8.7jan.nijtmans2019-08-151-10/+10
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | * \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Merge 8.7jan.nijtmans2019-08-141-1/+1
| | | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | * | | | | | | | | | | | | | | | | | Document that the *Backslash parsing functions output maximum 4 bytes, ↵jan.nijtmans2019-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | irrespectable of the TCL_UTF_MAX setting: It could be 4 for the "\Uxxxxxx" construct, but never more. Move <stddef.h> and <locale.h> to tclInt.h, so the can be removed from various other places.
| | | * | | | | | | | | | | | | | | | | | | Eliminate "register" keyword _everywhere_ in Tcl. This keyword is deprecated ↵jan.nijtmans2019-07-171-10/+10
| | | |/ / / / / / / / / / / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in C++ (removed in C++17, even), and essentially does nothing with most modern compilers.
| * | | | | | | | | | | | | | | | | | | | Merge branch tip-548. No longer define addtional stub-entries for functions ↵jan.nijtmans2019-08-121-51/+40
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / / / / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that will be removed (because of deprecation) anyway
| * | | | | | | | | | | | | | | | | | | | Merge tip-548jan.nijtmans2019-08-021-2/+2
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / / / / / / / / / / /
| | * | | | | | | | | | | | | | | | | | | Merge 8.7. Some formatting.jan.nijtmans2019-08-021-2/+2
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | |/ / / / / / / / / / / / / / / / / /
| | * | | | | | | | | | | | | | | | | | | Merge 8.7. Documentation improvements and code cleanup. Approaching finish.jan.nijtmans2019-08-011-9/+9
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | |/ / / / / / / / / / / / / / / / / /
| * | | | | | | | | | | | | | | | | | | | Protect Tcl_AToB() functions against NULL inputjan.nijtmans2019-08-011-0/+12
| | | | | | | | | | | | | | | | | | | | |
| * | | | | | | | | | | | | | | | | | | | Merge tip-548jan.nijtmans2019-08-011-6/+6
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / / / / / / / / / / /
| | * | | | | | | | | | | | | | | | | | | Rename UTF-related functions to "WChar" and "Char16" variants, more ↵jan.nijtmans2019-07-111-15/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intuitive because they represent wchar_t and char16_t (since C++11) types in modern compilers.
| | * | | | | | | | | | | | | | | | | | | Merge 8.7, and a few tweaks: Only provide Tcl_WinUtfToTChar on Tcl 8.x, not ↵jan.nijtmans2019-07-071-0/+2
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | |/ / / / / / / / / / / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on 9.0 any more
| | * | | | | | | | | | | | | | | | | | | Fix UNIX/Mac buildjan.nijtmans2019-07-061-1/+1
| | | | | | | | | | | | | | | | | | | | |
| | * | | | | | | | | | | | | | | | | | | Improvement: always export both 16-bit and 32-bit UTF functionjan.nijtmans2019-07-051-50/+22
| | | | | | | | | | | | | | | | | | | | |
| | * | | | | | | | | | | | | | | | | | | TIP #548: Deprecate Tcl_WinUtfToTChar() and Tcl_WinTCharToUtf() and provide ↵jan.nijtmans2019-06-031-35/+39
| | |/ / / / / / / / / / / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | more flexible replacement functions
| * | | | | | | | | | | | | | | | | | | More simplifications, taking deprecations into accountjan.nijtmans2019-05-221-227/+10
| | | | | | | | | | | | | | | | | | | |
| * | | | | | | | | | | | | | | | | | | More WIP: eliminate all usage of (platform-specific) ↵jan.nijtmans2019-05-221-27/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tcl_WinTCharToUtf()/Tcl_WinUtfToTChar() to its proposed portable replacements: Tcl_Utf16ToUtfDString()/Tcl_UtfToUtf16DString() This allows for Tcl_WinTCharToUtf()/Tcl_WinUtfToTChar() to be declared deprecated.
| * | | | | | | | | | | | | | | | | | | merge 8.7jan.nijtmans2019-05-101-4/+4
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / / / / / / / / / /
| | * | | | | | | | | | | | | | | | | | Replace memcpy() calls with memmove() to avoid undefined behavior whendgp2019-04-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | source and destination overlap.
| * | | | | | | | | | | | | | | | | | | Merge 8.7jan.nijtmans2019-03-281-3/+1
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / / / / / / / / / /
| | * | | | | | | | | | | | | | | | | | Since only bytes 0xF0 - 0xF4 can be the first byte of a valid 4-byte UTF-8 ↵jan.nijtmans2019-03-241-3/+1
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | byte sequence, account for that in Tcl_UtfCharComplete().
| * | \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Merge 8.7jan.nijtmans2019-03-241-1/+1
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / / / / / / / / / / /
| * | | | | | | | | | | | | | | | | | | | Merge 8.7. Also fix invalid reference to TclUtfToWChar, causing build failurejan.nijtmans2019-03-211-7/+3
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / / / / / / / / / / /
| | * | | | | | | | | | | | | | | | | | | Remove incorrect comment. jan.nijtmans2019-03-211-7/+3
| | |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify handling of last bytes in Tcl_UniCharToUtfDString(), since TclUtfToUniChar() already turns out to handle cp1252 fall-back correctly.
| * | \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Merge 8.7jan.nijtmans2019-03-201-8/+8
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | |/ / / / / / / / / / / / / / / / / / / /
| | * | | | | | | | | | | | | | | | | | | | Fix Tcl_UtfToUniCharDString() function, handling invalid byte at the end of ↵jan.nijtmans2019-03-201-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the string: Not quite correct for bytes between 0x80-0x9F, according to TIP
| | * | | | | | | | | | | | | | | | | | | | Comment Comment Tcl_UniCharToUtf() better, what happens handling surrogates. ↵jan.nijtmans2019-03-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add type cast in tclUtf.c, making actual check clearer