Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Bugfix in Tcl_UtfPrev/Tcl_UtfNext: When handling 4-byte UTF-8 byte ↵ | jan.nijtmans | 2019-09-16 | 1 | -7/+7 |
| | | | | | sequences, those should be able to move back/forward 4 bytes if TCL_UTF_MAX <= 4. Update comment accordingly. Bugfix in Tcl_UtfFindFirst/Tcl_UtfFindLast: Those functions should be able to find both the high surrogate (if asked for) as also the full character (combination of both surrogates) | ||||
* | Attempt to fix ↵ | jan.nijtmans | 2019-08-01 | 1 | -5/+3 |
| | | | | [https://core.tcl-lang.org/tk/tktview?name=a179564826|a179564826]: Tk 8.6: prevent issues when encountering non-BMP Unicode characters | ||||
* | Merge 8.5 | jan.nijtmans | 2019-07-31 | 1 | -6/+6 |
|\ | |||||
| * | (cherry-pick from core-8-branch): Replace memcpy() calls with memmove() to ↵ | jan.nijtmans | 2019-07-31 | 1 | -4/+4 |
| | | | | | | | | avoid undefined behavior when source and destination overlap | ||||
* | | Since only bytes 0xF0 - 0xF4 can be the first byte of a valid 4-byte UTF-8 ↵ | jan.nijtmans | 2019-03-24 | 1 | -3/+3 |
|\ \ | |/ | | | | | byte sequence, account for that in Tcl_UtfCharComplete(). Only effective when TCL_UTF_MAX>3 | ||||
| * | Since only bytes 0xF0 - 0xF4 can be the first byte of a valid 4-byte UTF-8 ↵ | jan.nijtmans | 2019-03-24 | 1 | -3/+3 |
| | | | | | | | | byte sequence, account for that in Tcl_UtfCharComplete(). Only effective when TCL_UTF_MAX>3 | ||||
| * | Backport various minor issues from 8.6: | jan.nijtmans | 2018-10-27 | 1 | -4/+14 |
| | | | | | | | | | | | | - gcc compiler warning in tclDate.c - protect Tcl_UtfToUniCharDString() from ever reading more than "length" bytes from its input, not even in the case of invalid UTF-8. - update to latest tzdata - fix 2 failing test-cases on MacOSX | ||||
* | | Backport [bd94500678e837d7] from 8.7, preventing endless loops in UTF-8 ↵ | jan.nijtmans | 2019-03-02 | 1 | -66/+75 |
| | | | | | | | | conversions when handling surrogates. Only effective when compiling with -DTCL_UTF_MAX=4|6 (default: 3). Meant for benefit of Androwish. | ||||
* | | Tcl_UniCharToUtfDString: Don't allocate too much memory for this function. | jan.nijtmans | 2018-10-03 | 1 | -5/+15 |
| | | | | | | Tcl_UtfToUniCharDString: Don't allocate too much memory for this function. And make sure that we never access more than 'length' bytes from the string, not even when encountering invalid UTF-8. | ||||
* | | Fix [53cad613d8]: TIP 389 implementation makes Tk tests font-4.12 and ↵ | jan.nijtmans | 2018-06-18 | 1 | -0/+7 |
| | | | | | | | | font-4.15 fail. One more situation in which high surrogate causes problem | ||||
* | | Merge 8.5. This adds Emoji 11.0 support, when Tcl is compiled with ↵ | jan.nijtmans | 2018-05-11 | 1 | -3/+10 |
|\ \ | |/ | | | | | TCL_UTF_MAX>3. Useful for Androwish, for example. | ||||
| * | Add emoji 11.0 to the set. Only active when compiled with TCL_UTF_MAX>3. ↵ | jan.nijtmans | 2018-05-11 | 1 | -15/+83 |
| | | | | | | | | Also prepare tooling for Unicode 11.0 (while being on it) | ||||
* | | Bug-fix in Tcl_UtfAtIndex (for TCL_UTF_MAX=4 only). With test-case (in ↵ | jan.nijtmans | 2018-04-23 | 1 | -0/+8 |
| | | | | | | | | "string totitle") demonstrating the bug. | ||||
* | | Slightly improved (more fail-safe) surrogate handling for TCL_UTF_MAX>3. ↵ | jan.nijtmans | 2018-04-19 | 1 | -7/+14 |
| | | | | | | | | Backported from latest TIP 389 implementation. (to be used for androwish) | ||||
* | | Fix handling of surrogates (when TCL_UTF_MAX > 3) in ↵ | jan.nijtmans | 2017-12-28 | 1 | -28/+29 |
| | | | | | | | | Tcl_UtfNcmp()/Tcl_UtfNcasecmp()/TclUtfCasecmp(). Backported from core-8-branch, where this was fixed already. | ||||
* | | Fix Tcl_UtfFindFirst()/Tcl_UtfFindLast(), which were broken by [83c0c569d6]. ↵ | jan.nijtmans | 2017-11-29 | 1 | -7/+7 |
| | | | | | | | | | | Not detected, because those functions aren't used anywhere in Tcl. So, added new test-cases, makeing sure this doesn't happen again. | ||||
* | | Update some functions in tclUtf.c to handle surrogate pairs when TCL_UTF_MAX ↵ | jan.nijtmans | 2017-11-29 | 1 | -16/+58 |
| | | | | | | | | | | == 4. Also update documentation to distinguish better between "Tcl_UniChar" and "Unicode character": Those are not necessary the same when TCL_UTF_MAX == 4. No change when TCL_UTF_MAX == 4 or TCL_UTF_MAX == 6. | ||||
* | | merge core-8-6-branch | jan.nijtmans | 2017-07-03 | 1 | -1/+1 |
|\ \ | |||||
| * | | 'inline static' -> 'static inline' and 'INLINE' -> 'inline', for consistancy. | jan.nijtmans | 2017-07-03 | 1 | -2/+2 |
| | | | |||||
* | | | merge core-8-6-branch | jan.nijtmans | 2017-06-13 | 1 | -22/+21 |
|\ \ \ | |/ / | |||||
| * | | Fix [2738427]: Tcl_NumUtfChars(...) no overflow check. | jan.nijtmans | 2017-06-08 | 1 | -13/+14 |
| |\ \ | | |/ | |||||
| | * | Fix [2738427]: Tcl_NumUtfChars(...) no overflow check. | jan.nijtmans | 2017-06-08 | 1 | -13/+14 |
| | | | |||||
* | | | Better UTF-8 surrogate handling, only functional when TCL_UTF_MAX>3 | jan.nijtmans | 2017-06-08 | 1 | -19/+49 |
|/ / | |||||
* | | Follow-up to [67aa9a2070]: Use uppercase consistantly, slight optimization ↵ | jan.nijtmans | 2017-06-06 | 1 | -18/+18 |
|\ \ | |/ | | | | | in character tests, comment fixes. No change in functionality. | ||||
| * | [67aa9a2070] Tcl_UtfToUniChar returns single byte for invalid UTF-8 input as ↵ | jan.nijtmans | 2017-06-06 | 1 | -75/+52 |
| | | | | | | | | documented. | ||||
* | | Fix [67aa9a207037ae67f9014b544c3db34fa732f2dc|67aa9a2070]: Security: Invalid ↵ | jan.nijtmans | 2017-06-02 | 1 | -3/+9 |
| | | | | | | | | UTF-8 can inject unexpected characters | ||||
* | | Don't ever allow UTF-8 sequences of more than 4 characters to be generated ↵ | jan.nijtmans | 2016-08-30 | 1 | -44/+24 |
| | | | | | | | | | | or parsed, even when TCL_UTF_MAX>4: According to current Unicode standard, a byte string of >4 characters can never form a single UTF-8 character. And a few minor micro-optimizations related to UTF-8 handling. | ||||
* | | Various Unicode handling enhancements, when building with TCL_UTF_MAX > 3, ↵ | jan.nijtmans | 2015-09-01 | 1 | -32/+93 |
| | | | | | | | | inspired by androwish. No effect if TCL_UTF_MAX=3 (which is the default) | ||||
* | | Make sure that "string is space \u202f" will continue to return "1", even if ↵ | jan.nijtmans | 2013-07-29 | 1 | -1/+1 |
|\ \ | |/ | | | | | | | in future Unicode this character (NARROW_NO_BREAK_SPACE) will cease to be a space. See: [http://www.unicode.org/review/pri249/]. Don't hardcode "tclWinError.o" for Cygwin | ||||
| * | Make sure that "string is space \u202f" will continue to return "1", even if ↵ | jan.nijtmans | 2013-07-29 | 1 | -1/+1 |
| | | | | | | | | in future Unicode this character (NARROW_NO_BREAK_SPACE) will cease to be a space. See: [http://www.unicode.org/review/pri249/] | ||||
* | | Unbreak MSVC6 debug build (thanks Andreas Kupries!) | jan.nijtmans | 2013-07-08 | 1 | -1/+1 |
|\ \ | |/ | |||||
| * | Unbreak MSVC6 debug build (thanks Andreas Kupries!) | jan.nijtmans | 2013-07-08 | 1 | -1/+1 |
| | | |||||
* | | Use more portable TclIsSpaceProc() in stead of isspace(). | jan.nijtmans | 2013-06-17 | 1 | -1/+1 |
|\ \ | |/ | |||||
| * | Use more portable TclIsSpaceProc() in stead of isspace(). | jan.nijtmans | 2013-06-17 | 1 | -1/+3 |
| | | | | | | Make sure that "string is space \u180e" continues to return 1 for whatever unicode version. | ||||
* | | [3613609]: Replace strcasecmp() with UTF-8-aware version. | dkf | 2013-05-22 | 1 | -0/+40 |
|\ \ | |/ | |||||
| * | Fixed the weird edge case. | dkf | 2013-05-22 | 1 | -12/+25 |
| | | |||||
| * | Slight improvement: if cs = "\xC0\x80" and ct = "\x00", loop would continue ↵ | jan.nijtmans | 2013-05-21 | 1 | -4/+4 |
| | | | | | | | | after NUL-byte, this should not happen. | ||||
| * | Proposed solution for 3613609: lsort -nocase does not sort non-ASCII correctly | jan.nijtmans | 2013-05-21 | 1 | -0/+27 |
| | | |||||
* | | For Unicode 6.3, mongolian vowel separator (U+180e) is nominated to change ↵ | jan.nijtmans | 2013-02-25 | 1 | -2/+3 |
| | | | | | | | | | | character class from Space to Control character. Make sure that "string is space" will continue to return 1 for this character. See TIP #413. | ||||
* | | merge trunk | jan.nijtmans | 2012-10-09 | 1 | -2/+1 |
|\ \ | | | | | | | <p>Dont include U+0082 and U+0083 in the Tcl space set | ||||
* | | | tip 318 update | jan.nijtmans | 2012-09-23 | 1 | -0/+4 |
|/ / | |||||
* | | [Frq 3473670]: Various Unicode-related | jan.nijtmans | 2012-01-22 | 1 | -10/+7 |
|\ \ | |/ | |||||
| * | [Frq 3473670]: Various Unicode-related speedups/robustness | jan.nijtmans | 2012-01-22 | 1 | -10/+7 |
| |\ | |||||
| | * | rfe-3473670: Various Unicode-related speedups/robustness | jan.nijtmans | 2012-01-14 | 1 | -10/+7 |
| | | | |||||
* | | | [Bug 3464428] string is graph \u0120 is wrong | jan.nijtmans | 2012-01-09 | 1 | -35/+23 |
|\ \ \ | |/ / | |||||
| * | | [Bug 3464428] string is graph \u0120 is wrong | jan.nijtmans | 2012-01-09 | 1 | -35/+23 |
| |\ \ | | |/ | |||||
| | * | [Bug 3464428] string is graph \u0120 is wrong | jan.nijtmans | 2012-01-09 | 1 | -69/+56 |
| | | | |||||
* | | | [Bug 3464428] string is graph \u0120 is wrong | jan.nijtmans | 2011-12-24 | 1 | -1/+1 |
|\ \ \ | |/ / | |||||
| * | | [Bug 3464428] string is graph \u0120 is wrong | jan.nijtmans | 2011-12-24 | 1 | -1/+1 |
| |\ \ | | |/ | |||||
| | * | [Bug 3464428] string is graph \u0120 is wrong | jan.nijtmans | 2011-12-23 | 1 | -1/+1 |
| | | |