summaryrefslogtreecommitdiffstats
path: root/generic/tclUtf.c
Commit message (Collapse)AuthorAgeFilesLines
* merge core-8-6-branchjan.nijtmans2017-08-181-25/+53
|\
| * merge core-8-6-branchjan.nijtmans2017-07-031-1/+1
| |\
| | * 'inline static' -> 'static inline' and 'INLINE' -> 'inline', for consistancy.jan.nijtmans2017-07-031-2/+2
| | |
| * | merge core-8-6-branchjan.nijtmans2017-06-131-22/+21
| |\ \ | | |/
| * | Better UTF-8 surrogate handling, only functional when TCL_UTF_MAX>3jan.nijtmans2017-06-081-19/+49
| | |
* | | merge core-8-6-branchjan.nijtmans2017-06-081-13/+14
|\ \ \ | | |/ | |/|
| * | Fix [2738427]: Tcl_NumUtfChars(...) no overflow check.jan.nijtmans2017-06-081-13/+14
| |\ \ | | |/ | |/|
| | * Fix [2738427]: Tcl_NumUtfChars(...) no overflow check.jan.nijtmans2017-06-081-13/+14
| | |
* | | merge core-8-6-branchjan.nijtmans2017-06-061-18/+18
|\ \ \ | |/ /
| * | Follow-up to [67aa9a2070]: Use uppercase consistantly, slight optimization ↵jan.nijtmans2017-06-061-18/+18
| |\ \ | | |/ | | | | | | in character tests, comment fixes. No change in functionality.
| | * [67aa9a2070] Tcl_UtfToUniChar returns single byte for invalid UTF-8 input as ↵jan.nijtmans2017-06-061-75/+52
| | | | | | | | | | | | documented.
* | | [67aa9a2070] Tcl_UtfToUniChar returns single byte for invalid UTF-8 input as ↵dgp2017-06-051-3/+9
|\ \ \ | |/ / | | | | | | documented.
| * | Fix [67aa9a207037ae67f9014b544c3db34fa732f2dc|67aa9a2070]: Security: Invalid ↵jan.nijtmans2017-06-021-3/+9
| | | | | | | | | | | | UTF-8 can inject unexpected characters
* | | Merge core-8-6-branch. This removes the work currently being done in ↵jan.nijtmans2017-06-021-9/+3
|\ \ \ | |/ / | | | | | | | | | "sebres-8-6-clock-speedup-cr1" branch, but that will be merged again as soon as the work is done. All other changes in "trunk" since then (e.g. the INST_STR_CONCAT1 performance improvement, and the removal of SunOS-4) are retained.
* | | merge core-8-6-branchjan.nijtmans2017-05-311-3/+9
|\ \ \
| * | | Fix [67aa9a207037ae67f9014b544c3db34fa732f2dc|67aa9a2070]: Security: Invalid ↵jan.nijtmans2017-05-311-3/+9
| |/ / | | | | | | | | | UTF-8 can inject unexpected characters
* | | Don't ever allow UTF-8 sequences of more than 4 characters to be generated ↵jan.nijtmans2016-08-301-44/+24
|\ \ \ | |/ / | | | | | | | | | or parsed, even when TCL_UTF_MAX>4: According to current Unicode standard, a byte string of >4 characters can never form a single UTF-8 character. And a few minor micro-optimizations related to UTF-8 handling.
| * | Don't ever allow UTF-8 sequences of more than 4 characters to be generated ↵jan.nijtmans2016-08-301-44/+24
| | | | | | | | | | | | | | | or parsed, even when TCL_UTF_MAX>4: According to current Unicode standard, a byte string of >4 characters can never form a single UTF-8 character. And a few minor micro-optimizations related to UTF-8 handling.
* | | Rename UtfCount() to TclUtfCount() and use it in more places. Suggested by ↵jan.nijtmans2016-04-051-14/+8
|/ / | | | | | | pspjuth here: [e99a79a32650e7e5]
* | Various Unicode handling enhancements, when building with TCL_UTF_MAX > 3, ↵jan.nijtmans2015-09-011-32/+93
| | | | | | | | inspired by androwish. No effect if TCL_UTF_MAX=3 (which is the default)
* | Make sure that "string is space \u202f" will continue to return "1", even if ↵jan.nijtmans2013-07-291-1/+1
|\ \ | |/ | | | | | | in future Unicode this character (NARROW_NO_BREAK_SPACE) will cease to be a space. See: [http://www.unicode.org/review/pri249/]. Don't hardcode "tclWinError.o" for Cygwin
| * Make sure that "string is space \u202f" will continue to return "1", even if ↵jan.nijtmans2013-07-291-1/+1
| | | | | | | | in future Unicode this character (NARROW_NO_BREAK_SPACE) will cease to be a space. See: [http://www.unicode.org/review/pri249/]
* | Unbreak MSVC6 debug build (thanks Andreas Kupries!)jan.nijtmans2013-07-081-1/+1
|\ \ | |/
| * Unbreak MSVC6 debug build (thanks Andreas Kupries!)jan.nijtmans2013-07-081-1/+1
| |
* | Use more portable TclIsSpaceProc() in stead of isspace().jan.nijtmans2013-06-171-1/+1
|\ \ | |/
| * Use more portable TclIsSpaceProc() in stead of isspace(). jan.nijtmans2013-06-171-1/+3
| | | | | | Make sure that "string is space \u180e" continues to return 1 for whatever unicode version.
* | [3613609]: Replace strcasecmp() with UTF-8-aware version.dkf2013-05-221-0/+40
|\ \ | |/
| * Fixed the weird edge case.bug_3613609dkf2013-05-221-12/+25
| |
| * Slight improvement: if cs = "\xC0\x80" and ct = "\x00", loop would continue ↵jan.nijtmans2013-05-211-4/+4
| | | | | | | | after NUL-byte, this should not happen.
| * Proposed solution for 3613609: lsort -nocase does not sort non-ASCII correctlyjan.nijtmans2013-05-211-0/+27
| |
* | For Unicode 6.3, mongolian vowel separator (U+180e) is nominated to change ↵jan.nijtmans2013-02-251-2/+3
| | | | | | | | | | character class from Space to Control character. Make sure that "string is space" will continue to return 1 for this character. See TIP #413.
* | merge trunk jan.nijtmans2012-10-091-2/+1
|\ \ | | | | | | <p>Dont include U+0082 and U+0083 in the Tcl space set
* | | tip 318 updatejan.nijtmans2012-09-231-0/+4
|/ /
* | [Frq 3473670]: Various Unicode-relatedjan.nijtmans2012-01-221-10/+7
|\ \ | |/
| * [Frq 3473670]: Various Unicode-related speedups/robustnessjan.nijtmans2012-01-221-10/+7
| |\
| | * rfe-3473670: Various Unicode-related speedups/robustnessjan.nijtmans2012-01-141-10/+7
| | |
* | | [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2012-01-091-35/+23
|\ \ \ | |/ /
| * | [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2012-01-091-35/+23
| |\ \ | | |/
| | * [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2012-01-091-69/+56
| | |
* | | [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2011-12-241-1/+1
|\ \ \ | |/ /
| * | [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2011-12-241-1/+1
| |\ \ | | |/
| | * [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2011-12-231-1/+1
| | |
* | | More isspace() callers.dgp2011-04-281-1/+1
|\ \ \ | |/ /
| * | More isspace() callers.dgp2011-04-281-1/+1
| | |
* | | Now that we're no longer using SCM based on RCS, the RCS Keyword linesdgp2011-03-021-2/+0
|\ \ \ | |/ / | | | cause more harm than good. Purged them (except in zlib files).
| * | Now that we're no longer using SCM based on RCS, the RCS Keyword lines causedgp2011-03-021-2/+0
| |\ \ | | |/ | | | more harm than good. Purged them.
| | * Now that we're no longer using SCM based on RCS, the RCS Keyword lines causedgp2011-03-011-2/+0
| | | | | | | | | more harm than good. Purged them.
| | * * generic/tclUtf.c (Tcl_UniCharToUtf): Corrected handling of negativedgp2005-09-071-36/+38
| | | | | | | | | | | | | | | | | | * tests/utf.test (utf-1.5): Tcl_UniChar input value. Incorrect handling was producing byte sequences outside of Tcl's legal internal encoding. [Bug 1283976].
| | * Made Tcl_NumUtfChars do the right thing with \u0000 when guessing the lengthdkf2003-10-081-5/+2
| | | | | | | | | | | | because of a negative 'length' parameter. [Bug 769812]
| | * * generic/TclUtf.c (Tcl_UniCharNcasecmp): Corrected failure to dgp2003-03-061-4/+7
| | | | | | | | | | | | | | | * tests/utf.test (utf-25.*): properly compare Unicode strings of different case in a case insensitive manner. [Bug 699042]