summaryrefslogtreecommitdiffstats
path: root/generic/tclUtf.c
Commit message (Collapse)AuthorAgeFilesLines
* Don't ever allow UTF-8 sequences of more than 4 characters to be generated ↵jan.nijtmans2016-08-301-44/+24
|\ | | | | | | | | or parsed, even when TCL_UTF_MAX>4: According to current Unicode standard, a byte string of >4 characters can never form a single UTF-8 character. And a few minor micro-optimizations related to UTF-8 handling.
| * Don't ever allow UTF-8 sequences of more than 4 characters to be generated ↵jan.nijtmans2016-08-301-44/+24
| | | | | | | | | | or parsed, even when TCL_UTF_MAX>4: According to current Unicode standard, a byte string of >4 characters can never form a single UTF-8 character. And a few minor micro-optimizations related to UTF-8 handling.
* | Rename UtfCount() to TclUtfCount() and use it in more places. Suggested by ↵jan.nijtmans2016-04-051-14/+8
|/ | | | pspjuth here: [e99a79a32650e7e5]
* Various Unicode handling enhancements, when building with TCL_UTF_MAX > 3, ↵jan.nijtmans2015-09-011-32/+93
| | | | inspired by androwish. No effect if TCL_UTF_MAX=3 (which is the default)
* Make sure that "string is space \u202f" will continue to return "1", even if ↵jan.nijtmans2013-07-291-1/+1
|\ | | | | | | | | in future Unicode this character (NARROW_NO_BREAK_SPACE) will cease to be a space. See: [http://www.unicode.org/review/pri249/]. Don't hardcode "tclWinError.o" for Cygwin
| * Make sure that "string is space \u202f" will continue to return "1", even if ↵jan.nijtmans2013-07-291-1/+1
| | | | | | | | in future Unicode this character (NARROW_NO_BREAK_SPACE) will cease to be a space. See: [http://www.unicode.org/review/pri249/]
* | Unbreak MSVC6 debug build (thanks Andreas Kupries!)jan.nijtmans2013-07-081-1/+1
|\ \ | |/
| * Unbreak MSVC6 debug build (thanks Andreas Kupries!)jan.nijtmans2013-07-081-1/+1
| |
* | Use more portable TclIsSpaceProc() in stead of isspace().jan.nijtmans2013-06-171-1/+1
|\ \ | |/
| * Use more portable TclIsSpaceProc() in stead of isspace(). jan.nijtmans2013-06-171-1/+3
| | | | | | Make sure that "string is space \u180e" continues to return 1 for whatever unicode version.
* | [3613609]: Replace strcasecmp() with UTF-8-aware version.dkf2013-05-221-0/+40
|\ \ | |/
| * Fixed the weird edge case.bug_3613609dkf2013-05-221-12/+25
| |
| * Slight improvement: if cs = "\xC0\x80" and ct = "\x00", loop would continue ↵jan.nijtmans2013-05-211-4/+4
| | | | | | | | after NUL-byte, this should not happen.
| * Proposed solution for 3613609: lsort -nocase does not sort non-ASCII correctlyjan.nijtmans2013-05-211-0/+27
| |
* | For Unicode 6.3, mongolian vowel separator (U+180e) is nominated to change ↵jan.nijtmans2013-02-251-2/+3
| | | | | | | | | | character class from Space to Control character. Make sure that "string is space" will continue to return 1 for this character. See TIP #413.
* | merge trunk jan.nijtmans2012-10-091-2/+1
|\ \ | | | | | | <p>Dont include U+0082 and U+0083 in the Tcl space set
* | | tip 318 updatejan.nijtmans2012-09-231-0/+4
|/ /
* | [Frq 3473670]: Various Unicode-relatedjan.nijtmans2012-01-221-10/+7
|\ \ | |/
| * [Frq 3473670]: Various Unicode-related speedups/robustnessjan.nijtmans2012-01-221-10/+7
| |\
| | * rfe-3473670: Various Unicode-related speedups/robustnessjan.nijtmans2012-01-141-10/+7
| | |
* | | [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2012-01-091-35/+23
|\ \ \ | |/ /
| * | [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2012-01-091-35/+23
| |\ \ | | |/
| | * [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2012-01-091-69/+56
| | |
* | | [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2011-12-241-1/+1
|\ \ \ | |/ /
| * | [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2011-12-241-1/+1
| |\ \ | | |/
| | * [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2011-12-231-1/+1
| | |
* | | More isspace() callers.dgp2011-04-281-1/+1
|\ \ \ | |/ /
| * | More isspace() callers.dgp2011-04-281-1/+1
| | |
* | | Now that we're no longer using SCM based on RCS, the RCS Keyword linesdgp2011-03-021-2/+0
|\ \ \ | |/ / | | | cause more harm than good. Purged them (except in zlib files).
| * | Now that we're no longer using SCM based on RCS, the RCS Keyword lines causedgp2011-03-021-2/+0
| |\ \ | | |/ | | | more harm than good. Purged them.
| | * Now that we're no longer using SCM based on RCS, the RCS Keyword lines causedgp2011-03-011-2/+0
| | | | | | | | | more harm than good. Purged them.
| | * * generic/tclUtf.c (Tcl_UniCharToUtf): Corrected handling of negativedgp2005-09-071-36/+38
| | | | | | | | | | | | | | | | | | * tests/utf.test (utf-1.5): Tcl_UniChar input value. Incorrect handling was producing byte sequences outside of Tcl's legal internal encoding. [Bug 1283976].
| | * Made Tcl_NumUtfChars do the right thing with \u0000 when guessing the lengthdkf2003-10-081-5/+2
| | | | | | | | | | | | because of a negative 'length' parameter. [Bug 769812]
| | * * generic/TclUtf.c (Tcl_UniCharNcasecmp): Corrected failure to dgp2003-03-061-4/+7
| | | | | | | | | | | | | | | * tests/utf.test (utf-25.*): properly compare Unicode strings of different case in a case insensitive manner. [Bug 699042]
* | | * generic/tclExecute.c: fix potential uninitialized variable use anddas2009-09-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * generic/tclFCmd.c: null dereference flagged by clang static * generic/tclProc.c: analyzer. * generic/tclTimer.c: * generic/tclUtf.c: * generic/tclExecute.c: silence false positives from clang static * generic/tclIO.c: analyzer about potential null dereference. * generic/tclScan.c: * generic/tclCompExpr.c:
* | | * generic/tclStringObj.c: Changed type of the 'allocated' fielddgp2009-02-111-1/+2
| | | | | | | | | | | | | | | of the String struct from size_t to int since only int values are ever stored in it.
* | | Get rid of pre-C89-isms (esp. CONST vs const).dkf2008-04-271-40/+40
|/ /
* | Convert to using ANSI decls/definitions and using the (ANSI) assumption that ↵dkf2005-10-311-163/+167
| | | | | | | | | | | | NULL can be cast to any pointer type transparently.
* | * generic/tclUtf.c (Tcl_UniCharToUtf): Corrected handling of negativedgp2005-09-071-36/+38
| | | | | | | | | | | | * tests/utf.test (utf-1.5): Tcl_UniChar input value. Incorrect handling was producing byte sequences outside of Tcl's legal internal encoding. [Bug 1283976].
* | Systematizing the formattingdkf2005-07-211-198/+217
| |
* | Merged kennykb-numerics-branch back to the head; TIPs 132 and 232Kevin B Kenny2005-05-101-1/+1
| |
* | * doc/DString.3: Eliminated use of identifier "string" in Tcl'sdgp2005-05-031-161/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | * doc/Environment.3: public C API to avoid conflict/confusion with * doc/Eval.3: the std::string of C++. * doc/ExprLong.3, doc/ExprLongObj.3, doc/GetInt.3, doc/GetOpnFl.3: * doc/ParseCmd.3, doc/RegExp.3, doc/SetResult.3, doc/StrMatch.3: * doc/Utf.3, generic/tcl.decls, generic/tclBasic.c, generic/tclEnv.c: * generic/tclGet.c, generic/tclParse.c, generic/tclParseExpr.c: * generic/tclRegexp.c, generic/tclResult.c, generic/tclUtf.c: * generic/tclUtil.c, unix/tclUnixChan.c: * generic/tclDecls.h: `make genstubs`
* | Made Tcl_NumUtfChars do the right thing with \u0000 when guessing the lengthdkf2003-10-081-5/+2
| | | | | | | | because of a negative 'length' parameter. [Bug 769812]
* | * generic/TclUtf.c (Tcl_UniCharNcasecmp): Corrected failure todgp2003-03-061-4/+7
|/ | | | | * tests/utf.test (utf-25.*): properly compare Unicode strings of different case in a case insensitive manner. [Bug 699042]
* * generic/tclExecute.c (TclExecuteByteCode INST_STR_MATCH):hobbs2003-02-181-2/+193
| | | | | | | | | | | | * generic/tclCmdMZ.c (Tcl_StringObjCmd STR_MATCH): * generic/tclUtf.c (TclUniCharMatch): * generic/tclInt.decls: add private TclUniCharMatch function that * generic/tclIntDecls.h: does string match on counted unicode * generic/tclStubInit.c: strings. Tcl_UniCharCaseMatch has the * tests/string.test: failing that it can't handle strings or * tests/stringComp.test: patterns with embedded NULLs. Added tests that actually try strings/pats with NULLs. TclUniCharMatch should be TIPed and made public in the next minor version rev.
* * generic/tclUtf.c: make use of TclUtfToUniChar macro throughouthobbs2002-11-121-22/+31
| | | | | the functions, and add extra optimization to Tcl_NumUtfChars for one-byte/char case.
* * doc/CmdCmplt.3: Applied Patch 585105 to fully CONST-ifydgp2002-08-051-121/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * doc/Concat.3: all remaining public interfaces of Tcl. * doc/CrtCommand.3: Notably, the parser no longer writes on * doc/CrtSlave.3: the string it is parsing, so it is no * doc/CrtTrace.3: longer necessary for Tcl_Eval() to be * doc/Eval.3: given a writable string. Also, the * doc/ExprLong.3: refactoring of the Tcl_*Var* routines * doc/LinkVar.3: by Miguel Sofer is included, so that the * doc/ParseCmd.3: "part1" argument for them no longer needs * doc/SetVar.3: to be writable either. * doc/TraceVar.3: * doc/UpVar.3: Compatibility support has been enhanced so * generic/tcl.decls that a #define of USE_NON_CONST will remove * generic/tcl.h all possible source incompatibilities with * generic/tclBasic.c the 8.3 version of the header file(s). * generic/tclCmdMZ.c The new #define of USE_COMPAT_CONST now does * generic/tclCompCmds.c what USE_NON_CONST used to do -- disable * generic/tclCompExpr.c only those new CONST's that introduce * generic/tclCompile.c irreconcilable incompatibilities. * generic/tclCompile.h * generic/tclDecls.h Several bugs are also fixed by this patch. * generic/tclEnv.c [Bugs 584051,580433] [Patches 585105,582429] * generic/tclEvent.c * generic/tclInt.decls * generic/tclInt.h * generic/tclIntDecls.h * generic/tclInterp.c * generic/tclLink.c * generic/tclObj.c * generic/tclParse.c * generic/tclParseExpr.c * generic/tclProc.c * generic/tclTest.c * generic/tclUtf.c * generic/tclUtil.c * generic/tclVar.c * mac/tclMacTest.c * tests/expr-old.test * tests/parseExpr.test * unix/tclUnixTest.c * unix/tclXtTest.c * win/tclWinTest.c
* Global symbols are now all either prefixed with 'tcl' (or 'Tcl' or ...) or ↵dkf2002-07-191-3/+3
| | | | have file-scope.
* * unix/configure: regen'edhobbs2002-05-301-4/+4
| | | | | | | | | * unix/configure.in: replaced bigendian check with autoconf standard AC_C_BIG_ENDIAN, which defined WORDS_BIGENDIAN on bigendian systems. * generic/tclUtf.c (Tcl_UniCharNcmp): * generic/tclInt.h (TclUniCharNcmp): use WORDS_BIGENDIAN instead of TCL_OPTIMIZE_UNICODE_COMPARE to enable memcmp alternative.
* Made Tcl_UniCharNcmp faster on big-endian machines; the system memcmp()isdkf2002-05-291-4/+11
| | | | | probably optimized far in excess of anything we could do! Little-endian just use the old code...