summaryrefslogtreecommitdiffstats
path: root/generic/tclUtf.c
Commit message (Collapse)AuthorAgeFilesLines
* Backport various minor issues from 8.6: jan.nijtmans2018-10-271-4/+14
| | | | | | - gcc compiler warning in tclDate.c - protect Tcl_UtfToUniCharDString() from ever reading more than "length" bytes from its input, not even in the case of invalid UTF-8. - update to latest tzdata - fix 2 failing test-cases on MacOSX
* Add emoji 11.0 to the set. Only active when compiled with TCL_UTF_MAX>3. ↵jan.nijtmans2018-05-111-15/+83
| | | | Also prepare tooling for Unicode 11.0 (while being on it)
* Fix [2738427]: Tcl_NumUtfChars(...) no overflow check.jan.nijtmans2017-06-081-13/+14
|
* [67aa9a2070] Tcl_UtfToUniChar returns single byte for invalid UTF-8 input as ↵jan.nijtmans2017-06-061-75/+52
| | | | documented.
* Make sure that "string is space \u202f" will continue to return "1", even if ↵jan.nijtmans2013-07-291-1/+1
| | | | in future Unicode this character (NARROW_NO_BREAK_SPACE) will cease to be a space. See: [http://www.unicode.org/review/pri249/]
* Unbreak MSVC6 debug build (thanks Andreas Kupries!)jan.nijtmans2013-07-081-1/+1
|
* Use more portable TclIsSpaceProc() in stead of isspace(). jan.nijtmans2013-06-171-1/+3
| | | Make sure that "string is space \u180e" continues to return 1 for whatever unicode version.
* Fixed the weird edge case.dkf2013-05-221-12/+25
|
* Slight improvement: if cs = "\xC0\x80" and ct = "\x00", loop would continue ↵jan.nijtmans2013-05-211-4/+4
| | | | after NUL-byte, this should not happen.
* Proposed solution for 3613609: lsort -nocase does not sort non-ASCII correctlyjan.nijtmans2013-05-211-0/+27
|
* [Frq 3473670]: Various Unicode-related speedups/robustnessjan.nijtmans2012-01-221-10/+7
|\
| * rfe-3473670: Various Unicode-related speedups/robustnessjan.nijtmans2012-01-141-10/+7
| |
* | [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2012-01-091-35/+23
|\ \ | |/
| * [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2012-01-091-69/+56
| |
* | [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2011-12-241-1/+1
|\ \ | |/
| * [Bug 3464428] string is graph \u0120 is wrongjan.nijtmans2011-12-231-1/+1
| |
* | More isspace() callers.dgp2011-04-281-1/+1
| |
* | Now that we're no longer using SCM based on RCS, the RCS Keyword lines causedgp2011-03-021-2/+0
|\ \ | |/ | | more harm than good. Purged them.
| * Now that we're no longer using SCM based on RCS, the RCS Keyword lines causedgp2011-03-011-2/+0
| | | | | | more harm than good. Purged them.
| * * generic/tclUtf.c (Tcl_UniCharToUtf): Corrected handling of negativedgp2005-09-071-36/+38
| | | | | | | | | | | | * tests/utf.test (utf-1.5): Tcl_UniChar input value. Incorrect handling was producing byte sequences outside of Tcl's legal internal encoding. [Bug 1283976].
| * Made Tcl_NumUtfChars do the right thing with \u0000 when guessing the lengthdkf2003-10-081-5/+2
| | | | | | | | because of a negative 'length' parameter. [Bug 769812]
| * * generic/TclUtf.c (Tcl_UniCharNcasecmp): Corrected failure to dgp2003-03-061-4/+7
| | | | | | | | | | * tests/utf.test (utf-25.*): properly compare Unicode strings of different case in a case insensitive manner. [Bug 699042]
* | Convert to using ANSI decls/definitions and using the (ANSI) assumption that ↵dkf2005-10-311-163/+167
| | | | | | | | | | | | NULL can be cast to any pointer type transparently.
* | * generic/tclUtf.c (Tcl_UniCharToUtf): Corrected handling of negativedgp2005-09-071-36/+38
| | | | | | | | | | | | * tests/utf.test (utf-1.5): Tcl_UniChar input value. Incorrect handling was producing byte sequences outside of Tcl's legal internal encoding. [Bug 1283976].
* | Systematizing the formattingdkf2005-07-211-198/+217
| |
* | Merged kennykb-numerics-branch back to the head; TIPs 132 and 232Kevin B Kenny2005-05-101-1/+1
| |
* | * doc/DString.3: Eliminated use of identifier "string" in Tcl'sdgp2005-05-031-161/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | * doc/Environment.3: public C API to avoid conflict/confusion with * doc/Eval.3: the std::string of C++. * doc/ExprLong.3, doc/ExprLongObj.3, doc/GetInt.3, doc/GetOpnFl.3: * doc/ParseCmd.3, doc/RegExp.3, doc/SetResult.3, doc/StrMatch.3: * doc/Utf.3, generic/tcl.decls, generic/tclBasic.c, generic/tclEnv.c: * generic/tclGet.c, generic/tclParse.c, generic/tclParseExpr.c: * generic/tclRegexp.c, generic/tclResult.c, generic/tclUtf.c: * generic/tclUtil.c, unix/tclUnixChan.c: * generic/tclDecls.h: `make genstubs`
* | Made Tcl_NumUtfChars do the right thing with \u0000 when guessing the lengthdkf2003-10-081-5/+2
| | | | | | | | because of a negative 'length' parameter. [Bug 769812]
* | * generic/TclUtf.c (Tcl_UniCharNcasecmp): Corrected failure todgp2003-03-061-4/+7
|/ | | | | * tests/utf.test (utf-25.*): properly compare Unicode strings of different case in a case insensitive manner. [Bug 699042]
* * generic/tclExecute.c (TclExecuteByteCode INST_STR_MATCH):hobbs2003-02-181-2/+193
| | | | | | | | | | | | * generic/tclCmdMZ.c (Tcl_StringObjCmd STR_MATCH): * generic/tclUtf.c (TclUniCharMatch): * generic/tclInt.decls: add private TclUniCharMatch function that * generic/tclIntDecls.h: does string match on counted unicode * generic/tclStubInit.c: strings. Tcl_UniCharCaseMatch has the * tests/string.test: failing that it can't handle strings or * tests/stringComp.test: patterns with embedded NULLs. Added tests that actually try strings/pats with NULLs. TclUniCharMatch should be TIPed and made public in the next minor version rev.
* * generic/tclUtf.c: make use of TclUtfToUniChar macro throughouthobbs2002-11-121-22/+31
| | | | | the functions, and add extra optimization to Tcl_NumUtfChars for one-byte/char case.
* * doc/CmdCmplt.3: Applied Patch 585105 to fully CONST-ifydgp2002-08-051-121/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * doc/Concat.3: all remaining public interfaces of Tcl. * doc/CrtCommand.3: Notably, the parser no longer writes on * doc/CrtSlave.3: the string it is parsing, so it is no * doc/CrtTrace.3: longer necessary for Tcl_Eval() to be * doc/Eval.3: given a writable string. Also, the * doc/ExprLong.3: refactoring of the Tcl_*Var* routines * doc/LinkVar.3: by Miguel Sofer is included, so that the * doc/ParseCmd.3: "part1" argument for them no longer needs * doc/SetVar.3: to be writable either. * doc/TraceVar.3: * doc/UpVar.3: Compatibility support has been enhanced so * generic/tcl.decls that a #define of USE_NON_CONST will remove * generic/tcl.h all possible source incompatibilities with * generic/tclBasic.c the 8.3 version of the header file(s). * generic/tclCmdMZ.c The new #define of USE_COMPAT_CONST now does * generic/tclCompCmds.c what USE_NON_CONST used to do -- disable * generic/tclCompExpr.c only those new CONST's that introduce * generic/tclCompile.c irreconcilable incompatibilities. * generic/tclCompile.h * generic/tclDecls.h Several bugs are also fixed by this patch. * generic/tclEnv.c [Bugs 584051,580433] [Patches 585105,582429] * generic/tclEvent.c * generic/tclInt.decls * generic/tclInt.h * generic/tclIntDecls.h * generic/tclInterp.c * generic/tclLink.c * generic/tclObj.c * generic/tclParse.c * generic/tclParseExpr.c * generic/tclProc.c * generic/tclTest.c * generic/tclUtf.c * generic/tclUtil.c * generic/tclVar.c * mac/tclMacTest.c * tests/expr-old.test * tests/parseExpr.test * unix/tclUnixTest.c * unix/tclXtTest.c * win/tclWinTest.c
* Global symbols are now all either prefixed with 'tcl' (or 'Tcl' or ...) or ↵dkf2002-07-191-3/+3
| | | | have file-scope.
* * unix/configure: regen'edhobbs2002-05-301-4/+4
| | | | | | | | | * unix/configure.in: replaced bigendian check with autoconf standard AC_C_BIG_ENDIAN, which defined WORDS_BIGENDIAN on bigendian systems. * generic/tclUtf.c (Tcl_UniCharNcmp): * generic/tclInt.h (TclUniCharNcmp): use WORDS_BIGENDIAN instead of TCL_OPTIMIZE_UNICODE_COMPARE to enable memcmp alternative.
* Made Tcl_UniCharNcmp faster on big-endian machines; the system memcmp()isdkf2002-05-291-4/+11
| | | | | probably optimized far in excess of anything we could do! Little-endian just use the old code...
* * generic/tclInt.decls:hobbs2002-05-291-12/+55
| | | | | | | | | | | | | | | | | | | | | | | * generic/tclIntDecls.h: * generic/tclStubInit.c: * generic/tclUtf.c: added TclpUtfNcmp2 private command that mirrors Tcl_UtfNcmp, but takes n in bytes, not utf-8 chars. This provides a faster alternative for comparing utf strings internally. (Tcl_UniCharNcmp, Tcl_UniCharNcasecmp): removed the explicit end of string check as it wasn't correct for the function (by doc and logic). * generic/tclCmdMZ.c (Tcl_StringObjCmd): reworked the string equal comparison code to use TclpUtfNcmp2 as well as short-circuit for equal objects or unequal length strings in the equal case. Removed the use of goto and streamlined the other parts. * generic/tclExecute.c (TclExecuteByteCode): added check for object equality in the comparison instructions. Added short-circuit for != length strings in INST_EQ, INST_NEQ and INST_STR_CMP. Reworked INST_STR_CMP to use TclpUtfNcmp2 where appropriate, and only use Tcl_UniCharNcmp when at least one of the objects is a Unicode obj with no utf bytes.
* * Partial TIP 27 rollback. Following routinesdgp2002-02-081-3/+3
| | | | | | | | | | | | | restored to return (char *): Tcl_DStringAppend, Tcl_DStringAppendElement, Tcl_JoinPath, Tcl_TranslateFileName, Tcl_ExternalToUtfDString, Tcl_UtfToExternalDString, Tcl_UniCharToUtfDString, Tcl_GetCwd, Tcl_WinTCharToUtf. Also restored Tcl_WinUtfToTChar to return (TCHAR *) and Tcl_UtfToUniCharDString to return (Tcl_UniChar *). Modified some callers. This change recognizes that Tcl_DStrings are de-facto white-box objects. * generic/tclCmdMZ.c: corrected use of C++-style comment.
* * Sought out and eliminated instances of CONST-casting that are nodgp2002-01-261-2/+2
| | | | longer needed after the TIP 27 effort.
* * Updated APIs in generic/tclUtf.c and generic/tclRegexp.c accordingdgp2002-01-171-14/+14
| | | | to the guidelines of TIP 27. Updated callers.
* Fixed fault with case-insensitive string matching (Bug#233257) and rewrotedkf2002-01-021-1/+4
| | | | some tests to test what they claimed to be testing.
* Undo of mistaken commit. Sorry!dgp2001-10-161-12/+12
|
* * Added test to demonstrate memory corruption problems. [Bug 219393].dgp2001-10-161-12/+12
|
* * generic/tclUtf.c (Tcl_UtfPrev): corrected to return the properhobbs2001-09-131-5/+3
| | | | | location when the middle of a UTF-8 byte was passed in. [Bug #450504]
* * tests/subst.test:hobbs2001-06-281-6/+17
| | | | | * generic/tclUtf.c (Tcl_UtfBackslash): Corrected backslash handling of multibyte utf-8 chars. [Bug #217987]
* Fixed problem with [string compare \x00 \x01] and hopefully sped thedkf2001-04-061-6/+4
| | | | | command up in a few cases too (notably byte arrays and UNICODE objects.) [Bug #219201]
* Comment typo correction.ericm2000-06-051-2/+2
|
* removed unreferenced varhobbs2000-05-081-2/+1
|
* * doc/Utf.3:hobbs2000-05-081-2/+215
| | | | | | | | | | * generic/tclStubInit.c: * generic/tcl.decls: * generic/tclDecls.h: * generic/tclUtf.c: Added new functions Tcl_UniCharNcasecmp and Tcl_UniCharCaseMatch (unicode parallel to Tcl_StringCaseMatch) * generic/tclUtil.c: rewrote Tcl_StringCaseMatch algorithm for optimization and made Tcl_StringMatch just call Tcl_StringCaseMatch
* * generic/tclUtf.c: changed Tcl_UtfBackslash to not allowhobbs2000-01-111-4/+7
| | | | | | | non-octal digits (8,9) in \ooo substs. [Bug: 3975] * generic/tcl.h: noted need to change win/tcl.m4 and tools/tclSplash.bmp for minor version changes
* * doc/Utf.3:redman1999-07-221-4/+4
| | | | | | | | | | | | | * generic/tcl.decls: * generic/tclInt.decls: * generic/tclDecls.h: * generic/tclIntDecls.h: * generic/tclUtf.c: * compat/strftime.c: * unix/tclUnixTime.c: Changed function declarations in non-platform-specific APIs to use "unsigned long" instead of "size_t", which may not be defined on certain compilers (rather than include sys/types.h, which may not exist).