summaryrefslogtreecommitdiffstats
path: root/Lib/test/test_unicode.py
Commit message (Collapse)AuthorAgeFilesLines
* Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask".Tim Peters2001-05-131-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment following used to say: /* We use ~hash instead of hash, as degenerate hash functions, such as for ints <sigh>, can have lots of leading zeros. It's not really a performance risk, but better safe than sorry. 12-Dec-00 tim: so ~hash produces lots of leading ones instead -- what's the gain? */ That is, there was never a good reason for doing it. And to the contrary, as explained on Python-Dev last December, it tended to make the *sum* (i + incr) & mask (which is the first table index examined in case of collison) the same "too often" across distinct hashes. Changing to the simpler "i = hash & mask" reduced the number of string-dict collisions (== # number of times we go around the lookup for-loop) from about 6 million to 5 million during a full run of the test suite (these are approximate because the test suite does some random stuff from run to run). The number of collisions in non-string dicts also decreased, but not as dramatically. Note that this may, for a given dict, change the order (wrt previous releases) of entries exposed by .keys(), .values() and .items(). A number of std tests suffered bogus failures as a result. For dicts keyed by small ints, or (less so) by characters, the order is much more likely to be in increasing order of key now; e.g., >>> d = {} >>> for i in range(10): ... d[i] = i ... >>> d {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} >>> Unfortunately. people may latch on to that in small examples and draw a bogus conclusion. test_support.py Moved test_extcall's sortdict() into test_support, made it stronger, and imported sortdict into other std tests that needed it. test_unicode.py Excluced cp875 from the "roundtrip over range(128)" test, because cp875 doesn't have a well-defined inverse for unicode("?", "cp875"). See Python-Dev for excruciating details. Cookie.py Chaged various output functions to sort dicts before building strings from them. test_extcall Fiddled the expected-result file. This remains sensitive to native dict ordering, because, e.g., if there are multiple errors in a keyword-arg dict (and test_extcall sets up many cases like that), the specific error Python complains about first depends on native dict ordering.
* Fix for bug #417030: "print '%*s' fails for unicode string"Marc-André Lemburg2001-05-021-0/+6
|
* Patch by Finn Bock to make test_unicode.py work for Jython.Marc-André Lemburg2001-02-101-5/+10
|
* Fixed .capitalize() method of Unicode objects to work like theMarc-André Lemburg2001-01-291-0/+2
| | | | | | corresponding string method. Added tests for this too. Patch written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.
* Change verify() function to raise TestFailed, not AssertionError.Guido van Rossum2001-01-191-6/+6
| | | | | | (I realize that I didn't really test this, because all the tests succeed, so verify() never raised an AssertionError -- but the test suite still succeeds, so I'm not too worried.)
* Whitespace normalization. Leaving tokenize_tests.py alone for now.Tim Peters2001-01-181-6/+6
|
* This patch removes all uses of "assert" in the regression test suiteMarc-André Lemburg2001-01-171-93/+93
| | | | | | | and replaces them with a new API verify(). As a result the regression suite will also perform its tests in optimization mode. Written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.
* Added checks to prevent PyUnicode_Count() from dumping coreMarc-André Lemburg2001-01-161-0/+7
| | | | | | | | | | | | in case the parameters are out of bounds and fixes error handling for .count(), .startswith() and .endswith() for the case of mixed string/Unicode objects. This patch adds Python style index semantics to PyUnicode_Count() indices (including the special handling of negative indices). The patch is an extended version of patch #103249 submitted by Michael Hudson (mwh) on SF. It also includes new test cases.
* This patch changes the default behaviour of the builtin charmapMarc-André Lemburg2001-01-031-2/+3
| | | | | | | | | | | | | | | | codec to not apply Latin-1 mappings for keys which are not found in the mapping dictionaries, but instead treat them as undefined mappings. The patch was originally written by Martin v. Loewis with some additional (cosmetic) changes and an updated test script by Marc-Andre Lemburg. The standard codecs were recreated from the most current files available at the Unicode.org site using the Tools/scripts/gencodec.py tool. This patch closes the bugs #116285 and #119960.
* Test more split argument combinations:Guido van Rossum2000-12-191-0/+7
| | | | | | 1) multi-char separator 2) multi-char separator that only occurs at last position 3) all of the above with mixed Unicode and 8-bit-string arguments
* Slight improvement to Unicode test suite, inspired by patch #102563:Guido van Rossum2000-11-291-9/+11
| | | | | | | | also test join method of 8-bit strings. Also changed the test() function to (1) compare the types of the expected and actual result, and (2) in verbose mode, print the repr() of the output.
* Make reindent.py happy (convert everything to 4-space indents!).Fred Drake2000-10-231-43/+42
|
* Updated test with a case which checks for the bug reported inMarc-André Lemburg2000-10-071-0/+1
|
* Removing UTF-16 aware Unicode comparison code. This kind of compareMarc-André Lemburg2000-08-081-50/+53
| | | | | | | function (together with other locale aware ones) should into a new collation support module. See python-dev for a discussion of this removal. Note: This patch should also be applied to the 1.6 branch.
* Tests for new surrogate support in the UTF-8 codec. By Bill Tutt.Marc-André Lemburg2000-07-071-0/+72
|
* Tests for new instance support in unicode().Marc-André Lemburg2000-07-071-0/+15
|
* Added tests for the new .isalpha() and .isalnum() methods.Marc-André Lemburg2000-07-051-0/+16
|
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-301-74/+0
| | | | Moved tests of new Unicode Char Name support to a separate test.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-281-0/+75
| | | | | Added tests for the new Unicode character name support in the standard unicode-escape codec.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-141-6/+0
| | | | | Removed a test which can fail when the default locale setting uses a Latin-1 encoding. The test case is not applicable anymore.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-131-3/+12
| | | | | | | | Fixed some tests to not cause the script to fail, but rather output a warning (which then is caught by regrtest.py as wrong output). This is needed to make test_unicode.py run through on JPython. Thanks to Finn Bock.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-081-2/+2
| | | | | Updated to the fix in %c formatting: it now always checks for a one character argument.
* M.-A. Lemburg <mal@lemburg.com>:Fred Drake2000-05-091-0/+6
| | | | | Added another test for string formatting (the one that produced the core dump now fixed in unicodeobject.c).
* Get rid of memory leak caused by assingning sys.exc_info() to a local.Guido van Rossum2000-04-281-2/+2
| | | | Store sys.exc_info()[:2] instead.
* M.-A. Lemburg <mal@lemburg.com>:Fred Drake2000-04-131-0/+8
| | | | Added test for Unicode string concatenation.
* Marc-Andre Lemburg:Guido van Rossum2000-04-111-2/+1
| | | | | Modified .splitlines() tests according to the changes in unicodeobject.c.
* Marc-Andre Lemburg:Guido van Rossum2000-04-101-0/+29
| | | | | | | | * '...%s...' % u"abc" now coerces to Unicode just like string methods. Care is taken not to reevaluate already formatted arguments -- only the first Unicode object appearing in the argument mapping is looked up twice. Added test cases for this to test_unicode.py.
* Marc-Andre's third try at this bulk patch seems to work (except thatGuido van Rossum2000-04-051-8/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | his copy of test_contains.py seems to be broken -- the lines he deleted were already absent). Checkin messages: New Unicode support for int(), float(), complex() and long(). - new APIs PyInt_FromUnicode() and PyLong_FromUnicode() - added support for Unicode to PyFloat_FromString() - new encoding API PyUnicode_EncodeDecimal() which converts Unicode to a decimal char* string (used in the above new APIs) - shortcuts for calls like int(<int object>) and float(<float obj>) - tests for all of the above Unicode compares and contains checks: - comparing Unicode and non-string types now works; TypeErrors are masked, all other errors such as ValueError during Unicode coercion are passed through (note that PyUnicode_Compare does not implement the masking -- PyObject_Compare does this) - contains now works for non-string types too; TypeErrors are masked and 0 returned; all other errors are passed through Better testing support for the standard codecs. Misc minor enhancements, such as an alias dbcs for the mbcs codec. Changes: - PyLong_FromString() now applies the same error checks as does PyInt_FromString(): trailing garbage is reported as error and not longer silently ignored. The only characters which may be trailing the digits are 'L' and 'l' -- these are still silently ignored. - string.ato?() now directly interface to int(), long() and float(). The error strings are now a little different, but the type still remains the same. These functions are now ready to get declared obsolete ;-) - PyNumber_Int() now also does a check for embedded NULL chars in the input string; PyNumber_Long() already did this (and still does) Followed by: Looks like I've gone a step too far there... (and test_contains.py seem to have a bug too). I've changed back to reporting all errors in PyUnicode_Contains() and added a few more test cases to test_contains.py (plus corrected the join() NameError).
* Marc-Andre Lemburg:Guido van Rossum2000-03-281-45/+0
| | | | | | | | | | | | | | | The attached patch set includes a workaround to get Python with Unicode compile on BSDI 4.x (courtesy Thomas Wouters; the cause is a bug in the BSDI wchar.h header file) and Python interfaces for the MBCS codec donated by Mark Hammond. Also included are some minor corrections w/r to the docs of the new "es" and "es#" parser markers (use PyMem_Free() instead of free(); thanks to Mark Hammond for finding these). The unicodedata tests are now in a separate file (test_unicodedata.py) to avoid problems if the module cannot be found.
* Marc-Andre Lemburg:Guido van Rossum2000-03-241-0/+30
| | | | | | | | | | Attached you find the latest update of the Unicode implementation. The patch is against the current CVS version. It includes the fix I posted yesterday for the core dump problem in codecs.c (was introduced by my previous patch set -- sorry), adds more tests for the codecs and two new parser markers "es" and "es#".
* On 17-Mar-2000, Marc-Andre Lemburg said:Barry Warsaw2000-03-201-0/+1
| | | | | | | | | | | | | Attached you find an update of the Unicode implementation. The patch is against the current CVS version. I would appreciate if someone with CVS checkin permissions could check the changes in. The patch contains all bugs and patches sent this week and also fixes a leak in the codecs code and a bug in the free list code for Unicode objects (which only shows up when compiling Python with Py_DEBUG; thanks to MarkH for spotting this one).
* Marc-Andre Lemburg: Add tests for mixed use of char in string.Guido van Rossum2000-03-131-0/+13
|
* Marc-Andre Lemburg: test script for Unicode implementation.Guido van Rossum2000-03-101-0/+281