summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* This patch adds a new feature to the builtin charmap codec:Marc-André Lemburg2001-01-061-8/+48
| | | | | | | | | | | | | | | The mapping dictionaries can now contain 1-n mappings, meaning that character ordinals may be mapped to strings or Unicode object, e.g. 0x0078 ('x') -> u"abc", causing the ordinal to be replaced by the complete string or Unicode object instead of just one character. Another feature introduced by the patch is that of mapping oridnals to the emtpy string. This allows removing characters. The patch is different from patch #103100 in that it does not cause a performance hit for the normal use case of 1-1 mappings. Written by Marc-Andre Lemburg, copyright assigned to Guido van Rossum.
* This patch changes the default behaviour of the builtin charmapMarc-André Lemburg2001-01-031-13/+8
| | | | | | | | | | | | | | | | codec to not apply Latin-1 mappings for keys which are not found in the mapping dictionaries, but instead treat them as undefined mappings. The patch was originally written by Martin v. Loewis with some additional (cosmetic) changes and an updated test script by Marc-Andre Lemburg. The standard codecs were recreated from the most current files available at the Unicode.org site using the Tools/scripts/gencodec.py tool. This patch closes the bugs #116285 and #119960.
* Patch #102940: use only printable Unicode chars in reportingAndrew M. Kuchling2000-12-191-1/+2
| | | | | incorrect % characters; characters outside the printable range are replaced with '?'
* Fix off-by-one error in split_substring(). Fixes SF bug #122162.Guido van Rossum2000-12-191-1/+1
|
* [ Patch #102852 ] Make % error a bit more informative by indicates theAndrew M. Kuchling2000-12-151-2/+3
| | | | index at which an unknown %-escape was found
* Fox for SF bug #123859: %[duxXo] long formats inconsistent.Tim Peters2000-11-301-3/+1
|
* _PyUnicode_Fini(): Initialize the local freelist walking variable `u'Barry Warsaw2000-10-031-2/+3
| | | | | | after unicode_empty has been freed, otherwise it might not point to the real start of the unicode_freelist. Final closure for SF bug #110681, Jitterbug PR#398.
* In _PyUnicode_Fini(), decref unicode_empty before tearng down the freeGuido van Rossum2000-10-031-2/+2
| | | | list. Discovered by Barry, fix approved by MAL.
* Rationalize use of limits.h, moving the inclusion to Python.h.Fred Drake2000-09-261-6/+0
| | | | | | | | Add definitions of INT_MAX and LONG_MAX to pyport.h. Remove includes of limits.h and conditional definitions of INT_MAX and LONG_MAX elsewhere. This closes SourceForge patch #101659 and bug #115323.
* Derived from Martin's SF patch 110609: support unbounded ints in ↵Tim Peters2000-09-211-24/+68
| | | | | | | | | | | | | | | | %d,i,u,x,X,o formats. Note a curious extension to the std C rules: x, X and o formatting can never produce a sign character in C, so the '+' and ' ' flags are meaningless for them. But unbounded ints *can* produce a sign character under these conversions (no fixed- width bitstring is wide enough to hold all negative values in 2's-comp form). So these flags become meaningful in Python when formatting a Python long which is too big to fit in a C long. This required shuffling around existing code, which hacked x and X conversions to death when both the '#' and '0' flags were specified: the hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or '+' flags too, since signs were always meaningless before for x and X conversions. Isomorphic shuffling was required in unicodeobject.c. Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
* Fix for bug 113934. string*n and unicode*n did no overflow checking atTim Peters2000-09-091-2/+19
| | | | | | | all, either to see whether the # of chars fit in an int, or that the amount of memory needed fit in a size_t. Checking these is expensive, but the alternative is silently wrong answers (as in the bug report) or core dumps (which were easy to provoke using Unicode strings).
* changed \x to consume exactly two hex digits, also for unicodeFredrik Lundh2000-09-031-55/+66
| | | | | | strings. closes PEP-223. also added \U escape (eight hex digits).
* PyUnicode_AsUTF8String(): /F picks up what I missed: the local varBarry Warsaw2000-08-181-2/+0
| | | | `str' is no longer necessary. Gotta turn on -Wall!
* PyUnicode_AsUTF8String(): Don't need to explicitly incref str sinceBarry Warsaw2000-08-181-7/+3
| | | | | PyUnicode_EncodeUTF8() already returns the created object with the proper reference count. This fixes an Insure reported memory leak.
* Fixed a couple of instances where a 0-length string was beingMarc-André Lemburg2000-08-141-6/+13
| | | | | | | resized after creation. 0-length strings are usually shared and _PyString_Resize() fails on these shared strings. Fixes [ Bug #111667 ] unicode core dump.
* Clean up warning from Monterey compiler.Trent Mick2000-08-121-1/+1
| | | | | | Properly end a comment block. It was terminated fine later but by a subsequent block and. It was also in #if 0. This patch is so trivial I can't believe I am talking about it. :)
* Removing UTF-16 aware Unicode comparison code. This kind of compareMarc-André Lemburg2000-08-081-0/+33
| | | | | | | function (together with other locale aware ones) should into a new collation support module. See python-dev for a discussion of this removal. Note: This patch should also be applied to the 1.6 branch.
* This patch finalizes the move from UTF-8 to a default encoding inMarc-André Lemburg2000-08-031-40/+40
| | | | | | | | | | | | | | | | | | the Python Unicode implementation. The internal buffer used for implementing the buffer protocol is renamed to defenc to make this change visible. It now holds the default encoded version of the Unicode object and is calculated on demand (NULL otherwise). Since the default encoding defaults to ASCII, this will mean that Unicode objects which hold non-ASCII characters will no longer work on C APIs using the "s" or "t" parser markers. C APIs must now explicitly provide Unicode support via the "u", "U" or "es"/"es#" parser markers in order to work with non-ASCII Unicode strings. (Note: this patch will also have to be applied to the 1.6 branch of the CVS tree.)
* Changing the CNRI copyright notice according to CNRI's instructions.Guido van Rossum2000-08-031-1/+1
| | | | | This is a notice without a date, which apparently is not a claim to copyright but only advice to the reader. IANAL. :-)
* merge Include/my*.h into Include/pyport.hPeter Schneider-Kamp2000-07-311-1/+0
| | | | marked my*.h as obsolete
* Miscelaneous ANSIfications. I'm assuming here 'main' should take (int,Thomas Wouters2000-07-221-20/+4
| | | | | char**) and return an int even on PC platforms. If not, please fix PC/utils/makesrc.c ;-P
* Fixed problems with UTF error reporting macros and some formatting bugs.Marc-André Lemburg2000-07-171-45/+64
|
* gcc is being stupid with if/else constructsGreg Stein2000-07-171-6/+14
| | | | clean out some other warnings
* stop messing around with goto and just write the macro correctly.Greg Stein2000-07-161-7/+6
|
* - change \x to mean "byte" also in unicode literalsFredrik Lundh2000-07-161-3/+5
| | | | (patch #100912)
* Fix fatal compiler (MSVC6) error:Tim Peters2000-07-161-0/+1
| | | | | unicodeobject.c(735) : error C2143: syntax error : missing ';' before '}'
* Fix to a bug found by Florian Weimer:Marc-André Lemburg2000-07-161-1/+2
| | | | | | | | | | | | | | | | The UTF-8 decoder is still buggy (i.e. it doesn't pass Markus Kuhn's stress test), mainly due to the following construct: #define UTF8_ERROR(details) do { \ if (utf8_decoding_error(&s, &p, errors, details)) \ goto onError; \ continue; \ } while (0) (The "continue" statement is supposed to exit from the outer loop, but of course, it doesn't. Indeed, this is a marvelous example of the dangers of the C programming language and especially of the C preprocessor.)
* Spelling fixes supplied by Rob W. W. Hooft. All these are fixes in eitherThomas Wouters2000-07-161-2/+2
| | | | | | | | | | comments, docstrings or error messages. I fixed two minor things in test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't"). There is a minor style issue involved: Guido seems to have preferred English grammar (behaviour, honour) in a couple places. This patch changes that to American, which is the more prominent style in the source. I prefer English myself, so if English is preferred, I'd be happy to supply a patch myself ;)
* replace PyXXX_Length calls with PyXXX_Size callsJeremy Hylton2000-07-121-1/+1
|
* Jeremy Hylton:Marc-André Lemburg2000-07-111-2/+4
| | | | better error message for unicode coercion failure
* - changed hash calculation for unicode strings. the newFredrik Lundh2000-07-101-18/+20
| | | | | | | | | | value is calculated from the character values, in a way that makes sure an 8-bit ASCII string and a unicode string with the same contents get the same hash value. (as a side effect, this also works for ISO Latin 1 strings). for more details, see the python-dev discussion.
* New surrogate support in the UTF-8 codec. By Bill Tutt.Marc-André Lemburg2000-07-071-29/+80
|
* Added new API PyUnicode_FromEncodedObject() which supports decodingMarc-André Lemburg2000-07-071-6/+49
| | | | | | objects including instance objects. The old API PyUnicode_FromObject() is still available as shortcut.
* Fix to bug #393 (UTF16 codec didn't like empty strings) andMarc-André Lemburg2000-07-071-7/+6
| | | | | corrected some usage of 'unsigned long' where Py_UNICODE should have been used.
* Two more places where long should be used instead of int. EspeciallySjoerd Mullender2000-07-071-2/+2
| | | | true after revision 2.36 was checked in...
* Fixed some code that used 'short' to use 'long' instead.Marc-André Lemburg2000-07-061-3/+3
|
* Fixed a couple of places where 'int' was used where 'long'Marc-André Lemburg2000-07-061-7/+7
| | | | should have been used.
* Added new .isalpha() and .isalnum() methods which provide interfacesMarc-André Lemburg2000-07-051-0/+66
| | | | to the new alphabetic lookup APIs in unicodectype.c.
* Bill Tutt:Marc-André Lemburg2000-07-041-6/+29
| | | | | Make unicode_compare a true UTF-16 compare function (includes support for surrogates).
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-301-1/+1
| | | | A previous patch by Jack Jansen was accidently reverted.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-301-23/+61
| | | | | | New buffer overflow checks for formatting strings. By Trent Mick.
* Jack Jansen: Use include "" instead of <>; and staticforward declarationsGuido van Rossum2000-06-291-1/+1
|
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-281-0/+121
| | | | | | | | Patch to the standard unicode-escape codec which dynamically loads the Unicode name to ordinal mapping from the module ucnhash. By Bill Tutt.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-281-1/+4
| | | | | Better error message for "1 in unicodestring". Submitted by Andrew Kuchling.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-181-6/+4
| | | | | | | | Fixed a bug in PyUnicode_Count() which would have caused a core dump in case of substring coercion failure. Synchronized .count() with the string method of the same name to return len(s)+1 for s.count('').
* Vladimir MARANGOZOV <Vladimir.Marangozov@inrialpes.fr>:Marc-André Lemburg2000-06-171-3/+4
| | | | | This patch fixes an optimisation mystery in _PyUnicodeNew causing segfaults on AIX when the interpreter is compiled with -O.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-141-0/+28
| | | | Added code so that .isXXX() testing returns 0 for emtpy strings.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-101-2/+1
| | | | | Fixed a typo and removed a debug printf(). Thanks to Finn Bock for finding these.
* Patch from Michael Hudson: improve unclear error messageAndrew M. Kuchling2000-06-091-1/+1
|
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-081-8/+29
| | | | | | | | Fixed %c formatting to check for one character arguments. Thanks to Finn Bock for finding this bug. Added a fix for bug PR#348 which originated from not resetting the globals correctly in _PyUnicode_Fini().