| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| |
|
|
|
|
|
|
|
|
|
| |
Fix SF bug #697220, string.strip implementation/doc mismatch
Attempt to make all the various string/unicode *strip methods the same.
* Doc - add doc for when functions were added
* UserString
* string/unicode object methods
* string module functions
'chars' is used for the last parameter everywhere.
|
| | |
|
| | |
|
| |
|
|
| |
Fix for bug #626172: crash using unicode latin1 single char
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Fix a nasty endcase reported by Armin Rigo in SF bug 618623:
'%2147483647d' % -123 segfaults. This was because an integer overflow
in a comparison caused the string resize to be skipped. After fixing
the overflow, this could call _PyString_Resize() with a negative size,
so I (1) test for that and raise MemoryError instead; (2) also added a
test for negative newsize to _PyString_Resize(), raising SystemError
as for all bad arguments.
An identical bug existed in unicodeobject.c, of course.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
2002/08/11 12:23:04 lemburg Python/bltinmodule.c 2.262
2002/08/11 12:23:04 lemburg Objects/unicodeobject.c 2.162
2002/08/11 12:23:03 lemburg Misc/NEWS 1.461
2002/08/11 12:23:03 lemburg Lib/test/test_unicode.py 1.65
2002/08/11 12:23:03 lemburg Include/unicodeobject.h 2.39
Add C API PyUnicode_FromOrdinal() which exposes unichr() at C level.
u'%c' will now raise a ValueError in case the argument is an
integer outside the valid range of Unicode code point ordinals.
Closes SF bug #593581.
|
| | |
|
| |
|
|
|
|
|
|
| |
UTF-8 decoder accept broken UTF-8 sequences which encode lone
high surrogates (the pre-2.2.2 versions forgot to generate the
UTF-8 prefix \xed for these).
Fixes SF bug #610783: Lone surrogates cause bad .pyc files.
|
| | |
|
| |
|
|
|
|
|
|
| |
unicodeobject.c 2.169
stringobject.c 2.189
Fix warnings on 64-bit platforms about casts from pointers to ints.
Two of these were real bugs.
|
| |
|
|
|
|
|
|
|
| |
Fix SF bug 599128, submitted by Inyeol Lee: .replace() would do the
wrong thing for a unicode subclass when there were zero string
replacements. The example given in the SF bug report was only one way
to trigger this; replacing a string of length >= 2 that's not found is
another. The code would actually write outside allocated memory if
replacement string was longer than the search string.
|
| |
|
|
|
|
| |
These were reported and fixed by Inyeol Lee in SF bug 595350. The
endswith() bug is already fixed in 2.3; I'll fix the others in
2.3 next.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Repair widespread misuse of _PyString_Resize. Since it's clear people
don't understand how this function works, also beefed up the docs. The
most common usage error is of this form (often spread out across gotos):
if (_PyString_Resize(&s, n) < 0) {
Py_DECREF(s);
s = NULL;
goto outtahere;
}
The error is that if _PyString_Resize runs out of memory, it automatically
decrefs the input string object s (which also deallocates it, since its
refcount must be 1 upon entry), and sets s to NULL. So if the "if"
branch ever triggers, it's an error to call Py_DECREF(s): s is already
NULL! A correct way to write the above is the simpler (and intended)
if (_PyString_Resize(&s, n) < 0)
goto outtahere;
Bugfix candidate.
Original patch(es):
python/dist/src/Objects/fileobject.c:2.161
python/dist/src/Objects/stringobject.c:2.161
python/dist/src/Objects/unicodeobject.c:2.147
|
| |
|
|
|
|
|
|
|
|
| |
Apply patch diff.txt from SF feature request
http://www.python.org/sf/444708
This adds the optional argument for str.strip
to unicode.strip too and makes it possible
to call str.strip with a unicode argument
and unicode.strip with a str argument.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Misc/NEWS 1.387->1.388
Lib/test/string_tests.py 1.10->1.11, 1.12->1.14,
Lib/test/test_unicode.py 1.50->1.51, 1.53->1.54, 1.55->1.56
Lib/test/test_string.py 1.15->1.16
Lib/string.py 1.61->1.63
Lib/test/test_userstring.py 1.5->1.6, 1.11, 1.12
Objects/stringobject.c 2.156->2.159
Objects/unicodeobject.c 2.137->2.139
Doc/lib/libstdtypes.tec 1.87->1.88
Add a method zfill to str, unicode and UserString
and change Lib/string.py accordingly
(see SF patch http://www.python.org/sf/536241)
This also adds Guido's fix to test_userstring.py
and the subinstance checks in test_string.py
and test_unicode.py.
|
| |
|
|
|
|
|
|
| |
[ 529104 ] broken error handling in unicode-escape
I presume this will need to be fixed on the trunk, too.
Later.
|
| |
|
|
|
|
|
|
| |
[ 531306 ] ucs4 build horked.
Classic C mistake, I think.
Also squashed a couple of warnings in the ucs4 build.
|
| |
|
|
|
|
|
|
|
|
|
| |
free" glitch).
unicodeobject.c: squash compiler warnings.
Noting that test_pyclbr currently fails in 2.2.1:
test_others (__main__.PyclbrTest) ... ??? HTTP11
FAIL
|
| | |
|
| | |
|
| |
|
|
|
|
| |
[ #495401 ] Build troubles: --with-pymalloc
in a slightly different manner to the trunk, as discussed on python-dev.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is best reproduced by
while 1:
class U(unicode):
pass
U(u"xxxxxx")
The unicode_dealloc() code wasn't properly freeing the str and defenc
fields of the Unicode object when freeing a subtype instance. Fixed
this by a subtle refactoring that actually reduces the amount of code
slightly.
|
| |
|
|
| |
for buffer overrun avoidance.
|
| | |
|
| |
|
|
|
| |
This patch also does away with an incompatibility between Jython
and CPython.
|
| |
|
|
| |
This patch should also be applied to the 2.2b1 trunk.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements what we have discussed on python-dev late in
September: str(obj) and unicode(obj) should behave similar, while
the old behaviour is retained for unicode(obj, encoding, errors).
The patch also adds a new feature with which objects can provide
unicode(obj) with input data: the __unicode__ method. Currently no
new tp_unicode slot is implemented; this is left as option for the
future.
Note that PyUnicode_FromEncodedObject() no longer accepts Unicode
objects as input. The API name already suggests that Unicode
objects do not belong in the list of acceptable objects and the
functionality was only needed because
PyUnicode_FromEncodedObject() was being used directly by
unicode(). The latter was changed in the discussed way:
* unicode(obj) calls PyObject_Unicode()
* unicode(obj, encoding, errors) calls PyUnicode_FromEncodedObject()
One thing left open to discussion is whether to leave the
PyUnicode_FromObject() API as a thin API extension on top of
PyUnicode_FromEncodedObject() or to turn it into a (macro) alias
for PyObject_Unicode() and deprecate it. Doing so would have some
surprising consequences though, e.g. u"abc" + 123 would turn out
as u"abc123"...
[Marc-Andre didn't have time to check this in before the deadline. I
hope this is OK, Marc-Andre! You can still make changes and commit
them on the trunk after the branch has been made, but then please mail
Barry a context diff if you want the change to be merged into the
2.2b1 release branch. GvR]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
many types were subclassable but had a xxx_dealloc function that
called PyObject_DEL(self) directly instead of deferring to
self->ob_type->tp_free(self). It is permissible to set tp_free in the
type object directly to _PyObject_Del, for non-GC types, or to
_PyObject_GC_Del, for GC types. Still, PyObject_DEL was a tad faster,
so I'm fearing that our pystone rating is going down again. I'm not
sure if doing something like
void xxx_dealloc(PyObject *self)
{
if (PyXxxCheckExact(self))
PyObject_DEL(self);
else
self->ob_type->tp_free(self);
}
is any faster than always calling the else branch, so I haven't
attempted that -- however those types whose own dealloc is fancier
(int, float, unicode) do use this pattern.
|
| |
|
|
| |
of \\.
|
| |
|
|
|
|
|
|
|
|
| |
elements which are not Unicode objects or strings. (This matches
the string.join() behaviour.)
Fix a memory leak in the .join() method which occurs in case
the Unicode resize fails.
Restore the test_unicode output.
|
| |
|
|
|
| |
works just like str(obj) in that it tries __str__/tp_str on the object
in case it finds that the object is not a string or buffer.
|
| | |
|
| |
|
|
|
|
|
|
| |
+ These were leaving the hash fields at 0, which all string and unicode
routines believe is a legitimate hash code. As a result, hash() applied
to str and unicode subclass instances always returned 0, which in turn
confused dict operations, etc.
+ Changed local names "new"; no point to antagonizing C++ compilers.
|
| | |
|
| |
|
|
|
| |
Changed unicode(i) to return a true Unicode object when i is an instance of
a unicode subclass. Added PyUnicode_CheckExact macro.
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
| |
Removed all instances of Py_UCS2 from the codebase, and so also (I hope)
the last remaining reliance on the platform having an integral type
with exactly 16 bits.
PyUnicode_DecodeUTF16() and PyUnicode_EncodeUTF16() now read and write
one byte at a time.
|
| | |
|
| |
|
|
|
|
|
| |
And remove all the extern decls in the middle of .c files.
Apparently, it was excluded from the header file because it is
intended for internal use by the interpreter. It's still intended for
internal use and documented as such in the header file.
|
| |
|
|
| |
wide builds.
|
| |
|
|
|
|
|
|
| |
correctly and thus roundtrip-safe.
Some minor cleanups of the code.
Added tests for the roundtrip-safety.
|
| |
|
|
|
|
|
|
|
|
| |
#caused warnings with the VMS C compiler. (SF bug #442998, in part.)
On a narrow system the current code should never be executed since ch
will always be < 0x10000.
Marc-Andre: you may end up fixing this a different way, since I
believe you have plans to generate \U for surrogate pairs. I'll leave
that to you.
|
| |
|
|
| |
tests.
|
| |
|
|
|
|
|
| |
Implement sys.maxunicode.
Explicitly wrap around upper/lower computations for wide Py_UNICODE.
When decoding large characters with UTF-8, represent expected test
results using the \U notation.
|
| |
|
|
| |
when checking surrogates.
|
| |
|
|
|
|
|
|
|
|
| |
Add configure option --enable-unicode.
Add config.h macros Py_USING_UNICODE, PY_UNICODE_TYPE, Py_UNICODE_SIZE,
SIZEOF_WCHAR_T.
Define Py_UCS2.
Encode and decode large UTF-8 characters into single Py_UNICODE values
for wide Unicode types; likewise for UTF-16.
Remove test whether sizeof Py_UNICODE is two.
|
| |
|
|
|
|
| |
unicodeobject.h, which forces sizeof(Py_UNICODE) == sizeof(Py_UCS4).
(this may be good enough for platforms that doesn't have a 16-bit
type. the UTF-16 codecs don't work, though)
|