| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
sequences.
Patch by Nick Barnes and Victor Stinner.
|
|
|
|
|
|
|
| |
into the extern "C" section.
Add a few more comments and apply some minor edits to make the file contents
fit the original structure again.
|
| |
|
| |
|
|
|
|
|
|
| |
compilation fails with "undefined reference to _Py_ascii_whitespace"
Will backport to 2.6.
|
|
|
|
| |
optimizations.
|
|
|
|
| |
collection of the highest generation.
|
|
|
|
| |
detection. The speedup is about 25% for split() (571 / 457 usec) and 35% (175 / 127 usec )for splitlines()
|
| |
|
|
|
|
|
|
|
| |
PyUnicode_FromString, PyUnicode_Format and PyLong_From/AsSsize_t. The functions are partly required for the backport of the bytearray type and _fileio module. They should also make it easier to port C to 3.0.
First chapter of the Python 3.0 io framework back port: _fileio
The next step depends on a working bytearray type which itself depends on a backport of the nwe buffer API.
|
|
|
|
| |
Py_REFCNT. Macros for b/w compatibility are available.
|
|
|
|
|
|
|
| |
Solves issue1460.
Might not be a backport candidate: a new API function was added,
and some code may rely on details in utf-7.py.
|
| |
|
|
|
|
|
| |
backwards compatibility. Add Py_Refcnt, Py_Type, Py_Size, and
PyVarObject_HEAD_INIT.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of some of the common builtin types.
Use a bit in tp_flags for each common builtin type. Check the bit
to determine if any instance is a subclass of these common types.
The check avoids a function call and O(n) search of the base classes.
The check is done in the various Py*_Check macros rather than calling
PyType_IsSubtype().
All the bits are set in tp_flags when the type is declared
in the Objects/*object.c files because PyType_Ready() is not called
for all the types. Should PyType_Ready() be called for all types?
If so and the change is made, the changes to the Objects/*object.c files
can be reverted (remove setting the tp_flags). Objects/typeobject.c
would also have to be modified to add conditions
for Py*_CheckExact() in addition to each the PyType_IsSubtype check.
|
|
|
|
|
|
|
|
|
|
| |
Replace UnicodeDecodeErrors raised during == and !=
compares of Unicode and other objects with a new
UnicodeWarning.
All other comparisons continue to raise exceptions.
Exceptions other than UnicodeDecodeErrors are also left
untouched.
|
| |
|
| |
|
|
|
|
|
| |
and use it for string copy operations. this gives a 20% speedup on some
string benchmarks.
|
| |
|
|
|
|
| |
feel free to improve the documentation and the docstrings.
|
| |
|
|
|
|
| |
for long repeats.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
_PyUnicode_IsLinebreak():
Changed the declarations to match the definitions.
Don't know why they differed; MSVC warned about it;
don't know why only these two functions use "const".
Someone who does may want to do something saner ;-).
|
|
|
|
|
| |
about illegal code points. The codec now supports PEP 293 style error handlers.
(This is a variant of the Nik Haldimann's patch that detects truncated data)
|
|
|
|
|
|
|
|
| |
and its usage in PyLocale_strcoll().
Clarify the documentation on this.
Thanks to Andreas Degert for pointing this out.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
|
|
|
|
|
|
|
|
|
|
|
|
| |
unicodedata.east_asian_width(). You can still implement your own
simple width() function using it like this:
def width(u):
w = 0
for c in unicodedata.normalize('NFC', u):
cwidth = unicodedata.east_asian_width(c)
if cwidth in ('W', 'F'): w += 2
else: w += 1
return w
|
|
|
|
|
| |
methods on string and unicode objects. Added unicode.decode()
which was missing for no apparent reason.
|
|
|
|
|
|
|
| |
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
|
|
|
|
|
| |
SF feature request #801847.
Original patch is written by Sean Reifschneider.
|
|
|
|
| |
to use new extern macro.
|
|
|
|
| |
Thanks to Skip Montanaro and Kalle Svensson for the patches.
|
|
|
|
|
|
|
| |
u'%c' will now raise a ValueError in case the argument is an
integer outside the valid range of Unicode code point ordinals.
Closes SF bug #593581.
|
| |
|
|
|
|
|
|
|
|
|
| |
http://www.python.org/sf/444708
This adds the optional argument for str.strip
to unicode.strip too and makes it possible
to call str.strip with a unicode argument
and unicode.strip with a str argument.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements what we have discussed on python-dev late in
September: str(obj) and unicode(obj) should behave similar, while
the old behaviour is retained for unicode(obj, encoding, errors).
The patch also adds a new feature with which objects can provide
unicode(obj) with input data: the __unicode__ method. Currently no
new tp_unicode slot is implemented; this is left as option for the
future.
Note that PyUnicode_FromEncodedObject() no longer accepts Unicode
objects as input. The API name already suggests that Unicode
objects do not belong in the list of acceptable objects and the
functionality was only needed because
PyUnicode_FromEncodedObject() was being used directly by
unicode(). The latter was changed in the discussed way:
* unicode(obj) calls PyObject_Unicode()
* unicode(obj, encoding, errors) calls PyUnicode_FromEncodedObject()
One thing left open to discussion is whether to leave the
PyUnicode_FromObject() API as a thin API extension on top of
PyUnicode_FromEncodedObject() or to turn it into a (macro) alias
for PyObject_Unicode() and deprecate it. Doing so would have some
surprising consequences though, e.g. u"abc" + 123 would turn out
as u"abc123"...
[Marc-Andre didn't have time to check this in before the deadline. I
hope this is OK, Marc-Andre! You can still make changes and commit
them on the trunk after the branch has been made, but then please mail
Barry a context diff if you want the change to be merged into the
2.2b1 release branch. GvR]
|
| |
|
| |
|
|
|
|
|
| |
Changed unicode(i) to return a true Unicode object when i is an instance of
a unicode subclass. Added PyUnicode_CheckExact macro.
|
| |
|
|
|
|
|
|
|
|
| |
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
|
|
|
|
|
|
|
|
| |
Removed all instances of Py_UCS2 from the codebase, and so also (I hope)
the last remaining reliance on the platform having an integral type
with exactly 16 bits.
PyUnicode_DecodeUTF16() and PyUnicode_EncodeUTF16() now read and write
one byte at a time.
|
|
|
|
|
| |
assure that extensions and interpreters using the Unicode APIs
were compiled using the same Unicode width.
|
|
|
|
|
|
|
| |
And remove all the extern decls in the middle of .c files.
Apparently, it was excluded from the header file because it is
intended for internal use by the interpreter. It's still intended for
internal use and documented as such in the header file.
|
|
|
|
| |
predicates
|