| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
just like string codecs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In C++, it's an error to pass a string literal to a char* function
without a const_cast(). Rather than require every C++ extension
module to put a cast around string literals, fix the API to state the
const-ness.
I focused on parts of the API where people usually pass literals:
PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type
slots, etc. Predictably, there were a large set of functions that
needed to be fixed as a result of these changes. The most pervasive
change was to make the keyword args list passed to
PyArg_ParseTupleAndKewords() to be a const char *kwlist[].
One cast was required as a result of the changes: A type object
mallocs the memory for its tp_doc slot and later frees it.
PyTypeObject says that tp_doc is const char *; but if the type was
created by type_new(), we know it is safe to cast to char *.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
[ 1327110 ] wrong TypeError traceback in generator expressions
by removing the code that can stomp on the users' TypeError raised by the
iterable argument to ''.join() -- PySequence_Fast (now?) gives a perfectly
reasonable message itself. Also, a couple of tests.
|
| |
|
|
|
|
| |
Backport candidate.
|
|
|
|
|
|
|
| |
PyUnicode_DecodeCharmap() the accept a unicode string as the mapping
argument which is used as a mapping table.
This code isn't used by any of the codecs yet.
|
|
|
|
|
| |
about illegal code points. The codec now supports PEP 293 style error handlers.
(This is a variant of the Nik Haldimann's patch that detects truncated data)
|
|
|
|
|
|
|
|
| |
and its usage in PyLocale_strcoll().
Clarify the documentation on this.
Thanks to Andreas Degert for pointing this out.
|
|
|
|
| |
Python 2.3.x candidate.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
to make it clear that it is possible to pass None as the
separator argument to get the default "any whitespace" separator.
|
|
|
|
|
|
|
|
|
|
|
| |
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
|
|
|
|
|
|
|
|
|
| |
need to convert str objects from the iterable to unicode. So, if
someone set the system default encoding to something nasty enough,
the conversion process could mutate the input iterable as a side
effect, and PySequence_Fast doesn't hide that from us if the input was
a list. IOW, can't assume the size of PySequence_Fast's result is
invariant across PyUnicode_FromObject() calls.
|
|
|
|
|
|
|
|
| |
much to reduce the size of the code, but greatly improves its clarity.
It's also quicker in what's probably the most common case (the argument
iterable is a list). Against it, if the iterable isn't a list or a tuple,
a temp tuple is materialized containing the entire input sequence, and
that's a bigger temp memory burden. Yawn.
|
|
|
|
|
| |
int. I sure wish MS would gripe about that! Whatever, note that the
statement above it guarantees that the cast loses no info.
|
|
|
|
|
|
|
|
| |
1. u1.join([u2]) is u2
2. Be more careful about C-level int overflow.
Since PySequence_Fast() isn't needed to achieve #1, it's not used -- but
the code could sure be simpler if it were.
|
|
|
|
|
|
|
|
|
|
|
|
| |
unicodedata.east_asian_width(). You can still implement your own
simple width() function using it like this:
def width(u):
w = 0
for c in unicodedata.normalize('NFC', u):
cwidth = unicodedata.east_asian_width(c)
if cwidth in ('W', 'F'): w += 2
else: w += 1
return w
|
| |
|
|
|
|
| |
modules and objects.
|
| |
|
| |
|
|
|
|
|
| |
methods on string and unicode objects. Added unicode.decode()
which was missing for no apparent reason.
|
| |
|
|
|
|
|
|
|
| |
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
|
|
|
|
| |
been used. (Suggested by Martin v. Loewis)
|
|
|
|
| |
characters instead of character pointers to determine space requirements.
|
| |
|
|
|
|
| |
using specialized splitter for 1 char sep.
|
|
|
|
| |
(Pointy hat goes to perky)
|
| |
|
|
|
|
|
| |
SF feature request #801847.
Original patch is written by Sean Reifschneider.
|
|
|
|
|
|
|
|
|
|
| |
and left shifts. (Thanks to Kalle Svensson for SF patch 849227.)
This addresses most of the remaining semantic changes promised by
PEP 237, except for repr() of a long, which still shows the trailing
'L'. The PEP appears to promise warnings for operations that
changed semantics compared to Python 2.3, but this is not
implemented; we've suffered through enough warnings related to
hex/oct literals and I think it's best to be silent now.
|
| |
|
|
|
|
|
|
|
| |
charmaptranslate_makespace() allocated more memory than required for the
next replacement but didn't remember that fact, so memory size was growing
exponentially every time a replacement string is longer that one character.
This fixes SF bug #828737.
|
|
|
|
| |
Backported to 2.3.
|
| |
|
|
|
|
|
|
|
| |
so fiddle Jeremy's fix to live with that. Also added more comments.
Bugfix candidate (this bug is in all versions of Python, at least since
2.1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a length-1 Unicode string was in the freelist and it was
uninitialized or pointed to a very large (magnitude) negative number,
the check
unicode_latin1[unicode->str[0]] == unicode
could cause a segmentation violation, e.g. unicode->str[0] is 0xcbcbcbcb.
Fix this in two ways:
1. Change guard befor unicode_latin1[] to test against 256U. If I
understand correctly, the unsigned long used to store UCS4 on my
box was getting converted to a signed long to compare with the
signed constant 256.
2. Change _PyUnicode_New() to make sure the first element of str is
always initialized to zero. There are several places in the code
where the caller can exit with an error before initializing any
of str, which would leave junk in str[0].
Also, silence a compiler warning on pointer vs. int arithmetic.
Bug fix candidate.
|
|
|
|
|
|
|
| |
The unicode_resize() family only returns -1 or 0 so simply checking
for != 0 is sufficient, but somewhat unclear. Many Python API
functions return < 0 on error, reserving the right to return 0 or 1 on
success. Change the call sites for consistency with these calls.
|
|
|
|
|
|
| |
Adding missing support for '%F'.
Will backport to 2.3.1.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
when an encoding error occurs and the callback name is unknown,
i.e. when the callback has to be called. The problem was that
the fact that the callback has already been looked up was only
recorded in a local variable in charmap_encoding_error(), because
charmap_encoding_error() got it's own copy of the errorHandler
pointer instead of a pointer to the pointer in
PyUnicode_EncodeCharmap().
|