| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
_PyUnicode_CompareWithId() is faster than PyUnicode_CompareWithASCIIString()
when both strings are equal and interned.
Add also _PyId_builtins identifier for "builtins" common string.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
* _PyUnicodeWriter_Init() has no more argument (except the writer itself):
min_length and overallocate must be set explicitly
* In error handlers, only enable overallocation if the replacement string
is longer than 1 character
* CJK decoders don't use overallocation anymore
* Set min_length, instead of preallocating memory using
_PyUnicodeWriter_Prepare(), in many decoders
* _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
|
|
|
|
|
|
| |
the legacy Py_UNICODE API.
Add also a new _PyUnicodeWriter_WriteChar() function.
|
|
|
|
|
|
|
|
|
| |
Write a function to enable more optimizations:
* If the substring is the whole string and overallocation is disabled, just
keep a reference to the string, don't copy characters
* Avoid a call to the expensive _PyUnicode_FindMaxChar() function when
possible
|
|
|
|
|
|
|
| |
ASCII/surrogateescape codec is now used, instead of the locale encoding, to
decode the command line arguments. This change fixes inconsistencies with
os.fsencode() and os.fsdecode() because these operating systems announces an
ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
|
|
|
|
| |
Patch written by Serhiy Storchaka.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Simplify the code: replace 4 steps with one unique step using the
_PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to
store intermediate results which require to allocate an array of pointers on
the heap.
* Use the _PyUnicodeWriter API for speed (and its convinient API):
overallocate the buffer to reduce the number of "realloc()"
* Implement "width" and "precision" in Python, don't rely on sprintf(). It
avoids to need of a temporary buffer allocated on the heap: only use a small
buffer allocated in the stack.
* Add _PyUnicodeWriter_WriteCstr() function
* Split PyUnicode_FromFormatV() into two functions: add
unicode_fromformat_arg().
* Inline parse_format_flags(): the format of an argument is now only parsed
once, it's no more needed to have a subfunction.
* Optimize PyUnicode_FromFormatV() for characters between two "%" arguments:
search the next "%" and copy the substring in one chunk, instead of copying
character per character.
|
|\ |
|
| | |
|
|/
|
|
| |
It was already implemented in PyUnicode_RichCompare()
|
|
|
|
| |
Patch by Serhiy Storchaka.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and str.format(args)
* Formatting string, int, float and complex use the _PyUnicodeWriter API. It
avoids a temporary buffer in most cases.
* Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just
keep a reference to the string if the output is only composed of one string
* Disable overallocation when formatting the last argument of str%args and
str.format(args)
* Overallocation allocates at least 100 characters: add min_length attribute
to the _PyUnicodeWriter structure
* Add new private functions: _PyUnicode_FastCopyCharacters(),
_PyUnicode_FastFill() and _PyUnicode_FromASCII()
The speed up is around 20% in average.
|
| |
|
|
|
|
|
| |
Add checks in PyUnicode_WriteChar() and convert PyUnicode_New() assertion to a
test raising a Python exception.
|
|
|
|
|
|
|
|
|
|
|
| |
* Decode thousands separator and decimal point using PyUnicode_DecodeLocale()
(from the locale encoding), instead of decoding them implicitly from latin1
* Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used
* Change _PyUnicode_InsertThousandsGrouping() API to return the maximum
character if unicode is NULL
* Replace MIN/MAX macros by Py_MIN/Py_MAX
* stringlib/undef.h undefines STRINGLIB_IS_UNICODE
* stringlib/localeutil.h only supports Unicode
|
| |
|
|\
| |
| |
| |
| |
| | |
in the file name.
Patch by Hynek Schlawack.
|
| |
| |
| |
| |
| |
| | |
in the file name.
Patch by Hynek Schlawack.
|
| |
| |
| |
| |
| | |
I had to move the static identifier code from unicodeobject.h to object.h in
order for this to work.
|
| | |
|
| | |
|
| |
| |
| |
| | |
Also broaden the category of characters that count as lowercase/uppercase.
|
| |
| |
| |
| |
| | |
It is faster than the unicode_fill() function which was implemented in
formatter_unicode.c.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
instead of surrogateescape
So it would be possible to support more error handlers later.
|
| |
| |
| |
| |
| |
| | |
* Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not
available
* Document my last changes in Misc/NEWS
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string
from the current locale encoding
* _Py_char2wchar() writes an "error code" in the size argument to indicate
if the function failed because of memory allocation failure or because of a
decoding error. The function doesn't write the error message directly to
stderr.
* Fix time.strftime() (if wcsftime() is missing): decode strftime() result
from the current locale encoding, not from the filesystem encoding.
|
| |
| |
| |
| | |
Call also directly unicode_resize() in unicodeobject.c
|
| |
| |
| |
| | |
Fix also PyUnicode_Resize() doc
|
| |
| |
| |
| |
| |
| | |
Undocument the function.
Make also decode_utf8_errors() as private (static).
|
| | |
|
| |
| |
| |
| | |
It has no sense to check if a not ready string is ASCII or not.
|
| |
| |
| |
| | |
And use surrogates macros everywhere in unicodeobject.c
|
| |
| |
| |
| | |
using an assertion
|
| |
| |
| |
| | |
not only its minimum value
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
http://mail.python.org/pipermail/python-dev/2011-November/114347.html
|
| |
| |
| |
| |
| | |
_PyUnicode_CheckConsistency() checks also hash and length value for not ready
Unicode strings.
|
| |
| |
| |
| |
| | |
* _PyUnicode_CheckConsistency() now takes a PyObject* instead of void*
* Remove now useless casts to PyObject*
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or
PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to
PyUnicodeObject* is wrong
|