| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* Don't overallocate by 400% when recode is needed: only overallocate on demand
using _PyBytesWriter.
* Use _PyLong_DigitValue to convert hexadecimal digit to int
* Create _PyBytes_DecodeEscapeRecode() subfunction
|
| | | |
|
| | |
| | |
| | |
| | | |
enhance code to detect buffer under- and overflow.
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Issue #25401: Optimize bytes.fromhex() and bytearray.fromhex(): they are now
between 2x and 3.5x faster. Changes:
* Use a fast-path working on a char* string for ASCII string
* Use a slow-path for non-ASCII string
* Replace slow hex_digit_to_int() function with a O(1) lookup in
_PyLong_DigitValue precomputed table
* Use _PyBytesWriter API to handle the buffer
* Add unit tests to check the error position in error messages
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Issue #25399: Don't create temporary bytes objects: modify _PyBytes_Format() to
create work directly on bytearray objects.
* Rename _PyBytes_Format() to _PyBytes_FormatEx() just in case if something
outside CPython uses it
* _PyBytes_FormatEx() now uses (char*, Py_ssize_t) for the input string, so
bytearray_format() doesn't need tot create a temporary input bytes object
* Add use_bytearray parameter to _PyBytes_FormatEx() which is passed to
_PyBytesWriter, to create a bytearray buffer instead of a bytes buffer
Most formatting operations are now between 2.5 and 5 times faster.
|
| | |
| | |
| | |
| | |
| | | |
Issue #25399: Add a new use_bytearray attribute to _PyBytesWriter to use a
bytearray buffer, instead of using a bytes object.
|
| | |
| | |
| | |
| | | |
Issue #25399: Fix long_format_binary(), allocate bytes for the bytes writer.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* Add much more unit tests on PyBytes_FromFormatV()
* Remove the first loop to compute the length of the output string
* Use _PyBytesWriter to handle the bytes buffer, use overallocation
* Cleanup the code to make simpler and easier to review
|
| | |
| | |
| | |
| | | |
the new _PyBytesWriter API.
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
Modify _PyBytesWriter_Finish() and _PyUnicodeWriter_Finish() to return the
empty bytes/Unicode string if the string is empty.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Don't require _PyBytesWriter pointer to be a "char *". Same change for
_PyBytesWriter_WriteBytes() parameter.
For example, binascii uses "unsigned char*".
|
| | |
| | |
| | |
| | | |
can now be pickled using pickle protocols older than protocol version 4.
|
| | |
| | |
| | |
| | | |
Optimize also %% formater.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Optimize bytes.__mod__(args) for integere formats: %d (%i, %u), %o, %x and %X.
_PyBytesWriter is now used to format directly the integer into the writer
buffer, instead of using a temporary bytes object.
Formatting is between 30% and 50% faster on a microbenchmark.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.
Cleanup unicode_encode_ucs1():
* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
|
| | | |
|
| | |
| | |
| | |
| | |
| | | |
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* Thanks to the _PyBytesWriter API, output smaller than 512 bytes are allocated
on the stack and so avoid calling _PyBytes_Resize(). Because of that, change
the default buffer size to fmtcnt instead of fmtcnt+100.
* Rely on _PyBytesWriter algorithm to overallocate the buffer instead of using
a custom code. For example, _PyBytesWriter uses a different overallocation
factor (25% or 50%) depending on the platform to get best performances.
* Disable overallocation for the last write.
* Replace C loops to fill characters with memset()
* Add also many comments to _PyBytes_Format()
* Remove unused FORMATBUFLEN constant
* Avoid the creation of a temporary bytes object when formatting a floating
point number (when no custom formatting option is used)
* Fix also reference leaks on error handling
* Use Py_MEMCPY() to copy bytes between two formatters (%)
|
| | |
| | |
| | |
| | |
| | |
| | | |
Rename "stack buffer" to "small buffer".
Add also an assertion in _PyBytesWriter_GetPos().
|
| | |
| | |
| | |
| | | |
Fix code to estimate the needed space.
|
| | |
| | |
| | |
| | |
| | |
| | | |
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().
Add also unit tests for non-BMP characters.
|
| | |
| | |
| | |
| | | |
Replace "#if Py_DEBUG" with "#ifdef Py_DEBUG".
|
| | |
| | |
| | |
| | | |
Declare also the private API in bytesobject.h.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.
Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.
Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.
unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
|
|\ \ \
| |/ / |
|
| | | |
|
| |\ \
| | |/ |
|
| | | |
|
|\ \ \
| |/ / |
|
| |\ \
| | |/ |
|
| | |
| | |
| | |
| | | |
(#24806)
|
| | | |
|
| | |
| | |
| | |
| | | |
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
|
| | |
| | |
| | |
| | |
| | |
| | | |
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
|
|\ \ \
| |/ /
| | |
| | |
| | |
| | |
| | | |
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
|
| |\ \
| | |/
| | |
| | |
| | |
| | |
| | | |
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
|
| | |
| | |
| | |
| | |
| | | |
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
|
|\ \ \
| |/ /
| | |
| | | |
hash only once.
|
| | |
| | |
| | |
| | | |
hash only once.
|
| | |
| | |
| | |
| | | |
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
|
| | |
| | |
| | |
| | |
| | | |
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
|
|\ \ \
| |/ / |
|
| |\ \
| | |/ |
|
| | |
| | |
| | |
| | | |
Restore also errno value before calling PyErr_SetFromErrno().
|
|\ \ \
| |/ /
| | |
| | | |
imported at startup) now uses the backslashreplace error handler.
|
| |\ \
| | |/
| | |
| | | |
imported at startup) now uses the backslashreplace error handler.
|