| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Thanks to the _PyBytesWriter API, output smaller than 512 bytes are allocated
on the stack and so avoid calling _PyBytes_Resize(). Because of that, change
the default buffer size to fmtcnt instead of fmtcnt+100.
* Rely on _PyBytesWriter algorithm to overallocate the buffer instead of using
a custom code. For example, _PyBytesWriter uses a different overallocation
factor (25% or 50%) depending on the platform to get best performances.
* Disable overallocation for the last write.
* Replace C loops to fill characters with memset()
* Add also many comments to _PyBytes_Format()
* Remove unused FORMATBUFLEN constant
* Avoid the creation of a temporary bytes object when formatting a floating
point number (when no custom formatting option is used)
* Fix also reference leaks on error handling
* Use Py_MEMCPY() to copy bytes between two formatters (%)
|
|
|
|
|
|
| |
Rename "stack buffer" to "small buffer".
Add also an assertion in _PyBytesWriter_GetPos().
|
|
|
|
| |
Fix code to estimate the needed space.
|
|
|
|
|
|
| |
Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors().
Add also unit tests for non-BMP characters.
|
|
|
|
| |
Replace "#if Py_DEBUG" with "#ifdef Py_DEBUG".
|
|
|
|
| |
Declare also the private API in bytesobject.h.
|
|
|
|
|
|
|
|
|
|
| |
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.
Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.
Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.
unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
|
|\ |
|
| | |
|
| |\ |
|
| | | |
|
|\ \ \
| |/ / |
|
| |\ \
| | |/ |
|
| | |
| | |
| | |
| | | |
(#24806)
|
| | | |
|
| | |
| | |
| | |
| | | |
handlers: ``ignore``, ``replace`` and ``surrogateescape``.
|
| | |
| | |
| | |
| | |
| | |
| | | |
Initialize kind to 0 (PyUnicode_WCHAR_KIND) to ensure that
_PyUnicodeWriter_PrepareKind() handles correctly read-only buffer: copy the
buffer.
|
|\ \ \
| |/ /
| | |
| | |
| | |
| | |
| | | |
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
|
| |\ \
| | |/
| | |
| | |
| | |
| | |
| | | |
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
|
| | |
| | |
| | |
| | |
| | | |
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
|
|\ \ \
| |/ /
| | |
| | | |
hash only once.
|
| | |
| | |
| | |
| | | |
hash only once.
|
| | |
| | |
| | |
| | | |
unicodeobject.h exposes PyUnicode_TranslateCharmap() and PyUnicode_Translate().
|
| | |
| | |
| | |
| | |
| | | |
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
|
|\ \ \
| |/ / |
|
| |\ \
| | |/ |
|
| | |
| | |
| | |
| | | |
Restore also errno value before calling PyErr_SetFromErrno().
|
|\ \ \
| |/ /
| | |
| | | |
imported at startup) now uses the backslashreplace error handler.
|
| |\ \
| | |/
| | |
| | | |
imported at startup) now uses the backslashreplace error handler.
|
| | |
| | |
| | |
| | | |
imported at startup) now uses the backslashreplace error handler.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape``
error handler: the encoders are now up to 3 times as fast.
Initial patch written by Serhiy Storchaka.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* Change limit type from unsigned int to Py_UCS4, to use the same type than the
"ch" variable (an Unicode character).
* Reuse ch variable for _Py_ERROR_XMLCHARREFREPLACE
* Add some newlines for readability
|
| | |
| | |
| | |
| | | |
Sorry, I pushed the patch on the UTF-8 decoder by mistake :-(
|
| | |
| | |
| | |
| | |
| | | |
It doesn't work to use #define XXX defined(YYY)" and then "#ifdef XXX"
to check YYY.
|
| | | |
|
| | |
| | |
| | |
| | | |
Add a macro which ensures that the writer has at least the requested kind.
|
| | |
| | |
| | |
| | |
| | |
| | | |
Factorize code with the new get_error_handler() function.
Add some empty lines for readability.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ignore and replace. Initial patch written by Naoki Inada.
The decoder is now up to 60 times as fast for these error handlers.
Add also unit tests for the ASCII decoder.
|
|\ \ \
| |/ / |
|
| | |
| | |
| | |
| | |
| | |
| | | |
avoid undefined behaviour when LONG_MAX type is smaller than 60 bits.
This change should fix a warning with the ICC compiler.
|
|\ \ \
| |/ / |
|
| | |
| | |
| | |
| | |
| | | |
PyObject_Length() returns a P_ssize_t, not an int. Use a Py_ssize_t to avoid
overflow.
|
| | |
| | |
| | |
| | | |
It is very unlikely that they can occur in real code for now.
|
|\ \ \
| |/ /
| | |
| | | |
(Merge 3.5 -> 3.6)
|
| |\ \
| | | |
| | | |
| | | | |
(Merge 3.5.0 -> 3.5)
|
| | | | |
|
|\ \ \ \
| |/ / / |
|