| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_PyCoreConfig: Change filesystem_encoding, filesystem_errors,
stdio_encoding and stdio_errors fields type from char* to wchar_t*.
Changes:
* PyInterpreterState: replace fscodec_initialized (int) with fs_codec
structure.
* Add get_error_handler_wide() and unicode_encode_utf8() helper
functions.
* Add error_handler parameter to unicode_encode_locale()
and unicode_decode_locale().
* Remove _PyCoreConfig_SetString().
* Rename _PyCoreConfig_SetWideString() to _PyCoreConfig_SetString().
* Rename _PyCoreConfig_SetWideStringFromString()
to _PyCoreConfig_DecodeLocale().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for the "surrogatepass" error handler in
PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault()
for the UTF-8 encoding.
Changes:
* _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the
surrogatepass error handler (_Py_ERROR_SURROGATEPASS).
* _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use
the _Py_error_handler enum instead of "int surrogateescape" to pass
the error handler. These functions now return -3 if the error
handler is unknown.
* Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx()
in test_codecs.
* Rename get_error_handler() to _Py_GetErrorHandler() and expose it
as a private function.
* _freeze_importlib doesn't need config.filesystem_errors="strict"
workaround anymore.
|
|
|
|
| |
(#3157)
|
|
|
|
| |
Patch by Xiang Zhang.
|
|
|
|
| |
Add also a newline for readability in normalize_encoding().
|
| |
|
|
|
|
| |
with correct type.
|
|
|
|
|
|
|
|
|
|
|
| |
string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual
character.
Cleanup unicode_encode_ucs1():
* Rename repunicode to rep
* Clear rep object on error
* Factorize code between bytes and unicode path
|
| |
|
|
|
|
|
| |
Substract preallocate bytes from min_size before calling
_PyBytesWriter_Prepare().
|
|
|
|
|
|
|
|
|
|
| |
Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in
UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and
Latin1 encoders.
Use the new _PyBytesWriter API to optimize these error handlers for the
encoders. It avoids to create an exception and call the slow implementation of
the error handler.
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new private API to optimize Unicode encoders. It uses a small buffer
allocated on the stack and supports overallocation.
Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable
overallocation for the UTF-8 encoder with error handlers.
unicode_encode_ucs1(): initialize collend to collstart+1 to not check the
current character twice, we already know that it is not ASCII.
|
|
|
|
|
| |
handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``.
Patch co-written with Serhiy Storchaka.
|
|\ |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.
Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
|
| |
|
|\ |
|
| | |
|
|\ \
| |/ |
|
| |
| |
| |
| | |
characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.
|
|/
|
|
| |
endianess detection and handling.
|
|
|
|
|
|
| |
integer values, by using Py_uintptr_t instead of size_t.
Patch by Serhiy Storchaka.
|
| |
|
|
|
|
| |
Serhiy Storchaka.
|
|
|
|
| |
Patch by Serhiy Storchaka.
|
|
|
|
| |
Patch by Serhiy Storchaka.
|
|
|
|
| |
Storchaka.
|
|
|
|
| |
The main bottleneck was the PyUnicode_READ() macro.
|
|
This almost catches up with pre-PEP 393 performance, when decoding needed
only one pass.
|