summaryrefslogtreecommitdiffstats
path: root/Objects/stringlib/codecs.h
Commit message (Collapse)AuthorAgeFilesLines
* bpo-36775: _PyCoreConfig only uses wchar_t* (GH-13062)Victor Stinner2019-05-021-1/+1
| | | | | | | | | | | | | | | | | _PyCoreConfig: Change filesystem_encoding, filesystem_errors, stdio_encoding and stdio_errors fields type from char* to wchar_t*. Changes: * PyInterpreterState: replace fscodec_initialized (int) with fs_codec structure. * Add get_error_handler_wide() and unicode_encode_utf8() helper functions. * Add error_handler parameter to unicode_encode_locale() and unicode_decode_locale(). * Remove _PyCoreConfig_SetString(). * Rename _PyCoreConfig_SetWideString() to _PyCoreConfig_SetString(). * Rename _PyCoreConfig_SetWideStringFromString() to _PyCoreConfig_DecodeLocale().
* bpo-34523: Support surrogatepass in locale codecs (GH-8995)Victor Stinner2018-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | Add support for the "surrogatepass" error handler in PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault() for the UTF-8 encoding. Changes: * _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the surrogatepass error handler (_Py_ERROR_SURROGATEPASS). * _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use the _Py_error_handler enum instead of "int surrogateescape" to pass the error handler. These functions now return -3 if the error handler is unknown. * Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() in test_codecs. * Rename get_error_handler() to _Py_GetErrorHandler() and expose it as a private function. * _freeze_importlib doesn't need config.filesystem_errors="strict" workaround anymore.
* bpo-30923: Silence fall-through warnings included in -Wextra since gcc-7.0. ↵Stefan Krah2017-08-211-2/+2
| | | | (#3157)
* Issue #28561: Clean up UTF-8 encoder: remove dead code, update comments, etc.Serhiy Storchaka2016-10-301-10/+4
| | | | Patch by Xiang Zhang.
* PEP 7 style for if/else in CVictor Stinner2016-09-021-1/+2
| | | | Add also a newline for readability in normalize_encoding().
* Issue #27895: Spelling fixes (Contributed by Ville Skyttä).Raymond Hettinger2016-08-301-3/+3
|
* Issue #26765: Ensure that bytes- and unicode-specific stringlib files are usedSerhiy Storchaka2016-05-161-3/+3
| | | | with correct type.
* Optimize error handlers of ASCII and Latin1 encoders when the replacementVictor Stinner2015-10-091-11/+7
| | | | | | | | | | | string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual character. Cleanup unicode_encode_ucs1(): * Rename repunicode to rep * Clear rep object on error * Factorize code between bytes and unicode path
* Add _PyBytesWriter_WriteBytes() to factorize the codeVictor Stinner2015-10-091-11/+11
|
* _PyBytesWriter: simplify code to avoid "prealloc" parametersVictor Stinner2015-10-091-8/+12
| | | | | Substract preallocate bytes from min_size before calling _PyBytesWriter_Prepare().
* Optimize backslashreplace error handlerVictor Stinner2015-10-081-2/+16
| | | | | | | | | | Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and Latin1 encoders. Use the new _PyBytesWriter API to optimize these error handlers for the encoders. It avoids to create an exception and call the slow implementation of the error handler.
* Issue #25318: Add _PyBytesWriter APIVictor Stinner2015-10-081-63/+21
| | | | | | | | | | | Add a new private API to optimize Unicode encoders. It uses a small buffer allocated on the stack and supports overallocation. Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable overallocation for the UTF-8 encoder with error handlers. unicode_encode_ucs1(): initialize collend to collstart+1 to not check the current character twice, we already know that it is not ASCII.
* Issue #25267: The UTF-8 encoder is now up to 75 times as fast for errorVictor Stinner2015-10-011-51/+96
| | | | | handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``. Patch co-written with Serhiy Storchaka.
* Fixed typos in comments.Serhiy Storchaka2015-05-181-4/+4
|\
| * Fixed typos in comments.Serhiy Storchaka2015-05-181-2/+2
| |
* | Issue #15027: The UTF-32 encoder is now 3x to 7x faster.Serhiy Storchaka2015-05-121-0/+87
|/
* Reverted changeset b72c5573c5e7 (issue #15027).Serhiy Storchaka2014-01-041-87/+0
|
* Issue #15027: Rewrite the UTF-32 encoder. It is now 1.6x to 3.5x faster.Serhiy Storchaka2014-01-041-0/+87
|
* Remove dead code committed in issue #12892.Serhiy Storchaka2013-11-191-104/+0
|
* Issue #12892: The utf-16* and utf-32* codecs now reject (lone) surrogates.Serhiy Storchaka2013-11-191-16/+182
| | | | | | | | | | The utf-16* and utf-32* encoders no longer allow surrogate code points (U+D800-U+DFFF) to be encoded. The utf-32* decoders no longer decode byte sequences that correspond to surrogate code points. The surrogatepass error handler now works with the utf-16* and utf-32* codecs. Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
* Issue #18722: Remove uses of the "register" keyword in C code.Antoine Pitrou2013-08-131-3/+3
|
* (Merge 3.3) Issue #8271: Fix compilation on WindowsVictor Stinner2012-11-041-1/+1
|\
| * Issue #8271: Fix compilation on WindowsVictor Stinner2012-11-041-1/+1
| |
* | #8271: merge with 3.3.Ezio Melotti2012-11-041-30/+62
|\ \ | |/
| * #8271: the utf-8 decoder now outputs the correct number of U+FFFD ↵Ezio Melotti2012-11-041-30/+62
| | | | | | | | characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.
* | Issue #16166: Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unifiedChristian Heimes2012-10-171-3/+3
|/ | | | endianess detection and handling.
* Issue #15144: Fix possible integer overflow when handling pointers as ↵Antoine Pitrou2012-09-201-9/+5
| | | | | | integer values, by using Py_uintptr_t instead of size_t. Patch by Serhiy Storchaka.
* Use correct types for ASCII_CHAR_MASK integer constants.Mark Dickinson2012-07-071-2/+2
|
* Issue #14923: Optimize continuation-byte check in UTF-8 decoding. Patch by ↵Mark Dickinson2012-06-231-6/+10
| | | | Serhiy Storchaka.
* Issue #15026: utf-16 encoding is now significantly faster (up to 10x).Antoine Pitrou2012-06-151-0/+64
| | | | Patch by Serhiy Storchaka.
* Issue #14624: UTF-16 decoding is now 3x to 4x faster on various inputs.Antoine Pitrou2012-05-151-1/+148
| | | | Patch by Serhiy Storchaka.
* Issue #14738: Speed-up UTF-8 decoding on non-ASCII data. Patch by Serhiy ↵Antoine Pitrou2012-05-101-78/+143
| | | | Storchaka.
* Issue #13624: Write a specialized UTF-8 encoder to allow more optimizationVictor Stinner2011-12-181-0/+197
| | | | The main bottleneck was the PyUnicode_READ() macro.
* Issue #13417: speed up utf-8 decoding by around 2x for the non-fully-ASCII case.Antoine Pitrou2011-11-211-0/+156
This almost catches up with pre-PEP 393 performance, when decoding needed only one pass.