summaryrefslogtreecommitdiffstats
path: root/Lib/test/test_codecs.py
Commit message (Collapse)AuthorAgeFilesLines
* bpo-39674: Revert "bpo-37330: open() no longer accept 'U' in file mode ↵Victor Stinner2020-03-041-15/+3
| | | | | | | (GH-16959)" (GH-18767) This reverts commit e471e72977c83664f13d041c78549140c86c92de. The mode will be removed from Python 3.10.
* bpo-38971: Open file in codecs.open() closes if exception raised. (GH-17666)Chris A2020-03-021-0/+9
| | | | | | Open issue in the BPO indicated a desire to make the implementation of codecs.open() at parity with io.open(), which implements a try/except to assure file stream gets closed before an exception is raised.
* bpo-30566: Fix IndexError when using punycode codec (GH-18632)Berker Peksag2020-02-251-0/+12
| | | | Trying to decode an invalid string with the punycode codec shoud raise UnicodeError.
* Remove binding of captured exceptions when not used to reduce the chances of ↵Pablo Galindo2019-11-191-1/+1
| | | | | | | creating cycles (GH-17246) Capturing exceptions into names can lead to reference cycles though the __traceback__ attribute of the exceptions in some obscure cases that have been reported previously and fixed individually. As these variables are not used anyway, we can remove the binding to reduce the chances of creating reference cycles. See for example GH-13135
* bpo-37330: open() no longer accept 'U' in file mode (GH-16959)Victor Stinner2019-10-281-3/+15
| | | | | open(), io.open(), codecs.open() and fileinput.FileInput no longer accept "U" ("universal newline") in the file mode. This flag was deprecated since Python 3.3.
* bpo-37876: Tests for ROT-13 codec (GH-15314)Zeth2019-09-091-0/+37
| | | | The Rot-13 codec is for educational use but does not have unit tests, dragging down test coverage. This adds a few very simple tests.
* bpo-36311: Fixes decoding multibyte characters around chunk boundaries and ↵Steve Dower2019-08-211-3/+17
| | | | improves decoding performance (GH-15083)
* Remove unused imports in tests (GH-14518)Victor Stinner2019-07-011-1/+1
|
* bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)Serhiy Storchaka2019-06-251-0/+24
| | | | | | | * The UTF-8 incremental decoders fails now fast if encounter a sequence that can't be handled by the error handler. * The UTF-16 incremental decoders with the surrogatepass error handler decodes now a lone low surrogate with final=False.
* bpo-36778: Remove outdated comment from CodePageTest (GH-13807)Victor Stinner2019-06-041-1/+0
| | | CP65001Test has been removed.
* bpo-33361: Fix bug with seeking in StreamRecoders (GH-8278)Ammar Askar2019-05-311-0/+25
|
* bpo-33482: fix codecs.StreamRecoder.writelines (GH-6779)Jelle Zijlstra2019-05-221-0/+21
| | | | | | A very simple fix. I found this while writing typeshed stubs for StreamRecoder. https://bugs.python.org/issue33482
* bpo-36778: cp65001 encoding becomes an alias to utf_8 (GH-13230)Victor Stinner2019-05-101-89/+0
|
* bpo-35920: Windows 10 ARM32 platform support (GH-11774)Paul Monson2019-04-251-1/+31
|
* bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603)Serhiy Storchaka2019-03-301-0/+9
| | | | The bug occurred when the encoded surrogate character is passed to the incremental decoder in two chunks.
* bpo-36312: Fix decoders for some code pages. (GH-12369)Serhiy Storchaka2019-03-201-0/+9
|
* bpo-36297: remove "unicode_internal" codec (GH-12342)Inada Naoki2019-03-181-102/+5
|
* bpo-22831: Use "with" to avoid possible fd leaks in tests (part 2). (GH-10929)Serhiy Storchaka2019-03-051-3/+2
|
* bpo-35372: Fix the code page decoder for input > 2 GiB. (GH-10848)Serhiy Storchaka2018-12-031-0/+18
|
* bpo-34523, bpo-35322: Fix unicode_encode_locale() (GH-10759)Victor Stinner2018-11-281-2/+12
| | | | | | | | | Fix memory leak in PyUnicode_EncodeLocale() and PyUnicode_EncodeFSDefault() on error handling. Changes: * Fix unicode_encode_locale() error handling * Fix test_codecs.LocaleCodecTest
* bpo-34523: Support surrogatepass in locale codecs (GH-8995)Victor Stinner2018-08-291-5/+113
| | | | | | | | | | | | | | | | | | | | Add support for the "surrogatepass" error handler in PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault() for the UTF-8 encoding. Changes: * _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the surrogatepass error handler (_Py_ERROR_SURROGATEPASS). * _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use the _Py_error_handler enum instead of "int surrogateescape" to pass the error handler. These functions now return -3 if the error handler is unknown. * Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() in test_codecs. * Rename get_error_handler() to _Py_GetErrorHandler() and expose it as a private function. * _freeze_importlib doesn't need config.filesystem_errors="strict" workaround anymore.
* bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences ↵Zackery Spytz2018-08-191-0/+1
| | | | | | | starting with "+". (GH-8741) The UTF-7 decoder now raises UnicodeDecodeError for ill-formed sequences starting with "+" (as specified in RFC 2152).
* bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)Victor Stinner2017-12-131-8/+2
| | | | | | | | | | | | | | | | | | | | | | * Add -X utf8 command line option, PYTHONUTF8 environment variable and a new sys.flags.utf8_mode flag. * If the LC_CTYPE locale is "C" at startup: enable automatically the UTF-8 mode. * Add _winapi.GetACP(). encodings._alias_mbcs() now calls _winapi.GetACP() to get the ANSI code page * locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 mode. As a side effect, open() now uses the UTF-8 encoding by default in this mode. * Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding in the UTF-8 Mode. * Update subprocess._args_from_interpreter_flags() to handle -X utf8 * Skip some tests relying on the current locale if the UTF-8 mode is enabled. * Add test_utf8mode.py. * _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to return also the length (number of wide characters). * pymain_get_global_config() and pymain_set_global_config() now always copy flag values, rather than only copying if the new value is greater than the old value.
* bpo-32110: codecs.StreamReader.read(n) now returns not more than n (#4499)Serhiy Storchaka2017-11-281-2/+16
| | | | | characters/bytes for non-negative n. This makes it compatible with read() methods of other file-like objects.
* bpo-31825: Fixed OverflowError in the 'unicode-escape' codec (#4058)Serhiy Storchaka2017-10-201-0/+4
| | | | and in codecs.escape_decode() when decode an escaped non-ascii byte.
* Issue #25270: Merge from 3.5Berker Peksag2016-09-161-0/+20
|\
| * Issue #25270: Prevent codecs.escape_encode() from raising SystemError when ↵Berker Peksag2016-09-161-0/+20
| | | | | | | | an empty bytestring is passed
* | #27364: Deprecate invalid escape strings in str/byutes.R David Murray2016-09-081-11/+24
| | | | | | | | Patch by Emanuel Barry, reviewed by Serhiy Storchaka and Martin Panter.
* | #27364: fix "incorrect" uses of escape character in the stdlib.R David Murray2016-09-081-7/+7
| | | | | | | | | | | | | | And most of the tools. Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and Martin Panter.
* | Issue #27959: Adds oem encoding, alias ansi to mbcs, move aliasmbcs to codec ↵Steve Dower2016-09-071-33/+29
| | | | | | | | lookup
* | Issue #23277: Remove unused imports in tests.Serhiy Storchaka2016-04-241-1/+0
| |
* | Merge typo fixes from 3.5Martin Panter2016-04-161-2/+2
|\ \ | |/
| * Fix typos in code comments and documentationMartin Panter2016-04-161-2/+2
| |
* | Issue #15984: Merge PyUnicode doc from 3.5Martin Panter2016-04-151-1/+1
|\ \ | |/
| * Correct “an” → “a” with “Unicode”, “user”, “UTF”, etcMartin Panter2016-04-151-1/+1
| | | | | | | | This affects documentation, code comments, and a debugging messages.
* | Issue #25523: Merge a-to-an corrections from 3.5Martin Panter2015-11-021-1/+1
|\ \ | |/
| * Issue #25523: Merge "a" to "an" fixes from 3.4 into 3.5Martin Panter2015-11-021-1/+1
| |\
| | * Issue #25523: Correct "a" article to "an" articleMartin Panter2015-11-021-1/+1
| | | | | | | | | | | | | | | | | | This changes the main documentation, doc strings, source code comments, and a couple error messages in the test suite. In some cases the word was removed or edited some other way to fix the grammar.
* | | Issue #25318: Avoid sprintf() in backslashreplace()Victor Stinner2015-10-091-2/+4
| | | | | | | | | | | | | | | | | | Rewrite backslashreplace() to be closer to PyCodec_BackslashReplaceErrors(). Add also unit tests for non-BMP characters.
* | | Issue #25301: The UTF-8 decoder is now up to 15 times as fast for errorVictor Stinner2015-10-051-0/+12
| | | | | | | | | | | | handlers: ``ignore``, ``replace`` and ``surrogateescape``.
* | | Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:Serhiy Storchaka2015-10-021-1/+59
|\ \ \ | |/ / | | | | | | | | | | | | | | | 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
| * | Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:Serhiy Storchaka2015-10-021-1/+59
| |\ \ | | |/ | | | | | | | | | | | | | | | 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
| | * Issue #24848: Fixed bugs in UTF-7 decoding of misformed data:Serhiy Storchaka2015-10-021-1/+59
| | | | | | | | | | | | | | | 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate.
* | | Issue #25267: The UTF-8 encoder is now up to 75 times as fast for errorVictor Stinner2015-10-011-10/+27
| | | | | | | | | | | | | | | handlers: ``ignore``, ``replace``, ``surrogateescape``, ``surrogatepass``. Patch co-written with Serhiy Storchaka.
* | | Optimize ascii/latin1+surrogateescape encodersVictor Stinner2015-09-291-0/+60
| | | | | | | | | | | | | | | | | | | | | Issue #25227: Optimize ASCII and latin1 encoders with the ``surrogateescape`` error handler: the encoders are now up to 3 times as fast. Initial patch written by Serhiy Storchaka.
* | | Issue #24870: Optimize the ASCII decoder for error handlers: surrogateescape,Victor Stinner2015-09-211-0/+32
|/ / | | | | | | | | | | | | | | ignore and replace. Initial patch written by Naoki Inada. The decoder is now up to 60 times as fast for these error handlers. Add also unit tests for the ASCII decoder.
* | Issue #16473: Merge codecs doc and test from 3.4 into 3.5Martin Panter2015-09-121-0/+8
|\ \ | |/
| * Issue #16473: Fix byte transform codec documentation; test quotetabs=TrueMartin Panter2015-09-121-0/+8
| | | | | | | | | | | | This changes the equivalent functions listed for the Base-64, hex and Quoted- Printable codecs to reflect the functions actually used. Also mention and test the "quotetabs" setting for Quoted-Printable encoding.
* | Issue #22681: Added support for the koi8_t encoding.Serhiy Storchaka2015-05-121-0/+1
| |
* | Issue #22682: Added support for the kz1048 encoding.Serhiy Storchaka2015-05-121-0/+1
| |