summaryrefslogtreecommitdiffstats
path: root/Lib/encodings
Commit message (Collapse)AuthorAgeFilesLines
* gh-85287: Change codecs to raise precise UnicodeEncodeError and ↵John Sloboda2024-03-175-66/+152
| | | | | UnicodeDecodeError (#113674) Co-authored-by: Inada Naoki <songofacandy@gmail.com>
* gh-63283: IDNA prefix should be case insensitive (GH-17726)Zackery Spytz2024-03-151-3/+3
| | | | | | | | Any capitalization of "xn--" should be acceptable for the ACE prefix (see https://tools.ietf.org/html/rfc3490#section-5). Co-authored-by: Pepijn de Vos <pepijndevos@gmail.com> Co-authored-by: Erlend E. Aasland <erlend@python.org> Co-authored-by: Petr Viktorin <encukou@gmail.com>
* gh-102388: Add windows_31j to aliases for cp932 codec (#102389)Masayuki Moriyama2024-02-191-0/+1
| | | | | | | | | The charset name "Windows-31J" is registered in the IANA Charset Registry[1] and is implemented in Python as the cp932 codec. [1] https://www.iana.org/assignments/charset-reg/windows-31J Signed-off-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
* gh-98433: Fix quadratic time idna decoding. (#99092)Gregory P. Smith2022-11-081-17/+25
| | | | | | | | There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. This also adds an early length check in IDNA decoding to outright reject huge inputs early on given the ultimate result is defined to be 63 or fewer characters.
* bpo-46659: Fix the MBCS codec alias on Windows (GH-31218)Victor Stinner2022-02-221-4/+7
|
* bpo-46659: Update the test on the mbcs codec alias (GH-31168)Victor Stinner2022-02-061-3/+4
| | | | | | | | | | | encodings registers the _alias_mbcs() codec search function before the search_function() codec search function. Previously, the _alias_mbcs() was never used. Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code page, not a fake ANSI code page number. Remove the test_site.test_aliasing_mbcs() test: the alias is now implemented in the encodings module, no longer in the site module.
* bpo-45467: Fix IncrementalDecoder and StreamReader in the ↵Serhiy Storchaka2021-10-141-4/+5
| | | | | | | | | "raw-unicode-escape" codec (GH-28944) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.raw_unicode_escape_decode(). It is True by default to match the former behavior.
* bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" ↵Serhiy Storchaka2021-10-141-4/+5
| | | | | | | | | codec (GH-28939) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.unicode_escape_decode(). It is True by default to match the former behavior.
* bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters ↵Hai Shi2020-10-141-1/+2
| | | | (GH-22219)
* bpo-30566: Fix IndexError when using punycode codec (GH-18632)Berker Peksag2020-02-251-1/+1
| | | | Trying to decode an invalid string with the punycode codec shoud raise UnicodeError.
* bpo-38945: UU Encoding: Don't let newline in filename corrupt the output ↵Matthew Rollings2019-12-021-0/+4
| | | | format (#17418)
* bpo-34519: Add additional aliases for HP Roman 8 (GH-8956)Michael Osipov2019-09-111-0/+2
| | | | | | | * bpo-34519: Add additional aliases for HP Roman 8 HP Roman 8 is known under mode aliases than listed in aliases.py. Patch by Michael Osipov.
* bpo-35551: remove mac_centeuro encoding (GH-13856)Inada Naoki2019-06-061-307/+0
| | | It is alias to mac_latin2 now.
* bpo-35551: encodings update (GH-11446)Ashwin Ramaswami2019-06-051-3/+1
|
* Fix typos in docs and docstrings (GH-13745)Xtreak2019-06-021-1/+1
|
* bpo-36778: cp65001 encoding becomes an alias to utf_8 (GH-13230)Victor Stinner2019-05-102-43/+1
|
* bpo-36297: remove "unicode_internal" codec (GH-12342)Inada Naoki2019-03-181-45/+0
|
* Remove obsolete comment about latin-1 in `normalize_encoding` (GH-8739)Anthony Sottile2018-09-111-2/+1
| | | This docstring has drifted since python2: https://github.com/python/cpython/blob/ca079a3ea30098aff3197c559a0e32d42dda6d84/Lib/encodings/__init__.py#L68
* bpo-32943: Fix confusing error message for rot13 codec (GH-5869)Xiang Zhang2018-03-251-4/+4
|
* bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)Victor Stinner2017-12-131-2/+3
| | | | | | | | | | | | | | | | | | | | | | * Add -X utf8 command line option, PYTHONUTF8 environment variable and a new sys.flags.utf8_mode flag. * If the LC_CTYPE locale is "C" at startup: enable automatically the UTF-8 mode. * Add _winapi.GetACP(). encodings._alias_mbcs() now calls _winapi.GetACP() to get the ANSI code page * locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 mode. As a side effect, open() now uses the UTF-8 encoding by default in this mode. * Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding in the UTF-8 Mode. * Update subprocess._args_from_interpreter_flags() to handle -X utf8 * Skip some tests relying on the current locale if the UTF-8 mode is enabled. * Add test_utf8mode.py. * _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to return also the length (number of wide characters). * pymain_get_global_config() and pymain_set_global_config() now always copy flag values, rather than only copying if the new value is greater than the old value.
* Revert #27959: ImportError within an encoding module should also skip the ↵Steve Dower2016-09-091-3/+4
| | | | encoding
* Issue #28005: Allow ImportErrors in encoding implementation to propagate.Steve Dower2016-09-081-2/+3
|
* Issue #27959: Prevent ImportError from escaping codec search functionSteve Dower2016-09-071-4/+8
|
* Issue #27959: Adds oem encoding, alias ansi to mbcs, move aliasmbcs to codec ↵Steve Dower2016-09-073-0/+52
| | | | lookup
* PEP 7 style for if/else in CVictor Stinner2016-09-021-0/+1
| | | | Add also a newline for readability in normalize_encoding().
* Issue #27076: Doc, comment and tests spelling fixesMartin Panter2016-05-262-2/+2
| | | | Most fixes to Doc/ and Lib/ directories by Ville Skyttä.
* Add some "used with permission" mentions where external resources are ↵Brett Cannon2016-01-151-0/+2
| | | | | | referenced. Permission was validated prior to adding these markings.
* Issue #16473: Merge codecs doc and test from 3.4 into 3.5Martin Panter2015-09-121-1/+1
|\
| * Issue #16473: Fix byte transform codec documentation; test quotetabs=TrueMartin Panter2015-09-121-1/+1
| | | | | | | | | | | | This changes the equivalent functions listed for the Base-64, hex and Quoted- Printable codecs to reflect the functions actually used. Also mention and test the "quotetabs" setting for Quoted-Printable encoding.
* | Added forgotten new files for issues #22681 and #22682.Serhiy Storchaka2015-05-122-0/+615
| |
* | Issue #22682: Added support for the kz1048 encoding.Serhiy Storchaka2015-05-121-0/+5
| |
* | Issue #22406: Fixed the uu_codec codec incorrectly ported to 3.x.Serhiy Storchaka2014-11-071-1/+1
|\ \ | |/ | | | | Based on patch by Martin Panter.
| * Issue #22406: Fixed the uu_codec codec incorrectly ported to 3.x.Serhiy Storchaka2014-11-071-1/+1
| | | | | | | | Based on patch by Martin Panter.
* | Issue #21171: Fixed undocumented filter API of the rot13 codec.Serhiy Storchaka2014-04-131-1/+1
|\ \ | |/ | | | | Patch by Berker Peksag.
| * Issue #21171: Fixed undocumented filter API of the rot13 codec.Serhiy Storchaka2014-04-131-1/+1
| | | | | | | | Patch by Berker Peksag.
* | Issue #20574: Implement incremental decoder for cp65001 codeVictor Stinner2014-03-171-3/+6
|/ | | | (Windows code page 65001, Microsoft UTF-8).
* Merge #7475: Remove references to '.transform' from transform codec docstrings.R David Murray2014-03-146-12/+6
|\
| * #7475: Remove references to '.transform' from transform codec docstrings.R David Murray2014-03-146-12/+6
| |
| * Issue #19619: Blacklist non-text codecs in method APISerhiy Storchaka2014-02-247-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | str.encode, bytes.decode and bytearray.decode now use an internal API to throw LookupError for known non-text encodings, rather than attempting the encoding or decoding operation and then throwing a TypeError for an unexpected output type. The latter mechanism remains in place for third party non-text encodings. Backported changeset d68df99d7a57.
* | whatsnew: cp273 codec (#10907797)R David Murray2014-03-081-0/+5
| | | | | | | | | | Also updated the docs and added the aliases mentioned by the references.
* | Fixed incorrectly applying a patch for issue19668.Serhiy Storchaka2013-11-232-26/+724
| |
* | Issue #19668: Added support for the cp1125 encoding.Serhiy Storchaka2013-11-232-26/+32
| |
* | Close #7475: Restore binary & text transform codecsNick Coghlan2013-11-231-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | The codecs themselves were restored in Python 3.2, this completes the restoration by adding back the convenience aliases. These aliases were originally left out due to confusing errors when attempting to use them with the text encoding specific convenience methods. Python 3.4 includes several improvements to those errors, thus permitting the aliases to be restored as well.
* | Issue #19619: Blacklist non-text codecs in method APINick Coghlan2013-11-227-0/+7
| | | | | | | | | | | | | | | | | | | | str.encode, bytes.decode and bytearray.decode now use an internal API to throw LookupError for known non-text encodings, rather than attempting the encoding or decoding operation and then throwing a TypeError for an unexpected output type. The latter mechanism remains in place for third party non-text encodings.
* | #1097797: Add CP273 codec, and exercise it in the test suiteAndrew Kuchling2013-11-101-0/+307
| |
* | Issue #18200: Back out usage of ModuleNotFoundError (8d28d44f3a9a)Brett Cannon2013-07-041-3/+4
| |
* | Issue #18200: Update the stdlib (except tests) to useBrett Cannon2013-06-141-4/+3
| | | | | | | | ModuleNotFoundError.
* | Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings:Victor Stinner2013-04-093-3/+0
|/ | | | cp037, cp500 and iso8859_1 codecs
* Normalize whitespaceAntoine Pitrou2012-06-162-2/+0
|
* Issue #14874: Restore charmap decoding speed to pre-PEP 393 levels.Antoine Pitrou2012-06-167-422/+1078
| | | | Patch by Serhiy Storchaka.