| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
UnicodeDecodeError (#113674)
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
|
|
|
|
|
|
|
|
| |
Any capitalization of "xn--" should be acceptable for the ACE prefix
(see https://tools.ietf.org/html/rfc3490#section-5).
Co-authored-by: Pepijn de Vos <pepijndevos@gmail.com>
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
|
|
|
|
|
|
|
|
|
| |
The charset name "Windows-31J" is registered in the IANA Charset Registry[1]
and is implemented in Python as the cp932 codec.
[1] https://www.iana.org/assignments/charset-reg/windows-31J
Signed-off-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
|
|
|
|
|
|
|
|
| |
There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.
This also adds an early length check in IDNA decoding to outright reject
huge inputs early on given the ultimate result is defined to be 63 or fewer
characters.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
encodings registers the _alias_mbcs() codec search function before
the search_function() codec search function. Previously, the
_alias_mbcs() was never used.
Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code
page, not a fake ANSI code page number.
Remove the test_site.test_aliasing_mbcs() test: the alias is now
implemented in the encodings module, no longer in the site module.
|
|
|
|
|
|
|
|
|
| |
"raw-unicode-escape" codec (GH-28944)
They support now splitting escape sequences between input chunks.
Add the third parameter "final" in codecs.raw_unicode_escape_decode().
It is True by default to match the former behavior.
|
|
|
|
|
|
|
|
|
| |
codec (GH-28939)
They support now splitting escape sequences between input chunks.
Add the third parameter "final" in codecs.unicode_escape_decode().
It is True by default to match the former behavior.
|
|
|
|
| |
(GH-22219)
|
|
|
|
| |
Trying to decode an invalid string with the punycode codec
shoud raise UnicodeError.
|
|
|
|
| |
format (#17418)
|
|
|
|
|
|
|
| |
* bpo-34519: Add additional aliases for HP Roman 8
HP Roman 8 is known under mode aliases than listed in aliases.py.
Patch by Michael Osipov.
|
|
|
| |
It is alias to mac_latin2 now.
|
| |
|
| |
|
| |
|
| |
|
|
|
| |
This docstring has drifted since python2: https://github.com/python/cpython/blob/ca079a3ea30098aff3197c559a0e32d42dda6d84/Lib/encodings/__init__.py#L68
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add -X utf8 command line option, PYTHONUTF8 environment variable
and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
_winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
mode. As a side effect, open() now uses the UTF-8 encoding by
default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
always copy flag values, rather than only copying if the new value
is greater than the old value.
|
|
|
|
| |
encoding
|
| |
|
| |
|
|
|
|
| |
lookup
|
|
|
|
| |
Add also a newline for readability in normalize_encoding().
|
|
|
|
| |
Most fixes to Doc/ and Lib/ directories by Ville Skyttä.
|
|
|
|
|
|
| |
referenced.
Permission was validated prior to adding these markings.
|
|\ |
|
| |
| |
| |
| |
| |
| | |
This changes the equivalent functions listed for the Base-64, hex and Quoted-
Printable codecs to reflect the functions actually used. Also mention and
test the "quotetabs" setting for Quoted-Printable encoding.
|
| | |
|
| | |
|
|\ \
| |/
| |
| | |
Based on patch by Martin Panter.
|
| |
| |
| |
| | |
Based on patch by Martin Panter.
|
|\ \
| |/
| |
| | |
Patch by Berker Peksag.
|
| |
| |
| |
| | |
Patch by Berker Peksag.
|
|/
|
|
| |
(Windows code page 65001, Microsoft UTF-8).
|
|\ |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.
The latter mechanism remains in place for third party non-text
encodings.
Backported changeset d68df99d7a57.
|
| |
| |
| |
| |
| | |
Also updated the docs and added the aliases mentioned by the
references.
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The codecs themselves were restored in Python 3.2, this
completes the restoration by adding back the convenience
aliases.
These aliases were originally left out due to confusing
errors when attempting to use them with the text encoding
specific convenience methods. Python 3.4 includes several
improvements to those errors, thus permitting the aliases
to be restored as well.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.
The latter mechanism remains in place for third party non-text
encodings.
|
| | |
|
| | |
|
| |
| |
| |
| | |
ModuleNotFoundError.
|
|/
|
|
| |
cp037, cp500 and iso8859_1 codecs
|
| |
|
|
|
|
| |
Patch by Serhiy Storchaka.
|