diff options
author | Victor Stinner <victor.stinner@gmail.com> | 2018-01-15 09:45:49 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2018-01-15 09:45:49 (GMT) |
commit | 7ed7aead9503102d2ed316175f198104e0cd674c (patch) | |
tree | 0b70b3b7d2eed5ea92552c1b93953d0333f5a869 /Doc/c-api | |
parent | ee3b83547c6b0cac1da2cb44aaaea533a1d1bbc8 (diff) | |
download | cpython-7ed7aead9503102d2ed316175f198104e0cd674c.zip cpython-7ed7aead9503102d2ed316175f198104e0cd674c.tar.gz cpython-7ed7aead9503102d2ed316175f198104e0cd674c.tar.bz2 |
bpo-29240: Fix locale encodings in UTF-8 Mode (#5170)
Modify locale.localeconv(), time.tzname, os.strerror() and other
functions to ignore the UTF-8 Mode: always use the current locale
encoding.
Changes:
* Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or
encoding error, they return the position of the error and an error
message which are used to raise Unicode errors in
PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale().
* Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx().
* PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all
cases, especially for the strict error handler.
* Add _Py_DecodeUTF8Ex(): return more information on decoding error
and supports the strict error handler.
* Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex().
* Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx().
* Ignore the UTF-8 mode to encode/decode localeconv(), strerror()
and time zone name.
* Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize()
and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use
the "current" locale.
* Remove _PyUnicode_DecodeCurrentLocale(),
_PyUnicode_DecodeCurrentLocaleAndSize() and
_PyUnicode_EncodeCurrentLocale().
Diffstat (limited to 'Doc/c-api')
-rw-r--r-- | Doc/c-api/sys.rst | 22 | ||||
-rw-r--r-- | Doc/c-api/unicode.rst | 16 |
2 files changed, 38 insertions, 0 deletions
diff --git a/Doc/c-api/sys.rst b/Doc/c-api/sys.rst index 20bc7bd..e4da96c 100644 --- a/Doc/c-api/sys.rst +++ b/Doc/c-api/sys.rst @@ -106,6 +106,16 @@ Operating System Utilities surrogate character, escape the bytes using the surrogateescape error handler instead of decoding them. + Encoding, highest priority to lowest priority: + + * ``UTF-8`` on macOS and Android; + * ``UTF-8`` if the Python UTF-8 mode is enabled; + * ``ASCII`` if the ``LC_CTYPE`` locale is ``"C"``, + ``nl_langinfo(CODESET)`` returns the ``ASCII`` encoding (or an alias), + and :c:func:`mbstowcs` and :c:func:`wcstombs` functions uses the + ``ISO-8859-1`` encoding. + * the current locale encoding. + Return a pointer to a newly allocated wide character string, use :c:func:`PyMem_RawFree` to free the memory. If size is not ``NULL``, write the number of wide characters excluding the null character into ``*size`` @@ -137,6 +147,18 @@ Operating System Utilities :ref:`surrogateescape error handler <surrogateescape>`: surrogate characters in the range U+DC80..U+DCFF are converted to bytes 0x80..0xFF. + Encoding, highest priority to lowest priority: + + * ``UTF-8`` on macOS and Android; + * ``UTF-8`` if the Python UTF-8 mode is enabled; + * ``ASCII`` if the ``LC_CTYPE`` locale is ``"C"``, + ``nl_langinfo(CODESET)`` returns the ``ASCII`` encoding (or an alias), + and :c:func:`mbstowcs` and :c:func:`wcstombs` functions uses the + ``ISO-8859-1`` encoding. + * the current locale encoding. + + The function uses the UTF-8 encoding in the Python UTF-8 mode. + Return a pointer to a newly allocated byte string, use :c:func:`PyMem_Free` to free the memory. Return ``NULL`` on encoding error or memory allocation error diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst index 45aff1b..3f6c055 100644 --- a/Doc/c-api/unicode.rst +++ b/Doc/c-api/unicode.rst @@ -770,12 +770,20 @@ system. :c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at Python startup). + This function ignores the Python UTF-8 mode. + .. seealso:: The :c:func:`Py_DecodeLocale` function. .. versionadded:: 3.3 + .. versionchanged:: 3.7 + The function now also uses the current locale encoding for the + ``surrogateescape`` error handler. Previously, :c:func:`Py_DecodeLocale` + was used for the ``surrogateescape``, and the current locale encoding was + used for ``strict``. + .. c:function:: PyObject* PyUnicode_DecodeLocale(const char *str, const char *errors) @@ -797,12 +805,20 @@ system. :c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at Python startup). + This function ignores the Python UTF-8 mode. + .. seealso:: The :c:func:`Py_EncodeLocale` function. .. versionadded:: 3.3 + .. versionchanged:: 3.7 + The function now also uses the current locale encoding for the + ``surrogateescape`` error handler. Previously, :c:func:`Py_EncodeLocale` + was used for the ``surrogateescape``, and the current locale encoding was + used for ``strict``. + File System Encoding """""""""""""""""""" |