diff options
author | Victor Stinner <victor.stinner@gmail.com> | 2017-12-13 11:29:09 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2017-12-13 11:29:09 (GMT) |
commit | 91106cd9ff2f321c0f60fbaa09fd46c80aa5c266 (patch) | |
tree | ff002e0532736a97f3ddd367c1491e7b04611816 /Doc | |
parent | c3e070f84931c847d1b35e7fb36aa71edd6215f6 (diff) | |
download | cpython-91106cd9ff2f321c0f60fbaa09fd46c80aa5c266.zip cpython-91106cd9ff2f321c0f60fbaa09fd46c80aa5c266.tar.gz cpython-91106cd9ff2f321c0f60fbaa09fd46c80aa5c266.tar.bz2 |
bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)
* Add -X utf8 command line option, PYTHONUTF8 environment variable
and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
_winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
mode. As a side effect, open() now uses the UTF-8 encoding by
default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
always copy flag values, rather than only copying if the new value
is greater than the old value.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/c-api/sys.rst | 13 | ||||
-rw-r--r-- | Doc/library/locale.rst | 7 | ||||
-rw-r--r-- | Doc/library/sys.rst | 13 | ||||
-rw-r--r-- | Doc/using/cmdline.rst | 13 | ||||
-rw-r--r-- | Doc/whatsnew/3.7.rst | 21 |
5 files changed, 63 insertions, 4 deletions
diff --git a/Doc/c-api/sys.rst b/Doc/c-api/sys.rst index 95d9d65..20bc7bd 100644 --- a/Doc/c-api/sys.rst +++ b/Doc/c-api/sys.rst @@ -127,6 +127,9 @@ Operating System Utilities .. versionadded:: 3.5 + .. versionchanged:: 3.7 + The function now uses the UTF-8 encoding in the UTF-8 mode. + .. c:function:: char* Py_EncodeLocale(const wchar_t *text, size_t *error_pos) @@ -138,12 +141,15 @@ Operating System Utilities to free the memory. Return ``NULL`` on encoding error or memory allocation error - If error_pos is not ``NULL``, ``*error_pos`` is set to the index of the - invalid character on encoding error, or set to ``(size_t)-1`` otherwise. + If error_pos is not ``NULL``, ``*error_pos`` is set to ``(size_t)-1`` on + success, or set to the index of the invalid character on encoding error. Use the :c:func:`Py_DecodeLocale` function to decode the bytes string back to a wide character string. + .. versionchanged:: 3.7 + The function now uses the UTF-8 encoding in the UTF-8 mode. + .. seealso:: The :c:func:`PyUnicode_EncodeFSDefault` and @@ -151,6 +157,9 @@ Operating System Utilities .. versionadded:: 3.5 + .. versionchanged:: 3.7 + The function now supports the UTF-8 mode. + .. _systemfunctions: diff --git a/Doc/library/locale.rst b/Doc/library/locale.rst index e8567a7..7da94a2 100644 --- a/Doc/library/locale.rst +++ b/Doc/library/locale.rst @@ -316,6 +316,13 @@ The :mod:`locale` module defines the following exception and functions: preferences, so this function is not thread-safe. If invoking setlocale is not necessary or desired, *do_setlocale* should be set to ``False``. + On Android or in the UTF-8 mode (:option:`-X` ``utf8`` option), always + return ``'UTF-8'``, the locale and the *do_setlocale* argument are ignored. + + .. versionchanged:: 3.7 + The function now always returns ``UTF-8`` on Android or if the UTF-8 mode + is enabled. + .. function:: normalize(localename) diff --git a/Doc/library/sys.rst b/Doc/library/sys.rst index 9e47681..957d02b 100644 --- a/Doc/library/sys.rst +++ b/Doc/library/sys.rst @@ -313,6 +313,9 @@ always available. has caught :exc:`SystemExit` (such as an error flushing buffered data in the standard streams), the exit status is changed to 120. + .. versionchanged:: 3.7 + Added ``utf8_mode`` attribute for the new :option:`-X` ``utf8`` flag. + .. data:: flags @@ -335,6 +338,7 @@ always available. :const:`quiet` :option:`-q` :const:`hash_randomization` :option:`-R` :const:`dev_mode` :option:`-X` ``dev`` + :const:`utf8_mode` :option:`-X` ``utf8`` ============================= ============================= .. versionchanged:: 3.2 @@ -347,7 +351,8 @@ always available. Removed obsolete ``division_warning`` attribute. .. versionchanged:: 3.7 - Added ``dev_mode`` attribute for the new :option:`-X` ``dev`` flag. + Added ``dev_mode`` attribute for the new :option:`-X` ``dev`` flag + and ``utf8_mode`` attribute for the new :option:`-X` ``utf8`` flag. .. data:: float_info @@ -492,6 +497,8 @@ always available. :func:`os.fsencode` and :func:`os.fsdecode` should be used to ensure that the correct encoding and errors mode are used. + * In the UTF-8 mode, the encoding is ``utf-8`` on any platform. + * On Mac OS X, the encoding is ``'utf-8'``. * On Unix, the encoding is the locale encoding. @@ -506,6 +513,10 @@ always available. Windows is no longer guaranteed to return ``'mbcs'``. See :pep:`529` and :func:`_enablelegacywindowsfsencoding` for more information. + .. versionchanged:: 3.7 + Return 'utf-8' in the UTF-8 mode. + + .. function:: getfilesystemencodeerrors() Return the name of the error mode used to convert between Unicode filenames diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst index e32f77e..5cb9071 100644 --- a/Doc/using/cmdline.rst +++ b/Doc/using/cmdline.rst @@ -439,6 +439,9 @@ Miscellaneous options * Set the :attr:`~sys.flags.dev_mode` attribute of :attr:`sys.flags` to ``True`` + * ``-X utf8`` enables the UTF-8 mode, whereas ``-X utf8=0`` disables the + UTF-8 mode. + It also allows passing arbitrary values and retrieving them through the :data:`sys._xoptions` dictionary. @@ -455,7 +458,7 @@ Miscellaneous options The ``-X showalloccount`` option. .. versionadded:: 3.7 - The ``-X importtime`` and ``-X dev`` options. + The ``-X importtime``, ``-X dev`` and ``-X utf8`` options. Options you shouldn't use @@ -816,6 +819,14 @@ conflict. .. versionadded:: 3.7 +.. envvar:: PYTHONUTF8 + + If set to ``1``, enable the UTF-8 mode. If set to ``0``, disable the UTF-8 + mode. Any other non-empty string cause an error. + + .. versionadded:: 3.7 + + Debug-mode variables ~~~~~~~~~~~~~~~~~~~~ diff --git a/Doc/whatsnew/3.7.rst b/Doc/whatsnew/3.7.rst index 58bfaef..81a88a0 100644 --- a/Doc/whatsnew/3.7.rst +++ b/Doc/whatsnew/3.7.rst @@ -185,6 +185,23 @@ resolution on Linux and Windows. PEP written and implemented by Victor Stinner +PEP 540: Add a new UTF-8 mode +----------------------------- + +Add a new UTF-8 mode to ignore the locale, use the UTF-8 encoding, and change +:data:`sys.stdin` and :data:`sys.stdout` error handlers to ``surrogateescape``. +This mode is enabled by default in the POSIX locale, but otherwise disabled by +default. + +The new :option:`-X` ``utf8`` command line option and :envvar:`PYTHONUTF8` +environment variable are added to control the UTF-8 mode. + +.. seealso:: + + :pep:`540` -- Add a new UTF-8 mode + PEP written and implemented by Victor Stinner + + New Development Mode: -X dev ---------------------------- @@ -353,6 +370,10 @@ Added another argument *monetary* in :meth:`format_string` of :mod:`locale`. If *monetary* is true, the conversion uses monetary thousands separator and grouping strings. (Contributed by Garvit in :issue:`10379`.) +The :func:`locale.getpreferredencoding` function now always returns ``'UTF-8'`` +on Android or in the UTF-8 mode (:option:`-X` ``utf8`` option), the locale and +the *do_setlocale* argument are ignored. + math ---- |