summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
Diffstat (limited to 'Doc')
-rw-r--r--Doc/using/cmdline.rst97
-rw-r--r--Doc/whatsnew/3.7.rst31
2 files changed, 104 insertions, 24 deletions
diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst
index e72dea9..c6bb0be 100644
--- a/Doc/using/cmdline.rst
+++ b/Doc/using/cmdline.rst
@@ -438,8 +438,10 @@ Miscellaneous options
* Set the :attr:`~sys.flags.dev_mode` attribute of :attr:`sys.flags` to
``True``
- * ``-X utf8`` enables the UTF-8 mode, whereas ``-X utf8=0`` disables the
- UTF-8 mode.
+ * ``-X utf8`` enables UTF-8 mode for operating system interfaces, overriding
+ the default locale-aware mode. ``-X utf8=0`` explicitly disables UTF-8
+ mode (even when it would otherwise activate automatically).
+ See :envvar:`PYTHONUTF8` for more details.
It also allows passing arbitrary values and retrieving them through the
:data:`sys._xoptions` dictionary.
@@ -789,14 +791,16 @@ conflict.
.. envvar:: PYTHONCOERCECLOCALE
If set to the value ``0``, causes the main Python command line application
- to skip coercing the legacy ASCII-based C locale to a more capable UTF-8
- based alternative.
+ to skip coercing the legacy ASCII-based C and POSIX locales to a more
+ capable UTF-8 based alternative.
- If this variable is *not* set, or is set to a value other than ``0``, and
- the current locale reported for the ``LC_CTYPE`` category is the default
- ``C`` locale, then the Python CLI will attempt to configure the following
- locales for the ``LC_CTYPE`` category in the order listed before loading the
- interpreter runtime:
+ If this variable is *not* set (or is set to a value other than ``0``), the
+ ``LC_ALL`` locale override environment variable is also not set, and the
+ current locale reported for the ``LC_CTYPE`` category is either the default
+ ``C`` locale, or else the explicitly ASCII-based ``POSIX`` locale, then the
+ Python CLI will attempt to configure the following locales for the
+ ``LC_CTYPE`` category in the order listed before loading the interpreter
+ runtime:
* ``C.UTF-8``
* ``C.utf8``
@@ -804,21 +808,32 @@ conflict.
If setting one of these locale categories succeeds, then the ``LC_CTYPE``
environment variable will also be set accordingly in the current process
- environment before the Python runtime is initialized. This ensures the
- updated setting is seen in subprocesses, as well as in operations that
- query the environment rather than the current C locale (such as Python's
- own :func:`locale.getdefaultlocale`).
+ environment before the Python runtime is initialized. This ensures that in
+ addition to being seen by both the interpreter itself and other locale-aware
+ components running in the same process (such as the GNU ``readline``
+ library), the updated setting is also seen in subprocesses (regardless of
+ whether or not those processes are running a Python interpreter), as well as
+ in operations that query the environment rather than the current C locale
+ (such as Python's own :func:`locale.getdefaultlocale`).
Configuring one of these locales (either explicitly or via the above
- implicit locale coercion) will automatically set the error handler for
- :data:`sys.stdin` and :data:`sys.stdout` to ``surrogateescape``. This
- behavior can be overridden using :envvar:`PYTHONIOENCODING` as usual.
+ implicit locale coercion) automatically enables the ``surrogateescape``
+ :ref:`error handler <error-handlers>` for :data:`sys.stdin` and
+ :data:`sys.stdout` (:data:`sys.stderr` continues to use ``backslashreplace``
+ as it does in any other locale). This stream handling behavior can be
+ overridden using :envvar:`PYTHONIOENCODING` as usual.
For debugging purposes, setting ``PYTHONCOERCECLOCALE=warn`` will cause
Python to emit warning messages on ``stderr`` if either the locale coercion
activates, or else if a locale that *would* have triggered coercion is
still active when the Python runtime is initialized.
+ Also note that even when locale coercion is disabled, or when it fails to
+ find a suitable target locale, :envvar:`PYTHONUTF8` will still activate by
+ default in legacy ASCII-based locales. Both features must be disabled in
+ order to force the interpreter to use ``ASCII`` instead of ``UTF-8`` for
+ system interfaces.
+
Availability: \*nix
.. versionadded:: 3.7
@@ -834,10 +849,56 @@ conflict.
.. envvar:: PYTHONUTF8
- If set to ``1``, enable the UTF-8 mode. If set to ``0``, disable the UTF-8
- mode. Any other non-empty string cause an error.
+ If set to ``1``, enables the interpreter's UTF-8 mode, where ``UTF-8`` is
+ used as the text encoding for system interfaces, regardless of the
+ current locale setting.
+
+ This means that:
+
+ * :func:`sys.getfilesystemencoding()` returns ``'UTF-8'`` (the locale
+ encoding is ignored).
+ * :func:`locale.getpreferredencoding()` returns ``'UTF-8'`` (the locale
+ encoding is ignored, and the function's ``do_setlocale`` parameter has no
+ effect).
+ * :data:`sys.stdin`, :data:`sys.stdout`, and :data:`sys.stderr` all use
+ UTF-8 as their text encoding, with the ``surrogateescape``
+ :ref:`error handler <error-handlers>` being enabled for :data:`sys.stdin`
+ and :data:`sys.stdout` (:data:`sys.stderr` continues to use
+ ``backslashreplace`` as it does in the default locale-aware mode)
+
+ As a consequence of the changes in those lower level APIs, other higher
+ level APIs also exhibit different default behaviours:
+
+ * Command line arguments, environment variables and filenames are decoded
+ to text using the UTF-8 encoding.
+ * :func:`os.fsdecode()` and :func:`os.fsencode()` use the UTF-8 encoding.
+ * :func:`open()`, :func:`io.open()`, and :func:`codecs.open()` use the UTF-8
+ encoding by default. However, they still use the strict error handler by
+ default so that attempting to open a binary file in text mode is likely
+ to raise an exception rather than producing nonsense data.
+
+ Note that the standard stream settings in UTF-8 mode can be overridden by
+ :envvar:`PYTHONIOENCODING` (just as they can be in the default locale-aware
+ mode).
+
+ If set to ``0``, the interpreter runs in its default locale-aware mode.
+
+ Setting any other non-empty string causes an error during interpreter
+ initialisation.
+
+ If this environment variable is not set at all, then the interpreter defaults
+ to using the current locale settings, *unless* the current locale is
+ identified as a legacy ASCII-based locale
+ (as descibed for :envvar:`PYTHONCOERCECLOCALE`), and locale coercion is
+ either disabled or fails. In such legacy locales, the interpreter will
+ default to enabling UTF-8 mode unless explicitly instructed not to do so.
+
+ Also available as the :option:`-X` ``utf8`` option.
+
+ Availability: \*nix
.. versionadded:: 3.7
+ See :pep:`540` for more details.
Debug-mode variables
diff --git a/Doc/whatsnew/3.7.rst b/Doc/whatsnew/3.7.rst
index 8a3afdf..762d84a 100644
--- a/Doc/whatsnew/3.7.rst
+++ b/Doc/whatsnew/3.7.rst
@@ -97,9 +97,10 @@ Significant improvements in the standard library:
CPython implementation improvements:
+* Avoiding the use of ASCII as a default text encoding:
+ * :ref:`PEP 538 <whatsnew37-pep538>`, legacy C locale coercion
+ * :ref:`PEP 540 <whatsnew37-pep540>`, forced UTF-8 runtime mode
* :ref:`PEP 552 <whatsnew37-pep552>`, deterministic .pycs
-* :ref:`PEP 538 <whatsnew37-pep538>`, legacy C locale coercion
-* :ref:`PEP 540 <whatsnew37-pep540>`, forced UTF-8 runtime mode
* :ref:`the new development runtime mode <whatsnew37-devmode>`
* :ref:`PEP 565 <whatsnew37-pep565>`, improved :exc:`DeprecationWarning`
handling
@@ -184,7 +185,8 @@ PEP 538: Legacy C Locale Coercion
An ongoing challenge within the Python 3 series has been determining a sensible
default strategy for handling the "7-bit ASCII" text encoding assumption
-currently implied by the use of the default C locale on non-Windows platforms.
+currently implied by the use of the default C or POSIX locale on non-Windows
+platforms.
:pep:`538` updates the default interpreter command line interface to
automatically coerce that locale to an available UTF-8 based locale as
@@ -205,10 +207,18 @@ continues to be ``backslashreplace``, regardless of locale.
Locale coercion is silent by default, but to assist in debugging potentially
locale related integration problems, explicit warnings (emitted directly on
-:data:`~sys.stderr` can be requested by setting ``PYTHONCOERCECLOCALE=warn``.
+:data:`~sys.stderr`) can be requested by setting ``PYTHONCOERCECLOCALE=warn``.
This setting will also cause the Python runtime to emit a warning if the
legacy C locale remains active when the core interpreter is initialized.
+While :pep:`538`'s locale coercion has the benefit of also affecting extension
+modules (such as GNU ``readline``), as well as child processes (including those
+running non-Python applications and older versions of Python), it has the
+downside of requiring that a suitable target locale be present on the running
+system. To better handle the case where no suitable target locale is available
+(as occurs on RHEL/CentOS 7, for example), Python 3.7 also implements
+:ref:`whatsnew37-pep540`.
+
.. seealso::
:pep:`538` -- Coercing the legacy C locale to a UTF-8 based locale
@@ -231,8 +241,17 @@ The forced UTF-8 mode can be used to change the text handling behavior in
an embedded Python interpreter without changing the locale settings of
an embedding application.
-The UTF-8 mode is enabled by default when the locale is "C". See
-:ref:`whatsnew37-pep538` for details.
+While :pep:`540`'s UTF-8 mode has the benefit of working regardless of which
+locales are available on the running system, it has the downside of having no
+effect on extension modules (such as GNU ``readline``), child processes running
+non-Python applications, and child processes running older versions of Python.
+To reduce the risk of corrupting text data when communicating with such
+components, Python 3.7 also implements :ref:`whatsnew37-pep540`).
+
+The UTF-8 mode is enabled by default when the locale is ``C`` or ``POSIX``, and
+the :pep:`538` locale coercion feature fails to change it to a UTF-8 based
+alternative (whether that failure is due to ``PYTHONCOERCECLOCALE=0`` being set,
+``LC_ALL`` being set, or the lack of a suitable target locale).
.. seealso::