summaryrefslogtreecommitdiffstats
path: root/Doc/using/cmdline.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/using/cmdline.rst')
-rw-r--r--Doc/using/cmdline.rst97
1 files changed, 79 insertions, 18 deletions
diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst
index e72dea9..c6bb0be 100644
--- a/Doc/using/cmdline.rst
+++ b/Doc/using/cmdline.rst
@@ -438,8 +438,10 @@ Miscellaneous options
* Set the :attr:`~sys.flags.dev_mode` attribute of :attr:`sys.flags` to
``True``
- * ``-X utf8`` enables the UTF-8 mode, whereas ``-X utf8=0`` disables the
- UTF-8 mode.
+ * ``-X utf8`` enables UTF-8 mode for operating system interfaces, overriding
+ the default locale-aware mode. ``-X utf8=0`` explicitly disables UTF-8
+ mode (even when it would otherwise activate automatically).
+ See :envvar:`PYTHONUTF8` for more details.
It also allows passing arbitrary values and retrieving them through the
:data:`sys._xoptions` dictionary.
@@ -789,14 +791,16 @@ conflict.
.. envvar:: PYTHONCOERCECLOCALE
If set to the value ``0``, causes the main Python command line application
- to skip coercing the legacy ASCII-based C locale to a more capable UTF-8
- based alternative.
+ to skip coercing the legacy ASCII-based C and POSIX locales to a more
+ capable UTF-8 based alternative.
- If this variable is *not* set, or is set to a value other than ``0``, and
- the current locale reported for the ``LC_CTYPE`` category is the default
- ``C`` locale, then the Python CLI will attempt to configure the following
- locales for the ``LC_CTYPE`` category in the order listed before loading the
- interpreter runtime:
+ If this variable is *not* set (or is set to a value other than ``0``), the
+ ``LC_ALL`` locale override environment variable is also not set, and the
+ current locale reported for the ``LC_CTYPE`` category is either the default
+ ``C`` locale, or else the explicitly ASCII-based ``POSIX`` locale, then the
+ Python CLI will attempt to configure the following locales for the
+ ``LC_CTYPE`` category in the order listed before loading the interpreter
+ runtime:
* ``C.UTF-8``
* ``C.utf8``
@@ -804,21 +808,32 @@ conflict.
If setting one of these locale categories succeeds, then the ``LC_CTYPE``
environment variable will also be set accordingly in the current process
- environment before the Python runtime is initialized. This ensures the
- updated setting is seen in subprocesses, as well as in operations that
- query the environment rather than the current C locale (such as Python's
- own :func:`locale.getdefaultlocale`).
+ environment before the Python runtime is initialized. This ensures that in
+ addition to being seen by both the interpreter itself and other locale-aware
+ components running in the same process (such as the GNU ``readline``
+ library), the updated setting is also seen in subprocesses (regardless of
+ whether or not those processes are running a Python interpreter), as well as
+ in operations that query the environment rather than the current C locale
+ (such as Python's own :func:`locale.getdefaultlocale`).
Configuring one of these locales (either explicitly or via the above
- implicit locale coercion) will automatically set the error handler for
- :data:`sys.stdin` and :data:`sys.stdout` to ``surrogateescape``. This
- behavior can be overridden using :envvar:`PYTHONIOENCODING` as usual.
+ implicit locale coercion) automatically enables the ``surrogateescape``
+ :ref:`error handler <error-handlers>` for :data:`sys.stdin` and
+ :data:`sys.stdout` (:data:`sys.stderr` continues to use ``backslashreplace``
+ as it does in any other locale). This stream handling behavior can be
+ overridden using :envvar:`PYTHONIOENCODING` as usual.
For debugging purposes, setting ``PYTHONCOERCECLOCALE=warn`` will cause
Python to emit warning messages on ``stderr`` if either the locale coercion
activates, or else if a locale that *would* have triggered coercion is
still active when the Python runtime is initialized.
+ Also note that even when locale coercion is disabled, or when it fails to
+ find a suitable target locale, :envvar:`PYTHONUTF8` will still activate by
+ default in legacy ASCII-based locales. Both features must be disabled in
+ order to force the interpreter to use ``ASCII`` instead of ``UTF-8`` for
+ system interfaces.
+
Availability: \*nix
.. versionadded:: 3.7
@@ -834,10 +849,56 @@ conflict.
.. envvar:: PYTHONUTF8
- If set to ``1``, enable the UTF-8 mode. If set to ``0``, disable the UTF-8
- mode. Any other non-empty string cause an error.
+ If set to ``1``, enables the interpreter's UTF-8 mode, where ``UTF-8`` is
+ used as the text encoding for system interfaces, regardless of the
+ current locale setting.
+
+ This means that:
+
+ * :func:`sys.getfilesystemencoding()` returns ``'UTF-8'`` (the locale
+ encoding is ignored).
+ * :func:`locale.getpreferredencoding()` returns ``'UTF-8'`` (the locale
+ encoding is ignored, and the function's ``do_setlocale`` parameter has no
+ effect).
+ * :data:`sys.stdin`, :data:`sys.stdout`, and :data:`sys.stderr` all use
+ UTF-8 as their text encoding, with the ``surrogateescape``
+ :ref:`error handler <error-handlers>` being enabled for :data:`sys.stdin`
+ and :data:`sys.stdout` (:data:`sys.stderr` continues to use
+ ``backslashreplace`` as it does in the default locale-aware mode)
+
+ As a consequence of the changes in those lower level APIs, other higher
+ level APIs also exhibit different default behaviours:
+
+ * Command line arguments, environment variables and filenames are decoded
+ to text using the UTF-8 encoding.
+ * :func:`os.fsdecode()` and :func:`os.fsencode()` use the UTF-8 encoding.
+ * :func:`open()`, :func:`io.open()`, and :func:`codecs.open()` use the UTF-8
+ encoding by default. However, they still use the strict error handler by
+ default so that attempting to open a binary file in text mode is likely
+ to raise an exception rather than producing nonsense data.
+
+ Note that the standard stream settings in UTF-8 mode can be overridden by
+ :envvar:`PYTHONIOENCODING` (just as they can be in the default locale-aware
+ mode).
+
+ If set to ``0``, the interpreter runs in its default locale-aware mode.
+
+ Setting any other non-empty string causes an error during interpreter
+ initialisation.
+
+ If this environment variable is not set at all, then the interpreter defaults
+ to using the current locale settings, *unless* the current locale is
+ identified as a legacy ASCII-based locale
+ (as descibed for :envvar:`PYTHONCOERCECLOCALE`), and locale coercion is
+ either disabled or fails. In such legacy locales, the interpreter will
+ default to enabling UTF-8 mode unless explicitly instructed not to do so.
+
+ Also available as the :option:`-X` ``utf8`` option.
+
+ Availability: \*nix
.. versionadded:: 3.7
+ See :pep:`540` for more details.
Debug-mode variables