summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorSteve Dower <steve.dower@microsoft.com>2016-09-08 17:35:16 (GMT)
committerSteve Dower <steve.dower@microsoft.com>2016-09-08 17:35:16 (GMT)
commitcc16be85c0b7119854c00fb5c666825deef641cf (patch)
tree18b9a8020679f8a0e6e0dd1ecb5668024be499b7 /Doc
parentcfbd48bc56980823dd8e2560e0ce4e46e33e4e3d (diff)
downloadcpython-cc16be85c0b7119854c00fb5c666825deef641cf.zip
cpython-cc16be85c0b7119854c00fb5c666825deef641cf.tar.gz
cpython-cc16be85c0b7119854c00fb5c666825deef641cf.tar.bz2
Issue #27781: Change file system encoding on Windows to UTF-8 (PEP 529)
Diffstat (limited to 'Doc')
-rw-r--r--Doc/c-api/unicode.rst30
-rw-r--r--Doc/library/sys.rst51
-rw-r--r--Doc/using/cmdline.rst14
-rw-r--r--Doc/whatsnew/3.6.rst29
4 files changed, 100 insertions, 24 deletions
diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst
index 44e9259..0835477 100644
--- a/Doc/c-api/unicode.rst
+++ b/Doc/c-api/unicode.rst
@@ -802,10 +802,11 @@ File System Encoding
""""""""""""""""""""
To encode and decode file names and other environment strings,
-:c:data:`Py_FileSystemEncoding` should be used as the encoding, and
-``"surrogateescape"`` should be used as the error handler (:pep:`383`). To
-encode file names during argument parsing, the ``"O&"`` converter should be
-used, passing :c:func:`PyUnicode_FSConverter` as the conversion function:
+:c:data:`Py_FileSystemDefaultEncoding` should be used as the encoding, and
+:c:data:`Py_FileSystemDefaultEncodeErrors` should be used as the error handler
+(:pep:`383` and :pep:`529`). To encode file names to :class:`bytes` during
+argument parsing, the ``"O&"`` converter should be used, passing
+:c:func:`PyUnicode_FSConverter` as the conversion function:
.. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result)
@@ -820,8 +821,9 @@ used, passing :c:func:`PyUnicode_FSConverter` as the conversion function:
.. versionchanged:: 3.6
Accepts a :term:`path-like object`.
-To decode file names during argument parsing, the ``"O&"`` converter should be
-used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
+To decode file names to :class:`str` during argument parsing, the ``"O&"``
+converter should be used, passing :c:func:`PyUnicode_FSDecoder` as the
+conversion function:
.. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result)
@@ -840,7 +842,7 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
.. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
Decode a string using :c:data:`Py_FileSystemDefaultEncoding` and the
- ``"surrogateescape"`` error handler, or ``"strict"`` on Windows.
+ :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
locale encoding.
@@ -854,28 +856,28 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
The :c:func:`Py_DecodeLocale` function.
- .. versionchanged:: 3.2
- Use ``"strict"`` error handler on Windows.
+ .. versionchanged:: 3.6
+ Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
.. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
Decode a null-terminated string using :c:data:`Py_FileSystemDefaultEncoding`
- and the ``"surrogateescape"`` error handler, or ``"strict"`` on Windows.
+ and the :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
locale encoding.
Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
- .. versionchanged:: 3.2
- Use ``"strict"`` error handler on Windows.
+ .. versionchanged:: 3.6
+ Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
.. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
- ``"surrogateescape"`` error handler, or ``"strict"`` on Windows, and return
+ :c:data:`Py_FileSystemDefaultEncodeErrors` error handler, and return
:class:`bytes`. Note that the resulting :class:`bytes` object may contain
null bytes.
@@ -892,6 +894,8 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
.. versionadded:: 3.2
+ .. versionchanged:: 3.6
+ Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
wchar_t Support
"""""""""""""""
diff --git a/Doc/library/sys.rst b/Doc/library/sys.rst
index 8c9ca2a..9460b84 100644
--- a/Doc/library/sys.rst
+++ b/Doc/library/sys.rst
@@ -428,25 +428,42 @@ always available.
.. function:: getfilesystemencoding()
- Return the name of the encoding used to convert Unicode filenames into
- system file names. The result value depends on the operating system:
+ Return the name of the encoding used to convert between Unicode
+ filenames and bytes filenames. For best compatibility, str should be
+ used for filenames in all cases, although representing filenames as bytes
+ is also supported. Functions accepting or returning filenames should support
+ either str or bytes and internally convert to the system's preferred
+ representation.
- * On Mac OS X, the encoding is ``'utf-8'``.
+ This encoding is always ASCII-compatible.
+
+ :func:`os.fsencode` and :func:`os.fsdecode` should be used to ensure that
+ the correct encoding and errors mode are used.
- * On Unix, the encoding is the user's preference according to the result of
- nl_langinfo(CODESET).
+ * On Mac OS X, the encoding is ``'utf-8'``.
- * On Windows NT+, file names are Unicode natively, so no conversion is
- performed. :func:`getfilesystemencoding` still returns ``'mbcs'``, as
- this is the encoding that applications should use when they explicitly
- want to convert Unicode strings to byte strings that are equivalent when
- used as file names.
+ * On Unix, the encoding is the locale encoding.
- * On Windows 9x, the encoding is ``'mbcs'``.
+ * On Windows, the encoding may be ``'utf-8'`` or ``'mbcs'``, depending
+ on user configuration.
.. versionchanged:: 3.2
:func:`getfilesystemencoding` result cannot be ``None`` anymore.
+ .. versionchanged:: 3.6
+ Windows is no longer guaranteed to return ``'mbcs'``. See :pep:`529`
+ and :func:`_enablelegacywindowsfsencoding` for more information.
+
+.. function:: getfilesystemencodeerrors()
+
+ Return the name of the error mode used to convert between Unicode filenames
+ and bytes filenames. The encoding name is returned from
+ :func:`getfilesystemencoding`.
+
+ :func:`os.fsencode` and :func:`os.fsdecode` should be used to ensure that
+ the correct encoding and errors mode are used.
+
+ .. versionadded:: 3.6
.. function:: getrefcount(object)
@@ -1138,6 +1155,18 @@ always available.
This function has been added on a provisional basis (see :pep:`411`
for details.) Use it only for debugging purposes.
+.. function:: _enablelegacywindowsfsencoding()
+
+ Changes the default filesystem encoding and errors mode to 'mbcs' and
+ 'replace' respectively, for consistency with versions of Python prior to 3.6.
+
+ This is equivalent to defining the :envvar:`PYTHONLEGACYWINDOWSFSENCODING`
+ environment variable before launching Python.
+
+ Availability: Windows
+
+ .. versionadded:: 3.6
+ See :pep:`529` for more details.
.. data:: stdin
stdout
diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst
index 37a9e14..2a83bd1 100644
--- a/Doc/using/cmdline.rst
+++ b/Doc/using/cmdline.rst
@@ -672,6 +672,20 @@ conflict.
It now has no effect if set to an empty string.
+.. envvar:: PYTHONLEGACYWINDOWSFSENCODING
+
+ If set to a non-empty string, the default filesystem encoding and errors mode
+ will revert to their pre-3.6 values of 'mbcs' and 'replace', respectively.
+ Otherwise, the new defaults 'utf-8' and 'surrogatepass' are used.
+
+ This may also be enabled at runtime with
+ :func:`sys._enablelegacywindowsfsencoding()`.
+
+ Availability: Windows
+
+ .. versionadded:: 3.6
+ See :pep:`529` for more details.
+
Debug-mode variables
~~~~~~~~~~~~~~~~~~~~
diff --git a/Doc/whatsnew/3.6.rst b/Doc/whatsnew/3.6.rst
index f2b53fb..ce1c44e 100644
--- a/Doc/whatsnew/3.6.rst
+++ b/Doc/whatsnew/3.6.rst
@@ -76,6 +76,8 @@ Security improvements:
Windows improvements:
+* PEP 529: :ref:`Change Windows filesystem encoding to UTF-8 <pep-529>`
+
* The ``py.exe`` launcher, when used interactively, no longer prefers
Python 2 over Python 3 when the user doesn't specify a version (via
command line arguments or a config file). Handling of shebang lines
@@ -218,6 +220,33 @@ evaluated at run time, and then formatted using the :func:`format` protocol.
See :pep:`498` and the main documentation at :ref:`f-strings`.
+.. _pep-529:
+
+PEP 529: Change Windows filesystem encoding to UTF-8
+----------------------------------------------------
+
+Representing filesystem paths is best performed with str (Unicode) rather than
+bytes. However, there are some situations where using bytes is sufficient and
+correct.
+
+Prior to Python 3.6, data loss could result when using bytes paths on Windows.
+With this change, using bytes to represent paths is now supported on Windows,
+provided those bytes are encoded with the encoding returned by
+:func:`sys.getfilesystemencoding()`, which now defaults to ``'utf-8'``.
+
+Applications that do not use str to represent paths should use
+:func:`os.fsencode()` and :func:`os.fsdecode()` to ensure their bytes are
+correctly encoded. To revert to the previous behaviour, set
+:envvar:`PYTHONLEGACYWINDOWSFSENCODING` or call
+:func:`sys._enablelegacywindowsfsencoding`.
+
+See :pep:`529` for more information and discussion of code modifications that
+may be required.
+
+.. note::
+
+ This change is considered experimental for 3.6.0 beta releases. The default
+ encoding may change before the final release.
PEP 487: Simpler customization of class creation
------------------------------------------------