diff options
author | Steve Dower <steve.dower@microsoft.com> | 2016-09-08 17:35:16 (GMT) |
---|---|---|
committer | Steve Dower <steve.dower@microsoft.com> | 2016-09-08 17:35:16 (GMT) |
commit | cc16be85c0b7119854c00fb5c666825deef641cf (patch) | |
tree | 18b9a8020679f8a0e6e0dd1ecb5668024be499b7 /Doc | |
parent | cfbd48bc56980823dd8e2560e0ce4e46e33e4e3d (diff) | |
download | cpython-cc16be85c0b7119854c00fb5c666825deef641cf.zip cpython-cc16be85c0b7119854c00fb5c666825deef641cf.tar.gz cpython-cc16be85c0b7119854c00fb5c666825deef641cf.tar.bz2 |
Issue #27781: Change file system encoding on Windows to UTF-8 (PEP 529)
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/c-api/unicode.rst | 30 | ||||
-rw-r--r-- | Doc/library/sys.rst | 51 | ||||
-rw-r--r-- | Doc/using/cmdline.rst | 14 | ||||
-rw-r--r-- | Doc/whatsnew/3.6.rst | 29 |
4 files changed, 100 insertions, 24 deletions
diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst index 44e9259..0835477 100644 --- a/Doc/c-api/unicode.rst +++ b/Doc/c-api/unicode.rst @@ -802,10 +802,11 @@ File System Encoding """""""""""""""""""" To encode and decode file names and other environment strings, -:c:data:`Py_FileSystemEncoding` should be used as the encoding, and -``"surrogateescape"`` should be used as the error handler (:pep:`383`). To -encode file names during argument parsing, the ``"O&"`` converter should be -used, passing :c:func:`PyUnicode_FSConverter` as the conversion function: +:c:data:`Py_FileSystemDefaultEncoding` should be used as the encoding, and +:c:data:`Py_FileSystemDefaultEncodeErrors` should be used as the error handler +(:pep:`383` and :pep:`529`). To encode file names to :class:`bytes` during +argument parsing, the ``"O&"`` converter should be used, passing +:c:func:`PyUnicode_FSConverter` as the conversion function: .. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result) @@ -820,8 +821,9 @@ used, passing :c:func:`PyUnicode_FSConverter` as the conversion function: .. versionchanged:: 3.6 Accepts a :term:`path-like object`. -To decode file names during argument parsing, the ``"O&"`` converter should be -used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function: +To decode file names to :class:`str` during argument parsing, the ``"O&"`` +converter should be used, passing :c:func:`PyUnicode_FSDecoder` as the +conversion function: .. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result) @@ -840,7 +842,7 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function: .. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size) Decode a string using :c:data:`Py_FileSystemDefaultEncoding` and the - ``"surrogateescape"`` error handler, or ``"strict"`` on Windows. + :c:data:`Py_FileSystemDefaultEncodeErrors` error handler. If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the locale encoding. @@ -854,28 +856,28 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function: The :c:func:`Py_DecodeLocale` function. - .. versionchanged:: 3.2 - Use ``"strict"`` error handler on Windows. + .. versionchanged:: 3.6 + Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler. .. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s) Decode a null-terminated string using :c:data:`Py_FileSystemDefaultEncoding` - and the ``"surrogateescape"`` error handler, or ``"strict"`` on Windows. + and the :c:data:`Py_FileSystemDefaultEncodeErrors` error handler. If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the locale encoding. Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length. - .. versionchanged:: 3.2 - Use ``"strict"`` error handler on Windows. + .. versionchanged:: 3.6 + Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler. .. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode) Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the - ``"surrogateescape"`` error handler, or ``"strict"`` on Windows, and return + :c:data:`Py_FileSystemDefaultEncodeErrors` error handler, and return :class:`bytes`. Note that the resulting :class:`bytes` object may contain null bytes. @@ -892,6 +894,8 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function: .. versionadded:: 3.2 + .. versionchanged:: 3.6 + Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler. wchar_t Support """"""""""""""" diff --git a/Doc/library/sys.rst b/Doc/library/sys.rst index 8c9ca2a..9460b84 100644 --- a/Doc/library/sys.rst +++ b/Doc/library/sys.rst @@ -428,25 +428,42 @@ always available. .. function:: getfilesystemencoding() - Return the name of the encoding used to convert Unicode filenames into - system file names. The result value depends on the operating system: + Return the name of the encoding used to convert between Unicode + filenames and bytes filenames. For best compatibility, str should be + used for filenames in all cases, although representing filenames as bytes + is also supported. Functions accepting or returning filenames should support + either str or bytes and internally convert to the system's preferred + representation. - * On Mac OS X, the encoding is ``'utf-8'``. + This encoding is always ASCII-compatible. + + :func:`os.fsencode` and :func:`os.fsdecode` should be used to ensure that + the correct encoding and errors mode are used. - * On Unix, the encoding is the user's preference according to the result of - nl_langinfo(CODESET). + * On Mac OS X, the encoding is ``'utf-8'``. - * On Windows NT+, file names are Unicode natively, so no conversion is - performed. :func:`getfilesystemencoding` still returns ``'mbcs'``, as - this is the encoding that applications should use when they explicitly - want to convert Unicode strings to byte strings that are equivalent when - used as file names. + * On Unix, the encoding is the locale encoding. - * On Windows 9x, the encoding is ``'mbcs'``. + * On Windows, the encoding may be ``'utf-8'`` or ``'mbcs'``, depending + on user configuration. .. versionchanged:: 3.2 :func:`getfilesystemencoding` result cannot be ``None`` anymore. + .. versionchanged:: 3.6 + Windows is no longer guaranteed to return ``'mbcs'``. See :pep:`529` + and :func:`_enablelegacywindowsfsencoding` for more information. + +.. function:: getfilesystemencodeerrors() + + Return the name of the error mode used to convert between Unicode filenames + and bytes filenames. The encoding name is returned from + :func:`getfilesystemencoding`. + + :func:`os.fsencode` and :func:`os.fsdecode` should be used to ensure that + the correct encoding and errors mode are used. + + .. versionadded:: 3.6 .. function:: getrefcount(object) @@ -1138,6 +1155,18 @@ always available. This function has been added on a provisional basis (see :pep:`411` for details.) Use it only for debugging purposes. +.. function:: _enablelegacywindowsfsencoding() + + Changes the default filesystem encoding and errors mode to 'mbcs' and + 'replace' respectively, for consistency with versions of Python prior to 3.6. + + This is equivalent to defining the :envvar:`PYTHONLEGACYWINDOWSFSENCODING` + environment variable before launching Python. + + Availability: Windows + + .. versionadded:: 3.6 + See :pep:`529` for more details. .. data:: stdin stdout diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst index 37a9e14..2a83bd1 100644 --- a/Doc/using/cmdline.rst +++ b/Doc/using/cmdline.rst @@ -672,6 +672,20 @@ conflict. It now has no effect if set to an empty string. +.. envvar:: PYTHONLEGACYWINDOWSFSENCODING + + If set to a non-empty string, the default filesystem encoding and errors mode + will revert to their pre-3.6 values of 'mbcs' and 'replace', respectively. + Otherwise, the new defaults 'utf-8' and 'surrogatepass' are used. + + This may also be enabled at runtime with + :func:`sys._enablelegacywindowsfsencoding()`. + + Availability: Windows + + .. versionadded:: 3.6 + See :pep:`529` for more details. + Debug-mode variables ~~~~~~~~~~~~~~~~~~~~ diff --git a/Doc/whatsnew/3.6.rst b/Doc/whatsnew/3.6.rst index f2b53fb..ce1c44e 100644 --- a/Doc/whatsnew/3.6.rst +++ b/Doc/whatsnew/3.6.rst @@ -76,6 +76,8 @@ Security improvements: Windows improvements: +* PEP 529: :ref:`Change Windows filesystem encoding to UTF-8 <pep-529>` + * The ``py.exe`` launcher, when used interactively, no longer prefers Python 2 over Python 3 when the user doesn't specify a version (via command line arguments or a config file). Handling of shebang lines @@ -218,6 +220,33 @@ evaluated at run time, and then formatted using the :func:`format` protocol. See :pep:`498` and the main documentation at :ref:`f-strings`. +.. _pep-529: + +PEP 529: Change Windows filesystem encoding to UTF-8 +---------------------------------------------------- + +Representing filesystem paths is best performed with str (Unicode) rather than +bytes. However, there are some situations where using bytes is sufficient and +correct. + +Prior to Python 3.6, data loss could result when using bytes paths on Windows. +With this change, using bytes to represent paths is now supported on Windows, +provided those bytes are encoded with the encoding returned by +:func:`sys.getfilesystemencoding()`, which now defaults to ``'utf-8'``. + +Applications that do not use str to represent paths should use +:func:`os.fsencode()` and :func:`os.fsdecode()` to ensure their bytes are +correctly encoded. To revert to the previous behaviour, set +:envvar:`PYTHONLEGACYWINDOWSFSENCODING` or call +:func:`sys._enablelegacywindowsfsencoding`. + +See :pep:`529` for more information and discussion of code modifications that +may be required. + +.. note:: + + This change is considered experimental for 3.6.0 beta releases. The default + encoding may change before the final release. PEP 487: Simpler customization of class creation ------------------------------------------------ |