diff options
author | Martin v. Löwis <martin@v.loewis.de> | 2009-05-05 04:43:17 (GMT) |
---|---|---|
committer | Martin v. Löwis <martin@v.loewis.de> | 2009-05-05 04:43:17 (GMT) |
commit | 011e8420339245f9b55d41082ec6036f2f83a182 (patch) | |
tree | 6e278775c41c1d50c62e3a42b960797813d245ef /Doc/library | |
parent | 93f65a177b36396dddd1e2938cc037288a7eb400 (diff) | |
download | cpython-011e8420339245f9b55d41082ec6036f2f83a182.zip cpython-011e8420339245f9b55d41082ec6036f2f83a182.tar.gz cpython-011e8420339245f9b55d41082ec6036f2f83a182.tar.bz2 |
Issue #5915: Implement PEP 383, Non-decodable Bytes in
System Character Interfaces.
Diffstat (limited to 'Doc/library')
-rw-r--r-- | Doc/library/codecs.rst | 4 | ||||
-rw-r--r-- | Doc/library/os.rst | 38 |
2 files changed, 31 insertions, 11 deletions
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst index ab578ea..3f1a5fe 100644 --- a/Doc/library/codecs.rst +++ b/Doc/library/codecs.rst @@ -322,6 +322,8 @@ and implemented by all standard Python codecs: | ``'backslashreplace'`` | Replace with backslashed escape sequences | | | (only for encoding). | +-------------------------+-----------------------------------------------+ +| ``'utf8b'`` | Replace byte with surrogate U+DCxx. | ++-------------------------+-----------------------------------------------+ In addition, the following error handlers are specific to a single codec: @@ -333,7 +335,7 @@ In addition, the following error handlers are specific to a single codec: +------------------+---------+--------------------------------------------+ .. versionadded:: 3.1 - The ``'surrogates'`` error handler. + The ``'utf8b'`` and ``'surrogates'`` error handlers. The set of allowed values can be extended via :meth:`register_error`. diff --git a/Doc/library/os.rst b/Doc/library/os.rst index c686baf..83f5ee9 100644 --- a/Doc/library/os.rst +++ b/Doc/library/os.rst @@ -51,6 +51,30 @@ the :mod:`os` module, but using them is of course a threat to portability! ``'ce'``, ``'java'``. +.. _os-filenames: + +File Names, Command Line Arguments, and Environment Variables +------------------------------------------------------------- + +In Python, file names, command line arguments, and environment +variables are represented using the string type. On some systems, +decoding these strings to and from bytes is necessary before passing +them to the operating system. Python uses the file system encoding to +perform this conversion (see :func:`sys.getfilesystemencoding`). + +.. versionchanged:: 3.1 + On some systems, conversion using the file system encoding may + fail. In this case, Python uses the ``utf8b`` encoding error + handler, which means that undecodable bytes are replaced by a + Unicode character U+DCxx on decoding, and these are again + translated to the original byte on encoding. + + +The file system encoding must guarantee to successfully decode all +bytes below 128. If the file system encoding fails to provide this +guarantee, API functions may raise UnicodeErrors. + + .. _os-procinfo: Process Parameters @@ -688,12 +712,8 @@ Files and Directories .. function:: getcwd() - Return a string representing the current working directory. On Unix - platforms, this function may raise :exc:`UnicodeDecodeError` if the name of - the current directory is not decodable in the file system encoding. Use - :func:`getcwdb` if you need the call to never fail. Availability: Unix, - Windows. - + Return a string representing the current working directory. + Availability: Unix, Windows. .. function:: getcwdb() @@ -800,10 +820,8 @@ Files and Directories entries ``'.'`` and ``'..'`` even if they are present in the directory. Availability: Unix, Windows. - This function can be called with a bytes or string argument. In the bytes - case, all filenames will be listed as returned by the underlying API. In the - string case, filenames will be decoded using the file system encoding, and - skipped if a decoding error occurs. + This function can be called with a bytes or string argument, and returns + filenames of the same datatype. .. function:: lstat(path) |