summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorMartin v. Löwis <martin@v.loewis.de>2009-05-05 04:43:17 (GMT)
committerMartin v. Löwis <martin@v.loewis.de>2009-05-05 04:43:17 (GMT)
commit011e8420339245f9b55d41082ec6036f2f83a182 (patch)
tree6e278775c41c1d50c62e3a42b960797813d245ef /Doc
parent93f65a177b36396dddd1e2938cc037288a7eb400 (diff)
downloadcpython-011e8420339245f9b55d41082ec6036f2f83a182.zip
cpython-011e8420339245f9b55d41082ec6036f2f83a182.tar.gz
cpython-011e8420339245f9b55d41082ec6036f2f83a182.tar.bz2
Issue #5915: Implement PEP 383, Non-decodable Bytes in
System Character Interfaces.
Diffstat (limited to 'Doc')
-rw-r--r--Doc/library/codecs.rst4
-rw-r--r--Doc/library/os.rst38
2 files changed, 31 insertions, 11 deletions
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst
index ab578ea..3f1a5fe 100644
--- a/Doc/library/codecs.rst
+++ b/Doc/library/codecs.rst
@@ -322,6 +322,8 @@ and implemented by all standard Python codecs:
| ``'backslashreplace'`` | Replace with backslashed escape sequences |
| | (only for encoding). |
+-------------------------+-----------------------------------------------+
+| ``'utf8b'`` | Replace byte with surrogate U+DCxx. |
++-------------------------+-----------------------------------------------+
In addition, the following error handlers are specific to a single codec:
@@ -333,7 +335,7 @@ In addition, the following error handlers are specific to a single codec:
+------------------+---------+--------------------------------------------+
.. versionadded:: 3.1
- The ``'surrogates'`` error handler.
+ The ``'utf8b'`` and ``'surrogates'`` error handlers.
The set of allowed values can be extended via :meth:`register_error`.
diff --git a/Doc/library/os.rst b/Doc/library/os.rst
index c686baf..83f5ee9 100644
--- a/Doc/library/os.rst
+++ b/Doc/library/os.rst
@@ -51,6 +51,30 @@ the :mod:`os` module, but using them is of course a threat to portability!
``'ce'``, ``'java'``.
+.. _os-filenames:
+
+File Names, Command Line Arguments, and Environment Variables
+-------------------------------------------------------------
+
+In Python, file names, command line arguments, and environment
+variables are represented using the string type. On some systems,
+decoding these strings to and from bytes is necessary before passing
+them to the operating system. Python uses the file system encoding to
+perform this conversion (see :func:`sys.getfilesystemencoding`).
+
+.. versionchanged:: 3.1
+ On some systems, conversion using the file system encoding may
+ fail. In this case, Python uses the ``utf8b`` encoding error
+ handler, which means that undecodable bytes are replaced by a
+ Unicode character U+DCxx on decoding, and these are again
+ translated to the original byte on encoding.
+
+
+The file system encoding must guarantee to successfully decode all
+bytes below 128. If the file system encoding fails to provide this
+guarantee, API functions may raise UnicodeErrors.
+
+
.. _os-procinfo:
Process Parameters
@@ -688,12 +712,8 @@ Files and Directories
.. function:: getcwd()
- Return a string representing the current working directory. On Unix
- platforms, this function may raise :exc:`UnicodeDecodeError` if the name of
- the current directory is not decodable in the file system encoding. Use
- :func:`getcwdb` if you need the call to never fail. Availability: Unix,
- Windows.
-
+ Return a string representing the current working directory.
+ Availability: Unix, Windows.
.. function:: getcwdb()
@@ -800,10 +820,8 @@ Files and Directories
entries ``'.'`` and ``'..'`` even if they are present in the directory.
Availability: Unix, Windows.
- This function can be called with a bytes or string argument. In the bytes
- case, all filenames will be listed as returned by the underlying API. In the
- string case, filenames will be decoded using the file system encoding, and
- skipped if a decoding error occurs.
+ This function can be called with a bytes or string argument, and returns
+ filenames of the same datatype.
.. function:: lstat(path)