Issue #8633: Support for POSIX.1-2008 binary pax headers.

tarfile is now able to read and write pax headers with a "hdrcharset=BINARY" record. This record was introduced in POSIX.1-2008 as a method to store unencoded binary strings that cannot be translated to UTF-8. In practice, this is just a workaround that allows a tar implementation to store filenames that do not comply with the current filesystem encoding and thus cannot be decoded correctly. Additionally, tarfile works around a bug in current versions of GNU tar: undecodable filenames are stored as-is in a pax header without a "hdrcharset" record being added. Technically, these headers are invalid, but tarfile manages to read them correctly anyway.
author: Lars Gustäbel <lars@gustaebel.de> 2010-05-17 18:02:50 (GMT)
committer: Lars Gustäbel <lars@gustaebel.de> 2010-05-17 18:02:50 (GMT)
commit: 1465cc2887be2054cca50c72ef804adcc15fdf65 (patch)
tree: 3f20bc90a15488fcbca7868415cf35d2bc1e114a /Doc
parent: 0f78a94f445c48f5a96a77a1bb77ca88d7c50694 (diff)
download: cpython-1465cc2887be2054cca50c72ef804adcc15fdf65.zip
cpython-1465cc2887be2054cca50c72ef804adcc15fdf65.tar.gz
cpython-1465cc2887be2054cca50c72ef804adcc15fdf65.tar.bz2
1 files changed, 5 insertions, 3 deletions
diff --git a/Doc/library/tarfile.rst b/Doc/library/tarfile.rst
index 8f68c42..c2a9143 100644
--- a/Doc/library/tarfile.rst
+++ b/Doc/library/tarfile.rst
@@ -711,6 +711,8 @@ converted. Possible values are listed in section :ref:`codec-base-classes`.
 The default scheme is ``'surrogateescape'`` which Python also uses for its
 file system calls, see :ref:`os-filenames`.
 
-In case of writing :const:`PAX_FORMAT` archives, *encoding* is ignored because
-non-ASCII metadata is stored using *UTF-8*. Storing surrogate characters is not
-possible and will raise a :exc:`UnicodeEncodeError`.
+In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
+because all the metadata is stored using *UTF-8*. *encoding* is only used in
+the rare cases when binary pax headers are decoded or when strings with
+surrogate characters are stored.
+
author	Lars Gustäbel <lars@gustaebel.de>	2010-05-17 18:02:50 (GMT)
committer	Lars Gustäbel <lars@gustaebel.de>	2010-05-17 18:02:50 (GMT)
commit	1465cc2887be2054cca50c72ef804adcc15fdf65 (patch)
tree	3f20bc90a15488fcbca7868415cf35d2bc1e114a /Doc
parent	0f78a94f445c48f5a96a77a1bb77ca88d7c50694 (diff)
download	cpython-1465cc2887be2054cca50c72ef804adcc15fdf65.zip cpython-1465cc2887be2054cca50c72ef804adcc15fdf65.tar.gz cpython-1465cc2887be2054cca50c72ef804adcc15fdf65.tar.bz2