summaryrefslogtreecommitdiffstats
path: root/Doc/library/tarfile.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/tarfile.rst')
-rw-r--r--Doc/library/tarfile.rst99
1 files changed, 76 insertions, 23 deletions
diff --git a/Doc/library/tarfile.rst b/Doc/library/tarfile.rst
index d5a511e..9b7071b 100644
--- a/Doc/library/tarfile.rst
+++ b/Doc/library/tarfile.rst
@@ -8,6 +8,9 @@
.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
+**Source code:** :source:`Lib/tarfile.py`
+
+--------------
The :mod:`tarfile` module makes it possible to read and write tar
archives, including those using gzip or bz2 compression.
@@ -20,7 +23,8 @@ Some facts and figures:
* read/write support for the POSIX.1-1988 (ustar) format.
* read/write support for the GNU tar format including *longname* and *longlink*
- extensions, read-only support for the *sparse* extension.
+ extensions, read-only support for all variants of the *sparse* extension
+ including restoration of sparse files.
* read/write support for the POSIX.1-2001 (pax) format.
@@ -185,8 +189,8 @@ The following variables are available on module level:
.. data:: ENCODING
- The default character encoding i.e. the value from either
- :func:`sys.getfilesystemencoding` or :func:`sys.getdefaultencoding`.
+ The default character encoding: ``'utf-8'`` on Windows,
+ :func:`sys.getfilesystemencoding` otherwise.
.. seealso::
@@ -209,8 +213,16 @@ a header block followed by data blocks. It is possible to store a file in a tar
archive several times. Each archive member is represented by a :class:`TarInfo`
object, see :ref:`tarinfo-objects` for details.
+A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
+statement. It will automatically be closed when the block is completed. Please
+note that in the event of an exception an archive opened for writing will not
+be finalized; only the internally used file object will be closed. See the
+:ref:`tar-examples` section for a use case.
-.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
+.. versionadded:: 3.2
+ Added support for the context manager protocol.
+
+.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
All following arguments are optional and can be accessed as instance attributes
as well.
@@ -259,6 +271,9 @@ object, see :ref:`tarinfo-objects` for details.
to be handled. The default settings will work for most users.
See section :ref:`tar-unicode` for in-depth information.
+ .. versionchanged:: 3.2
+ Use ``'surrogateescape'`` as the default for the *errors* argument.
+
The *pax_headers* argument is an optional dictionary of strings which
will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
@@ -324,12 +339,13 @@ object, see :ref:`tarinfo-objects` for details.
dots ``".."``.
-.. method:: TarFile.extract(member, path="")
+.. method:: TarFile.extract(member, path="", set_attrs=True)
Extract a member from the archive to the current working directory, using its
full name. Its file information is extracted as accurately as possible. *member*
may be a filename or a :class:`TarInfo` object. You can specify a different
- directory using *path*.
+ directory using *path*. File attributes (owner, mtime, mode) are set unless
+ *set_attrs* is False.
.. note::
@@ -340,6 +356,8 @@ object, see :ref:`tarinfo-objects` for details.
See the warning for :meth:`extractall`.
+ .. versionchanged:: 3.2
+ Added the *set_attrs* parameter.
.. method:: TarFile.extractfile(member)
@@ -355,15 +373,27 @@ object, see :ref:`tarinfo-objects` for details.
and :meth:`close`, and also supports iteration over its lines.
-.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None)
+.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
- Add the file *name* to the archive. *name* may be any type of file (directory,
- fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
- for the file in the archive. Directories are added recursively by default. This
- can be avoided by setting *recursive* to :const:`False`. If *exclude* is given,
- it must be a function that takes one filename argument and returns a boolean
- value. Depending on this value the respective file is either excluded
- (:const:`True`) or added (:const:`False`).
+ Add the file *name* to the archive. *name* may be any type of file
+ (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
+ alternative name for the file in the archive. Directories are added
+ recursively by default. This can be avoided by setting *recursive* to
+ :const:`False`. If *exclude* is given, it must be a function that takes one
+ filename argument and returns a boolean value. Depending on this value the
+ respective file is either excluded (:const:`True`) or added
+ (:const:`False`). If *filter* is specified it must be a keyword argument. It
+ should be a function that takes a :class:`TarInfo` object argument and
+ returns the changed :class:`TarInfo` object. If it instead returns
+ :const:`None` the :class:`TarInfo` object will be excluded from the
+ archive. See :ref:`tar-examples` for an example.
+
+ .. versionchanged:: 3.2
+ Added the *filter* parameter.
+
+ .. deprecated:: 3.2
+ The *exclude* parameter is deprecated, please use the *filter* parameter
+ instead.
.. method:: TarFile.addfile(tarinfo, fileobj=None)
@@ -430,11 +460,14 @@ It does *not* contain the file's data itself.
a :class:`TarInfo` object.
-.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
+.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Create a string buffer from a :class:`TarInfo` object. For information on the
arguments see the constructor of the :class:`TarFile` class.
+ .. versionchanged:: 3.2
+ Use ``'surrogateescape'`` as the default for the *errors* argument.
+
A ``TarInfo`` object has the following public data attributes:
@@ -582,6 +615,13 @@ How to create an uncompressed tar archive from a list of filenames::
tar.add(name)
tar.close()
+The same example using the :keyword:`with` statement::
+
+ import tarfile
+ with tarfile.open("sample.tar", "w") as tar:
+ for name in ["foo", "bar", "quux"]:
+ tar.add(name)
+
How to read a gzip compressed tar archive and display some member information::
import tarfile
@@ -596,6 +636,18 @@ How to read a gzip compressed tar archive and display some member information::
print("something else.")
tar.close()
+How to create an archive and reset the user information using the *filter*
+parameter in :meth:`TarFile.add`::
+
+ import tarfile
+ def reset(tarinfo):
+ tarinfo.uid = tarinfo.gid = 0
+ tarinfo.uname = tarinfo.gname = "root"
+ return tarinfo
+ tar = tarfile.open("sample.tar.gz", "w:gz")
+ tar.add("foo", filter=reset)
+ tar.close()
+
.. _tar-formats:
@@ -663,11 +715,12 @@ metadata must be either decoded or encoded. If *encoding* is not set
appropriately, this conversion may fail.
The *errors* argument defines how characters are treated that cannot be
-converted. Possible values are listed in section :ref:`codec-base-classes`. In
-read mode the default scheme is ``'replace'``. This avoids unexpected
-:exc:`UnicodeError` exceptions and guarantees that an archive can always be
-read. In write mode the default value for *errors* is ``'strict'``. This
-ensures that name information is not altered unnoticed.
-
-In case of writing :const:`PAX_FORMAT` archives, *encoding* is ignored because
-non-ASCII metadata is stored using *UTF-8*.
+converted. Possible values are listed in section :ref:`codec-base-classes`.
+The default scheme is ``'surrogateescape'`` which Python also uses for its
+file system calls, see :ref:`os-filenames`.
+
+In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
+because all the metadata is stored using *UTF-8*. *encoding* is only used in
+the rare cases when binary pax headers are decoded or when strings with
+surrogate characters are stored.
+