summaryrefslogtreecommitdiffstats
path: root/Doc/library/tarfile.rst
diff options
context:
space:
mode:
authorPetr Viktorin <encukou@gmail.com>2023-05-15 16:53:58 (GMT)
committerGitHub <noreply@github.com>2023-05-15 16:53:58 (GMT)
commit98016f7c92aa4c1232c68bac1ed6646db31782ec (patch)
tree49afe0e62f68da7d4d387fbe479ce355dd58d954 /Doc/library/tarfile.rst
parent7cb3a4474731f52c74b19dd3c99ca06e227dae3b (diff)
downloadcpython-98016f7c92aa4c1232c68bac1ed6646db31782ec.zip
cpython-98016f7c92aa4c1232c68bac1ed6646db31782ec.tar.gz
cpython-98016f7c92aa4c1232c68bac1ed6646db31782ec.tar.bz2
[3.9] gh-102950: Implement PEP 706 – Filter for tarfile.extractall (GH-102953) (#104382)
Backport of c8c3956d905e019101038b018129a4c90c9c9b8f
Diffstat (limited to 'Doc/library/tarfile.rst')
-rw-r--r--Doc/library/tarfile.rst458
1 files changed, 445 insertions, 13 deletions
diff --git a/Doc/library/tarfile.rst b/Doc/library/tarfile.rst
index 0dc7c61..f1d9bd3 100644
--- a/Doc/library/tarfile.rst
+++ b/Doc/library/tarfile.rst
@@ -206,6 +206,38 @@ The :mod:`tarfile` module defines the following exceptions:
Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
+.. exception:: FilterError
+
+ Base class for members :ref:`refused <tarfile-extraction-refuse>` by
+ filters.
+
+ .. attribute:: tarinfo
+
+ Information about the member that the filter refused to extract,
+ as :ref:`TarInfo <tarinfo-objects>`.
+
+.. exception:: AbsolutePathError
+
+ Raised to refuse extracting a member with an absolute path.
+
+.. exception:: OutsideDestinationError
+
+ Raised to refuse extracting a member outside the destination directory.
+
+.. exception:: SpecialFileError
+
+ Raised to refuse extracting a special file (e.g. a device or pipe).
+
+.. exception:: AbsoluteLinkError
+
+ Raised to refuse extracting a symbolic link with an absolute path.
+
+.. exception:: LinkOutsideDestinationError
+
+ Raised to refuse extracting a symbolic link pointing outside the destination
+ directory.
+
+
The following constants are available at the module level:
.. data:: ENCODING
@@ -316,11 +348,8 @@ be finalized; only the internally used file object will be closed. See the
*debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
messages). The messages are written to ``sys.stderr``.
- If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
- Nevertheless, they appear as error messages in the debug output, when debugging
- is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
- exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
- exceptions as well.
+ *errorlevel* controls how extraction errors are handled,
+ see :attr:`the corresponding attribute <~TarFile.errorlevel>`.
The *encoding* and *errors* arguments define the character encoding to be
used for reading or writing the archive and how conversion errors are going
@@ -387,7 +416,7 @@ be finalized; only the internally used file object will be closed. See the
available.
-.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
+.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False, filter=None)
Extract all members from the archive to the current working directory or
directory *path*. If optional *members* is given, it must be a subset of the
@@ -401,6 +430,12 @@ be finalized; only the internally used file object will be closed. See the
are used to set the owner/group for the extracted files. Otherwise, the named
values from the tarfile are used.
+ The *filter* argument, which was added in Python 3.9.17, specifies how
+ ``members`` are modified or rejected before extraction.
+ See :ref:`tarfile-extraction-filter` for details.
+ It is recommended to set this explicitly depending on which *tar* features
+ you need to support.
+
.. warning::
Never extract archives from untrusted sources without prior inspection.
@@ -408,14 +443,20 @@ be finalized; only the internally used file object will be closed. See the
that have absolute filenames starting with ``"/"`` or filenames with two
dots ``".."``.
+ Set ``filter='data'`` to prevent the most dangerous security issues,
+ and read the :ref:`tarfile-extraction-filter` section for details.
+
.. versionchanged:: 3.5
Added the *numeric_owner* parameter.
.. versionchanged:: 3.6
The *path* parameter accepts a :term:`path-like object`.
+ .. versionchanged:: 3.9.17
+ Added the *filter* parameter.
+
-.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
+.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False, filter=None)
Extract a member from the archive to the current working directory, using its
full name. Its file information is extracted as accurately as possible. *member*
@@ -423,9 +464,8 @@ be finalized; only the internally used file object will be closed. See the
directory using *path*. *path* may be a :term:`path-like object`.
File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
- If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
- are used to set the owner/group for the extracted files. Otherwise, the named
- values from the tarfile are used.
+ The *numeric_owner* and *filter* arguments are the same as
+ for :meth:`extractall`.
.. note::
@@ -436,6 +476,9 @@ be finalized; only the internally used file object will be closed. See the
See the warning for :meth:`extractall`.
+ Set ``filter='data'`` to prevent the most dangerous security issues,
+ and read the :ref:`tarfile-extraction-filter` section for details.
+
.. versionchanged:: 3.2
Added the *set_attrs* parameter.
@@ -445,6 +488,9 @@ be finalized; only the internally used file object will be closed. See the
.. versionchanged:: 3.6
The *path* parameter accepts a :term:`path-like object`.
+ .. versionchanged:: 3.9.17
+ Added the *filter* parameter.
+
.. method:: TarFile.extractfile(member)
@@ -457,6 +503,57 @@ be finalized; only the internally used file object will be closed. See the
.. versionchanged:: 3.3
Return an :class:`io.BufferedReader` object.
+.. attribute:: TarFile.errorlevel
+ :type: int
+
+ If *errorlevel* is ``0``, errors are ignored when using :meth:`TarFile.extract`
+ and :meth:`TarFile.extractall`.
+ Nevertheless, they appear as error messages in the debug output when
+ *debug* is greater than 0.
+ If ``1`` (the default), all *fatal* errors are raised as :exc:`OSError` or
+ :exc:`FilterError` exceptions. If ``2``, all *non-fatal* errors are raised
+ as :exc:`TarError` exceptions as well.
+
+ Some exceptions, e.g. ones caused by wrong argument types or data
+ corruption, are always raised.
+
+ Custom :ref:`extraction filters <tarfile-extraction-filter>`
+ should raise :exc:`FilterError` for *fatal* errors
+ and :exc:`ExtractError` for *non-fatal* ones.
+
+ Note that when an exception is raised, the archive may be partially
+ extracted. It is the user’s responsibility to clean up.
+
+.. attribute:: TarFile.extraction_filter
+
+ .. versionadded:: 3.9.17
+
+ The :ref:`extraction filter <tarfile-extraction-filter>` used
+ as a default for the *filter* argument of :meth:`~TarFile.extract`
+ and :meth:`~TarFile.extractall`.
+
+ The attribute may be ``None`` or a callable.
+ String names are not allowed for this attribute, unlike the *filter*
+ argument to :meth:`~TarFile.extract`.
+
+ If ``extraction_filter`` is ``None`` (the default),
+ calling an extraction method without a *filter* argument will
+ use the :func:`fully_trusted <fully_trusted_filter>` filter for
+ compatibility with previous Python versions.
+
+ In Python 3.12+, leaving ``extraction_filter=None`` will emit a
+ ``DeprecationWarning``.
+
+ In Python 3.14+, leaving ``extraction_filter=None`` will cause
+ extraction methods to use the :func:`data <data_filter>` filter by default.
+
+ The attribute may be set on instances or overridden in subclasses.
+ It also is possible to set it on the ``TarFile`` class itself to set a
+ global default, although, since it affects all uses of *tarfile*,
+ it is best practice to only do so in top-level applications or
+ :mod:`site configuration <site>`.
+ To set a global default this way, a filter function needs to be wrapped in
+ :func:`staticmethod()` to prevent injection of a ``self`` argument.
.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
@@ -532,7 +629,27 @@ permissions, owner etc.), it provides some useful methods to determine its type.
It does *not* contain the file's data itself.
:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
-:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
+:meth:`~TarFile.getmember`, :meth:`~TarFile.getmembers` and
+:meth:`~TarFile.gettarinfo`.
+
+Modifying the objects returned by :meth:`~!TarFile.getmember` or
+:meth:`~!TarFile.getmembers` will affect all subsequent
+operations on the archive.
+For cases where this is unwanted, you can use :mod:`copy.copy() <copy>` or
+call the :meth:`~TarInfo.replace` method to create a modified copy in one step.
+
+Several attributes can be set to ``None`` to indicate that a piece of metadata
+is unused or unknown.
+Different :class:`TarInfo` methods handle ``None`` differently:
+
+- The :meth:`~TarFile.extract` or :meth:`~TarFile.extractall` methods will
+ ignore the corresponding metadata, leaving it set to a default.
+- :meth:`~TarFile.addfile` will fail.
+- :meth:`~TarFile.list` will print a placeholder string.
+
+
+.. versionchanged:: 3.9.17
+ Added :meth:`~TarInfo.replace` and handling of ``None``.
.. class:: TarInfo(name="")
@@ -566,24 +683,39 @@ A ``TarInfo`` object has the following public data attributes:
.. attribute:: TarInfo.name
+ :type: str
Name of the archive member.
.. attribute:: TarInfo.size
+ :type: int
Size in bytes.
.. attribute:: TarInfo.mtime
+ :type: int | float
+
+ Time of last modification in seconds since the :ref:`epoch <epoch>`,
+ as in :attr:`os.stat_result.st_mtime`.
- Time of last modification.
+ .. versionchanged:: 3.9.17
+ Can be set to ``None`` for :meth:`~TarFile.extract` and
+ :meth:`~TarFile.extractall`, causing extraction to skip applying this
+ attribute.
.. attribute:: TarInfo.mode
+ :type: int
- Permission bits.
+ Permission bits, as for :func:`os.chmod`.
+ .. versionchanged:: 3.9.17
+
+ Can be set to ``None`` for :meth:`~TarFile.extract` and
+ :meth:`~TarFile.extractall`, causing extraction to skip applying this
+ attribute.
.. attribute:: TarInfo.type
@@ -595,35 +727,76 @@ A ``TarInfo`` object has the following public data attributes:
.. attribute:: TarInfo.linkname
+ :type: str
Name of the target file name, which is only present in :class:`TarInfo` objects
of type :const:`LNKTYPE` and :const:`SYMTYPE`.
.. attribute:: TarInfo.uid
+ :type: int
User ID of the user who originally stored this member.
+ .. versionchanged:: 3.9.17
+
+ Can be set to ``None`` for :meth:`~TarFile.extract` and
+ :meth:`~TarFile.extractall`, causing extraction to skip applying this
+ attribute.
.. attribute:: TarInfo.gid
+ :type: int
Group ID of the user who originally stored this member.
+ .. versionchanged:: 3.9.17
+
+ Can be set to ``None`` for :meth:`~TarFile.extract` and
+ :meth:`~TarFile.extractall`, causing extraction to skip applying this
+ attribute.
.. attribute:: TarInfo.uname
+ :type: str
User name.
+ .. versionchanged:: 3.9.17
+
+ Can be set to ``None`` for :meth:`~TarFile.extract` and
+ :meth:`~TarFile.extractall`, causing extraction to skip applying this
+ attribute.
.. attribute:: TarInfo.gname
+ :type: str
Group name.
+ .. versionchanged:: 3.9.17
+
+ Can be set to ``None`` for :meth:`~TarFile.extract` and
+ :meth:`~TarFile.extractall`, causing extraction to skip applying this
+ attribute.
.. attribute:: TarInfo.pax_headers
+ :type: dict
A dictionary containing key-value pairs of an associated pax extended header.
+.. method:: TarInfo.replace(name=..., mtime=..., mode=..., linkname=...,
+ uid=..., gid=..., uname=..., gname=...,
+ deep=True)
+
+ .. versionadded:: 3.9.17
+
+ Return a *new* copy of the :class:`!TarInfo` object with the given attributes
+ changed. For example, to return a ``TarInfo`` with the group name set to
+ ``'staff'``, use::
+
+ new_tarinfo = old_tarinfo.replace(gname='staff')
+
+ By default, a deep copy is made.
+ If *deep* is false, the copy is shallow, i.e. ``pax_headers``
+ and any custom attributes are shared with the original ``TarInfo`` object.
A :class:`TarInfo` object also provides some convenient query methods:
@@ -673,9 +846,259 @@ A :class:`TarInfo` object also provides some convenient query methods:
Return :const:`True` if it is one of character device, block device or FIFO.
+.. _tarfile-extraction-filter:
+
+Extraction filters
+------------------
+
+.. versionadded:: 3.9.17
+
+The *tar* format is designed to capture all details of a UNIX-like filesystem,
+which makes it very powerful.
+Unfortunately, the features make it easy to create tar files that have
+unintended -- and possibly malicious -- effects when extracted.
+For example, extracting a tar file can overwrite arbitrary files in various
+ways (e.g. by using absolute paths, ``..`` path components, or symlinks that
+affect later members).
+
+In most cases, the full functionality is not needed.
+Therefore, *tarfile* supports extraction filters: a mechanism to limit
+functionality, and thus mitigate some of the security issues.
+
+.. seealso::
+
+ :pep:`706`
+ Contains further motivation and rationale behind the design.
+
+The *filter* argument to :meth:`TarFile.extract` or :meth:`~TarFile.extractall`
+can be:
+
+* the string ``'fully_trusted'``: Honor all metadata as specified in the
+ archive.
+ Should be used if the user trusts the archive completely, or implements
+ their own complex verification.
+
+* the string ``'tar'``: Honor most *tar*-specific features (i.e. features of
+ UNIX-like filesystems), but block features that are very likely to be
+ surprising or malicious. See :func:`tar_filter` for details.
+
+* the string ``'data'``: Ignore or block most features specific to UNIX-like
+ filesystems. Intended for extracting cross-platform data archives.
+ See :func:`data_filter` for details.
+
+* ``None`` (default): Use :attr:`TarFile.extraction_filter`.
+
+ If that is also ``None`` (the default), the ``'fully_trusted'``
+ filter will be used (for compatibility with earlier versions of Python).
+
+ In Python 3.12, the default will emit a ``DeprecationWarning``.
+
+ In Python 3.14, the ``'data'`` filter will become the default instead.
+ It's possible to switch earlier; see :attr:`TarFile.extraction_filter`.
+
+* A callable which will be called for each extracted member with a
+ :ref:`TarInfo <tarinfo-objects>` describing the member and the destination
+ path to where the archive is extracted (i.e. the same path is used for all
+ members)::
+
+ filter(/, member: TarInfo, path: str) -> TarInfo | None
+
+ The callable is called just before each member is extracted, so it can
+ take the current state of the disk into account.
+ It can:
+
+ - return a :class:`TarInfo` object which will be used instead of the metadata
+ in the archive, or
+ - return ``None``, in which case the member will be skipped, or
+ - raise an exception to abort the operation or skip the member,
+ depending on :attr:`~TarFile.errorlevel`.
+ Note that when extraction is aborted, :meth:`~TarFile.extractall` may leave
+ the archive partially extracted. It does not attempt to clean up.
+
+Default named filters
+~~~~~~~~~~~~~~~~~~~~~
+
+The pre-defined, named filters are available as functions, so they can be
+reused in custom filters:
+
+.. function:: fully_trusted_filter(/, member, path)
+
+ Return *member* unchanged.
+
+ This implements the ``'fully_trusted'`` filter.
+
+.. function:: tar_filter(/, member, path)
+
+ Implements the ``'tar'`` filter.
+
+ - Strip leading slashes (``/`` and :attr:`os.sep`) from filenames.
+ - :ref:`Refuse <tarfile-extraction-refuse>` to extract files with absolute
+ paths (in case the name is absolute
+ even after stripping slashes, e.g. ``C:/foo`` on Windows).
+ This raises :class:`~tarfile.AbsolutePathError`.
+ - :ref:`Refuse <tarfile-extraction-refuse>` to extract files whose absolute
+ path (after following symlinks) would end up outside the destination.
+ This raises :class:`~tarfile.OutsideDestinationError`.
+ - Clear high mode bits (setuid, setgid, sticky) and group/other write bits
+ (:attr:`~stat.S_IWGRP`|:attr:`~stat.S_IWOTH`).
+
+ Return the modified ``TarInfo`` member.
+
+.. function:: data_filter(/, member, path)
+
+ Implements the ``'data'`` filter.
+ In addition to what ``tar_filter`` does:
+
+ - :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft)
+ that link to absolute paths, or ones that link outside the destination.
+
+ This raises :class:`~tarfile.AbsoluteLinkError` or
+ :class:`~tarfile.LinkOutsideDestinationError`.
+
+ Note that such files are refused even on platforms that do not support
+ symbolic links.
+
+ - :ref:`Refuse <tarfile-extraction-refuse>` to extract device files
+ (including pipes).
+ This raises :class:`~tarfile.SpecialFileError`.
+
+ - For regular files, including hard links:
+
+ - Set the owner read and write permissions
+ (:attr:`~stat.S_IRUSR`|:attr:`~stat.S_IWUSR`).
+ - Remove the group & other executable permission
+ (:attr:`~stat.S_IXGRP`|:attr:`~stat.S_IXOTH`)
+ if the owner doesn’t have it (:attr:`~stat.S_IXUSR`).
+
+ - For other files (directories), set ``mode`` to ``None``, so
+ that extraction methods skip applying permission bits.
+ - Set user and group info (``uid``, ``gid``, ``uname``, ``gname``)
+ to ``None``, so that extraction methods skip setting it.
+
+ Return the modified ``TarInfo`` member.
+
+
+.. _tarfile-extraction-refuse:
+
+Filter errors
+~~~~~~~~~~~~~
+
+When a filter refuses to extract a file, it will raise an appropriate exception,
+a subclass of :class:`~tarfile.FilterError`.
+This will abort the extraction if :attr:`TarFile.errorlevel` is 1 or more.
+With ``errorlevel=0`` the error will be logged and the member will be skipped,
+but extraction will continue.
+
+
+Hints for further verification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Even with ``filter='data'``, *tarfile* is not suited for extracting untrusted
+files without prior inspection.
+Among other issues, the pre-defined filters do not prevent denial-of-service
+attacks. Users should do additional checks.
+
+Here is an incomplete list of things to consider:
+
+* Extract to a :func:`new temporary directory <tempfile.mkdtemp>`
+ to prevent e.g. exploiting pre-existing links, and to make it easier to
+ clean up after a failed extraction.
+* When working with untrusted data, use external (e.g. OS-level) limits on
+ disk, memory and CPU usage.
+* Check filenames against an allow-list of characters
+ (to filter out control characters, confusables, foreign path separators,
+ etc.).
+* Check that filenames have expected extensions (discouraging files that
+ execute when you “click on them”, or extension-less files like Windows special device names).
+* Limit the number of extracted files, total size of extracted data,
+ filename length (including symlink length), and size of individual files.
+* Check for files that would be shadowed on case-insensitive filesystems.
+
+Also note that:
+
+* Tar files may contain multiple versions of the same file.
+ Later ones are expected to overwrite any earlier ones.
+ This feature is crucial to allow updating tape archives, but can be abused
+ maliciously.
+* *tarfile* does not protect against issues with “live” data,
+ e.g. an attacker tinkering with the destination (or source) directory while
+ extraction (or archiving) is in progress.
+
+
+Supporting older Python versions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Extraction filters were added to Python 3.12, and are backported to older
+versions as security updates.
+To check whether the feature is available, use e.g.
+``hasattr(tarfile, 'data_filter')`` rather than checking the Python version.
+
+The following examples show how to support Python versions with and without
+the feature.
+Note that setting ``extraction_filter`` will affect any subsequent operations.
+
+* Fully trusted archive::
+
+ my_tarfile.extraction_filter = (lambda member, path: member)
+ my_tarfile.extractall()
+
+* Use the ``'data'`` filter if available, but revert to Python 3.11 behavior
+ (``'fully_trusted'``) if this feature is not available::
+
+ my_tarfile.extraction_filter = getattr(tarfile, 'data_filter',
+ (lambda member, path: member))
+ my_tarfile.extractall()
+
+* Use the ``'data'`` filter; *fail* if it is not available::
+
+ my_tarfile.extractall(filter=tarfile.data_filter)
+
+ or::
+
+ my_tarfile.extraction_filter = tarfile.data_filter
+ my_tarfile.extractall()
+
+* Use the ``'data'`` filter; *warn* if it is not available::
+
+ if hasattr(tarfile, 'data_filter'):
+ my_tarfile.extractall(filter='data')
+ else:
+ # remove this when no longer needed
+ warn_the_user('Extracting may be unsafe; consider updating Python')
+ my_tarfile.extractall()
+
+
+Stateful extraction filter example
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+While *tarfile*'s extraction methods take a simple *filter* callable,
+custom filters may be more complex objects with an internal state.
+It may be useful to write these as context managers, to be used like this::
+
+ with StatefulFilter() as filter_func:
+ tar.extractall(path, filter=filter_func)
+
+Such a filter can be written as, for example::
+
+ class StatefulFilter:
+ def __init__(self):
+ self.file_count = 0
+
+ def __enter__(self):
+ return self
+
+ def __call__(self, member, path):
+ self.file_count += 1
+ return member
+
+ def __exit__(self, *exc_info):
+ print(f'{self.file_count} files extracted')
+
+
.. _tarfile-commandline:
.. program:: tarfile
+
Command-Line Interface
----------------------
@@ -745,6 +1168,15 @@ Command-line options
Verbose output.
+.. cmdoption:: --filter <filtername>
+
+ Specifies the *filter* for ``--extract``.
+ See :ref:`tarfile-extraction-filter` for details.
+ Only string names are accepted (that is, ``fully_trusted``, ``tar``,
+ and ``data``).
+
+ .. versionadded:: 3.9.17
+
.. _tar-examples:
Examples