diff options
author | Petr Viktorin <encukou@gmail.com> | 2023-05-15 16:53:58 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-05-15 16:53:58 (GMT) |
commit | 98016f7c92aa4c1232c68bac1ed6646db31782ec (patch) | |
tree | 49afe0e62f68da7d4d387fbe479ce355dd58d954 /Doc/library/tarfile.rst | |
parent | 7cb3a4474731f52c74b19dd3c99ca06e227dae3b (diff) | |
download | cpython-98016f7c92aa4c1232c68bac1ed6646db31782ec.zip cpython-98016f7c92aa4c1232c68bac1ed6646db31782ec.tar.gz cpython-98016f7c92aa4c1232c68bac1ed6646db31782ec.tar.bz2 |
[3.9] gh-102950: Implement PEP 706 – Filter for tarfile.extractall (GH-102953) (#104382)
Backport of c8c3956d905e019101038b018129a4c90c9c9b8f
Diffstat (limited to 'Doc/library/tarfile.rst')
-rw-r--r-- | Doc/library/tarfile.rst | 458 |
1 files changed, 445 insertions, 13 deletions
diff --git a/Doc/library/tarfile.rst b/Doc/library/tarfile.rst index 0dc7c61..f1d9bd3 100644 --- a/Doc/library/tarfile.rst +++ b/Doc/library/tarfile.rst @@ -206,6 +206,38 @@ The :mod:`tarfile` module defines the following exceptions: Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid. +.. exception:: FilterError + + Base class for members :ref:`refused <tarfile-extraction-refuse>` by + filters. + + .. attribute:: tarinfo + + Information about the member that the filter refused to extract, + as :ref:`TarInfo <tarinfo-objects>`. + +.. exception:: AbsolutePathError + + Raised to refuse extracting a member with an absolute path. + +.. exception:: OutsideDestinationError + + Raised to refuse extracting a member outside the destination directory. + +.. exception:: SpecialFileError + + Raised to refuse extracting a special file (e.g. a device or pipe). + +.. exception:: AbsoluteLinkError + + Raised to refuse extracting a symbolic link with an absolute path. + +.. exception:: LinkOutsideDestinationError + + Raised to refuse extracting a symbolic link pointing outside the destination + directory. + + The following constants are available at the module level: .. data:: ENCODING @@ -316,11 +348,8 @@ be finalized; only the internally used file object will be closed. See the *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug messages). The messages are written to ``sys.stderr``. - If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`. - Nevertheless, they appear as error messages in the debug output, when debugging - is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` - exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError` - exceptions as well. + *errorlevel* controls how extraction errors are handled, + see :attr:`the corresponding attribute <~TarFile.errorlevel>`. The *encoding* and *errors* arguments define the character encoding to be used for reading or writing the archive and how conversion errors are going @@ -387,7 +416,7 @@ be finalized; only the internally used file object will be closed. See the available. -.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False) +.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False, filter=None) Extract all members from the archive to the current working directory or directory *path*. If optional *members* is given, it must be a subset of the @@ -401,6 +430,12 @@ be finalized; only the internally used file object will be closed. See the are used to set the owner/group for the extracted files. Otherwise, the named values from the tarfile are used. + The *filter* argument, which was added in Python 3.9.17, specifies how + ``members`` are modified or rejected before extraction. + See :ref:`tarfile-extraction-filter` for details. + It is recommended to set this explicitly depending on which *tar* features + you need to support. + .. warning:: Never extract archives from untrusted sources without prior inspection. @@ -408,14 +443,20 @@ be finalized; only the internally used file object will be closed. See the that have absolute filenames starting with ``"/"`` or filenames with two dots ``".."``. + Set ``filter='data'`` to prevent the most dangerous security issues, + and read the :ref:`tarfile-extraction-filter` section for details. + .. versionchanged:: 3.5 Added the *numeric_owner* parameter. .. versionchanged:: 3.6 The *path* parameter accepts a :term:`path-like object`. + .. versionchanged:: 3.9.17 + Added the *filter* parameter. + -.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False) +.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False, filter=None) Extract a member from the archive to the current working directory, using its full name. Its file information is extracted as accurately as possible. *member* @@ -423,9 +464,8 @@ be finalized; only the internally used file object will be closed. See the directory using *path*. *path* may be a :term:`path-like object`. File attributes (owner, mtime, mode) are set unless *set_attrs* is false. - If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile - are used to set the owner/group for the extracted files. Otherwise, the named - values from the tarfile are used. + The *numeric_owner* and *filter* arguments are the same as + for :meth:`extractall`. .. note:: @@ -436,6 +476,9 @@ be finalized; only the internally used file object will be closed. See the See the warning for :meth:`extractall`. + Set ``filter='data'`` to prevent the most dangerous security issues, + and read the :ref:`tarfile-extraction-filter` section for details. + .. versionchanged:: 3.2 Added the *set_attrs* parameter. @@ -445,6 +488,9 @@ be finalized; only the internally used file object will be closed. See the .. versionchanged:: 3.6 The *path* parameter accepts a :term:`path-like object`. + .. versionchanged:: 3.9.17 + Added the *filter* parameter. + .. method:: TarFile.extractfile(member) @@ -457,6 +503,57 @@ be finalized; only the internally used file object will be closed. See the .. versionchanged:: 3.3 Return an :class:`io.BufferedReader` object. +.. attribute:: TarFile.errorlevel + :type: int + + If *errorlevel* is ``0``, errors are ignored when using :meth:`TarFile.extract` + and :meth:`TarFile.extractall`. + Nevertheless, they appear as error messages in the debug output when + *debug* is greater than 0. + If ``1`` (the default), all *fatal* errors are raised as :exc:`OSError` or + :exc:`FilterError` exceptions. If ``2``, all *non-fatal* errors are raised + as :exc:`TarError` exceptions as well. + + Some exceptions, e.g. ones caused by wrong argument types or data + corruption, are always raised. + + Custom :ref:`extraction filters <tarfile-extraction-filter>` + should raise :exc:`FilterError` for *fatal* errors + and :exc:`ExtractError` for *non-fatal* ones. + + Note that when an exception is raised, the archive may be partially + extracted. It is the user’s responsibility to clean up. + +.. attribute:: TarFile.extraction_filter + + .. versionadded:: 3.9.17 + + The :ref:`extraction filter <tarfile-extraction-filter>` used + as a default for the *filter* argument of :meth:`~TarFile.extract` + and :meth:`~TarFile.extractall`. + + The attribute may be ``None`` or a callable. + String names are not allowed for this attribute, unlike the *filter* + argument to :meth:`~TarFile.extract`. + + If ``extraction_filter`` is ``None`` (the default), + calling an extraction method without a *filter* argument will + use the :func:`fully_trusted <fully_trusted_filter>` filter for + compatibility with previous Python versions. + + In Python 3.12+, leaving ``extraction_filter=None`` will emit a + ``DeprecationWarning``. + + In Python 3.14+, leaving ``extraction_filter=None`` will cause + extraction methods to use the :func:`data <data_filter>` filter by default. + + The attribute may be set on instances or overridden in subclasses. + It also is possible to set it on the ``TarFile`` class itself to set a + global default, although, since it affects all uses of *tarfile*, + it is best practice to only do so in top-level applications or + :mod:`site configuration <site>`. + To set a global default this way, a filter function needs to be wrapped in + :func:`staticmethod()` to prevent injection of a ``self`` argument. .. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None) @@ -532,7 +629,27 @@ permissions, owner etc.), it provides some useful methods to determine its type. It does *not* contain the file's data itself. :class:`TarInfo` objects are returned by :class:`TarFile`'s methods -:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`. +:meth:`~TarFile.getmember`, :meth:`~TarFile.getmembers` and +:meth:`~TarFile.gettarinfo`. + +Modifying the objects returned by :meth:`~!TarFile.getmember` or +:meth:`~!TarFile.getmembers` will affect all subsequent +operations on the archive. +For cases where this is unwanted, you can use :mod:`copy.copy() <copy>` or +call the :meth:`~TarInfo.replace` method to create a modified copy in one step. + +Several attributes can be set to ``None`` to indicate that a piece of metadata +is unused or unknown. +Different :class:`TarInfo` methods handle ``None`` differently: + +- The :meth:`~TarFile.extract` or :meth:`~TarFile.extractall` methods will + ignore the corresponding metadata, leaving it set to a default. +- :meth:`~TarFile.addfile` will fail. +- :meth:`~TarFile.list` will print a placeholder string. + + +.. versionchanged:: 3.9.17 + Added :meth:`~TarInfo.replace` and handling of ``None``. .. class:: TarInfo(name="") @@ -566,24 +683,39 @@ A ``TarInfo`` object has the following public data attributes: .. attribute:: TarInfo.name + :type: str Name of the archive member. .. attribute:: TarInfo.size + :type: int Size in bytes. .. attribute:: TarInfo.mtime + :type: int | float + + Time of last modification in seconds since the :ref:`epoch <epoch>`, + as in :attr:`os.stat_result.st_mtime`. - Time of last modification. + .. versionchanged:: 3.9.17 + Can be set to ``None`` for :meth:`~TarFile.extract` and + :meth:`~TarFile.extractall`, causing extraction to skip applying this + attribute. .. attribute:: TarInfo.mode + :type: int - Permission bits. + Permission bits, as for :func:`os.chmod`. + .. versionchanged:: 3.9.17 + + Can be set to ``None`` for :meth:`~TarFile.extract` and + :meth:`~TarFile.extractall`, causing extraction to skip applying this + attribute. .. attribute:: TarInfo.type @@ -595,35 +727,76 @@ A ``TarInfo`` object has the following public data attributes: .. attribute:: TarInfo.linkname + :type: str Name of the target file name, which is only present in :class:`TarInfo` objects of type :const:`LNKTYPE` and :const:`SYMTYPE`. .. attribute:: TarInfo.uid + :type: int User ID of the user who originally stored this member. + .. versionchanged:: 3.9.17 + + Can be set to ``None`` for :meth:`~TarFile.extract` and + :meth:`~TarFile.extractall`, causing extraction to skip applying this + attribute. .. attribute:: TarInfo.gid + :type: int Group ID of the user who originally stored this member. + .. versionchanged:: 3.9.17 + + Can be set to ``None`` for :meth:`~TarFile.extract` and + :meth:`~TarFile.extractall`, causing extraction to skip applying this + attribute. .. attribute:: TarInfo.uname + :type: str User name. + .. versionchanged:: 3.9.17 + + Can be set to ``None`` for :meth:`~TarFile.extract` and + :meth:`~TarFile.extractall`, causing extraction to skip applying this + attribute. .. attribute:: TarInfo.gname + :type: str Group name. + .. versionchanged:: 3.9.17 + + Can be set to ``None`` for :meth:`~TarFile.extract` and + :meth:`~TarFile.extractall`, causing extraction to skip applying this + attribute. .. attribute:: TarInfo.pax_headers + :type: dict A dictionary containing key-value pairs of an associated pax extended header. +.. method:: TarInfo.replace(name=..., mtime=..., mode=..., linkname=..., + uid=..., gid=..., uname=..., gname=..., + deep=True) + + .. versionadded:: 3.9.17 + + Return a *new* copy of the :class:`!TarInfo` object with the given attributes + changed. For example, to return a ``TarInfo`` with the group name set to + ``'staff'``, use:: + + new_tarinfo = old_tarinfo.replace(gname='staff') + + By default, a deep copy is made. + If *deep* is false, the copy is shallow, i.e. ``pax_headers`` + and any custom attributes are shared with the original ``TarInfo`` object. A :class:`TarInfo` object also provides some convenient query methods: @@ -673,9 +846,259 @@ A :class:`TarInfo` object also provides some convenient query methods: Return :const:`True` if it is one of character device, block device or FIFO. +.. _tarfile-extraction-filter: + +Extraction filters +------------------ + +.. versionadded:: 3.9.17 + +The *tar* format is designed to capture all details of a UNIX-like filesystem, +which makes it very powerful. +Unfortunately, the features make it easy to create tar files that have +unintended -- and possibly malicious -- effects when extracted. +For example, extracting a tar file can overwrite arbitrary files in various +ways (e.g. by using absolute paths, ``..`` path components, or symlinks that +affect later members). + +In most cases, the full functionality is not needed. +Therefore, *tarfile* supports extraction filters: a mechanism to limit +functionality, and thus mitigate some of the security issues. + +.. seealso:: + + :pep:`706` + Contains further motivation and rationale behind the design. + +The *filter* argument to :meth:`TarFile.extract` or :meth:`~TarFile.extractall` +can be: + +* the string ``'fully_trusted'``: Honor all metadata as specified in the + archive. + Should be used if the user trusts the archive completely, or implements + their own complex verification. + +* the string ``'tar'``: Honor most *tar*-specific features (i.e. features of + UNIX-like filesystems), but block features that are very likely to be + surprising or malicious. See :func:`tar_filter` for details. + +* the string ``'data'``: Ignore or block most features specific to UNIX-like + filesystems. Intended for extracting cross-platform data archives. + See :func:`data_filter` for details. + +* ``None`` (default): Use :attr:`TarFile.extraction_filter`. + + If that is also ``None`` (the default), the ``'fully_trusted'`` + filter will be used (for compatibility with earlier versions of Python). + + In Python 3.12, the default will emit a ``DeprecationWarning``. + + In Python 3.14, the ``'data'`` filter will become the default instead. + It's possible to switch earlier; see :attr:`TarFile.extraction_filter`. + +* A callable which will be called for each extracted member with a + :ref:`TarInfo <tarinfo-objects>` describing the member and the destination + path to where the archive is extracted (i.e. the same path is used for all + members):: + + filter(/, member: TarInfo, path: str) -> TarInfo | None + + The callable is called just before each member is extracted, so it can + take the current state of the disk into account. + It can: + + - return a :class:`TarInfo` object which will be used instead of the metadata + in the archive, or + - return ``None``, in which case the member will be skipped, or + - raise an exception to abort the operation or skip the member, + depending on :attr:`~TarFile.errorlevel`. + Note that when extraction is aborted, :meth:`~TarFile.extractall` may leave + the archive partially extracted. It does not attempt to clean up. + +Default named filters +~~~~~~~~~~~~~~~~~~~~~ + +The pre-defined, named filters are available as functions, so they can be +reused in custom filters: + +.. function:: fully_trusted_filter(/, member, path) + + Return *member* unchanged. + + This implements the ``'fully_trusted'`` filter. + +.. function:: tar_filter(/, member, path) + + Implements the ``'tar'`` filter. + + - Strip leading slashes (``/`` and :attr:`os.sep`) from filenames. + - :ref:`Refuse <tarfile-extraction-refuse>` to extract files with absolute + paths (in case the name is absolute + even after stripping slashes, e.g. ``C:/foo`` on Windows). + This raises :class:`~tarfile.AbsolutePathError`. + - :ref:`Refuse <tarfile-extraction-refuse>` to extract files whose absolute + path (after following symlinks) would end up outside the destination. + This raises :class:`~tarfile.OutsideDestinationError`. + - Clear high mode bits (setuid, setgid, sticky) and group/other write bits + (:attr:`~stat.S_IWGRP`|:attr:`~stat.S_IWOTH`). + + Return the modified ``TarInfo`` member. + +.. function:: data_filter(/, member, path) + + Implements the ``'data'`` filter. + In addition to what ``tar_filter`` does: + + - :ref:`Refuse <tarfile-extraction-refuse>` to extract links (hard or soft) + that link to absolute paths, or ones that link outside the destination. + + This raises :class:`~tarfile.AbsoluteLinkError` or + :class:`~tarfile.LinkOutsideDestinationError`. + + Note that such files are refused even on platforms that do not support + symbolic links. + + - :ref:`Refuse <tarfile-extraction-refuse>` to extract device files + (including pipes). + This raises :class:`~tarfile.SpecialFileError`. + + - For regular files, including hard links: + + - Set the owner read and write permissions + (:attr:`~stat.S_IRUSR`|:attr:`~stat.S_IWUSR`). + - Remove the group & other executable permission + (:attr:`~stat.S_IXGRP`|:attr:`~stat.S_IXOTH`) + if the owner doesn’t have it (:attr:`~stat.S_IXUSR`). + + - For other files (directories), set ``mode`` to ``None``, so + that extraction methods skip applying permission bits. + - Set user and group info (``uid``, ``gid``, ``uname``, ``gname``) + to ``None``, so that extraction methods skip setting it. + + Return the modified ``TarInfo`` member. + + +.. _tarfile-extraction-refuse: + +Filter errors +~~~~~~~~~~~~~ + +When a filter refuses to extract a file, it will raise an appropriate exception, +a subclass of :class:`~tarfile.FilterError`. +This will abort the extraction if :attr:`TarFile.errorlevel` is 1 or more. +With ``errorlevel=0`` the error will be logged and the member will be skipped, +but extraction will continue. + + +Hints for further verification +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Even with ``filter='data'``, *tarfile* is not suited for extracting untrusted +files without prior inspection. +Among other issues, the pre-defined filters do not prevent denial-of-service +attacks. Users should do additional checks. + +Here is an incomplete list of things to consider: + +* Extract to a :func:`new temporary directory <tempfile.mkdtemp>` + to prevent e.g. exploiting pre-existing links, and to make it easier to + clean up after a failed extraction. +* When working with untrusted data, use external (e.g. OS-level) limits on + disk, memory and CPU usage. +* Check filenames against an allow-list of characters + (to filter out control characters, confusables, foreign path separators, + etc.). +* Check that filenames have expected extensions (discouraging files that + execute when you “click on them”, or extension-less files like Windows special device names). +* Limit the number of extracted files, total size of extracted data, + filename length (including symlink length), and size of individual files. +* Check for files that would be shadowed on case-insensitive filesystems. + +Also note that: + +* Tar files may contain multiple versions of the same file. + Later ones are expected to overwrite any earlier ones. + This feature is crucial to allow updating tape archives, but can be abused + maliciously. +* *tarfile* does not protect against issues with “live” data, + e.g. an attacker tinkering with the destination (or source) directory while + extraction (or archiving) is in progress. + + +Supporting older Python versions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Extraction filters were added to Python 3.12, and are backported to older +versions as security updates. +To check whether the feature is available, use e.g. +``hasattr(tarfile, 'data_filter')`` rather than checking the Python version. + +The following examples show how to support Python versions with and without +the feature. +Note that setting ``extraction_filter`` will affect any subsequent operations. + +* Fully trusted archive:: + + my_tarfile.extraction_filter = (lambda member, path: member) + my_tarfile.extractall() + +* Use the ``'data'`` filter if available, but revert to Python 3.11 behavior + (``'fully_trusted'``) if this feature is not available:: + + my_tarfile.extraction_filter = getattr(tarfile, 'data_filter', + (lambda member, path: member)) + my_tarfile.extractall() + +* Use the ``'data'`` filter; *fail* if it is not available:: + + my_tarfile.extractall(filter=tarfile.data_filter) + + or:: + + my_tarfile.extraction_filter = tarfile.data_filter + my_tarfile.extractall() + +* Use the ``'data'`` filter; *warn* if it is not available:: + + if hasattr(tarfile, 'data_filter'): + my_tarfile.extractall(filter='data') + else: + # remove this when no longer needed + warn_the_user('Extracting may be unsafe; consider updating Python') + my_tarfile.extractall() + + +Stateful extraction filter example +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +While *tarfile*'s extraction methods take a simple *filter* callable, +custom filters may be more complex objects with an internal state. +It may be useful to write these as context managers, to be used like this:: + + with StatefulFilter() as filter_func: + tar.extractall(path, filter=filter_func) + +Such a filter can be written as, for example:: + + class StatefulFilter: + def __init__(self): + self.file_count = 0 + + def __enter__(self): + return self + + def __call__(self, member, path): + self.file_count += 1 + return member + + def __exit__(self, *exc_info): + print(f'{self.file_count} files extracted') + + .. _tarfile-commandline: .. program:: tarfile + Command-Line Interface ---------------------- @@ -745,6 +1168,15 @@ Command-line options Verbose output. +.. cmdoption:: --filter <filtername> + + Specifies the *filter* for ``--extract``. + See :ref:`tarfile-extraction-filter` for details. + Only string names are accepted (that is, ``fully_trusted``, ``tar``, + and ``data``). + + .. versionadded:: 3.9.17 + .. _tar-examples: Examples |