summaryrefslogtreecommitdiffstats
path: root/Lib/tarfile.py
Commit message (Collapse)AuthorAgeFilesLines
* gh-121267: Improve performance of tarfile (#121267) (#121269)Johan Förberg2024-10-301-8/+17
| | | | | | | | | | | | | | | Tarfile in the default write mode spends much of its time resolving UIDs into usernames and GIDs into group names. By caching these mappings, a significant speedup can be achieved. In my simple benchmark[1], this extra caching speeds up tarfile by 8x. [1] https://gist.github.com/jforberg/86af759c796199740c31547ae828aef2 --------- Co-authored-by: Tian Gao <gaogaotiantian@hotmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
* gh-121285: Remove backtracking when parsing tarfile headers (GH-121286)Seth Michael Larson2024-08-311-35/+68
| | | | | | | | * Remove backtracking when parsing tarfile headers * Rewrite PAX header parsing to be stricter * Optimize parsing of GNU extended sparse headers v0.0 Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru> Co-authored-by: Gregory P. Smith <greg@krypto.org>
* gh-121999: Change default tarfile filter to 'data' (GH-122002)WilliamRoyNelson2024-07-261-7/+1
| | | | | | Co-authored-by: Tomas R <tomas.roun8@gmail.com> Co-authored-by: Scott Odle <scott@sjodle.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
* gh-118673: Remove shebang and executable bits from stdlib modules. (#119658)Jason R. Coombs2024-05-291-1/+0
| | | | | | | * gh-118673: Remove shebang and executable bits from stdlib modules. * Removed shebangs and exe bits on turtledemo scripts. The setting was inappropriate for '__main__' and inconsistent across the other modules. The scripts can still be executed directly by invoking with the desired interpreter.
* Remove almost all unpaired backticks in docstrings (#119231)Geoffrey Thomas2024-05-221-30/+30
| | | | | | | | | | | | | | | | | | As reported in #117847 and #115366, an unpaired backtick in a docstring tends to confuse e.g. Sphinx running on subclasses of standard library objects, and the typographic style of using a backtick as an opening quote is no longer in favor. Convert almost all uses of the form The variable `foo' should do xyz to The variable 'foo' should do xyz and also fix up miscellaneous other unpaired backticks (extraneous / missing characters). No functional change is intended here other than in human-readable docstrings.
* gh-115961: Add name and mode attributes for compressed file-like objects ↵Serhiy Storchaka2024-04-211-0/+4
| | | | | | | | | | (GH-116036) * Add name and mode attributes for compressed and archived file-like objects in modules bz2, lzma, tarfile and zipfile. * Change the value of the mode attribute of GzipFile from integer (1 or 2) to string ('rb' or 'wb'). * Change the value of the mode attribute of ZipExtFile from 'r' to 'rb'.
* gh-116931: Add fileobj parameter check for Tarfile.addfile (GH-117988)lyc85032024-04-191-4/+7
| | | | | | Tarfile.addfile now throws an ValueError when the user passes in a non-zero size tarinfo but does not provide a fileobj, instead of writing an incomplete entry.
* gh-117691: Add an appropriate stacklevel for PEP-706 tarfile deprecation ↵Alex Waygood2024-04-161-1/+1
| | | | warnings (GH-117872)
* gh-115256: Remove refcycles from tarfile writing (GH-115257)pan3242024-03-041-2/+20
|
* gh-67837, gh-112998: Fix dirs creation in concurrent extraction (GH-115082)Serhiy Storchaka2024-02-111-1/+1
| | | | | | | Avoid race conditions in the creation of directories during concurrent extraction in tarfile and zipfile. Co-authored-by: Samantha Hughes <shughes-uk@users.noreply.github.com> Co-authored-by: Peder Bergebakken Sundt <pbsds@hotmail.com>
* gh-114959: tarfile: do not ignore errors when extract a directory on top of ↵Serhiy Storchaka2024-02-031-1/+2
| | | | | | a file (GH-114960) Also, add tests common to tarfile and zipfile.
* gh-67641: Clarify documentation on bytes vs text with non-seeking tarfile ↵Stanley2023-12-271-4/+5
| | | | stream (GH-31610)
* gh-87264: Convert tarinfo type to stat type (GH-113230)Marat Idrisov2023-12-191-1/+6
| | | Co-authored-by: val-shkolnikov <val@nvsoft.net>
* gh-109653: Defer importing `warnings` in several modules (#110286)Alex Waygood2023-10-041-1/+1
|
* gh-107811: tarfile: treat overflow in UID/GID as failure to set it (#108369)Petr Viktorin2023-08-231-1/+2
|
* gh-107396: tarfiles: set self.exception before _init_read_gz() (GH-107485)balmeida-nokia2023-08-211-1/+1
| | | | | | | | | | | In the stack call of: _init_read_gz() ``` _read, tarfile.py:548 read, tarfile.py:526 _init_read_gz, tarfile.py:491 ``` a try;except exists that uses `self.exception`, so it needs to be set before calling _init_read_gz().
* gh-107845: Fix symlink handling for tarfile.data_filter (GH-107846)Petr Viktorin2023-08-211-2/+9
| | | | Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: Lumír 'Frenzy' Balhar <frenzy.madness@gmail.com>
* gh-102120: [TarFile] Add an iter function that doesn't cache (GH-102128)Robert O'Shea2023-05-231-6/+11
|
* gh-102950: Implement PEP 706 – Filter for tarfile.extractall (#102953)Petr Viktorin2023-04-241-41/+320
|
* gh-74468: [tarfile] Fix incorrect name attribute of ExFileObject (GH-102424)Oleg Iarygin2023-03-271-3/+3
| | | Co-authored-by: Simeon Visser <svisser@users.noreply.github.com>
* bpo-45975: Simplify some while-loops with walrus operator (GH-29347)Nick Drozd2022-11-261-9/+3
|
* gh-91078: Return None from TarFile.next when the tarfile is empty (GH-91850)Sam Ezeh2022-11-261-0/+2
| | | Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
* gh-99325: Remove unused `NameError` handling (#99326)Nikita Sobolev2022-11-111-7/+3
|
* bpo-26253: Add compressionlevel to tarfile stream (GH-2962)Yaron de Leeuw2022-06-251-9/+13
| | | | | | `tarfile` already accepts a compressionlevel argument for creating files. This patch adds the same for stream-based tarfile usage. The default is 9, the value that was previously hard-coded.
* gh-91387: Strip trailing slash from tarfile longname directories (GH-32423)Chris Fernald2022-06-171-0/+10
| | | Co-authored-by: Brett Cannon <brett@python.org>
* bpo-45863: tarfile: don't zero out header fields unnecessarily (GH-29693)Joshua Root2022-02-091-6/+15
| | | | | | | | | | Numeric fields of type float, notably mtime, can't be represented exactly in the ustar header, so the pax header is used. But it is helpful to set them to the nearest int (i.e. second rather than nanosecond precision mtimes) in the ustar header as well, for the benefit of unarchivers that don't understand the pax header. Add test for tarfile.TarInfo.create_pax_header to confirm correct behaviour.
* bpo-44289: Keep argument file object's current position in ↵Andrzej Mateja2022-02-091-0/+2
| | | | tarfile.is_tarfile (GH-26488)
* bpo-21987: Fix TarFile.getmember getting a dir with a trailing slash (GH-30283)andrei kulakov2022-01-211-1/+1
|
* bpo-39039: tarfile raises descriptive exception from zlib.error (GH-27766)Jack DeVries2021-09-291-0/+9
| | | | | | | * during tarfile parsing, a zlib error indicates invalid data * tarfile.open now raises a descriptive exception from the zlib error * this makes it clear to the user that they may be trying to open a corrupted tar file
* bpo-8978: improve tarfile.open error message when lzma / bz2 are missing ↵Anthony Sottile2021-04-271-2/+5
| | | | | (GH-24850) Automerge-Triggered-By: GH:pablogsal
* bpo-39717: [tarfile] update nested exception raising (GH-23739)Ethan Furman2020-12-121-32/+33
| | | | - `from None` if the new exception uses, or doesn't need, the previous one - `from e` if the previous exception is still relevant
* bpo-12800: tarfile: Restore fix from 011525ee9 (GH-21409)Julien Palard2020-11-251-0/+3
| | | Restore fix from 011525ee92eb1c13ad1a62d28725a840e28f8160.
* bpo-39693: mention KeyError in tarfile extractfile documentation (GH-18639)Andrey Doroschenko2020-10-201-3/+4
| | | Co-authored-by: Andrey Darascheka <andrei.daraschenka@leverx.com>
* bpo-41316: Make tarfile follow specs for FNAME (GH-21511)Artem Bulgakov2020-09-071-0/+2
| | | | | | | | | | tarfile writes full path to FNAME field of GZIP format instead of just basename if user specified absolute path. Some archive viewers may process file incorrectly. Also it creates security issue because anyone can know structure of directories on system and know username or other personal information. RFC1952 says about FNAME: This is the original name of the file being compressed, with any directory components removed. So tarfile must remove directory names from FNAME and write only basename of file. Automerge-Triggered-By: @jaraco
* bpo-39017: Avoid infinite loop in the tarfile module (GH-21454)Rishi2020-07-151-0/+2
| | | | | Avoid infinite loop when reading specially crafted TAR files using the tarfile module (CVE-2019-20907).
* bpo-18819: tarfile: only set device fields for device files (GH-18080)William Chargin2020-02-121-2/+10
| | | | | | The GNU docs describe the `devmajor` and `devminor` fields of the tar header struct only in the context of character and block special files, suggesting that in other cases they are not populated. Typical utilities behave accordingly; this patch teaches `tarfile` to do the same.
* bpo-39430: Fix race condition in lazy imports in tarfile. (GH-18161)Serhiy Storchaka2020-01-241-10/+8
| | | Use `from ... import ...` to ensure module is fully loaded before accessing its attributes.
* bpo-29435: Allow is_tarfile to take a filelike obj (GH-18090)William Woodruff2020-01-231-1/+6
| | | `is_tarfile()` now supports `name` being a file or file-like object.
* Add missing docstrings for TarInfo objects (#12555)Raymond Hettinger2019-03-271-7/+46
|
* bpo-36268: Change default tar format to pax from GNU. (GH-12355)CAM Gerlach2019-03-211-1/+1
|
* Clean up code which checked presence of os.{stat,lstat,chmod} (#11643)Anthony Sottile2019-02-251-8/+6
|
* bpo-34043: Optimize tarfile uncompress performance (GH-8089)INADA Naoki2018-07-061-18/+12
| | | | | | | | | | | tarfile._Stream has two buffer for compressed and uncompressed data. Those buffers are not aligned so unnecessary bytes slicing happens for every reading chunks. This commit bypass compressed buffering. In this benchmark [1], user time become 250ms from 300ms. [1]: https://bugs.python.org/msg320763
* bpo-34010: Fix tarfile read performance regression (GH-8020)hajoscher2018-07-041-9/+11
| | | | During buffered read, use a list followed by join instead of extending a bytes object. This is how it was done before but changed in commit b506dc32c1a.
* bpo-33842: Remove tarfile.filemode (GH-7661)INADA Naoki2018-06-281-7/+0
|
* bpo-32713: Fix tarfile.itn for large/negative float values. (GH-5434)Joffrey F2018-02-271-1/+2
|
* bpo-30693: zip+tarfile: sort directory listing (#2263)Bernhard M. Wiedemann2018-01-311-1/+1
| | | | | | tarfile and zipfile now sort directory listing to generate tar and zip archives in a more reproducible way. See also https://reproducible-builds.org/docs/stable-inputs/ on that topic.
* bpo-32297: Few misspellings found in Python source code comments. (#4803)Mike2017-12-141-1/+1
| | | | | | | | * Fix multiple typos in code comments * Add spacing in comments (test_logging.py, test_math.py) * Fix spaces at the beginning of comments in test_logging.py
* Remove two legacy constants which hopefully have no consumers (#1087)Alex Gaynor2017-04-121-2/+0
| | | The data contained in them is nonsensical
* bpo-29958: Minor improvements to zipfile and tarfile CLI. (#944)Serhiy Storchaka2017-04-071-9/+6
|
* bpo-29776: Use decorator syntax for properties. (#585)Serhiy Storchaka2017-03-191-6/+10
|