summaryrefslogtreecommitdiffstats
path: root/Lib/pathlib.py
Commit message (Collapse)AuthorAgeFilesLines
* GH-112361: Speed up pathlib by removing some temporary objects. (#112362)Barney Gale2023-11-251-20/+12
| | | | | Construct only one new list object (using `list.copy()`) when creating a new path object with a modified tail. This slightly speeds up `with_name()` and `with_suffix()`
* gh-110745: add a newline argument to pathlib.Path.read_text (#110880)Junya Okabe2023-11-211-2/+2
| | | | Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Barney Gale <barney.gale@gmail.com>
* GH-110109: Speed up `pathlib._PathBase.resolve()` (#110412)Barney Gale2023-11-171-22/+17
| | | | | | | | - Add fast path to `_split_stack()` - Skip unnecessarily resolution of the current directory when a relative path is given to `resolve()` - Remove stat and target caches, which slow down most `resolve()` calls in practice. - Slightly refactor code for clarity.
* GH-110109: Churn `pathlib.PurePath` methods (#112012)Barney Gale2023-11-171-120/+120
| | | | | | | | | | | | | | Re-arrange `pathlib.PurePath` methods in source code. No other changes. The `PurePath` implementations of certain special methods, such as `__eq__()` and `__hash__()`, are not usually applicable to user subclasses of `_PathBase`. To facilitate their removal, another patch will split the `PurePath` class into `_PurePathBase` and `PurePath`, with the latter providing these special methods. This patch prepares the ground for splitting `PurePath`. It's similar to e8d77b0, which preceded splitting `Path`. By churning the methods here, subsequent patches will be easier to review and less likely to break things.
* GH-72904: Add `glob.translate()` function (#106703)Barney Gale2023-11-131-104/+21
| | | | | | | | | | | Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`. This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment. In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code. Co-authored-by: Jason R. Coombs <jaraco@jaraco.com> Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
* GH-111429: Speed up `pathlib.PurePath.[is_]relative_to()` (#111431)Barney Gale2023-11-121-4/+8
|
* gh-111259: Optimize recursive wildcards in pathlib (GH-111303)Serhiy Storchaka2023-10-261-3/+3
| | | Regular expression pattern `(?s:.)` is much faster than `[\s\S]`.
* GH-110488: Fix two small issues in `pathlib.PurePath.with_name()` (#110651)Barney Gale2023-10-111-2/+1
| | | | | | Ensure that `PurePath('foo/a').with_name('.')` raises `ValueError` Ensure that `PureWindowsPath('foo/a').with_name('a:b')` does not raise `ValueError`.
* GH-107465: Add `pathlib.Path.from_uri()` classmethod. (#107640)Barney Gale2023-10-011-5/+35
| | | | | | | This method supports file URIs (including variants) as described in RFC 8089, such as URIs generated by `pathlib.Path.as_uri()` and `urllib.request.pathname2url()`. The method is added to `Path` rather than `PurePath` because it uses `os.fsdecode()`, and so its results vary from system to system. I intend to deprecate `PurePath.as_uri()` and move it to `Path` for the same reason. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
* GH-89812: Add `pathlib._PathBase` (#106337)Barney Gale2023-09-301-87/+359
| | | | | Add private `pathlib._PathBase` class. This will be used by an experimental PyPI package to incubate a `tarfile.TarPath` class. Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
* GH-109187: Improve symlink loop handling in `pathlib.Path.resolve()` (GH-109192)Barney Gale2023-09-261-20/+1
| | | | Treat symlink loops like other errors: in strict mode, raise `OSError`, and in non-strict mode, do not raise any exception.
* GH-78722: Raise exceptions from `pathlib.Path.iterdir()` without delay. ↵Barney Gale2023-09-021-2/+1
| | | | | | | (#107320) `pathlib.Path.iterdir()` now immediately raises any `OSError` exception from `os.listdir()`, rather than waiting until its result is iterated over.
* GH-70303: Emit FutureWarning when pathlib glob pattern ends with `**` ↵Barney Gale2023-08-041-0/+5
| | | | | | (GH-105413) In a future Python release, patterns with this ending will match both files and directories. Users may add a trailing slash to remove the warning.
* gh-105002: [pathlib] Fix relative_to with walk_up=True using ".." (#107014)János Kukovecz2023-07-261-2/+4
| | | | It makes sense to raise an Error because ".." can not be resolved and the current working directory is unknown.
* GH-100502: Add `pathlib.PurePath.pathmod` attribute (GH-106533)Barney Gale2023-07-191-42/+42
| | | | This instance attribute stores the implementation of `os.path` used for low-level path operations: either `posixpath` or `ntpath`.
* GH-106330: Fix matching of empty path in `pathlib.PurePath.match()` (GH-106331)Barney Gale2023-07-031-2/+6
| | | | | We match paths using the `_lines` attribute, which is derived from the path's string representation. The bug arises because an empty path's string representation is `'.'` (not `''`), which is matched by the `'*'` wildcard.
* GH-105793: Add follow_symlinks argument to `pathlib.Path.is_dir()` and ↵Barney Gale2023-06-261-4/+4
| | | | | | | `is_file()` (GH-105794) Brings `pathlib.Path.is_dir()` and `in line with `os.DirEntry.is_dir()`, which will be important for implementing generic path walking and globbing. Likewise `is_file()`.
* GH-89812: Add `pathlib.UnsupportedOperation` (GH-105926)Barney Gale2023-06-221-7/+15
| | | | | | | This new exception type is raised instead of `NotImplementedError` when a path operation is not supported. It can be raised from `Path.readlink()`, `symlink_to()`, `hardlink_to()`, `owner()` and `group()`. In a future version of pathlib, it will be raised by `AbstractPath` for these methods and others, such as `AbstractPath.mkdir()` and `unlink()`.
* GH-104996: Defer joining of `pathlib.PurePath()` arguments. (GH-104999)Barney Gale2023-06-071-17/+27
| | | | Joining of arguments is moved to `_load_parts`, which is called when a normalized path is needed.
* GH-102613: Fast recursive globbing in `pathlib.Path.glob()` (GH-104512)Barney Gale2023-06-061-136/+133
| | | | | | | | | | | | | | This commit introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal `**` wildcard, such as `**/*.py`. For this example, the previous implementation recursively walked directories using `os.scandir()` when it expanded the `**` component, and then **scanned those same directories again** when expanded the `*.py` component. This is wasteful. In the new implementation, any components following a `**` wildcard are used to build a `re.Pattern` object, which is used to filter the results of the recursive walk. A pattern like `**/*.py` uses half the number of `os.scandir()` calls; a pattern like `**/*/*.py` a third, etc. This new algorithm does not apply if either: 1. The *follow_symlinks* argument is set to `None` (its default), or 2. The pattern contains `..` components. In these cases we fall back to the old implementation. This commit also replaces selector classes with selector functions. These generators directly yield results rather calling through to their successors. A new internal `Path._glob()` method takes care to chain these generators together, which simplifies the lazy algorithm and slightly improves performance. It should also be easier to understand and maintain.
* GH-73435: Implement recursive wildcards in `pathlib.PurePath.match()` (#101398)Barney Gale2023-05-301-14/+85
| | | | | | | | `PurePath.match()` now handles the `**` wildcard as in `Path.glob()`, i.e. it matches any number of path segments. We now compile a `re.Pattern` object for the entire pattern. This is made more difficult by `fnmatch` not treating directory separators as special when evaluating wildcards (`*`, `?`, etc), and so we arrange the path parts onto separate *lines* in a string, and ensure we don't set `re.DOTALL`. Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
* GH-104898: Revert pathlib os.PathLike registration change. (GH-105073)Barney Gale2023-05-291-1/+5
| | | | | | | | Subclassing `os.PathLike` rather than using `register()` makes initialisation slower, due to the additional `__isinstance__` work. This partially reverts commit bd1b6228d132b8e9836fe352cd8dca2b6c1bd98c. Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
* GH-77609: Add follow_symlinks argument to `pathlib.Path.glob()` (GH-102616)Barney Gale2023-05-291-19/+21
| | | | | Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and`rglob()`. When *follow_symlinks* is `None` (the default), these methods follow symlinks except when evaluating "`**`" wildcards. When set to true or false, symlinks are always or never followed, respectively.
* GH-103631: Fix `PurePosixPath(PureWindowsPath(...))` separator handling ↵Barney Gale2023-05-261-0/+3
| | | | | | | | | (GH-104949) For backwards compatibility, accept backslashes as path separators in `PurePosixPath` if an instance of `PureWindowsPath` is supplied. This restores behaviour from Python 3.11. Co-authored-by: Gregory P. Smith <greg@krypto.org>
* GH-104947: Make pathlib.PureWindowsPath comparisons consistent across ↵Barney Gale2023-05-261-1/+4
| | | | | | | | platforms (GH-104948) Use `str.lower()` rather than `ntpath.normcase()` to normalize case of Windows paths. This restores behaviour from Python 3.11. Co-authored-by: Gregory P. Smith <greg@krypto.org>
* GH-104898: Add __slots__ to os.PathLike (GH-104899)Barney Gale2023-05-251-5/+1
|
* GH-83863: Drop support for using `pathlib.Path` objects as context managers ↵Barney Gale2023-05-231-19/+0
| | | | | | | | | | (GH-104807) In Python 3.8 and prior, `pathlib.Path.__exit__()` marked a path as closed; some subsequent attempts to perform I/O would raise an IOError. This functionality was never documented, and had the effect of making `Path` objects mutable, contrary to PEP 428. In Python 3.9 we made `__exit__()` a no-op, and in 3.11 `__enter__()` began raising deprecation warnings. Here we remove both methods.
* GH-104484: Add case_sensitive argument to `pathlib.PurePath.match()` (GH-104565)thirumurugan2023-05-181-6/+14
| | | Co-authored-by: Barney Gale <barney.gale@gmail.com>
* GH-102613: Fix recursion error from `pathlib.Path.glob()` (GH-104373)Barney Gale2023-05-151-20/+5
| | | | Use `Path.walk()` to implement the recursive wildcard `**`. This method uses an iterative (rather than recursive) walk - see GH-100282.
* GH-90208: Suppress OSError exceptions from `pathlib.Path.glob()` (GH-104141)Barney Gale2023-05-111-20/+13
| | | | | | | | | | | `pathlib.Path.glob()` now suppresses all OSError exceptions, except those raised from calling `is_dir()` on the top-level path. Previously, `glob()` suppressed ENOENT, ENOTDIR, EBADF and ELOOP errors and their Windows equivalents. PermissionError was also suppressed unless it occurred when calling `is_dir()` on the top-level path. However, the selector would abort prematurely if a PermissionError was raised, and so `glob()` could return incomplete results.
* GH-87695: Fix OSError from `pathlib.Path.glob()` (GH-104292)Barney Gale2023-05-101-2/+2
| | | | Fix issue where `pathlib.Path.glob()` raised `OSError` when it encountered a symlink to an overly long path.
* GH-102613: Improve performance of `pathlib.Path.rglob()` (GH-104244)Barney Gale2023-05-071-17/+37
| | | | | | | | | | Stop de-duplicating results in `_RecursiveWildcardSelector`. A new `_DoubleRecursiveWildcardSelector` class is introduced which performs de-duplication, but this is used _only_ for patterns with multiple non-adjacent `**` segments, such as `path.glob('**/foo/**')`. By avoiding the use of a set, `PurePath.__hash__()` is not called, and so paths do not need to be stringified and case-normalised. Also merge adjacent '**' segments in patterns.
* GH-89812: Churn `pathlib.Path` methods (GH-104243)Barney Gale2023-05-071-303/+303
| | | | | | | | | | | | | | | Re-arrange `pathlib.Path` methods in source code. No other changes. The methods are arranged as follows: 1. `stat()` and dependants (`exists()`, `is_dir()`, etc) 2. `open()` and dependants (`read_text()`, `write_bytes()`, etc) 3. `iterdir()` and dependants (`glob()`, `walk()`, etc) 4. All other `Path` methods This patch prepares the ground for a new `_AbstractPath` class, which will support the methods in groups 1, 2 and 3 above. By churning the methods here, subsequent patches will be easier to review and less likely to break things.
* GH-103548: Improve performance of `pathlib.Path.[is_]absolute()` (GH-103549)Barney Gale2023-05-061-1/+10
| | | | Improve performance of `pathlib.Path.absolute()` and `cwd()` by joining paths only when necessary. Also improve performance of `PurePath.is_absolute()` on Posix by skipping path parsing and normalization.
* GH-100479: Add `pathlib.PurePath.with_segments()` (GH-103975)Barney Gale2023-05-051-30/+36
| | | | | Add `pathlib.PurePath.with_segments()`, which creates a path object from arguments. This method is called whenever a derivative path is created, such as from `pathlib.PurePath.parent`. Subclasses may override this method to share information between path objects. Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
* GH-81079: Add case_sensitive argument to `pathlib.Path.glob()` (GH-102710)Barney Gale2023-05-041-15/+19
| | | | | | This argument allows case-sensitive matching to be enabled on Windows, and case-insensitive matching to be enabled on Posix. Co-authored-by: Steve Dower <steve.dower@microsoft.com>
* GH-104114: Fix `pathlib.WindowsPath.glob()` use of literal pattern segment ↵Barney Gale2023-05-031-39/+13
| | | | | | | | | case (GH-104116) We now use `_WildcardSelector` to evaluate literal pattern segments, which allows us to retrieve the real filesystem case. This change is necessary in order to implement a *case_sensitive* argument (see GH-81079) and a *follow_symlinks* argument (see GH-77609).
* GH-89769: `pathlib.Path.glob()`: do not follow symlinks when checking for ↵andrei kulakov2023-05-031-3/+7
| | | | | precise match (GH-29655) Co-authored-by: Barney Gale <barney.gale@gmail.com>
* GH-104102: Optimize `pathlib.Path.glob()` handling of `../` pattern segments ↵Barney Gale2023-05-021-0/+12
| | | | | | | | (GH-104103) These segments do not require a `stat()` call, as the selector's `_select_from()` method is called after we've established that the parent is a directory.
* GH-104104: Optimize `pathlib.Path.glob()` by avoiding repeated calls to ↵Barney Gale2023-05-021-11/+14
| | | | | | `os.path.normcase()` (GH-104105) Use `re.IGNORECASE` to implement case-insensitive matching. This restores behaviour from before GH-31691.
* GH-103525: Improve exception message from `pathlib.PurePath()` (GH-103526)Barney Gale2023-05-021-14/+23
| | | | | | | | Check that arguments are strings before calling `os.path.join()`. Also improve performance of `PurePath(PurePath(...))` while we're in the area: we now use the *unnormalized* string path of such arguments. Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
* GH-78079: Fix UNC device path root normalization in pathlib (GH-102003)Barney Gale2023-04-141-3/+8
| | | | | | | We no longer add a root to device paths such as `//./PhysicalDrive0`, `//?/BootPartition` and `//./c:` while normalizing. We also avoid adding a root to incomplete UNC share paths, like `//`, `//a` and `//a/`. Co-authored-by: Eryk Sun <eryksun@gmail.com>
* GH-101362: Omit path anchor from `pathlib.PurePath()._parts` (GH-102476)Barney Gale2023-04-091-65/+106
| | | Improve performance of path construction by skipping the addition of the path anchor (`drive + root`) to the internal `_parts` list. Rename this attribute to `_tail` for clarity.
* GH-76846, GH-85281: Call `__new__()` and `__init__()` on pathlib subclasses ↵Barney Gale2023-04-031-67/+78
| | | | | | | | | (GH-102789) Fix an issue where `__new__()` and `__init__()` were not called on subclasses of `pathlib.PurePath` and `Path` in some circumstances. Paths are now normalized on-demand. This speeds up path construction, `p.joinpath(q)`, and `p / q`. Co-authored-by: Steve Dower <steve.dower@microsoft.com>
* GH-89727: Fix pathlib.Path.walk RecursionError on deep trees (GH-100282)Stanislav Zmiev2023-03-221-38/+40
| | | | | | Use a stack to implement `pathlib.Path.walk()` iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees. Co-authored-by: Barney Gale <barney.gale@gmail.com> Co-authored-by: Brett Cannon <brett@python.org>
* GH-80486: Fix handling of NTFS alternate data streams in pathlib (GH-102454)Barney Gale2023-03-101-3/+5
| | | Co-authored-by: Maor Kleinberger <kmaork@gmail.com>
* GH-101362: Optimise PurePath(PurePath(...)) (GH-101667)Barney Gale2023-03-051-25/+11
| | | | | | | The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized! This reduces the time taken to run `PurePath("foo", "bar")` by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%. Automerge-Triggered-By: GH:AlexWaygood
* GH-101362: Check pathlib.Path flavour compatibility at import time (GH-101664)Barney Gale2023-03-051-5/+11
| | | | | This saves a comparison in `pathlib.Path.__new__()` and reduces the time taken to run `Path()` by ~5%. Automerge-Triggered-By: GH:AlexWaygood
* GH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() ↵Barney Gale2023-03-051-1/+4
| | | | | | | (#101665) GH-101362: Call join() only when >1 argument supplied to pathlib.PurePath This reduces the time taken to run `PurePath("foo")` by ~15%
* gh-100809: Fix handling of drive-relative paths in pathlib.Path.absolute() ↵Barney Gale2023-02-171-1/+6
| | | | | (GH-100812) Resolving the drive independently uses the OS API, which ensures it starts from the current directory on that drive.