summaryrefslogtreecommitdiffstats
path: root/Lib/pathlib
Commit message (Collapse)AuthorAgeFilesLines
* GH-115060: Speed up `pathlib.Path.glob()` by removing redundant regex ↵Barney Gale2024-02-102-28/+62
| | | | | matching (#115061) When expanding and filtering paths for a `**` wildcard segment, build an `re.Pattern` object from the subsequent pattern parts, rather than the entire pattern, and match against the `os.DirEntry` object prior to instantiating a path object. Also skip compiling a pattern when expanding a `*` wildcard segment.
* GH-106747: Make pathlib ABC globbing more consistent with `glob.glob()` ↵Barney Gale2024-02-061-1/+1
| | | | | | | | | | | | (#115056) When expanding `**` wildcards, ensure we add a trailing slash to the topmost directory path. This matches `glob.glob()` behaviour: >>> glob.glob('dirA/**', recursive=True) ['dirA/', 'dirA/dirB', 'dirA/dirB/dirC'] This does not affect `pathlib.Path.glob()`, because trailing slashes aren't supported in pathlib proper.
* pathlib ABCs: drop partial, broken, untested support for `bytes` paths. ↵Barney Gale2024-01-311-4/+3
| | | | | | | (#114777) Methods like `full_match()`, `glob()`, etc, are difficult to make work with byte paths, and it's not worth the effort. This patch makes `PurePathBase` raise `TypeError` when given non-`str` path segments.
* pathlib ABCs: raise `UnsupportedOperation` directly. (#114776)Barney Gale2024-01-312-33/+31
| | | | | Raise `UnsupportedOperation` directly, rather than via an `_unsupported()` helper, to give human readers and IDEs/typecheckers/etc a bigger hint that these methods are abstract.
* GH-70303: Make `pathlib.Path.glob('**')` return both files and directories ↵Barney Gale2024-01-301-8/+0
| | | | | | | | | | | | (#114684) Return files and directories from `pathlib.Path.glob()` if the pattern ends with `**`. This is more compatible with `PurePath.full_match()` and with other glob implementations such as bash and `glob.glob()`. Users can add a trailing slash to match only directories. In my previous patch I added a `FutureWarning` with the intention of fixing this in Python 3.15. Upon further reflection I think this was an unnecessarily cautious remedy to a clear bug.
* GH-114610: Fix `pathlib._abc.PurePathBase.with_suffix('.ext')` handling of ↵Barney Gale2024-01-301-2/+5
| | | | | | | | | stems (#114613) Raise `ValueError` if `with_suffix('.ext')` is called on a path without a stem. Paths may only have a non-empty suffix if they also have a non-empty stem. ABC-only bugfix; no effect on public classes.
* GH-79634: Speed up pathlib globbing by removing `joinpath()` call. (#114623)Barney Gale2024-01-271-1/+1
| | | | | Remove `self.joinpath('')` call that should have been removed in 6313cdde. This makes `PathBase.glob('')` yield itself *without* adding a trailing slash. It's hard to say whether this is more or less correct, but at least everything else is faster, and there's no behaviour change in the public classes where empty glob patterns are disallowed.
* gh-88569: add `ntpath.isreserved()` (#95486)Barney Gale2024-01-261-21/+7
| | | | | | | | | | | Add `ntpath.isreserved()`, which identifies reserved pathnames such as "NUL", "AUX" and "CON". Deprecate `pathlib.PurePath.is_reserved()`. --------- Co-authored-by: Eryk Sun <eryksun@gmail.com> Co-authored-by: Brett Cannon <brett@python.org> Co-authored-by: Steve Dower <steve.dower@microsoft.com>
* GH-73435: Add `pathlib.PurePath.full_match()` (#114350)Barney Gale2024-01-262-16/+45
| | | | | | | | | | | | | | | | In 49f90ba we added support for the recursive wildcard `**` in `pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a problem: for relative patterns only, `match()` implicitly inserts a `**` token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: `PurePath('foo/bar').match('bar/**')` is true! This commit reverts the changes to `match()`, and instead adds a new `full_match()` method that: - Allows empty patterns - Supports the recursive wildcard `**` - Matches the *entire* path when given a relative pattern
* GH-113225: Speed up `pathlib.Path.walk(top_down=False)` (#113693)Barney Gale2024-01-201-4/+5
| | | | | | | | | | | | | | Use `_make_child_entry()` rather than `_make_child_relpath()` to retrieve path objects for directories to visit. This saves the allocation of one path object per directory in user subclasses of `PathBase`, and avoids a second loop. This trick does not apply when walking top-down, because users can affect the walk by modifying *dirnames* in-place. A side effect of this change is that, in bottom-up mode, subdirectories of each directory are visited in reverse order, and that this order doesn't match that of the names in *dirnames*. I suspect this is fine as the order is arbitrary anyway.
* GH-79634: Accept path-like objects as pathlib glob patterns. (#114017)Barney Gale2024-01-202-69/+78
| | | | | | | | | Allow `os.PathLike` objects to be passed as patterns to `pathlib.Path.glob()` and `rglob()`. (It's already possible to use them in `PurePath.match()`) While we're in the area: - Allow empty glob patterns in `PathBase` (but not `Path`) - Speed up globbing in `PathBase` by generating paths with trailing slashes only as a final step, rather than for every intermediate directory. - Simplify and speed up handling of rare patterns involving both `**` and `..` segments.
* Replace `pathlib._abc.PathModuleBase.splitroot()` with `splitdrive()` (#114065)Barney Gale2024-01-141-10/+8
| | | | | This allows users of the `pathlib-abc` PyPI package to use `posixpath` or `ntpath` as a path module in versions of Python lacking `os.path.splitroot()` (3.11 and before).
* Add `pathlib._abc.PathModuleBase` (#113893)Barney Gale2024-01-142-57/+127
| | | | | | | | | | | | | | | | Path modules provide a subset of the `os.path` API, specifically those functions needed to provide `PurePathBase` functionality. Each `PurePathBase` subclass references its path module via a `pathmod` class attribute. This commit adds a new `PathModuleBase` class, which provides abstract methods that unconditionally raise `UnsupportedOperation`. An instance of this class is assigned to `PurePathBase.pathmod`, replacing `posixpath`. As a result, `PurePathBase` is no longer POSIX-y by default, and all its methods raise `UnsupportedOperation` courtesy of `pathmod`. Users who subclass `PurePathBase` or `PathBase` should choose the path syntax by setting `pathmod` to `posixpath`, `ntpath`, `os.path`, or their own subclass of `PathModuleBase`, as circumstances demand.
* Add module docstring for `pathlib._abc`. (#113691)Barney Gale2024-01-131-0/+13
|
* pathlib ABCs: add `_raw_path` property (#113976)Barney Gale2024-01-132-20/+31
| | | | | | | | | It's wrong for the `PurePathBase` methods to rely so much on `__str__()`. Instead, they should treat the raw path(s) as opaque objects and leave the details to `pathmod`. This commit adds a `PurePathBase._raw_path` property and uses it through many of the other ABC methods. These methods are all redefined in `PurePath` and `Path`, so this has no effect on the public classes.
* GH-44626, GH-105476: Fix `ntpath.isabs()` handling of part-absolute paths ↵Barney Gale2024-01-131-5/+1
| | | | | | | | | | | | | (#113829) On Windows, `os.path.isabs()` now returns `False` when given a path that starts with exactly one (back)slash. This is more compatible with other functions in `os.path`, and with Microsoft's own documentation. Also adjust `pathlib.PureWindowsPath.is_absolute()` to call `ntpath.isabs()`, which corrects its handling of partial UNC/device paths like `//foo`. Co-authored-by: Jon Foster <jon@jon-foster.co.uk>
* pathlib ABCs: Require one or more initialiser arguments (#113885)Barney Gale2024-01-101-8/+2
| | | | | | | Refuse to guess what a user means when they initialise a pathlib ABC without any positional arguments. In mainline pathlib it's normalised to `.`, but in the ABCs this guess isn't appropriate; for example, the path type may not represent the current directory as `.`, or may have no concept of a "current directory" at all.
* GH-113528: Deoptimise `pathlib._abc.PurePathBase` (#113559)Barney Gale2024-01-092-114/+132
| | | | | | | Apply pathlib's normalization and performance tuning in `pathlib.PurePath`, but not `pathlib._abc.PurePathBase`. With this change, the pathlib ABCs do not normalize away alternate path separators, empty segments, or dot segments. A single string given to the initialiser will round-trip by default, i.e. `str(PurePathBase(my_string)) == my_string`. Implementors can set their own path domain-specific normalization scheme by overriding `__str__()` Eliminating path normalization makes maintaining and caching the path's parts and string representation both optional and not very useful, so this commit moves the `_drv`, `_root`, `_tail_cached` and `_str` slots from `PurePathBase` to `PurePath`. Only `_raw_paths` and `_resolving` slots remain in `PurePathBase`. This frees the ABCs from the burden of some of pathlib's hardest-to-understand code.
* GH-113528: Deoptimise `pathlib._abc.PurePathBase.relative_to()` (again) ↵Barney Gale2024-01-092-15/+42
| | | | | | | | | (#113882) Restore full battle-tested implementations of `PurePath.[is_]relative_to()`. These were recently split up in 3375dfe and a15a773. In `PurePathBase`, add entirely new implementations based on `_stack`, which itself calls `pathmod.split()` repeatedly to disassemble a path. These new implementations preserve features like trailing slashes where possible, while still observing that a `..` segment cannot be added to traverse an empty or `.` segment in *walk_up* mode. They do not rely on `parents` nor `__eq__()`, nor do they spin up temporary path objects. Unfortunately calling `pathmod.relpath()` isn't an option, as it calls `abspath()` and in turn `os.getcwd()`, which is impure.
* GH-113528: Deoptimise `pathlib._abc.PurePathBase.parts` (#113883)Barney Gale2024-01-092-4/+13
| | | | | Implement `parts` using `_stack`, which itself calls `pathmod.split()` repeatedly. This avoids use of `_tail`, which will be moved to `PurePath` shortly.
* GH-113528: Deoptimise `pathlib._abc.PathBase.resolve()` (#113782)Barney Gale2024-01-091-25/+40
| | | | | | Replace use of `_from_parsed_parts()` with `with_segments()` in `resolve()`. No effect on `Path.resolve()`, which uses `os.path.realpath()`.
* GH-113528: Deoptimise `pathlib._abc.PathBase._make_child_relpath()` (#113532)Barney Gale2024-01-092-14/+17
| | | | Call straight through to `joinpath()` in `PathBase._make_child_relpath()`. Move optimised/caching code to `pathlib.Path._make_child_relpath()`
* GH-113528: Speed up pathlib ABC tests. (#113788)Barney Gale2024-01-081-4/+4
| | | | | - Add `__slots__` to dummy path classes. - Return namedtuple rather than `os.stat_result` from `DummyPath.stat()`. - Reduce maximum symlink count in `DummyPathWithSymlinks.resolve()`.
* GH-113528: Deoptimise `pathlib._abc.PurePathBase.relative_to()` (#113529)Barney Gale2024-01-062-2/+5
| | | | | Replace use of `_from_parsed_parts()` with `with_segments()` in `PurePathBase.relative_to()`, and move the assignment of `_drv`, `_root` and `_tail_cached` slots into `PurePath.relative_to()`.
* GH-113528: Deoptimise `pathlib._abc.PurePathBase.parent` (#113530)Barney Gale2024-01-062-42/+63
| | | | | Replace use of `_from_parsed_parts()` with `with_segments()`, and move assignments to `_drv`, `_root`, _tail_cached` and `_str` slots into `PurePath`.
* GH-113528: Deoptimise `pathlib._abc.PurePathBase.name` (#113531)Barney Gale2024-01-062-7/+25
| | | | | Replace usage of `_from_parsed_parts()` with `with_segments()` in `with_name()`, and take a similar approach in `name` for consistency's sake.
* GH-113568: Stop raising deprecation warnings from pathlib ABCs (#113757)Barney Gale2024-01-052-17/+31
|
* GH-113568: Stop raising auditing events from pathlib ABCs (#113571)Barney Gale2024-01-052-22/+50
| | | | | Raise auditing events in `pathlib.Path.glob()`, `rglob()` and `walk()`, but not in `pathlib._abc.PathBase` methods. Also move generation of a deprecation warning into `pathlib.Path` so it gets the right stack level.
* GH-113225: Speed up `pathlib.Path.glob()` (#113226)Barney Gale2024-01-041-1/+7
| | | | Use `os.DirEntry.path` as the string representation of child paths, unless the parent path is empty, in which case we use the entry `name`.
* GH-113225: Speed up `pathlib._abc.PathBase.glob()` (#113556)Barney Gale2023-12-282-5/+12
| | | | `PathBase._scandir()` is implemented using `iterdir()`, so we can use its results directly, rather than passing them through `_make_child_relpath()`.
* GH-110109: pathlib ABCs: drop use of `warnings._deprecated()` (#113419)Barney Gale2023-12-271-6/+4
| | | | | | The `pathlib._abc` module will be made available as a PyPI backport supporting Python 3.8+. The `warnings._deprecated()` function was only added last year, and it's private from an external package perspective, so here we switch to `warnings.warn()` instead.
* GH-110109: pathlib ABCs: drop use of `io.text_encoding()` (#113417)Barney Gale2023-12-272-3/+18
| | | | | | Do not use the locale-specific default encoding in `PathBase.read_text()` and `write_text()`. Locale settings shouldn't influence the operation of these base classes, which are intended mostly for implementing rich paths on *nonlocal* filesystems.
* GH-110109: pathlib ABCs: do not vary path syntax by host OS. (#113219)Barney Gale2023-12-222-2/+2
| | | | | | | | | | | | | | | | Change the value of `pathlib._abc.PurePathBase.pathmod` from `os.path` to `posixpath`. User subclasses of `PurePathBase` and `PathBase` previously used the host OS's path syntax, e.g. backslashes as separators on Windows. This is wrong in most use cases, and likely to catch developers out unless they test on both Windows and non-Windows machines. In this patch we change the default to POSIX syntax, regardless of OS. This is somewhat arguable (why not make all aspects of syntax abstract and individually configurable?) but an improvement all the same. This change has no effect on `PurePath`, `Path`, nor their subclasses. Only private APIs are affected.
* GH-110109: Fix misleading `pathlib._abc.PurePathBase` repr (#113376)Barney Gale2023-12-222-3/+3
| | | | | | | | | | `PurePathBase.__repr__()` produces a string like `MyPath('/foo')`. This repr is incorrect/misleading when a subclass's `__init__()` method is customized, which I expect to be the very common. This commit moves the `__repr__()` method to `PurePath`, leaving `PurePathBase` with the default `object` repr. No user-facing changes because the `pathlib._abc` module remains private.
* GH-112906: Fix performance regression in pathlib path initialisation (#112907)Barney Gale2023-12-101-1/+3
| | | | | | This was caused by 76929fdeeb, specifically its use of `super()` and its packing/unpacking `*args`. Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
* GH-110109: Move pathlib ABCs to new `pathlib._abc` module. (#112881)Barney Gale2023-12-092-0/+1657
Move `_PurePathBase` and `_PathBase` to a new `pathlib._abc` module, and drop the underscores from the class names. Tests are mostly left alone in this commit, but they'll be similarly split in a subsequent commit. The `pathlib._abc` module will be published as an independent PyPI package (similar to how `zipfile._path` is published as `zipp`), to be refined and stabilised prior to its possible addition to the standard library.