summaryrefslogtreecommitdiffstats
path: root/Python/codegen.c
diff options
context:
space:
mode:
authorBarney Gale <barney.gale@gmail.com>2025-02-28 20:33:51 (GMT)
committerGitHub <noreply@github.com>2025-02-28 20:33:51 (GMT)
commitda4899b94a9a9083fed4972b2473546e0d997727 (patch)
tree87fcbf476e02abbb98f46114aa3963373bfb7f18 /Python/codegen.c
parentb5454509612870dd0e09aaba4b79865a5faad284 (diff)
downloadcpython-da4899b94a9a9083fed4972b2473546e0d997727.zip
cpython-da4899b94a9a9083fed4972b2473546e0d997727.tar.gz
cpython-da4899b94a9a9083fed4972b2473546e0d997727.tar.bz2
GH-116380: Speed up `glob.[i]glob()` by making fewer system calls. (#116392)
## Filtered recursive walk Expanding a recursive `**` segment entails walking the entire directory tree, and so any subsequent pattern segments (except special segments) can be evaluated by filtering the expanded paths through a regex. For example, `glob.glob("foo/**/*.py", recursive=True)` recursively walks `foo/` with `os.scandir()`, and then filters paths through a regex based on "`**/*.py`, with no further filesystem access needed. This fixes an issue where `glob()` could return duplicate results. ## Tracking path existence We store a flag alongside each path indicating whether the path is guaranteed to exist. As we process the pattern: - Certain special pattern segments (`""`, `"."` and `".."`) leave the flag unchanged - Literal pattern segments (e.g. `foo/bar`) set the flag to false - Wildcard pattern segments (e.g. `*/*.py`) set the flag to true (because children are found via `os.scandir()`) - Recursive pattern segments (e.g. `**`) leave the flag unchanged for the root path, and set it to true for descendants discovered via `os.scandir()`. If the flag is false at the end, we call `lstat()` on each path to filter out missing paths. ## Minor speed-ups - Exclude paths that don't match a non-terminal non-recursive wildcard pattern _prior_ to calling `is_dir()`. - Use a stack rather than recursion to implement recursive wildcards. - This fixes a recursion error when globbing deep trees. - Pre-compile regular expressions and pre-join literal pattern segments. - Convert to/from `bytes` (a minor use-case) in `iglob()` rather than supporting `bytes` throughout. This particularly simplifies the code needed to handle relative bytes paths with `dir_fd`. - Avoid calling `os.path.join()`; instead we keep paths in a normalized form and append trailing slashes when needed. - Avoid calling `os.path.normcase()`; instead we use case-insensitive regex matching. ## Implementation notes Much of this functionality is already present in pathlib's implementation of globbing. The specific additions we make are: 1. Support for `dir_fd` 2. Support for `include_hidden` 3. Support for generating paths relative to `root_dir` This unifies the implementations of globbing in the `glob` and `pathlib` modules. Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Diffstat (limited to 'Python/codegen.c')
0 files changed, 0 insertions, 0 deletions