cpython.git - https://github.com/python/cpython.git

diff options

author	Barney Gale <barney.gale@gmail.com>	2023-06-06 22:50:36 (GMT)
committer	GitHub <noreply@github.com>	2023-06-06 22:50:36 (GMT)
commit	24af45172f74e4f01eb21d3aee7beab62417b833 (patch)
tree	f4679083343245b4cdc0e35eb9ce2859ba24b66d /Python/Python-tokenize.c
parent	2587b9f64eefde803a5e0b050171ad5f6654f31b (diff)
download	cpython-24af45172f74e4f01eb21d3aee7beab62417b833.zip cpython-24af45172f74e4f01eb21d3aee7beab62417b833.tar.gz cpython-24af45172f74e4f01eb21d3aee7beab62417b833.tar.bz2

GH-102613: Fast recursive globbing in `pathlib.Path.glob()` (GH-104512)

This commit introduces a 'walk-and-match' strategy for handling glob patterns that include a non-terminal `**` wildcard, such as `**/*.py`. For this example, the previous implementation recursively walked directories using `os.scandir()` when it expanded the `**` component, and then **scanned those same directories again** when expanded the `*.py` component. This is wasteful. In the new implementation, any components following a `**` wildcard are used to build a `re.Pattern` object, which is used to filter the results of the recursive walk. A pattern like `**/*.py` uses half the number of `os.scandir()` calls; a pattern like `**/*/*.py` a third, etc. This new algorithm does not apply if either: 1. The *follow_symlinks* argument is set to `None` (its default), or 2. The pattern contains `..` components. In these cases we fall back to the old implementation. This commit also replaces selector classes with selector functions. These generators directly yield results rather calling through to their successors. A new internal `Path._glob()` method takes care to chain these generators together, which simplifies the lazy algorithm and slightly improves performance. It should also be easier to understand and maintain.

Diffstat (limited to 'Python/Python-tokenize.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: