| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Co-authored-by: Filipe Laíns <lains@riseup.net>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* The lexer, which include the actual lexeme producing logic, goes into
the `lexer` directory.
* The wrappers, one wrapper per input mode (file, string, utf-8, and
readline), go into the `tokenizer` directory and include logic for
creating a lexer instance and managing the buffer for different modes.
---------
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
|
|
|
|
| |
incorrect soft keywords (#109606)
|
| |
|
|
|
|
| |
Remove private _PyErr C API functions: move them to the internal
C API (pycore_pyerrors.h).
|
|
|
|
|
|
|
|
|
|
|
| |
This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.
As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
| |
(GH-103896)
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
|
| |
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
|
|
|
|
|
| |
in the tokenizer (GH-100065)
Automerge-Triggered-By: GH:pablogsal
|
|
|
|
|
| |
Right now, the tokenizer only returns type and two pointers to the start and end of the token.
This PR modifies the tokenizer to return the type and set all of the necessary information,
so that the parser does not have to this.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* bpo-14916: interactive fd is not always stdin
related to https://github.com/python/cpython/pull/31006 merged bugfix
following https://bugs.python.org/issue14916
* 📜🤖 Added by blurb_it.
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
|
|
| |
errors from stdin (#94386)
* gh-94360: Fix a tokenizer crash when reading encoded files with syntax errors from stdin
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
* nitty nit
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
|
|
|
|
|
| |
buffers are uninitialized (GH-32129)
Automerge-Triggered-By: GH:pablogsal
|
| |
|
|
|
|
| |
'get_error_line_from_tokenizer_buffers' (#30545)
|
|
|
|
|
| |
f-strings (GH-30529)
Automerge-Triggered-By: GH:pablogsal
|
|
|
|
| |
not finished (GH-30378)
|
|
|
|
| |
inside parentheses (GH-29757)
|
| |
|
|
|