| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
input detection in the codeop module (#113745)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
| |
tokenize module (#112949)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* The lexer, which include the actual lexeme producing logic, goes into
the `lexer` directory.
* The wrappers, one wrapper per input mode (file, string, utf-8, and
readline), go into the `tokenizer` directory and include logic for
creating a lexer instance and managing the buffer for different modes.
---------
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
|
| |
|
|
|
| |
Replace _PyObject_FastCall() calls with PyObject_Vectorcall().
|
|
|
| |
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.
As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
| |
(GH-103896)
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
|
| |
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some warnings if build python via clang:
Parser/pegen.c:812:31: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_clear_memo_statistics()
^
void
Parser/pegen.c:820:29: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_get_memo_statistics()
^
void
Fix it to make clang happy.
Signed-off-by: Chenxi Mao <chenxi.mao@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
* Make sure that the current exception is always normalized.
* Remove redundant type and traceback fields for the current exception.
* Add new API functions: PyErr_GetRaisedException, PyErr_SetRaisedException
* Add new API functions: PyException_GetArgs, PyException_SetArgs
|
|
|
| |
https://github.com/python/cpython/issues/81057
|
|
|
|
| |
Replace Py_INCREF() with Py_NewRef() in C files of the Parser/
directory and in the PEG generator.
|
|
|
|
|
| |
Right now, the tokenizer only returns type and two pointers to the start and end of the token.
This PR modifies the tokenizer to return the type and set all of the necessary information,
so that the parser does not have to this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.
This PR comes fresh from a pile of work done in our private PSRT security response team repo.
Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
|
|
|
|
|
| |
#95355
Automerge-Triggered-By: GH:pablogsal
|
|
|
|
|
|
| |
It combines PyImport_ImportModule() and PyObject_GetAttrString()
and saves 4-6 lines of code on every use.
Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
|
|
|
|
|
|
|
| |
* Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the
parser.
* Add Parser.debug member.
* Add tok_state.debug member.
* Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.
|
|
|
|
| |
unnecessary (GH-31812)
|
| |
|
| |
|
| |
|
| |
|
|
|
| |
This restores commit e9898bf153d26059261ffef11f7643ae991e2a4c .
|
|
|
|
|
| |
parser (GH-30177)" (GH-30363)
This reverts commit e9898bf153d26059261ffef11f7643ae991e2a4c temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.
|
|
|
|
|
| |
(GH-30177)
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
|
|
|
|
| |
PyImport_ImportModule (GH-30046)
|
|
|
| |
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
|
|
|
|
| |
inside parentheses (GH-29757)
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
not provided (GH-29582)
|
|
|
|
| |
(GH-29580)
|
|
|
| |
characters in the parser
|
|
|
|
|
|
|
|
|
|
|
|
| |
continuation characters (GH-28993)
There are two errors that this commit fixes:
* The parser was not correctly computing the offset and the string
source for E_LINECONT errors due to the incorrect usage of strtok().
* The parser was not correctly unwinding the call stack when a tokenizer
exception happened in rules involving optionals ('?', [...]) as we
always make them return valid results by using the comma operator. We
need to check first if we don't have an error before continuing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename PyTokenize functions to mark them as private:
* PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename()
* PyTokenizer_FromString() => _PyTokenizer_FromString()
* PyTokenizer_FromFile() => _PyTokenizer_FromFile()
* PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8()
* PyTokenizer_Free() => _PyTokenizer_Free()
* PyTokenizer_Get() => _PyTokenizer_Get()
Remove the unused PyTokenizer_FindEncoding() function.
import.c: remove unused #include "errcode.h".
|
|
|
|
| |
pass (GH-28812)
|
|
|
|
| |
expression in function calls (GH-28576)
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
The traceback.c and traceback.py mechanisms now utilize the newly added code.co_positions and PyCode_Addr2Location
to print carets on the specific expressions involved in a traceback.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>
|
|
|
|
| |
patterns (GH-26793)
|
|
|
|
| |
initialization (GH-26712)
|
| |
|
|
|
|
| |
keywords (GH-26630)
|
|
|
|
| |
syntax errors (GH-26611)
|