summaryrefslogtreecommitdiffstats
path: root/Parser/pegen.c
Commit message (Collapse)AuthorAgeFilesLines
* gh-113744: Add a new IncompleteInputError exception to improve incomplete ↵Pablo Galindo Salgado2024-01-301-1/+1
| | | | | input detection in the codeop module (#113745) Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
* gh-112943: Correctly compute end offsets for multiline tokens in the ↵Pablo Galindo Salgado2023-12-111-5/+11
| | | | tokenize module (#112949)
* gh-110805: Allow the repl to show source code and complete tracebacks (#110775)Pablo Galindo Salgado2023-10-131-1/+11
|
* gh-104169: Refactor tokenizer into lexer and wrappers (#110684)Lysandros Nikolaou2023-10-111-1/+2
| | | | | | | | | | | * The lexer, which include the actual lexeme producing logic, goes into the `lexer` directory. * The wrappers, one wrapper per input mode (file, string, utf-8, and readline), go into the `tokenizer` directory and include logic for creating a lexer instance and managing the buffer for different modes. --------- Co-authored-by: Pablo Galindo <pablogsal@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
* gh-107015: Remove async_hacks from the tokenizer (#107018)Pablo Galindo Salgado2023-07-261-4/+0
|
* gh-106023: Update code using _PyObject_FastCall() (#106257)Victor Stinner2023-06-301-2/+2
| | | Replace _PyObject_FastCall() calls with PyObject_Vectorcall().
* gh-105017: Include CRLF lines in strings and column numbers (#105030)Marta Gómez Macías2023-05-281-2/+2
| | | Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
* gh-102856: Python tokenizer implementation for PEP 701 (#104323)Marta Gómez Macías2023-05-211-2/+2
| | | | | | | | | | | This commit replaces the Python implementation of the tokenize module with an implementation that reuses the real C tokenizer via a private extension module. The tokenize module now implements a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward compatibility. As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
* gh-103656: Transfer f-string buffers to parser to avoid use-after-free ↵Lysandros Nikolaou2023-04-271-3/+17
| | | | | (GH-103896) Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
* gh-102856: Initial implementation of PEP 701 (#102855)Pablo Galindo Salgado2023-04-191-1/+1
| | | | | | Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Batuhan Taskaya <isidentical@gmail.com> Co-authored-by: Marta Gómez Macías <mgmacias@google.com> Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
* GH-102711: Fix warnings found by clang (#102712)Chenxi Mao2023-03-281-2/+2
| | | | | | | | | | | | | | | | | There are some warnings if build python via clang: Parser/pegen.c:812:31: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes] _PyPegen_clear_memo_statistics() ^ void Parser/pegen.c:820:29: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes] _PyPegen_get_memo_statistics() ^ void Fix it to make clang happy. Signed-off-by: Chenxi Mao <chenxi.mao@suse.com>
* GH-101578: Normalize the current exception (GH-101607)Mark Shannon2023-02-081-9/+6
| | | | | | | | | | * Make sure that the current exception is always normalized. * Remove redundant type and traceback fields for the current exception. * Add new API functions: PyErr_GetRaisedException, PyErr_SetRaisedException * Add new API functions: PyException_GetArgs, PyException_SetArgs
* gh-81057: Move More Globals to _PyRuntimeState (gh-100092)Eric Snow2022-12-071-2/+2
| | | https://github.com/python/cpython/issues/81057
* gh-99300: Use Py_NewRef() in Parser/ directory (#99330)Victor Stinner2022-11-101-4/+2
| | | | Replace Py_INCREF() with Py_NewRef() in C files of the Parser/ directory and in the PEG generator.
* gh-97973: Return all necessary information from the tokenizer (GH-97984)Lysandros Nikolaou2022-10-061-30/+24
| | | | | Right now, the tokenizer only returns type and two pointers to the start and end of the token. This PR modifies the tokenizer to return the type and set all of the necessary information, so that the parser does not have to this.
* gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96499)Gregory P. Smith2022-09-021-0/+23
| | | | | | | | | | | | | | | | Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds. This PR comes fresh from a pile of work done in our private PSRT security response team repo. Signed-off-by: Christian Heimes [Red Hat] <christian@python.org> Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org> Reviews via the private PSRT repo via many others (see the NEWS entry in the PR). <!-- gh-issue-number: gh-95778 --> * Issue: gh-95778 <!-- /gh-issue-number --> I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
* gh-95355: Check tokens[0] after allocating memory (GH-95356)Honglin Zhu2022-07-281-1/+1
| | | | | #95355 Automerge-Triggered-By: GH:pablogsal
* gh-93741: Add private C API _PyImport_GetModuleAttrString() (GH-93742)Serhiy Storchaka2022-06-141-7/+1
| | | | | | It combines PyImport_ImportModule() and PyObject_GetAttrString() and saves 4-6 lines of code on every use. Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
* gh-93103: Parser uses PyConfig.parser_debug instead of Py_DebugFlag (#93106)Victor Stinner2022-05-241-0/+3
| | | | | | | * Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the parser. * Add Parser.debug member. * Add tok_state.debug member. * Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.
* bpo-46920: Remove disabled debug code added decades ago and likely ↵Oleg Iarygin2022-03-141-11/+0
| | | | unnecessary (GH-31812)
* Don't print rejected tokens when using the debug flags in the parser (GH-31258)Pablo Galindo Salgado2022-02-101-1/+0
|
* Allow the parser to avoid nested processing of invalid rules (GH-31252)Pablo Galindo Salgado2022-02-101-1/+1
|
* bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010)Pablo Galindo Salgado2022-02-081-1/+14
|
* bpo-46237: Fix the line number of tokenizer errors inside f-strings (GH-30463)Pablo Galindo Salgado2022-01-081-4/+4
|
* bpo-46110: Restore commit e9898bf153d26059261ffef11f7643ae991e2a4cPablo Galindo Salgado2022-01-031-0/+1
| | | This restores commit e9898bf153d26059261ffef11f7643ae991e2a4c .
* Revert "bpo-46110: Add a recursion check to avoid stack overflow in the PEG ↵Pablo Galindo Salgado2022-01-031-1/+0
| | | | | parser (GH-30177)" (GH-30363) This reverts commit e9898bf153d26059261ffef11f7643ae991e2a4c temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.
* bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser ↵Pablo Galindo Salgado2021-12-201-0/+1
| | | | | (GH-30177) Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
* bpo-45855: Replaced deprecated `PyImport_ImportModuleNoBlock` with ↵Kumar Aditya2021-12-121-1/+1
| | | | PyImport_ImportModule (GH-30046)
* bpo-42918: Improve build-in function compile() in mode 'single' (GH-29934)Weipeng Hong2021-12-101-19/+1
| | | Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
* bpo-45727: Only trigger the 'did you forgot a comma' error suggestion if ↵Pablo Galindo Salgado2021-11-241-1/+3
| | | | inside parentheses (GH-29757)
* Refactor parser compilation units into specific components (GH-29676)Pablo Galindo Salgado2021-11-211-1814/+122
|
* bpo-45494: Fix error location in EOF tokenizer errors (GH-29108)Pablo Galindo Salgado2021-11-201-2/+7
|
* bpo-45848: Allow the parser to get error lines from encoded files (GH-29646)Pablo Galindo Salgado2021-11-201-7/+8
|
* bpo-45727: Make the syntax error for missing comma more consistent (GH-29427)Pablo Galindo Salgado2021-11-191-1/+3
|
* bpo-45822: Respect PEP 263's coding cookies in the parser even if flags are ↵Pablo Galindo Salgado2021-11-161-1/+1
| | | | not provided (GH-29582)
* bpo-45820: Fix a segfault when the parser fails without reading any input ↵Pablo Galindo Salgado2021-11-161-0/+8
| | | | (GH-29580)
* bpo-45738: Fix computation of error location for invalid continuation (GH-29550)Pablo Galindo Salgado2021-11-141-10/+5
| | | characters in the parser
* bpo-45494: Fix parser crash when reporting errors involving invalid ↵Pablo Galindo Salgado2021-10-191-2/+10
| | | | | | | | | | | | continuation characters (GH-28993) There are two errors that this commit fixes: * The parser was not correctly computing the offset and the string source for E_LINECONT errors due to the incorrect usage of strtok(). * The parser was not correctly unwinding the call stack when a tokenizer exception happened in rules involving optionals ('?', [...]) as we always make them return valid results by using the comma operator. We need to check first if we don't have an error before continuing.
* bpo-45434: Mark the PyTokenizer C API as private (GH-28924)Victor Stinner2021-10-131-8/+8
| | | | | | | | | | | | | | Rename PyTokenize functions to mark them as private: * PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename() * PyTokenizer_FromString() => _PyTokenizer_FromString() * PyTokenizer_FromFile() => _PyTokenizer_FromFile() * PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8() * PyTokenizer_Free() => _PyTokenizer_Free() * PyTokenizer_Get() => _PyTokenizer_Get() Remove the unused PyTokenizer_FindEncoding() function. import.c: remove unused #include "errcode.h".
* bpo-45408: Don't override previous tokenizer errors in the second parser ↵Pablo Galindo Salgado2021-10-071-1/+4
| | | | pass (GH-28812)
* bpo-43914: Correctly highlight SyntaxError exceptions for invalid generator ↵Pablo Galindo Salgado2021-09-271-2/+14
| | | | expression in function calls (GH-28576)
* Update pegen to use the latest upstream developments (GH-27586)Pablo Galindo Salgado2021-08-121-0/+13
|
* bpo-34013: Generalize the invalid legacy statement error message (GH-27389)Pablo Galindo Salgado2021-07-271-0/+12
|
* bpo-43950: Distinguish errors happening on character offset decoding (GH-27217)Batuhan Taskaya2021-07-201-5/+13
|
* bpo-43950: Print columns in tracebacks (PEP 657) (GH-26958)Ammar Askar2021-07-041-23/+23
| | | | | | | | The traceback.c and traceback.py mechanisms now utilize the newly added code.co_positions and PyCode_Addr2Location to print carets on the specific expressions involved in a traceback. Co-authored-by: Pablo Galindo <Pablogsal@gmail.com> Co-authored-by: Ammar Askar <ammar@ammaraskar.com> Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>
* bpo-44456: Improve the syntax error when mixing keyword and positional ↵Pablo Galindo2021-06-241-0/+7
| | | | patterns (GH-26793)
* bpo-44409: Fix error location in tokenizer errors that happen during ↵Pablo Galindo2021-06-141-0/+1
| | | | initialization (GH-26712)
* Add more const modifiers. (GH-26691)Serhiy Storchaka2021-06-121-7/+7
|
* bpo-44368: Ensure we don't raise incorrect custom syntax errors with soft ↵Pablo Galindo2021-06-091-4/+11
| | | | keywords (GH-26630)
* bpo-44349: Fix edge case when displaying text from files with encoding in ↵Pablo Galindo2021-06-081-2/+5
| | | | syntax errors (GH-26611)