summaryrefslogtreecommitdiffstats
path: root/Parser/pegen.c
Commit message (Collapse)AuthorAgeFilesLines
* [3.11] gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96500)Gregory P. Smith2022-09-021-0/+23
| | | | | | | | | | | | | | | | | | Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds. This PR comes fresh from a pile of work done in our private PSRT security response team repo. This backports https://github.com/python/cpython/pull/96499 aka 511ca9452033ef95bc7d7fc404b8161068226002 Signed-off-by: Christian Heimes [Red Hat] <christian@python.org> Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org> Reviews via the private PSRT repo via many others (see the NEWS entry in the PR). <!-- gh-issue-number: gh-95778 --> * Issue: gh-95778 <!-- /gh-issue-number --> I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
* gh-95355: Check tokens[0] after allocating memory (GH-95356)Miss Islington (bot)2022-07-281-1/+1
| | | | | | | | GH-95355 Automerge-Triggered-By: GH:pablogsal (cherry picked from commit b946f529efb4a623ac4ad968d8091edb81ebdcdb) Co-authored-by: Honglin Zhu <zhuhonglin.zhl@alibaba-inc.com>
* bpo-46920: Remove disabled debug code added decades ago and likely ↵Oleg Iarygin2022-03-141-11/+0
| | | | unnecessary (GH-31812)
* Don't print rejected tokens when using the debug flags in the parser (GH-31258)Pablo Galindo Salgado2022-02-101-1/+0
|
* Allow the parser to avoid nested processing of invalid rules (GH-31252)Pablo Galindo Salgado2022-02-101-1/+1
|
* bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010)Pablo Galindo Salgado2022-02-081-1/+14
|
* bpo-46237: Fix the line number of tokenizer errors inside f-strings (GH-30463)Pablo Galindo Salgado2022-01-081-4/+4
|
* bpo-46110: Restore commit e9898bf153d26059261ffef11f7643ae991e2a4cPablo Galindo Salgado2022-01-031-0/+1
| | | This restores commit e9898bf153d26059261ffef11f7643ae991e2a4c .
* Revert "bpo-46110: Add a recursion check to avoid stack overflow in the PEG ↵Pablo Galindo Salgado2022-01-031-1/+0
| | | | | parser (GH-30177)" (GH-30363) This reverts commit e9898bf153d26059261ffef11f7643ae991e2a4c temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.
* bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser ↵Pablo Galindo Salgado2021-12-201-0/+1
| | | | | (GH-30177) Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
* bpo-45855: Replaced deprecated `PyImport_ImportModuleNoBlock` with ↵Kumar Aditya2021-12-121-1/+1
| | | | PyImport_ImportModule (GH-30046)
* bpo-42918: Improve build-in function compile() in mode 'single' (GH-29934)Weipeng Hong2021-12-101-19/+1
| | | Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
* bpo-45727: Only trigger the 'did you forgot a comma' error suggestion if ↵Pablo Galindo Salgado2021-11-241-1/+3
| | | | inside parentheses (GH-29757)
* Refactor parser compilation units into specific components (GH-29676)Pablo Galindo Salgado2021-11-211-1814/+122
|
* bpo-45494: Fix error location in EOF tokenizer errors (GH-29108)Pablo Galindo Salgado2021-11-201-2/+7
|
* bpo-45848: Allow the parser to get error lines from encoded files (GH-29646)Pablo Galindo Salgado2021-11-201-7/+8
|
* bpo-45727: Make the syntax error for missing comma more consistent (GH-29427)Pablo Galindo Salgado2021-11-191-1/+3
|
* bpo-45822: Respect PEP 263's coding cookies in the parser even if flags are ↵Pablo Galindo Salgado2021-11-161-1/+1
| | | | not provided (GH-29582)
* bpo-45820: Fix a segfault when the parser fails without reading any input ↵Pablo Galindo Salgado2021-11-161-0/+8
| | | | (GH-29580)
* bpo-45738: Fix computation of error location for invalid continuation (GH-29550)Pablo Galindo Salgado2021-11-141-10/+5
| | | characters in the parser
* bpo-45494: Fix parser crash when reporting errors involving invalid ↵Pablo Galindo Salgado2021-10-191-2/+10
| | | | | | | | | | | | continuation characters (GH-28993) There are two errors that this commit fixes: * The parser was not correctly computing the offset and the string source for E_LINECONT errors due to the incorrect usage of strtok(). * The parser was not correctly unwinding the call stack when a tokenizer exception happened in rules involving optionals ('?', [...]) as we always make them return valid results by using the comma operator. We need to check first if we don't have an error before continuing.
* bpo-45434: Mark the PyTokenizer C API as private (GH-28924)Victor Stinner2021-10-131-8/+8
| | | | | | | | | | | | | | Rename PyTokenize functions to mark them as private: * PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename() * PyTokenizer_FromString() => _PyTokenizer_FromString() * PyTokenizer_FromFile() => _PyTokenizer_FromFile() * PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8() * PyTokenizer_Free() => _PyTokenizer_Free() * PyTokenizer_Get() => _PyTokenizer_Get() Remove the unused PyTokenizer_FindEncoding() function. import.c: remove unused #include "errcode.h".
* bpo-45408: Don't override previous tokenizer errors in the second parser ↵Pablo Galindo Salgado2021-10-071-1/+4
| | | | pass (GH-28812)
* bpo-43914: Correctly highlight SyntaxError exceptions for invalid generator ↵Pablo Galindo Salgado2021-09-271-2/+14
| | | | expression in function calls (GH-28576)
* Update pegen to use the latest upstream developments (GH-27586)Pablo Galindo Salgado2021-08-121-0/+13
|
* bpo-34013: Generalize the invalid legacy statement error message (GH-27389)Pablo Galindo Salgado2021-07-271-0/+12
|
* bpo-43950: Distinguish errors happening on character offset decoding (GH-27217)Batuhan Taskaya2021-07-201-5/+13
|
* bpo-43950: Print columns in tracebacks (PEP 657) (GH-26958)Ammar Askar2021-07-041-23/+23
| | | | | | | | The traceback.c and traceback.py mechanisms now utilize the newly added code.co_positions and PyCode_Addr2Location to print carets on the specific expressions involved in a traceback. Co-authored-by: Pablo Galindo <Pablogsal@gmail.com> Co-authored-by: Ammar Askar <ammar@ammaraskar.com> Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>
* bpo-44456: Improve the syntax error when mixing keyword and positional ↵Pablo Galindo2021-06-241-0/+7
| | | | patterns (GH-26793)
* bpo-44409: Fix error location in tokenizer errors that happen during ↵Pablo Galindo2021-06-141-0/+1
| | | | initialization (GH-26712)
* Add more const modifiers. (GH-26691)Serhiy Storchaka2021-06-121-7/+7
|
* bpo-44368: Ensure we don't raise incorrect custom syntax errors with soft ↵Pablo Galindo2021-06-091-4/+11
| | | | keywords (GH-26630)
* bpo-44349: Fix edge case when displaying text from files with encoding in ↵Pablo Galindo2021-06-081-2/+5
| | | | syntax errors (GH-26611)
* bpo-44335: Ensure the tokenizer doesn't go into Python with the error set ↵Pablo Galindo2021-06-081-3/+17
| | | | (GH-26608)
* bpo-44335: Fix a regression when identifying invalid characters in syntax ↵Pablo Galindo2021-06-081-1/+3
| | | | errors (GH-26589)
* bpo-44273: Improve syntax error message for assigning to "..." (GH-26477)Serhiy Storchaka2021-06-011-1/+1
| | | Use "ellipsis" instead of "Ellipsis" in syntax error messages to eliminate confusion with built-in variable Ellipsis.
* bpo-44201: Avoid side effects of "invalid_*" rules in the REPL (GH-26298)Pablo Galindo2021-05-221-0/+3
| | | | | | | | | | | | When the parser does a second pass to check for errors, these rules can have some small side-effects as they may advance the parser more than the point reached in the first pass. This can cause the tokenizer to ask for extra tokens in interactive mode causing the tokenizer to show the prompt instead of failing instantly. To avoid this, add a new mode to the tokenizer that is activated in the second pass and deactivates asking for new tokens when the interactive line is finished. As the parsing should have reached the last line in the first pass, the second pass should not need to ask for more tokens.
* bpo-44180: Fix edge cases in invalid assigment rules in the parser (GH-26283)Pablo Galindo2021-05-211-1/+1
| | | | | | | | | | | | | | | The invalid assignment rules are very delicate since the parser can easily raise an invalid assignment when a keyword argument is provided. As they are very deep into the grammar tree, is very difficult to specify in which contexts these rules can be used and in which don't. For that, we need to use a different version of the rule that doesn't do error checking in those situations where we don't want the rule to raise (keyword arguments and generator expressions). We also need to check if we are in left-recursive rule, as those can try to eagerly advance the parser even if the parse will fail at the end of the expression. Failing to do this allows the parser to start parsing a call as a tuple and incorrectly identify a keyword argument as an invalid assignment, before it realizes that it was not a tuple after all.
* bpo-44180: Report generic syntax errors in the furthest position reached in ↵Pablo Galindo2021-05-211-1/+6
| | | | the first parser pass (GH-26253)
* bpo-44143: Fix crash in the parser when raising tokenizer errors with an ↵Pablo Galindo2021-05-151-0/+1
| | | | exception set (GH-26144)
* bpo-43822: Prioritize tokenizer errors over custom syntax errors when ↵Pablo Galindo2021-05-041-0/+3
| | | | raising parser exceptions (GH-25866)
* bpo-43892: Validate the first term of complex literal value patterns (GH-25735)Brandt Bucher2021-04-301-1/+11
|
* bpo-43892: Make match patterns explicit in the AST (GH-25585)Nick Coghlan2021-04-291-1/+56
| | | Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
* bpo-43914: Highlight invalid ranges in SyntaxErrors (#25525)Pablo Galindo2021-04-231-7/+31
| | | | | | | | | | | | | | | | | To improve the user experience understanding what part of the error messages associated with SyntaxErrors is wrong, we can highlight the whole error range and not only place the caret at the first character. In this way: >>> foo(x, z for z in range(10), t, w) File "<stdin>", line 1 foo(x, z for z in range(10), t, w) ^ SyntaxError: Generator expression must be parenthesized becomes >>> foo(x, z for z in range(10), t, w) File "<stdin>", line 1 foo(x, z for z in range(10), t, w) ^^^^^^^^^^^^^^^^^^^^ SyntaxError: Generator expression must be parenthesized
* bpo-43822: Improve syntax errors for missing commas (GH-25377)Pablo Galindo2021-04-151-0/+18
|
* bpo-43797: Improve syntax error for invalid comparisons (#25317)Pablo Galindo2021-04-121-4/+4
| | | | | | | | | | | | | * bpo-43797: Improve syntax error for invalid comparisons * Update Lib/test/test_fstring.py Co-authored-by: Guido van Rossum <gvanrossum@gmail.com> * Apply review comments * can't -> cannot Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
* bpo-43798: Add source location attributes to alias (GH-25324)Matthew Suozzo2021-04-101-3/+3
| | | | | | | * Add source location attributes to alias. * Move alias star construction to pegen helper. Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* Simplify _PyPegen_fill_token in pegen.c (GH-25295)Pablo Galindo2021-04-091-58/+64
|
* Sanitize macros and debug functions in pegen.c (GH-25291)Pablo Galindo2021-04-091-2/+7
|
* Break down some complex functions in pegen.c for readability (GH-25292)Pablo Galindo2021-04-081-79/+91
|