summaryrefslogtreecommitdiffstats
path: root/Parser/pegen.c
Commit message (Collapse)AuthorAgeFilesLines
* [3.10] gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96501)Gregory P. Smith2022-09-021-0/+23
| | | | | | | | | | | | | | | | | Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds. This PR comes fresh from a pile of work done in our private PSRT security response team repo. This backports https://github.com/python/cpython/pull/96499 aka 511ca9452033ef95bc7d7fc404b8161068226002 Signed-off-by: Christian Heimes [Red Hat] <christian@python.org> Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org> Reviews via the private PSRT repo via many others (see the NEWS entry in the PR). <!-- gh-issue-number: gh-95778 --> * Issue: gh-95778 <!-- /gh-issue-number --> I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#).
* [3.10] gh-95876: Fix format string in pegen error location code (GH-95877 ↵Christian Heimes2022-08-111-1/+1
| | | | | | | (GH-95901) (cherry picked from commit b4c857d0fd74abb1ede6fe083c4fa3ca728b2b83) Co-authored-by: Christian Heimes <christian@python.org>
* gh-95355: Check tokens[0] after allocating memory (GH-95356)Miss Islington (bot)2022-07-281-1/+1
| | | | | | | | GH-95355 Automerge-Triggered-By: GH:pablogsal (cherry picked from commit b946f529efb4a623ac4ad968d8091edb81ebdcdb) Co-authored-by: Honglin Zhu <zhuhonglin.zhl@alibaba-inc.com>
* [3.10] gh-94360: Fix a tokenizer crash when reading encoded files with ↵Pablo Galindo Salgado2022-07-051-4/+4
| | | | | | | | | | | syntax errors from stdin (GH-94386) (GH-94574) Signed-off-by: Pablo Galindo <pablogsal@gmail.com> Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com> Co-authored-by: Łukasz Langa <lukasz@langa.pl> (cherry picked from commit 36fcde61ba48c4e918830691ecf4092e4e3b9b99)
* [3.10] Backport bpo-47212 (GH-32302) to Python 3.10 (GH-32334)Matthieu Dartiailh2022-04-051-1/+1
| | | | | | | | | (cherry picked from commit aa0f056a00c4bcaef83d729e042359ddae903382) # Conflicts: # Grammar/python.gram # Parser/action_helpers.c Automerge-Triggered-By: GH:pablogsal
* [3.10] bpo-47117: Don't crash if we fail to decode characters when the ↵Pablo Galindo Salgado2022-03-261-2/+7
| | | | | | | | tokenizer buffers are uninitialized (GH-32129) (GH-32130) Automerge-Triggered-By: GH:pablogsal. (cherry picked from commit 26cca8067bf5306e372c0e90036d832c5021fd90) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] Allow the parser to avoid nested processing of invalid rules ↵Pablo Galindo Salgado2022-02-101-1/+0
| | | | | | | (GH-31252). (GH-31257) (cherry picked from commit 390459de6db1e68b79c0897cc88c0d562693ec5c) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] bpo-46521: Fix codeop to use a new partial-input mode of the parser ↵Pablo Galindo Salgado2022-02-081-0/+17
| | | | | | | (GH-31010). (GH-31213) (cherry picked from commit 69e10976b2e7682c6d57f4272932ebc19f8e8859) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] bpo-46240: Correct the error for unclosed parentheses when the ↵Pablo Galindo Salgado2022-01-231-1/+2
| | | | | | | tokenizer is not finished (GH-30378). (GH-30819) (cherry picked from commit 70f415fb8b632247e28d87998642317ca7a652ae) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] bpo-46339: Fix crash in the parser when computing error text for ↵Miss Islington (bot)2022-01-201-2/+10
| | | | | | | | | | | | | | multi-line f-strings (GH-30529) (GH-30542) * bpo-46339: Fix crash in the parser when computing error text for multi-line f-strings (GH-30529) Automerge-Triggered-By: GH:pablogsal (cherry picked from commit cedec19be81e6bd153678bfb28c8e217af8bda58) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com> * Fix interactive mode Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* bpo-46237: Fix the line number of tokenizer errors inside f-strings (GH-30463)Miss Islington (bot)2022-01-111-4/+4
| | | | | (cherry picked from commit 6fa8b2ceee38187b0ae96aee12fe4f0a5c8a2ce7) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* bpo-42918: Improve built-in function compile() in mode 'single' (GH-29934) ↵Miss Islington (bot)2021-12-271-19/+1
| | | | | | | | (GH-30040) Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> (cherry picked from commit 28179aac796ed1debdce336c4b8ca18e8475d40d) Co-authored-by: Weipeng Hong <hongweichen8888@sina.com>
* [3.10] bpo-46110: Add a recursion check to avoid stack overflow in the PEG ↵Pablo Galindo Salgado2021-12-201-0/+1
| | | | | | | | parser (GH-30177) (GH-30214) Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>. (cherry picked from commit e9898bf153d26059261ffef11f7643ae991e2a4c) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] bpo-45727: Only trigger the 'did you forgot a comma' error suggestion ↵Pablo Galindo Salgado2021-11-251-1/+3
| | | | | | | if inside parentheses. (GH-29767) Backport of GH-29757 Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
* [3.10] Ensure the str member of the tokenizer is always initialised ↵Pablo Galindo Salgado2021-11-211-1/+1
| | | | | | | (GH-29681). (GH-29683) (cherry picked from commit 4f006a789a35f5d1a7ef142bd1304ce167392457) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* bpo-45494: Fix error location in EOF tokenizer errors (GH-29108)Miss Islington (bot)2021-11-201-2/+7
| | | | | (cherry picked from commit 79ff0d1687e3f823fb121a19f0297ad052871b1b) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] bpo-45727: Make the syntax error for missing comma more consistent ↵Pablo Galindo Salgado2021-11-201-1/+3
| | | | | | | (GH-29427) (GH-29647) (cherry picked from commit 546cefcda75d7150b55c8bc1724bea35a1e12890) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] bpo-45848: Allow the parser to get error lines from encoded files ↵Łukasz Langa2021-11-201-7/+8
| | | | | | | (GH-29646) (GH-29661) (cherry picked from commit fdcc46d9554094994f78bedf6dc9220e5d5ee668) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* bpo-45820: Fix a segfault when the parser fails without reading any input ↵Miss Islington (bot)2021-11-171-0/+8
| | | | | | | (GH-29580) (cherry picked from commit df4ae55e66e34ea8de6a34f0b104871ddaf35d53) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] bpo-45822: Respect PEP 263's coding cookies in the parser even if ↵Pablo Galindo Salgado2021-11-171-1/+1
| | | | | | | flags are not provided (GH-29582) (GH-29586) (cherry picked from commit da20d7401de97b425897d3069f71f77b039eb16f) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* bpo-45738: Fix computation of error location for invalid continuation (GH-29550)Miss Islington (bot)2021-11-141-10/+5
| | | | | | characters in the parser (cherry picked from commit 25835c518aa7446f3680b62c1fb43827e0f190d9) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] bpo-45494: Fix parser crash when reporting errors involving invalid ↵Łukasz Langa2021-10-191-2/+10
| | | | | | | | | | | | | | | continuation characters (GH-28993) (GH-29070) There are two errors that this commit fixes: * The parser was not correctly computing the offset and the string source for E_LINECONT errors due to the incorrect usage of strtok(). * The parser was not correctly unwinding the call stack when a tokenizer exception happened in rules involving optionals ('?', [...]) as we always make them return valid results by using the comma operator. We need to check first if we don't have an error before continuing.. (cherry picked from commit a106343f632a99c8ebb0136fa140cf189b4a6a57) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] bpo-45408: Don't override previous tokenizer errors in the second ↵Pablo Galindo Salgado2021-10-071-1/+4
| | | | | | | parser pass (GH-28812). (GH-28813) (cherry picked from commit 0219017df7ec41839fd0d56a3076b5f09c58d313) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* bpo-43914: Correctly highlight SyntaxError exceptions for invalid generator ↵Miss Islington (bot)2021-09-271-2/+14
| | | | | | | expression in function calls (GH-28576) (cherry picked from commit e5f13ce5b48b551c09fdd0faeafa6ecf860de51c) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.10] bpo-34013: Generalize the invalid legacy statement error message ↵Pablo Galindo Salgado2021-07-271-0/+12
| | | | | | | (GH-27389). (GH-27391) (cherry picked from commit 6948964ecf94e858448dd28eea634317226d2913) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* bpo-44456: Improve the syntax error when mixing keyword and positional ↵Miss Islington (bot)2021-06-241-0/+7
| | | | | | | patterns (GH-26793) (cherry picked from commit 0acc258fe6f0ec200ca2f6f9294adbf52a244802) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-44409: Fix error location in tokenizer errors that happen during ↵Miss Islington (bot)2021-06-141-0/+1
| | | | | | | initialization (GH-26712) (cherry picked from commit 507ed6fa1d6661e0f8e6d3282764aa9625a99594) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* [3.10] Add more const modifiers. (GH-26691). (GH-26692)Serhiy Storchaka2021-06-121-7/+7
| | | | | (cherry picked from commit be8b631b7a587aa781245e14c8cca32970e1be5b) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* bpo-44368: Ensure we don't raise incorrect custom syntax errors with soft ↵Miss Islington (bot)2021-06-091-4/+11
| | | | | | | keywords (GH-26630) (cherry picked from commit 457ce60fc70f1c9290023f46fb82b6a490dff32e) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-44349: Fix edge case when displaying text from files with encoding in ↵Miss Islington (bot)2021-06-091-2/+5
| | | | | | | | | syntax errors (GH-26611) (GH-26616) (cherry picked from commit 9fd21f649d66dcb10108ee395fd68ed32c8239cd) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com> Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-44335: Ensure the tokenizer doesn't go into Python with the error set ↵Miss Islington (bot)2021-06-081-3/+17
| | | | | | | (GH-26608) (cherry picked from commit bafe0aade5741ab0d13143ee261711fdd65e8a1f) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-44335: Fix a regression when identifying invalid characters in syntax ↵Miss Islington (bot)2021-06-081-1/+3
| | | | | | | errors (GH-26589) (cherry picked from commit d334c73b56756e90c33ce06e3a6ec23271aa099d) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* [3.10] bpo-44273: Improve syntax error message for assigning to "..." ↵Pablo Galindo2021-06-031-1/+1
| | | | | | | | | | (GH-26477) (GH-26478) Use "ellipsis" instead of "Ellipsis" in syntax error messages to eliminate confusion with built-in variable Ellipsis. (cherry picked from commit 39dd141) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* bpo-44201: Avoid side effects of "invalid_*" rules in the REPL (GH-26298) ↵Miss Islington (bot)2021-05-221-0/+3
| | | | | | | | | | | | | | | | | | (GH-26313) When the parser does a second pass to check for errors, these rules can have some small side-effects as they may advance the parser more than the point reached in the first pass. This can cause the tokenizer to ask for extra tokens in interactive mode causing the tokenizer to show the prompt instead of failing instantly. To avoid this, add a new mode to the tokenizer that is activated in the second pass and deactivates asking for new tokens when the interactive line is finished. As the parsing should have reached the last line in the first pass, the second pass should not need to ask for more tokens. (cherry picked from commit bd7476dae337e905e7b1bbf33ddb96cc270fdc84) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-44180: Fix edge cases in invalid assigment rules in the parser (GH-26283)Miss Islington (bot)2021-05-211-1/+1
| | | | | | | | | | | | | | | | | | The invalid assignment rules are very delicate since the parser can easily raise an invalid assignment when a keyword argument is provided. As they are very deep into the grammar tree, is very difficult to specify in which contexts these rules can be used and in which don't. For that, we need to use a different version of the rule that doesn't do error checking in those situations where we don't want the rule to raise (keyword arguments and generator expressions). We also need to check if we are in left-recursive rule, as those can try to eagerly advance the parser even if the parse will fail at the end of the expression. Failing to do this allows the parser to start parsing a call as a tuple and incorrectly identify a keyword argument as an invalid assignment, before it realizes that it was not a tuple after all. (cherry picked from commit c878a9796841c1f4726e6dd5ac49a478af4c8504) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-44180: Report generic syntax errors in the furthest position reached in ↵Miss Islington (bot)2021-05-211-1/+6
| | | | | | | the first parser pass (GH-26253) (GH-26281) (cherry picked from commit b51081c1a8cf01b92ba0692173e1b9274a57f455) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-44143: Fix crash in the parser when raising tokenizer errors with an ↵Miss Islington (bot)2021-05-151-0/+1
| | | | | | | exception set (GH-26144) (GH-26148) (cherry picked from commit 80b089179fa798c8ceaab2ff699c82499b2fcacd) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-43822: Prioritize tokenizer errors over custom syntax errors when ↵Miss Islington (bot)2021-05-041-0/+3
| | | | | | | raising parser exceptions (GH-25866) (cherry picked from commit 9142088e7454a392b69a627863b235ecc32aea54) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-43892: Validate the first term of complex literal value patterns (GH-25735)Brandt Bucher2021-04-301-1/+11
|
* bpo-43892: Make match patterns explicit in the AST (GH-25585)Nick Coghlan2021-04-291-1/+56
| | | Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
* bpo-43914: Highlight invalid ranges in SyntaxErrors (#25525)Pablo Galindo2021-04-231-7/+31
| | | | | | | | | | | | | | | | | To improve the user experience understanding what part of the error messages associated with SyntaxErrors is wrong, we can highlight the whole error range and not only place the caret at the first character. In this way: >>> foo(x, z for z in range(10), t, w) File "<stdin>", line 1 foo(x, z for z in range(10), t, w) ^ SyntaxError: Generator expression must be parenthesized becomes >>> foo(x, z for z in range(10), t, w) File "<stdin>", line 1 foo(x, z for z in range(10), t, w) ^^^^^^^^^^^^^^^^^^^^ SyntaxError: Generator expression must be parenthesized
* bpo-43822: Improve syntax errors for missing commas (GH-25377)Pablo Galindo2021-04-151-0/+18
|
* bpo-43797: Improve syntax error for invalid comparisons (#25317)Pablo Galindo2021-04-121-4/+4
| | | | | | | | | | | | | * bpo-43797: Improve syntax error for invalid comparisons * Update Lib/test/test_fstring.py Co-authored-by: Guido van Rossum <gvanrossum@gmail.com> * Apply review comments * can't -> cannot Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
* bpo-43798: Add source location attributes to alias (GH-25324)Matthew Suozzo2021-04-101-3/+3
| | | | | | | * Add source location attributes to alias. * Move alias star construction to pegen helper. Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* Simplify _PyPegen_fill_token in pegen.c (GH-25295)Pablo Galindo2021-04-091-58/+64
|
* Sanitize macros and debug functions in pegen.c (GH-25291)Pablo Galindo2021-04-091-2/+7
|
* Break down some complex functions in pegen.c for readability (GH-25292)Pablo Galindo2021-04-081-79/+91
|
* Fix possible refleak involving _PyArena_AddPyObject (GH-25289)Erlend Egeberg Aasland2021-04-081-1/+4
|
* bpo-43244: Rename pycore_ast.h functions to _PyAST_xxx() (GH-25252)Victor Stinner2021-04-071-38/+45
| | | | | | Rename AST functions of pycore_ast.h to use the "_PyAST_" prefix. Remove macros creating aliases without prefix. For example, Module() becomes _PyAST_Module(). Update Grammar/python.gram to use _PyAST_xxx() functions.
* bpo-43244: Remove the pyarena.h header (GH-25007)Victor Stinner2021-03-241-15/+15
| | | | | | | | | | | | | | | | | | Remove the pyarena.h header file with functions: * PyArena_New() * PyArena_Free() * PyArena_Malloc() * PyArena_AddPyObject() These functions were undocumented, excluded from the limited C API, and were only used internally by the compiler. Add pycore_pyarena.h header. Rename functions: * PyArena_New() => _PyArena_New() * PyArena_Free() => _PyArena_Free() * PyArena_Malloc() => _PyArena_Malloc() * PyArena_AddPyObject() => _PyArena_AddPyObject()