summaryrefslogtreecommitdiffstats
path: root/Parser
Commit message (Collapse)AuthorAgeFilesLines
* [3.9] gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96502)Gregory P. Smith2022-09-051-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Correctly pre-check for int-to-str conversion (#96537) Converting a large enough `int` to a decimal string raises `ValueError` as expected. However, the raise comes _after_ the quadratic-time base-conversion algorithm has run to completion. For effective DOS prevention, we need some kind of check before entering the quadratic-time loop. Oops! =) The quick fix: essentially we catch _most_ values that exceed the threshold up front. Those that slip through will still be on the small side (read: sufficiently fast), and will get caught by the existing check so that the limit remains exact. The justification for the current check. The C code check is: ```c max_str_digits / (3 * PyLong_SHIFT) <= (size_a - 11) / 10 ``` In GitHub markdown math-speak, writing $M$ for `max_str_digits`, $L$ for `PyLong_SHIFT` and $s$ for `size_a`, that check is: $$\left\lfloor\frac{M}{3L}\right\rfloor \le \left\lfloor\frac{s - 11}{10}\right\rfloor$$ From this it follows that $$\frac{M}{3L} < \frac{s-1}{10}$$ hence that $$\frac{L(s-1)}{M} > \frac{10}{3} > \log_2(10).$$ So $$2^{L(s-1)} > 10^M.$$ But our input integer $a$ satisfies $|a| \ge 2^{L(s-1)}$, so $|a|$ is larger than $10^M$. This shows that we don't accidentally capture anything _below_ the intended limit in the check. <!-- gh-issue-number: gh-95778 --> * Issue: gh-95778 <!-- /gh-issue-number --> Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org> Co-authored-by: Christian Heimes <christian@python.org> Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
* bpo-46762: Fix an assert failure in f-strings where > or < is the last ↵Miss Islington (bot)2022-02-161-10/+10
| | | | | | | character if the f-string is missing a trailing right brace. (GH-31365) (cherry picked from commit ffd9f8ff84ed53c956b16d027f7d2926ea631051) Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
* bpo-46503: Prevent an assert from firing when parsing some invalid \N ↵Miss Islington (bot)2022-01-251-2/+14
| | | | | | | | | | | | sequences in f-strings. (GH-30865) (30867) * bpo-46503: Prevent an assert from firing. Also fix one nearby tiny PEP-7 nit. * Added blurb. (cherry picked from commit 0daf72194bd4e31de7f12020685bb39a14d6f45e) Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com> Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
* [3.9] bpo-46110: Add a recursion check to avoid stack overflow in the PEG ↵Pablo Galindo Salgado2021-12-201-2316/+3308
| | | | | | | | parser (GH-30177) (#30215) Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>. (cherry picked from commit e9898bf153d26059261ffef11f7643ae991e2a4c) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* bpo-45866: pegen strips directory of "generated from" header (GH-29777) ↵Victor Stinner2021-11-261-1/+1
| | | | | | | | | | (GH-29792) (GH-29797) "make regen-all" now produces the same output when run from a directory other than the source tree: when building Python out of the source tree. (cherry picked from commit 253b7a0a9fef1d72a4cb87b837885576e68e917c) (cherry picked from commit b6defde2afe656db830d6fedf74ca5f6225f5928)
* [3.9] bpo-45820: Fix a segfault when the parser fails without reading any ↵Miss Islington (bot)2021-11-181-0/+8
| | | | | | input (GH-29580) (GH-29584) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com> Co-authored-by: Łukasz Langa <lukasz@langa.pl>
* bpo-45822: Respect PEP 263's coding cookies in the parser even if flags are ↵Pablo Galindo Salgado2021-11-171-1/+1
| | | | | not provided (GH-29582) (GH-29585) (cherry picked from commit da20d7401de97b425897d3069f71f77b039eb16f)
* bpo-45738: Fix computation of error location for invalid continuation ↵Pablo Galindo Salgado2021-11-142-10/+3
| | | | | characters in the parser (GH-29550) (GH-29552) (cherry picked from commit 25835c518aa7446f3680b62c1fb43827e0f190d9)
* [3.9] bpo-45494: Fix parser crash when reporting errors involving invalid ↵Łukasz Langa2021-10-202-88/+96
| | | | | | | | | | | | | | | | | | | continuation characters (GH-28993) (#29071) There are two errors that this commit fixes: * The parser was not correctly computing the offset and the string source for E_LINECONT errors due to the incorrect usage of strtok(). * The parser was not correctly unwinding the call stack when a tokenizer exception happened in rules involving optionals ('?', [...]) as we always make them return valid results by using the comma operator. We need to check first if we don't have an error before continuing.. (cherry picked from commit a106343f632a99c8ebb0136fa140cf189b4a6a57) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com> NOTE: unlike the cherry-picked original, this commit points at a crazy location due to a bug in the tokenizer that required a big refactor in 3.10 to fix. We are leaving as-is for 3.9.
* [3.9] bpo-45461: Fix IncrementalDecoder and StreamReader in the ↵Serhiy Storchaka2021-10-141-1/+1
| | | | | | | | | | | | "unicode-escape" codec (GH-28939) (GH-28945) They support now splitting escape sequences between input chunks. Add the third parameter "final" in codecs.unicode_escape_decode(). It is True by default to match the former behavior. (cherry picked from commit c96d1546b11b4c282a7e21737cb1f5d16349656d) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* [3.9] bpo-44947: Refine the syntax error for trailing commas in import ↵Łukasz Langa2021-08-181-5/+8
| | | | | | | statements (GH-27814) (GH-27817) (cherry picked from commit b2f68b190035540872072ac1d2349e7745e85596) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.9] bpo-44885: Correct the ast locations of f-strings with format specs ↵Pablo Galindo Salgado2021-08-121-39/+31
| | | | | | | and repeated expressions (GH-27729) (GH-27744) (cherry picked from commit 8e832fb2a2cb54d7262148b6ec15563dffb48d63) Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
* [3.9] Update URLs in comments and metadata to use HTTPS (GH-27458) (GH-27480)Łukasz Langa2021-07-301-1/+1
| | | | | (cherry picked from commit be42c06bb01206209430f3ac08b72643dc7cad1c) Co-authored-by: Noah Kantrowitz <noah@coderanger.net>
* [3.9] bpo-44409: Fix error location in tokenizer errors that happen during ↵Pablo Galindo2021-06-141-0/+1
| | | | | | | initialization (GH-26712). (GH-26723) (cherry picked from commit 507ed6fa1d6661e0f8e6d3282764aa9625a99594) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* [3.9] bpo-44385: Remove unused grammar rules (GH-26655) (GH-26659)Lysandros Nikolaou2021-06-101-932/+467
| | | (cherry picked from commit e7b4644607789848f9752a3bd20ff216e25b4156)
* [3.9] bpo-11105: Do not crash when compiling recursive ASTs (GH-20594) ↵Batuhan Taskaya2021-06-031-4/+15
| | | | | | | | | | | | (GH-26522) When compiling an AST object with a direct / indirect reference cycles, on the conversion phase because of exceeding amount of calls, a segfault was raised. This patch adds recursion guards to places for preventing user inputs to not to crash AST but instead raise a RecursionError.. (cherry picked from commit f3491242e41933aa9529add7102edb68b80a25e9) Co-authored-by: Batuhan Taskaya <batuhan@python.org>
* [3.9] bpo-44168: Fix error message in the parser for keyword arguments for ↵Pablo Galindo2021-05-191-287/+330
| | | | | | | invalid expressions (GH-26210) (GH-26250) (cherry picked from commit 33c0c90dea06fda1df99482521559ebef7210bea) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* [3.9] bpo-43779: Fix possible refleak involving _PyArena_AddPyObject ↵Erlend Egeberg Aasland2021-04-091-1/+4
| | | | | | | | | | | | (GH-25289). (GH-25294) * [3.9] Fix possible refleak involving _PyArena_AddPyObject (GH-25289). (cherry picked from commit c0e11a3ceb9427e09db4224f394c7789bf6deec5) Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no> * Update Parser/pegen/pegen.c Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-43555: Report the column offset for invalid line continuation character ↵Miss Islington (bot)2021-03-222-6/+6
| | | | | | | | | (GH-24939) (#24975) (cherry picked from commit 96eeff516204b7cc751103fa33dcc665e387846e) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com> Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* [3.9] bpo-42806: Fix ast locations of f-strings inside parentheses ↵Pablo Galindo2021-01-031-1/+1
| | | | | | | (GH-24067) (GH-24069) (cherry picked from commit bd2728b1e8a99ba8f8c2d481f88aeb99b8b8360f) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* [3.9] bpo-40631: Disallow single parenthesized star target (GH-24027) (GH-24068)Lysandros Nikolaou2021-01-031-540/+787
| | | | | (cherry picked from commit 2ea320dddd553298038bb7d6789e50e199332f66) Automerge-Triggered-By: GH:pablogsal
* [3.9] bpo-42381: Allow walrus in set literals and set comprehensions ↵Pablo Galindo2020-11-181-1120/+955
| | | | | | | | | | | | | | | | (GH-23332) (GH-23333) Currently walruses are not allowerd in set literals and set comprehensions: >>> {y := 4, 4**2, 3**3} File "<stdin>", line 1 {y := 4, 4**2, 3**3} ^ SyntaxError: invalid syntax but they should be allowed as well per PEP 572. (cherry picked from commit b0aba1fcdc3da952698d99aec2334faa79a8b68c) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-40998: Address compiler warnings found by ubsan (GH-20929)Miss Islington (bot)2020-11-181-0/+3
| | | | | | | | Signed-off-by: Christian Heimes <christian@python.org> Automerge-Triggered-By: GH:tiran (cherry picked from commit 07f2adedf0940b06d136208ec386d69b7d2d5b43) Co-authored-by: Christian Heimes <christian@python.org>
* bpo-42374: Allow unparenthesized walrus in genexps (GH-23319) (GH-23329)Lysandros Nikolaou2020-11-161-6/+6
| | | | | This fixes a regression that was introduced by the new parser. (cherry picked from commit cb3e5ed0716114393696ec7201e51fe0595eab4f)
* [3.9] bpo-42218: Correctly handle errors in left-recursive rules (GH-23065) ↵Lysandros Nikolaou2020-10-311-0/+18
| | | | | | | | | | | | (GH-23066) Left-recursive rules need to check for errors explicitly, since even if the rule returns NULL, the parsing might continue and lead to long-distance failures. Co-authored-by: Pablo Galindo <Pablogsal@gmail.com> (cherry picked from commit 02cdfc93f82fecdb7eae97a868d4ee222b9875d9) Automerge-Triggered-By: GH:lysnikolaou
* [3.9] bpo-42214: Fix check for NOTEQUAL token in the PEG parser for the ↵Pablo Galindo2020-10-313-4/+3
| | | | | | | barry_as_flufl rule (GH-23048) (GH-23051) (cherry picked from commit 06f8c3328dcd81c84d1ee2b3a57b5381dcb38482) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* [3.9] bpo-42123: Run the parser two times and only enable invalid rules on ↵Lysandros Nikolaou2020-10-283-48/+62
| | | | | | | | | | | | the second run (GH-22111) (GH-23011) * Implement running the parser a second time for the errors messages The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced. (cherry picked from commit bca701403253379409dece03053dbd739c0bd059)
* [3.9] bpo-41659: Disallow curly brace directly after primary (GH-22996) (#23006)Lysandros Nikolaou2020-10-271-167/+234
| | | (cherry picked from commit 15acc4eaba8519d7d5f2acaffde65446b44dcf79)
* bpo-42150: Avoid buffer overflow in the new parser (GH-22978)Miss Skeleton (bot)2020-10-251-1/+2
| | | | | (cherry picked from commit e68c67805e6a4c4ec80bea64be0e8373cc02d322) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* [3.9] bpo-41979: Accept star-unpacking on with-item targets (GH-22611) ↵Batuhan Taskaya2020-10-091-6/+9
| | | | | | | (GH-22612) Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com> Automerge-Triggered-By: @pablogsal
* [3.9] bpo-41631: _ast module uses again a global state (GH-21961) (GH-22258)Pablo Galindo2020-09-151-41/+20
| | | | | | | | | | | | | | | | | | | | Partially revert commit ac46eb4ad6662cf6d771b20d8963658b2186c48c: "bpo-38113: Update the Python-ast.c generator to PEP384 (gh-15957)". Using a module state per module instance is causing subtle practical problems. For example, the Mercurial project replaces the __import__() function to implement lazy import, whereas Python expected that "import _ast" always return a fully initialized _ast module. Add _PyAST_Fini() to clear the state at exit. The _ast module has no state (set _astmodule.m_size to 0). Remove astmodule_traverse(), astmodule_clear() and astmodule_free() functions.. (cherry picked from commit e5fbe0cbd4be99ced5f000ad382208ad2a561c90) Co-authored-by: Victor Stinner <vstinner@python.org>
* [3.9] bpo-41697: Correctly handle KeywordOrStarred when parsing arguments in ↵Pablo Galindo2020-09-033-10/+22
| | | | | | | the parser (GH-22077) (GH-22079) (cherry picked from commit 315a61f7a9418d904e0eea14b1f054fac3a90e9f) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* [3.9] bpo-41690: Use a loop to collect args in the parser instead of ↵Pablo Galindo2020-09-023-503/+625
| | | | | | | | | | | | | | | | | | | | | | | recursion (GH-22053) (GH-22067) This program can segfault the parser by stack overflow: ``` import ast code = "f(" + ",".join(['a' for _ in range(100000)]) + ")" print("Ready!") ast.parse(code) ``` the reason is that the rule for arguments has a simple recursion when collecting args: args[expr_ty]: [...] | a=named_expression b=[',' c=args { c }] { [...] }. (cherry picked from commit 4a97b1517a6b5ff22e2984b677a680b07ff0ce11) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* [3.9] bpo-41194: Convert _ast extension to PEP 489 (GH-21807)Victor Stinner2020-08-101-47/+75
| | | | | | | | | | | | * bpo-41194: Convert _ast extension to PEP 489 (GH-21293) Convert the _ast extension module to PEP 489 "Multiphase initialization". Replace the global _ast state with a module state. (cherry picked from commit b1cc6ba73a51d5cc3aeb113b5e7378fb50a0e20a) * bpo-41204: Fix compiler warning in ast_type_init() (GH-21307) (cherry picked from commit 1f76453173267887ed05bb3783e862cb22365ae8)
* bpo-38156: Fix compiler warning in PyOS_StdioReadline() (GH-21721)Miss Islington (bot)2020-08-041-1/+1
| | | | | | incr cannot be larger than INT_MAX: downcast to int explicitly. (cherry picked from commit bde48fd8110cc5f128d5db44810d17811e328a24) Co-authored-by: Victor Stinner <vstinner@python.org>
* closes bpo-38156: Always handle interrupts in PyOS_StdioReadline. (GH-21569)Miss Islington (bot)2020-07-291-30/+14
| | | | | | | | This consolidates the handling of my_fgets return values, so that interrupts are always handled, even if they come after EOF. I believe PyOS_StdioReadline is still buggy in that I/O errors will not result in a proper Python exception being set. However, that is a separate issue. (cherry picked from commit a74eea238f5baba15797e2e8b570d153bc8690a7) Co-authored-by: Benjamin Peterson <benjamin@python.org>
* [3.9] Validate the AST produced by the parser in debug mode (GH-21643) ↵Pablo Galindo2020-07-271-0/+9
| | | | | | | | (GH-21646) This will improve the debug experience if something fails in the produced AST. Previously, errors in the produced AST can be felt much later like in the garbage collector or the compiler, making debugging them much more difficult.. (cherry picked from commit 1332226b32da44087a55e1d71990ee6899dfd28a) Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* Fix trivial typo in the PEG string parser (GH-21508)Miss Islington (bot)2020-07-161-1/+1
| | | | | (cherry picked from commit 0275e0452a773976827c2b9bd1e598ee08e2d7f5) Co-authored-by: Eric V. Smith <ericvsmith@users.noreply.github.com>
* Fix possibly-unitialized warning in string_parser.c. (GH-21503)Miss Islington (bot)2020-07-161-15/+16
| | | | | | | | | | | | | | | | | | | | | | | GCC says ``` ../cpython/Parser/string_parser.c: In function ‘fstring_find_expr’: ../cpython/Parser/string_parser.c:404:93: warning: ‘cols’ may be used uninitialized in this function [-Wmaybe-uninitialized] 404 | p2->starting_col_offset = p->tok->first_lineno == p->tok->lineno ? t->col_offset + cols : cols; | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ ../cpython/Parser/string_parser.c:384:16: note: ‘cols’ was declared here 384 | int lines, cols; | ^~~~ ../cpython/Parser/string_parser.c:403:45: warning: ‘lines’ may be used uninitialized in this function [-Wmaybe-uninitialized] 403 | p2->starting_lineno = t->lineno + lines - 1; | ~~~~~~~~~~~~~~~~~~^~~ ../cpython/Parser/string_parser.c:384:9: note: ‘lines’ was declared here 384 | int lines, cols; | ^~~~~ ``` and, indeed, if `PyBytes_AsString` somehow fails, lines & cols will not be initialized. (cherry picked from commit 2ad7e9c011b7606c5c7307176df07419a0e60134) Co-authored-by: Benjamin Peterson <benjamin@python.org>
* bpo-41215: Make assertion in the new parser more strict (GH-21364)Miss Islington (bot)2020-07-061-1/+1
| | | | | (cherry picked from commit 782f44b8fb07ec33cee148b2b6b4cf53024fe0cd) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
* [3.9] bpo-41215: Don't use NULL by default in the PEG parser keyword list ↵Pablo Galindo2020-07-062-9/+12
| | | | | | | | | (GH-21355) (GH-21356) (cherry picked from commit 39e76c0fb07e20acad454deb86a0457b279884a9) Co-authored-by: Pablo Galindo <pablogsal@gmail.com> Automerge-Triggered-By: @lysnikolaou
* [3.9] bpo-41194: The _ast module cannot be loaded more than once (GH-21290) ↵Victor Stinner2020-07-031-96/+111
| | | | | | | | | | | | | | | | | | | | | | | (GH-21292) * bpo-41194: Pass module state in Python-ast.c (GH-21284) Rework asdl_c.py to pass the module state to functions in Python-ast.c, instead of using astmodulestate_global. Handle also PyState_AddModule() failure in init_types(). (cherry picked from commit 74419f0c64959bb8392fcf3659058410423038e1) * bpo-41194: The _ast module cannot be loaded more than once (GH-21290) Fix a crash in the _ast module: it can no longer be loaded more than once. It now uses a global state rather than a module state. * Move _ast module state: use a global state instead. * Set _astmodule.m_size to -1, so the extension cannot be loaded more than once. (cherry picked from commit 91e1bc18bd467a13bceb62e16fbc435b33381c82)
* [3.9] bpo-35975: Only use cf_feature_version if PyCF_ONLY_AST in cf_flags ↵Guido van Rossum2020-06-281-2/+3
| | | | (#21022)
* [3.9] bpo-41076: Pre-feed the parser with the f-string expression location ↵Pablo Galindo2020-06-282-242/+25
| | | | | | (GH-21054) (GH-21190) This commit changes the parsing of f-string expressions with the new parser. The parser gets pre-fed with the location of the expression itself (not the f-string, which was what we were doing before). This allows us to completely skip the shifting of the AST nodes after the parsing is completed.. (cherry picked from commit 1f0f4abb110b9fbade6175842b6a26ab0b8df6dd)
* [3.9] bpo-40769: Allow extra surrounding parentheses for invalid annotated ↵Pablo Galindo2020-06-271-205/+261
| | | | | assignment rule (GH-20387) (GH-21186) (cherry picked from commit c8f29ad986f8274fc5fbf889bdd2a211878856b9)
* bpo-41084: Adjust message when an f-string expression causes a SyntaxError ↵Miss Islington (bot)2020-06-271-0/+21
| | | | | | | | (GH-21084) Prefix the error message with `fstring: `, when parsing an f-string expression throws a `SyntaxError`. (cherry picked from commit 2e0a920e9eb540654c0bb2298143b00637dc5961) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
* [3.9] bpo-41132: Use pymalloc allocator in the f-string parser (GH-21173) ↵Lysandros Nikolaou2020-06-271-7/+7
| | | | | | | (GH-21183) (cherry picked from commit 6dcbc2422de9e2a7ff89a4689572d84001e230b2) Automerge-Triggered-By: @pablogsal
* [3.9] bpo-41119: Output correct error message for list/tuple followed by ↵Lysandros Nikolaou2020-06-261-320/+284
| | | | | colon (GH-21160) (GH-21172) (cherry picked from commit 4b85e60601489f9ee9dd2909e28d89a31566887c)
* [3.9] bpo-41060: Avoid SEGFAULT when calling GET_INVALID_TARGET in the ↵Lysandros Nikolaou2020-06-212-7/+26
| | | | | | | | | | | | | grammar (GH-21020) (GH-21024) `GET_INVALID_TARGET` might unexpectedly return `NULL`, which if not caught will cause a SEGFAULT. Therefore, this commit introduces a new inline function `RAISE_SYNTAX_ERROR_INVALID_TARGET` that always checks for `GET_INVALID_TARGET` returning NULL and can be used in the grammar, replacing the long C ternary operation used till now. (cherry picked from commit 6c4e0bd974f2895d42b63d9d004587e74b286c88) Automerge-Triggered-By: @pablogsal
* bpo-40958: Avoid 'possible loss of data' warning on Windows (GH-20970)Miss Islington (bot)2020-06-202-2/+2
| | | | | (cherry picked from commit 861efc6e8fe7f030b1e193989b13287b31385939) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>