cpython.git - https://github.com/python/cpython.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	gh-122581: Avoid data races when collecting parser statistics (#122694)	Lysandros Nikolaou	2024-08-06	1	-0/+17
\|
*	gh-122270: Fix typos in the Py_DEBUG macro name (GH-122271)	Serhiy Storchaka	2024-07-25	1	-1/+1
\|
*	gh-119521: Rename IncompleteInputError to _IncompleteInputError and remove ↵	Pablo Galindo Salgado	2024-06-24	1	-0/+1
\| \| \| \| \| \| \|	from public API/ABI (GH-119680) Signed-off-by: Pablo Galindo <pablogsal@gmail.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
*	gh-113993: Allow interned strings to be mortal, and fix related issues ↵	Petr Viktorin	2024-06-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(GH-120520) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to explicitly request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
*	gh-119118: Fix performance regression in tokenize module (#119615)	Lysandros Nikolaou	2024-05-28	1	-0/+25
\| \| \| \| \| \| \| \| \| \|	* gh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
*	gh-113744: Add a new IncompleteInputError exception to improve incomplete ↵	Pablo Galindo Salgado	2024-01-30	1	-1/+1
\| \| \| \| \|	input detection in the codeop module (#113745) Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
*	gh-112943: Correctly compute end offsets for multiline tokens in the ↵	Pablo Galindo Salgado	2023-12-11	1	-5/+11
\| \| \| \|	tokenize module (#112949)
*	gh-110805: Allow the repl to show source code and complete tracebacks (#110775)	Pablo Galindo Salgado	2023-10-13	1	-1/+11
\|
*	gh-104169: Refactor tokenizer into lexer and wrappers (#110684)	Lysandros Nikolaou	2023-10-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	* The lexer, which include the actual lexeme producing logic, goes into the `lexer` directory. * The wrappers, one wrapper per input mode (file, string, utf-8, and readline), go into the `tokenizer` directory and include logic for creating a lexer instance and managing the buffer for different modes. --------- Co-authored-by: Pablo Galindo <pablogsal@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
*	gh-107015: Remove async_hacks from the tokenizer (#107018)	Pablo Galindo Salgado	2023-07-26	1	-4/+0
\|
*	gh-106023: Update code using _PyObject_FastCall() (#106257)	Victor Stinner	2023-06-30	1	-2/+2
\| \| \|	Replace _PyObject_FastCall() calls with PyObject_Vectorcall().
*	gh-105017: Include CRLF lines in strings and column numbers (#105030)	Marta Gómez Macías	2023-05-28	1	-2/+2
\| \| \|	Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
*	gh-102856: Python tokenizer implementation for PEP 701 (#104323)	Marta Gómez Macías	2023-05-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	This commit replaces the Python implementation of the tokenize module with an implementation that reuses the real C tokenizer via a private extension module. The tokenize module now implements a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward compatibility. As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
*	gh-103656: Transfer f-string buffers to parser to avoid use-after-free ↵	Lysandros Nikolaou	2023-04-27	1	-3/+17
\| \| \| \| \|	(GH-103896) Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
*	gh-102856: Initial implementation of PEP 701 (#102855)	Pablo Galindo Salgado	2023-04-19	1	-1/+1
\| \| \| \| \| \|	Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Batuhan Taskaya <isidentical@gmail.com> Co-authored-by: Marta Gómez Macías <mgmacias@google.com> Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
*	GH-102711: Fix warnings found by clang (#102712)	Chenxi Mao	2023-03-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are some warnings if build python via clang: Parser/pegen.c:812:31: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes] _PyPegen_clear_memo_statistics() ^ void Parser/pegen.c:820:29: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes] _PyPegen_get_memo_statistics() ^ void Fix it to make clang happy. Signed-off-by: Chenxi Mao <chenxi.mao@suse.com>
*	GH-101578: Normalize the current exception (GH-101607)	Mark Shannon	2023-02-08	1	-9/+6
\| \| \| \| \| \| \| \| \| \|	* Make sure that the current exception is always normalized. * Remove redundant type and traceback fields for the current exception. * Add new API functions: PyErr_GetRaisedException, PyErr_SetRaisedException * Add new API functions: PyException_GetArgs, PyException_SetArgs
*	gh-81057: Move More Globals to _PyRuntimeState (gh-100092)	Eric Snow	2022-12-07	1	-2/+2
\| \| \|	https://github.com/python/cpython/issues/81057
*	gh-99300: Use Py_NewRef() in Parser/ directory (#99330)	Victor Stinner	2022-11-10	1	-4/+2
\| \| \| \|	Replace Py_INCREF() with Py_NewRef() in C files of the Parser/ directory and in the PEG generator.
*	gh-97973: Return all necessary information from the tokenizer (GH-97984)	Lysandros Nikolaou	2022-10-06	1	-30/+24
\| \| \| \| \|	Right now, the tokenizer only returns type and two pointers to the start and end of the token. This PR modifies the tokenizer to return the type and set all of the necessary information, so that the parser does not have to this.
*	gh-95778: CVE-2020-10735: Prevent DoS by very large int() (#96499)	Gregory P. Smith	2022-09-02	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds. This PR comes fresh from a pile of work done in our private PSRT security response team repo. Signed-off-by: Christian Heimes [Red Hat] <christian@python.org> Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org> Reviews via the private PSRT repo via many others (see the NEWS entry in the PR). <!-- gh-issue-number: gh-95778 --> * Issue: gh-95778 <!-- /gh-issue-number --> I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
*	gh-95355: Check tokens[0] after allocating memory (GH-95356)	Honglin Zhu	2022-07-28	1	-1/+1
\| \| \| \| \|	#95355 Automerge-Triggered-By: GH:pablogsal
*	gh-93741: Add private C API _PyImport_GetModuleAttrString() (GH-93742)	Serhiy Storchaka	2022-06-14	1	-7/+1
\| \| \| \| \| \|	It combines PyImport_ImportModule() and PyObject_GetAttrString() and saves 4-6 lines of code on every use. Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
*	gh-93103: Parser uses PyConfig.parser_debug instead of Py_DebugFlag (#93106)	Victor Stinner	2022-05-24	1	-0/+3
\| \| \| \| \| \| \|	* Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the parser. * Add Parser.debug member. * Add tok_state.debug member. * Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.
*	bpo-46920: Remove disabled debug code added decades ago and likely ↵	Oleg Iarygin	2022-03-14	1	-11/+0
\| \| \| \|	unnecessary (GH-31812)
*	Don't print rejected tokens when using the debug flags in the parser (GH-31258)	Pablo Galindo Salgado	2022-02-10	1	-1/+0
\|
*	Allow the parser to avoid nested processing of invalid rules (GH-31252)	Pablo Galindo Salgado	2022-02-10	1	-1/+1
\|
*	bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010)	Pablo Galindo Salgado	2022-02-08	1	-1/+14
\|
*	bpo-46237: Fix the line number of tokenizer errors inside f-strings (GH-30463)	Pablo Galindo Salgado	2022-01-08	1	-4/+4
\|
*	bpo-46110: Restore commit e9898bf153d26059261ffef11f7643ae991e2a4c	Pablo Galindo Salgado	2022-01-03	1	-0/+1
\| \| \|	This restores commit e9898bf153d26059261ffef11f7643ae991e2a4c .
*	Revert "bpo-46110: Add a recursion check to avoid stack overflow in the PEG ↵	Pablo Galindo Salgado	2022-01-03	1	-1/+0
\| \| \| \| \|	parser (GH-30177)" (GH-30363) This reverts commit e9898bf153d26059261ffef11f7643ae991e2a4c temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.
*	bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser ↵	Pablo Galindo Salgado	2021-12-20	1	-0/+1
\| \| \| \| \|	(GH-30177) Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
*	bpo-45855: Replaced deprecated `PyImport_ImportModuleNoBlock` with ↵	Kumar Aditya	2021-12-12	1	-1/+1
\| \| \| \|	PyImport_ImportModule (GH-30046)
*	bpo-42918: Improve build-in function compile() in mode 'single' (GH-29934)	Weipeng Hong	2021-12-10	1	-19/+1
\| \| \|	Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
*	bpo-45727: Only trigger the 'did you forgot a comma' error suggestion if ↵	Pablo Galindo Salgado	2021-11-24	1	-1/+3
\| \| \| \|	inside parentheses (GH-29757)
*	Refactor parser compilation units into specific components (GH-29676)	Pablo Galindo Salgado	2021-11-21	1	-1814/+122
\|
*	bpo-45494: Fix error location in EOF tokenizer errors (GH-29108)	Pablo Galindo Salgado	2021-11-20	1	-2/+7
\|
*	bpo-45848: Allow the parser to get error lines from encoded files (GH-29646)	Pablo Galindo Salgado	2021-11-20	1	-7/+8
\|
*	bpo-45727: Make the syntax error for missing comma more consistent (GH-29427)	Pablo Galindo Salgado	2021-11-19	1	-1/+3
\|
*	bpo-45822: Respect PEP 263's coding cookies in the parser even if flags are ↵	Pablo Galindo Salgado	2021-11-16	1	-1/+1
\| \| \| \|	not provided (GH-29582)
*	bpo-45820: Fix a segfault when the parser fails without reading any input ↵	Pablo Galindo Salgado	2021-11-16	1	-0/+8
\| \| \| \|	(GH-29580)
*	bpo-45738: Fix computation of error location for invalid continuation (GH-29550)	Pablo Galindo Salgado	2021-11-14	1	-10/+5
\| \| \|	characters in the parser
*	bpo-45494: Fix parser crash when reporting errors involving invalid ↵	Pablo Galindo Salgado	2021-10-19	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \|	continuation characters (GH-28993) There are two errors that this commit fixes: * The parser was not correctly computing the offset and the string source for E_LINECONT errors due to the incorrect usage of strtok(). * The parser was not correctly unwinding the call stack when a tokenizer exception happened in rules involving optionals ('?', [...]) as we always make them return valid results by using the comma operator. We need to check first if we don't have an error before continuing.
*	bpo-45434: Mark the PyTokenizer C API as private (GH-28924)	Victor Stinner	2021-10-13	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename PyTokenize functions to mark them as private: * PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename() * PyTokenizer_FromString() => _PyTokenizer_FromString() * PyTokenizer_FromFile() => _PyTokenizer_FromFile() * PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8() * PyTokenizer_Free() => _PyTokenizer_Free() * PyTokenizer_Get() => _PyTokenizer_Get() Remove the unused PyTokenizer_FindEncoding() function. import.c: remove unused #include "errcode.h".
*	bpo-45408: Don't override previous tokenizer errors in the second parser ↵	Pablo Galindo Salgado	2021-10-07	1	-1/+4
\| \| \| \|	pass (GH-28812)
*	bpo-43914: Correctly highlight SyntaxError exceptions for invalid generator ↵	Pablo Galindo Salgado	2021-09-27	1	-2/+14
\| \| \| \|	expression in function calls (GH-28576)
*	Update pegen to use the latest upstream developments (GH-27586)	Pablo Galindo Salgado	2021-08-12	1	-0/+13
\|
*	bpo-34013: Generalize the invalid legacy statement error message (GH-27389)	Pablo Galindo Salgado	2021-07-27	1	-0/+12
\|
*	bpo-43950: Distinguish errors happening on character offset decoding (GH-27217)	Batuhan Taskaya	2021-07-20	1	-5/+13
\|
*	bpo-43950: Print columns in tracebacks (PEP 657) (GH-26958)	Ammar Askar	2021-07-04	1	-23/+23
\| \| \| \| \| \| \| \|	The traceback.c and traceback.py mechanisms now utilize the newly added code.co_positions and PyCode_Addr2Location to print carets on the specific expressions involved in a traceback. Co-authored-by: Pablo Galindo <Pablogsal@gmail.com> Co-authored-by: Ammar Askar <ammar@ammaraskar.com> Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>