cpython.git - https://github.com/python/cpython.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	gh-120317: Lock around global state in the tokenize module (#120318)	Lysandros Nikolaou	2024-07-16	1	-43/+72
\| \| \|	Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
*	gh-120343: Fix column offsets of multiline tokens in tokenize (#120391)	Lysandros Nikolaou	2024-06-12	1	-4/+10
\|
*	gh-120343: Do not reset byte_col_offset_diff after multiline tokens (#120352)	Lysandros Nikolaou	2024-06-11	1	-1/+6
\| \| \|	Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
*	gh-119704: Fix reference leak in the ``Python/Python-tokenize.c`` (#119705)	Kirill Podoprigora	2024-05-29	1	-0/+1
\|
*	gh-119118: Fix performance regression in tokenize module (#119615)	Lysandros Nikolaou	2024-05-28	1	-4/+40
\| \| \| \| \| \| \| \| \| \|	* gh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
*	gh-116322: Add Py_mod_gil module slot (#116882)	Brett Simmers	2024-05-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR adds the ability to enable the GIL if it was disabled at interpreter startup, and modifies the multi-phase module initialization path to enable the GIL when loading a module, unless that module's spec includes a slot indicating it can run safely without the GIL. PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148. A warning will be issued up to once per interpreter for the first GIL-using module that is loaded. If `-v` is given, a shorter message will be printed to stderr every time a GIL-using module is loaded (including the first one that issues a warning).
*	gh-112943: Correctly compute end offsets for multiline tokens in the ↵	Pablo Galindo Salgado	2023-12-11	1	-1/+1
\| \| \| \|	tokenize module (#112949)
*	Remove unnecessary includes (GH-111633)	Serhiy Storchaka	2023-11-02	1	-1/+0
\|
*	gh-104169: Refactor tokenizer into lexer and wrappers (#110684)	Lysandros Nikolaou	2023-10-11	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	* The lexer, which include the actual lexeme producing logic, goes into the `lexer` directory. * The wrappers, one wrapper per input mode (file, string, utf-8, and readline), go into the `tokenizer` directory and include logic for creating a lexer instance and managing the buffer for different modes. --------- Co-authored-by: Pablo Galindo <pablogsal@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
*	gh-107015: Remove async_hacks from the tokenizer (#107018)	Pablo Galindo Salgado	2023-07-26	1	-3/+0
\|
*	gh-105564: Don't include artificial newlines in the line attribute of tokens ↵	Pablo Galindo Salgado	2023-06-09	1	-0/+3
\| \| \| \|	(#105565)
*	gh-105390: Add explicit type cast (#105466)	Kirill Podoprigora	2023-06-07	1	-1/+2
\|
*	gh-105435: Fix spurious NEWLINE token if file ends with comment without a ↵	Pablo Galindo Salgado	2023-06-07	1	-0/+11
\| \| \| \|	newline (#105442)
*	gh-105390: Correctly raise TokenError instead of SyntaxError for tokenize ↵	Pablo Galindo Salgado	2023-06-07	1	-7/+2
\| \| \| \|	errors (#105399)
*	gh-105259: Ensure we don't show newline characters for trailing NEWLINE ↵	Pablo Galindo Salgado	2023-06-06	1	-4/+6
\| \| \| \|	tokens (#105364)
*	gh-105042: Disable unmatched parens syntax error in python tokenize (#105061)	Lysandros Nikolaou	2023-05-30	1	-1/+1
\|
*	gh-105069: Add a readline-like callable to the tokenizer to consume input ↵	Pablo Galindo Salgado	2023-05-30	1	-5/+7
\| \| \| \|	iteratively (#105070)
*	gh-105017: Include CRLF lines in strings and column numbers (#105030)	Marta Gómez Macías	2023-05-28	1	-2/+7
\| \| \|	Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
*	gh-104976: Ensure trailing dedent tokens are emitted as the previous ↵	Pablo Galindo Salgado	2023-05-26	1	-3/+23
\| \| \| \| \|	tokenizer (#104980) Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
*	gh-104972: Ensure that line attributes in tokens in the tokenize module are ↵	Pablo Galindo Salgado	2023-05-26	1	-5/+4
\| \| \| \|	correct (#104975)
*	gh-104825: Remove implicit newline in the line attribute in tokens emitted ↵	Pablo Galindo Salgado	2023-05-24	1	-0/+4
\| \| \| \|	in the tokenize module (#104846)
*	gh-104741: Add line number attribute to indentation error exception (#104743)	Marta Gómez Macías	2023-05-22	1	-6/+9
\|
*	gh-102856: Tokenize performance improvement (#104731)	Marta Gómez Macías	2023-05-22	1	-1/+16
\|
*	gh-102856: Python tokenizer implementation for PEP 701 (#104323)	Marta Gómez Macías	2023-05-21	1	-14/+126
\| \| \| \| \| \| \| \| \| \| \|	This commit replaces the Python implementation of the tokenize module with an implementation that reuses the real C tokenizer via a private extension module. The tokenize module now implements a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward compatibility. As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
*	gh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (gh-104205)	Eric Snow	2023-05-05	1	-0/+1
\| \| \|	Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules. We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
*	gh-102856: Initial implementation of PEP 701 (#102855)	Pablo Galindo Salgado	2023-04-19	1	-2/+2
\| \| \| \| \| \|	Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Batuhan Taskaya <isidentical@gmail.com> Co-authored-by: Marta Gómez Macías <mgmacias@google.com> Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
*	gh-97973: Return all necessary information from the tokenizer (GH-97984)	Lysandros Nikolaou	2022-10-06	1	-9/+8
\| \| \| \| \|	Right now, the tokenizer only returns type and two pointers to the start and end of the token. This PR modifies the tokenizer to return the type and set all of the necessary information, so that the parser does not have to this.
*	gh-90928: Statically Initialize the Keywords Tuple in Clinic-Generated Code ↵	Eric Snow	2022-08-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(gh-95860) We only statically initialize for core code and builtin modules. Extension modules still create the tuple at runtime. We'll solve that part of interpreter isolation separately. This change includes generated code. The non-generated changes are in: * Tools/clinic/clinic.py * Python/getargs.c * Include/cpython/modsupport.h * Makefile.pre.in (re-generate global strings after running clinic) * very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c All other changes are generated code (clinic, global strings).
*	bpo-46613: Add PyType_GetModuleByDef to the public API (GH-31081)	Petr Viktorin	2022-02-11	1	-1/+1
\| \| \| \| \|	* Make PyType_GetModuleByDef public (remove underscore) Co-authored-by: Victor Stinner <vstinner@python.org>
*	bpo-45434: Mark the PyTokenizer C API as private (GH-28924)	Victor Stinner	2021-10-13	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename PyTokenize functions to mark them as private: * PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename() * PyTokenizer_FromString() => _PyTokenizer_FromString() * PyTokenizer_FromFile() => _PyTokenizer_FromFile() * PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8() * PyTokenizer_Free() => _PyTokenizer_Free() * PyTokenizer_Get() => _PyTokenizer_Get() Remove the unused PyTokenizer_FindEncoding() function. import.c: remove unused #include "errcode.h".
*	Remove trailing spaces. (GH-28706)	Serhiy Storchaka	2021-10-03	1	-1/+1
\|
*	Format the Python-tokenize module and fix exit path (GH-27935)	Pablo Galindo Salgado	2021-08-25	1	-47/+46
\|
*	Add tests for the C tokenizer and expose it as a private module (GH-27924)	Pablo Galindo Salgado	2021-08-24	1	-0/+195