| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
| |
from public API/ABI (GH-119680)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(GH-120520)
* Add an InternalDocs file describing how interning should work and how to use it.
* Add internal functions to *explicitly* request what kind of interning is done:
- `_PyUnicode_InternMortal`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternStatic`
* Switch uses of `PyUnicode_InternInPlace` to those.
* Disallow using `_Py_SetImmortal` on strings directly.
You should use `_PyUnicode_InternImmortal` instead:
- Strings should be interned before immortalization, otherwise you're possibly
interning a immortalizing copy.
- `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
`SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
backports, as they are now part of public API and version-specific ABI.
* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.
* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
- `_Py_ID`
- `_Py_STR` (including the empty string)
- one-character latin-1 singletons
Now, when you intern a singleton, that exact singleton will be interned.
* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).
* Intern `_Py_STR` singletons at startup.
* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.
* Beef up the tests. Cover internal details (marked with `@cpython_only`).
* Add lots of assertions
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
* gh-119118: Fix performance regression in tokenize module
- Cache line object to avoid creating a Unicode object
for all of the tokens in the same line.
- Speed up byte offset to column offset conversion by using the
smallest buffer possible to measure the difference.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
| |
input detection in the codeop module (#113745)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
| |
tokenize module (#112949)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* The lexer, which include the actual lexeme producing logic, goes into
the `lexer` directory.
* The wrappers, one wrapper per input mode (file, string, utf-8, and
readline), go into the `tokenizer` directory and include logic for
creating a lexer instance and managing the buffer for different modes.
---------
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
|
| |
|
|
|
| |
Replace _PyObject_FastCall() calls with PyObject_Vectorcall().
|
|
|
| |
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.
As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
| |
(GH-103896)
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
|
| |
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some warnings if build python via clang:
Parser/pegen.c:812:31: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_clear_memo_statistics()
^
void
Parser/pegen.c:820:29: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
_PyPegen_get_memo_statistics()
^
void
Fix it to make clang happy.
Signed-off-by: Chenxi Mao <chenxi.mao@suse.com>
|
|
|
|
|
|
|
|
|
|
| |
* Make sure that the current exception is always normalized.
* Remove redundant type and traceback fields for the current exception.
* Add new API functions: PyErr_GetRaisedException, PyErr_SetRaisedException
* Add new API functions: PyException_GetArgs, PyException_SetArgs
|
|
|
| |
https://github.com/python/cpython/issues/81057
|
|
|
|
| |
Replace Py_INCREF() with Py_NewRef() in C files of the Parser/
directory and in the PEG generator.
|
|
|
|
|
| |
Right now, the tokenizer only returns type and two pointers to the start and end of the token.
This PR modifies the tokenizer to return the type and set all of the necessary information,
so that the parser does not have to this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Integer to and from text conversions via CPython's bignum `int` type is not safe against denial of service attacks due to malicious input. Very large input strings with hundred thousands of digits can consume several CPU seconds.
This PR comes fresh from a pile of work done in our private PSRT security response team repo.
Signed-off-by: Christian Heimes [Red Hat] <christian@python.org>
Tons-of-polishing-up-by: Gregory P. Smith [Google] <greg@krypto.org>
Reviews via the private PSRT repo via many others (see the NEWS entry in the PR).
<!-- gh-issue-number: gh-95778 -->
* Issue: gh-95778
<!-- /gh-issue-number -->
I wrote up [a one pager for the release managers](https://docs.google.com/document/d/1KjuF_aXlzPUxTK4BMgezGJ2Pn7uevfX7g0_mvgHlL7Y/edit#). Much of that text wound up in the Issue. Backports PRs already exist. See the issue for links.
|
|
|
|
|
| |
#95355
Automerge-Triggered-By: GH:pablogsal
|
|
|
|
|
|
| |
It combines PyImport_ImportModule() and PyObject_GetAttrString()
and saves 4-6 lines of code on every use.
Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
|
|
|
|
|
|
|
| |
* Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the
parser.
* Add Parser.debug member.
* Add tok_state.debug member.
* Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.
|
|
|
|
| |
unnecessary (GH-31812)
|
| |
|
| |
|
| |
|
| |
|
|
|
| |
This restores commit e9898bf153d26059261ffef11f7643ae991e2a4c .
|
|
|
|
|
| |
parser (GH-30177)" (GH-30363)
This reverts commit e9898bf153d26059261ffef11f7643ae991e2a4c temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.
|
|
|
|
|
| |
(GH-30177)
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
|
|
|
|
| |
PyImport_ImportModule (GH-30046)
|
|
|
| |
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
|
|
|
|
| |
inside parentheses (GH-29757)
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
not provided (GH-29582)
|
|
|
|
| |
(GH-29580)
|
|
|
| |
characters in the parser
|
|
|
|
|
|
|
|
|
|
|
|
| |
continuation characters (GH-28993)
There are two errors that this commit fixes:
* The parser was not correctly computing the offset and the string
source for E_LINECONT errors due to the incorrect usage of strtok().
* The parser was not correctly unwinding the call stack when a tokenizer
exception happened in rules involving optionals ('?', [...]) as we
always make them return valid results by using the comma operator. We
need to check first if we don't have an error before continuing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename PyTokenize functions to mark them as private:
* PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename()
* PyTokenizer_FromString() => _PyTokenizer_FromString()
* PyTokenizer_FromFile() => _PyTokenizer_FromFile()
* PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8()
* PyTokenizer_Free() => _PyTokenizer_Free()
* PyTokenizer_Get() => _PyTokenizer_Get()
Remove the unused PyTokenizer_FindEncoding() function.
import.c: remove unused #include "errcode.h".
|
|
|
|
| |
pass (GH-28812)
|
|
|
|
| |
expression in function calls (GH-28576)
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
The traceback.c and traceback.py mechanisms now utilize the newly added code.co_positions and PyCode_Addr2Location
to print carets on the specific expressions involved in a traceback.
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
Co-authored-by: Batuhan Taskaya <batuhanosmantaskaya@gmail.com>
|