|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| | tokenize module (#112949) | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | * The lexer, which include the actual lexeme producing logic, goes into
  the `lexer` directory.
* The wrappers, one wrapper per input mode (file, string, utf-8, and
  readline), go into the `tokenizer` directory and include logic for
  creating a lexer instance and managing the buffer for different modes.
---------
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> | 
| | |  | 
| | 
| 
| 
| | (#105565) | 
| | |  | 
| | 
| 
| 
| | newline (#105442) | 
| | 
| 
| 
| | errors (#105399) | 
| | 
| 
| 
| | tokens (#105364) | 
| | |  | 
| | 
| 
| 
| | iteratively (#105070) | 
| | 
| 
| | Co-authored-by: Pablo Galindo <pablogsal@gmail.com> | 
| | 
| 
| 
| 
| | tokenizer (#104980)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com> | 
| | 
| 
| 
| | correct (#104975) | 
| | 
| 
| 
| | in the tokenize module (#104846) | 
| | |  | 
| | |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.
As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com> | 
| | 
| 
| | Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules.  We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204). | 
| | 
| 
| 
| 
| 
| | Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com> | 
| | 
| 
| 
| 
| | Right now, the tokenizer only returns type and two pointers to the start and end of the token.
This PR modifies the tokenizer to return the type and set all of the necessary information,
so that the parser does not have to this. | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | (gh-95860)
We only statically initialize for core code and builtin modules.  Extension modules still create
the tuple at runtime.  We'll solve that part of interpreter isolation separately.
This change includes generated code. The non-generated changes are in:
* Tools/clinic/clinic.py
* Python/getargs.c
* Include/cpython/modsupport.h
* Makefile.pre.in (re-generate global strings after running clinic)
* very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c
All other changes are generated code (clinic, global strings). | 
| | 
| 
| 
| 
| | * Make PyType_GetModuleByDef public (remove underscore)
Co-authored-by: Victor Stinner <vstinner@python.org> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Rename PyTokenize functions to mark them as private:
* PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename()
* PyTokenizer_FromString() => _PyTokenizer_FromString()
* PyTokenizer_FromFile() => _PyTokenizer_FromFile()
* PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8()
* PyTokenizer_Free() => _PyTokenizer_Free()
* PyTokenizer_Get() => _PyTokenizer_Get()
Remove the unused PyTokenizer_FindEncoding() function.
import.c: remove unused #include "errcode.h". | 
| | |  | 
| | |  | 
|  |  |