| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
* The lexer, which include the actual lexeme producing logic, goes into
the `lexer` directory.
* The wrappers, one wrapper per input mode (file, string, utf-8, and
readline), go into the `tokenizer` directory and include logic for
creating a lexer instance and managing the buffer for different modes.
---------
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
|
|
|
| |
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
* Remove unused <locale.h> includes.
* Remove unused <fcntl.h> include in traceback.h.
* Remove redundant <assert.h> and <stddef.h> includes. They are already
included by "Python.h".
* Remove <object.h> include in faulthandler.c. Python.h already includes it.
* Add missing <stdbool.h> in pycore_pythread.h if HAVE_PTHREAD_STUBS
is defined.
* Fix also warnings in pthread_stubs.h: don't redefine macros if they
are already defined, like the __NEED_pthread_t macro.
|
|
|
|
|
| |
numerical literal (GH-109081)
It now points on the invalid non-ASCII character, not on the valid numerical literal.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace <ctype.h> locale dependent functions with Python "pyctype.h"
locale independent functions:
* Replace isalpha() with Py_ISALPHA().
* Replace isdigit() with Py_ISDIGIT().
* Replace isxdigit() with Py_ISXDIGIT().
* Replace tolower() with Py_TOLOWER().
Leave Modules/_sre/sre.c unchanged, it uses locale dependent
functions on purpose.
Include explicitly <ctype.h> in _decimal.c to get isascii().
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
literals (#105555)
|
|
|
|
| |
tokens (#105364)
|
| |
|
|
|
|
| |
iteratively (#105070)
|
|
|
| |
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
| |
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
|
| |
|
|
|
|
| |
(#104870)
|
| |
|
|
|
|
|
|
|
|
|
| |
* Support for conversion specifiers o (octal) and X (uppercase hexadecimal).
* Support for length modifiers j (intmax_t) and t (ptrdiff_t).
* Length modifiers are now applied to all integer conversions.
* Support for wchar_t C strings (%ls and %lV).
* Support for variable width and precision (*).
* Support for flag - (left alignment).
|
|
|
|
|
|
|
|
|
|
|
| |
This commit replaces the Python implementation of the tokenize module with an implementation
that reuses the real C tokenizer via a private extension module. The tokenize module now implements
a compatibility layer that transforms tokens from the C tokenizer into Python tokenize tokens for backward
compatibility.
As the C tokenizer does not emit some tokens that the Python tokenizer provides (such as comments and non-semantic newlines), a new special mode has been added to the C tokenizer mode that currently is only used via
the extension module that exposes it to the Python layer. This new mode forces the C tokenizer to emit these new extra tokens and add the appropriate metadata that is needed to match the old Python implementation.
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
| |
(#104660)
|
|
|
|
| |
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
|
| |
|
|
|
|
| |
(GH-104136)
|
|
|
|
|
|
| |
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
Co-authored-by: Ken Jin <kenjin@python.org>
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
| |
|
|
|
|
|
| |
(GH-103896)
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
|
|
|
|
|
|
| |
Turns out we always need to remember/restore fstring buffers in all of
the stack of tokenizer modes, cause they might change to
`TOK_REGULAR_MODE` and have newlines inside the braces (which is when we
need to reallocate the buffer and restore the fstring ones).
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
|
|
|
|
|
| |
(GH-99893)
Automerge-Triggered-By: GH:pablogsal
|
|
|
|
| |
fill the available buffer (#99605)
|
|
|
|
| |
Replace Py_INCREF() with Py_NewRef() in C files of the Parser/
directory and in the PEG generator.
|
| |
|
|
|
|
|
| |
Right now, the tokenizer only returns type and two pointers to the start and end of the token.
This PR modifies the tokenizer to return the type and set all of the necessary information,
so that the parser does not have to this.
|
| |
|
|
|
| |
Automerge-Triggered-By: GH:pablogsal
|
|
|
|
|
| |
This makes tokenizer.c:valid_utf8 match stringlib/codecs.h:decode_utf8.
It also fixes an off-by-one error introduced in 3.10 for the line number when the tokenizer reports bad UTF8.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
errors from stdin (#94386)
* gh-94360: Fix a tokenizer crash when reading encoded files with syntax errors from stdin
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
* nitty nit
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
|
|
|
|
|
|
| |
It combines PyImport_ImportModule() and PyObject_GetAttrString()
and saves 4-6 lines of code on every use.
Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
|
| |
|
|
|
|
|
|
|
| |
* Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the
parser.
* Add Parser.debug member.
* Add tok_state.debug member.
* Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the token.h header file. There was never any public tokenizer
C API. The token.h header file was only designed to be used by Python
internals.
Move Include/token.h to Include/internal/pycore_token.h. Including
this header file now requires that the Py_BUILD_CORE macro is
defined. It no longer checks for the Py_LIMITED_API macro.
Rename functions:
* PyToken_OneChar() => _PyToken_OneChar()
* PyToken_TwoChars() => _PyToken_TwoChars()
* PyToken_ThreeChars() => _PyToken_ThreeChars()
|
|
|
|
|
| |
The warning emitted by the Python parser for a numeric literal
immediately followed by keyword has been changed from deprecation
warning to syntax warning.
|