summaryrefslogtreecommitdiffstats
path: root/Parser/tokenizer.c
Commit message (Collapse)AuthorAgeFilesLines
* gh-94360: Fix a tokenizer crash when reading encoded files with syntax ↵Pablo Galindo Salgado2022-07-051-1/+9
| | | | | | | | | | | errors from stdin (#94386) * gh-94360: Fix a tokenizer crash when reading encoded files with syntax errors from stdin Signed-off-by: Pablo Galindo <pablogsal@gmail.com> * nitty nit Co-authored-by: Łukasz Langa <lukasz@langa.pl>
* gh-93741: Add private C API _PyImport_GetModuleAttrString() (GH-93742)Serhiy Storchaka2022-06-141-5/+5
| | | | | | It combines PyImport_ImportModule() and PyObject_GetAttrString() and saves 4-6 lines of code on every use. Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
* GH-93207: Remove HAVE_STDARG_PROTOTYPES configure check for stdarg.h (#93215)Kumar Aditya2022-05-271-12/+0
|
* gh-93103: Parser uses PyConfig.parser_debug instead of Py_DebugFlag (#93106)Victor Stinner2022-05-241-1/+4
| | | | | | | * Replace deprecated Py_DebugFlag with PyConfig.parser_debug in the parser. * Add Parser.debug member. * Add tok_state.debug member. * Py_FrozenMain(): Replace Py_VerboseFlag with PyConfig.verbose.
* gh-92651: Remove the Include/token.h header file (#92652)Victor Stinner2022-05-111-3/+3
| | | | | | | | | | | | | | | Remove the token.h header file. There was never any public tokenizer C API. The token.h header file was only designed to be used by Python internals. Move Include/token.h to Include/internal/pycore_token.h. Including this header file now requires that the Py_BUILD_CORE macro is defined. It no longer checks for the Py_LIMITED_API macro. Rename functions: * PyToken_OneChar() => _PyToken_OneChar() * PyToken_TwoChars() => _PyToken_TwoChars() * PyToken_ThreeChars() => _PyToken_ThreeChars()
* gh-87999: Change warning type for numeric literal followed by keyword (GH-91980)Serhiy Storchaka2022-04-271-4/+6
| | | | | The warning emitted by the Python parser for a numeric literal immediately followed by keyword has been changed from deprecation warning to syntax warning.
* bpo-46315: Use fopencookie only on Emscripten 3.x and newer (GH-32266)Christian Heimes2022-04-021-1/+1
|
* bpo-47126: Update to canonical PEP URLs specified by PEP 676 (GH-32124)Hugo van Kemenade2022-03-301-1/+1
|
* bpo-46315: Use fopencookie() to avoid dup() in ↵Christian Heimes2022-03-221-6/+34
| | | | | _PyTokenizer_FindEncodingFilename (GH-32033) WASI does not have dup() and Emscripten's emulation is slow.
* bpo-46920: Remove code that has explainers why it was disabled (GH-31813)Oleg Iarygin2022-03-141-24/+0
|
* bpo-46820: Fix a SyntaxError in a numeric literal followed by "not in" ↵Serhiy Storchaka2022-02-221-0/+3
| | | | | | | (GH-31479) Fix parsing a numeric literal immediately (without spaces) followed by "not in" keywords, like in "1not in x". Now the parser only emits a warning, not a syntax error.
* bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized ↵Eric Snow2022-02-081-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | global objects. (gh-30928) We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules. The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings). https://bugs.python.org/issue46541#msg411799 explains the rationale for this change. The core of the change is in: * (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros * Include/internal/pycore_runtime_init.h - added the static initializers for the global strings * Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState * Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config. The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *. The following are not changed (yet): * stop using _Py_IDENTIFIER() in the stdlib modules * (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API * (maybe) intern the strings during runtime init https://bugs.python.org/issue46541
* bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010)Pablo Galindo Salgado2022-02-081-10/+16
|
* bpo-14916: use specified tokenizer fd for file input (GH-31006)Paul m. p. P2022-02-011-1/+1
| | | | | | | | | | @pablogsal, sorry i failed to rebase to main, so i recreated https://github.com/python/cpython/pull/22190#issuecomment-1024633392 > PyRun_InteractiveOne\*() functions allow to explicitily set fd instead of stdin. but stdin was hardcoded in readline call. > This patch does not fix target file for prompt unlike original bpo one : prompt fd is unrelated to tokenizer source which could be read only. It is more of a bugfix regarding the docs : actual documentation say "prompt the user" so one would expect prompt to go on stdout not a file for both PyRun_InteractiveOne\*() and PyRun_InteractiveLoop\*(). Automerge-Triggered-By: GH:pablogsal
* bpo-46091: Correctly calculate indentation levels for whitespace lines with ↵Pablo Galindo Salgado2022-01-251-13/+33
| | | | continuation characters (GH-30130)
* bpo-45855: Replaced deprecated `PyImport_ImportModuleNoBlock` with ↵Kumar Aditya2021-12-121-1/+1
| | | | PyImport_ImportModule (GH-30046)
* bpo-46054: Fix parsing error when parsing non-utf8 characters in source ↵Pablo Galindo Salgado2021-12-121-8/+5
| | | | files (GH-30068)
* Ensure the str member of the tokenizer is always initialised (GH-29681)Pablo Galindo Salgado2021-11-211-1/+1
|
* bpo-45811: Improve error message when source code contains invisible control ↵Pablo Galindo Salgado2021-11-201-0/+6
| | | | characters (GH-29654)
* bpo-45738: Fix computation of error location for invalid continuation (GH-29550)Pablo Galindo Salgado2021-11-141-1/+0
| | | characters in the parser
* bpo-45562: Ensure all tokenizer debug messages are printed to stderr (GH-29270)Pablo Galindo Salgado2021-10-281-1/+1
|
* bpo-45562: Print tokenizer debug messages to stderr (GH-29250)Pablo Galindo Salgado2021-10-271-4/+4
|
* bpo-45574: fix warning about `print_escape` being unused (GH-29172)Nikita Sobolev2021-10-221-0/+2
| | | | | | | | | | | It used to be like this: <img width="1232" alt="Снимок экрана 2021-10-22 в 23 07 40" src="https://user-images.githubusercontent.com/4660275/138516608-fef6ec01-a96a-40f4-81ef-52265b0f536b.png"> Quick `grep` tells that it is just used in one place under `Py_DEBUG`: https://github.com/python/cpython/blame/f6e8b80d20159596cf641305bad3a833bedd2f4f/Parser/tokenizer.c#L1047-L1051 <img width="752" alt="Снимок экрана 2021-10-22 в 23 08 09" src="https://user-images.githubusercontent.com/4660275/138516684-ea503136-1e92-48a5-95bb-419e190d5866.png"> I am not sure, but it also looks like a private thing, it should not affect other users. Automerge-Triggered-By: GH:pablogsal
* bpo-45562: Only show debug output from the parser in debug builds (GH-29140)Pablo Galindo Salgado2021-10-221-0/+2
|
* bpo-45434: Mark the PyTokenizer C API as private (GH-28924)Victor Stinner2021-10-131-27/+20
| | | | | | | | | | | | | | Rename PyTokenize functions to mark them as private: * PyTokenizer_FindEncodingFilename() => _PyTokenizer_FindEncodingFilename() * PyTokenizer_FromString() => _PyTokenizer_FromString() * PyTokenizer_FromFile() => _PyTokenizer_FromFile() * PyTokenizer_FromUTF8() => _PyTokenizer_FromUTF8() * PyTokenizer_Free() => _PyTokenizer_Free() * PyTokenizer_Get() => _PyTokenizer_Get() Remove the unused PyTokenizer_FindEncoding() function. import.c: remove unused #include "errcode.h".
* bpo-45439: Move _PyObject_CallNoArgs() to pycore_call.h (GH-28895)Victor Stinner2021-10-121-0/+1
| | | | | | | * Move _PyObject_CallNoArgs() to pycore_call.h (internal C API). * _ssl, _sqlite and _testcapi extensions now call the public PyObject_CallNoArgs() function, rather than _PyObject_CallNoArgs(). * _lsprof extension is now built with Py_BUILD_CORE_MODULE macro defined to get access to internal _PyObject_CallNoArgs().
* bpo-45439: Rename _PyObject_CallNoArg() to _PyObject_CallNoArgs() (GH-28891)Victor Stinner2021-10-111-1/+1
| | | | | Fix typo in the private _PyObject_CallNoArg() function name: rename it to _PyObject_CallNoArgs() to be consistent with the public function PyObject_CallNoArgs().
* Update URLs in comments and metadata to use HTTPS (GH-27458)Noah Kantrowitz2021-07-301-1/+1
|
* bpo-44317: Improve tokenizer errors with more informative locations (GH-26555)Pablo Galindo Salgado2021-07-101-18/+54
|
* Fix typos in multiple files (GH-26689)Binbin2021-06-131-3/+3
| | | Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
* bpo-44396: Update multi-line-start location when reallocating tokenizer ↵Pablo Galindo2021-06-121-0/+5
| | | | | buffers (GH-26676) Automerge-Triggered-By: GH:pablogsal
* bpo-43833: Emit warnings for numeric literals followed by keyword (GH-25466)Serhiy Storchaka2021-06-081-0/+128
| | | | | | | | Emit a deprecation warning if the numeric literal is immediately followed by one of keywords: and, else, for, if, in, is, or. Raise a syntax error with more informative message if it is immediately followed by other keyword or identifier. Automerge-Triggered-By: GH:pablogsal
* bpo-44201: Avoid side effects of "invalid_*" rules in the REPL (GH-26298)Pablo Galindo2021-05-221-0/+9
| | | | | | | | | | | | When the parser does a second pass to check for errors, these rules can have some small side-effects as they may advance the parser more than the point reached in the first pass. This can cause the tokenizer to ask for extra tokens in interactive mode causing the tokenizer to show the prompt instead of failing instantly. To avoid this, add a new mode to the tokenizer that is activated in the second pass and deactivates asking for new tokens when the interactive line is finished. As the parsing should have reached the last line in the first pass, the second pass should not need to ask for more tokens.
* Fix tokenizer error when raw decoding null bytes (GH-25080)Pablo Galindo2021-03-291-1/+4
|
* bpo-25643: Refactor the C tokenizer into smaller, logical units (GH-25050)Pablo Galindo2021-03-281-354/+332
|
* bpo-43410: Fix crash in the parser when producing syntax errors when reading ↵Pablo Galindo2021-03-141-26/+52
| | | | from stdin (GH-24763)
* bpo-40176: Improve error messages for unclosed string literals (GH-19346)Batuhan Taskaya2021-01-201-10/+16
| | | Automerge-Triggered-By: GH:isidentical
* bpo-42864: Fix compiler warning in the tokenizer with the new paren stack ↵Pablo Galindo2021-01-201-1/+1
| | | | for column numbers (GH-24266)
* bpo-42864: Improve error messages regarding unclosed parentheses (GH-24161)Pablo Galindo2021-01-191-1/+4
|
* bpo-42827: Fix crash on SyntaxError in multiline expressions (GH-24140)Lysandros Nikolaou2021-01-141-0/+21
| | | | | | | | | | | | | | | | | | | | | | | When trying to extract the error line for the error message there are two distinct cases: 1. The input comes from a file, which means that we can extract the error line by using `PyErr_ProgramTextObject` and which we already do. 2. The input does not come from a file, at which point we need to get the source code from the tokenizer: * If the tokenizer's current line number is the same with the line of the error, we get the line from `tok->buf` and we're ready. * Else, we can extract the error line from the source code in the following two ways: * If the input comes from a string we have all the input in `tok->str` and we can extract the error line from it. * If the input comes from stdin, i.e. the interactive prompt, we do not have access to the previous line. That's why a new field `tok->stdin_content` is added which holds the whole input for the current (multiline) statement or expression. We can then extract the error line from `tok->stdin_content` like we do in the string case above. Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586)Victor Stinner2020-12-011-30/+30
| | | | | | | | | | | No longer use deprecated aliases to functions: * Replace PyMem_MALLOC() with PyMem_Malloc() * Replace PyMem_REALLOC() with PyMem_Realloc() * Replace PyMem_FREE() with PyMem_Free() * Replace PyMem_Del() with PyMem_Free() * Replace PyMem_DEL() with PyMem_Free() Modify also the PyMem_DEL() macro to use directly PyMem_Free().
* bpo-36020: Remove snprintf macro in pyerrors.h (GH-20889)Victor Stinner2020-06-151-1/+1
| | | | | | | | | | On Windows, #include "pyerrors.h" no longer defines "snprintf" and "vsnprintf" macros. PyOS_snprintf() and PyOS_vsnprintf() should be used to get portable behavior. Replace snprintf() calls with PyOS_snprintf() and replace vsnprintf() calls with PyOS_vsnprintf().
* bpo-40847: Consider a line with only a LINECONT a blank line (GH-20769)Lysandros Nikolaou2020-06-101-1/+2
| | | | | | | | | | A line with only a line continuation character should be considered a blank line at tokenizer level so that only a single NEWLINE token gets emitted. The old parser was working around the issue, but the new parser threw a `SyntaxError` for valid input. For example, an empty line following a line continuation character was interpreted as a `SyntaxError`. Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* Fix peg_generator compiler warnings under MSVC (GH-20405)Ammar Askar2020-05-261-4/+0
|
* bpo-40593: Improve syntax errors for invalid characters in source code. ↵Serhiy Storchaka2020-05-121-9/+37
| | | | (GH-20033)
* bpo-40246: Revert reporting of invalid string prefixes (GH-19888)Lysandros Nikolaou2020-05-041-4/+0
| | | | Due to backwards compatibility concerns regarding keywords immediately followed by a string without whitespace between them (like in `bg="#d00" if clear else"#fca"`) will fail to parse, commit 41d5b94af44e34ac05d4cd57460ed104ccf96628 has to be reverted.
* bpo-40335: Correctly handle multi-line strings in tokenize error scenarios ↵Pablo Galindo2020-04-211-3/+4
| | | | | (GH-19619) Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
* bpo-40246: Report a better error message for invalid string prefixes (GH-19476)Lysandros Nikolaou2020-04-121-0/+4
|
* bpo-39882: Add _Py_FatalErrorFormat() function (GH-19157)Victor Stinner2020-03-251-1/+1
|
* bpo-39882: Py_FatalError() logs the function name (GH-18819)Victor Stinner2020-03-061-3/+5
| | | | | | | | | | | | The Py_FatalError() function is replaced with a macro which logs automatically the name of the current function, unless the Py_LIMITED_API macro is defined. Changes: * Add _Py_FatalErrorFunc() function. * Remove the function name from the message of Py_FatalError() calls which included the function name. * Update tests.