diff options
author | Ivan Levkivskyi <levkivskyi@gmail.com> | 2019-01-22 11:18:22 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2019-01-22 11:18:22 (GMT) |
commit | 9932a22897ef9905161dac7476e6976370e13515 (patch) | |
tree | 5cfbec44c7ecb01f4817274280881a74ec15c605 /Doc | |
parent | 7a2368063f25746d4008a74aca0dc0b82f86ff7b (diff) | |
download | cpython-9932a22897ef9905161dac7476e6976370e13515.zip cpython-9932a22897ef9905161dac7476e6976370e13515.tar.gz cpython-9932a22897ef9905161dac7476e6976370e13515.tar.bz2 |
bpo-33416: Add end positions to Python AST (GH-11605)
The majority of this PR is tediously passing `end_lineno` and `end_col_offset` everywhere. Here are non-trivial points:
* It is not possible to reconstruct end positions in AST "on the fly", some information is lost after an AST node is constructed, so we need two more attributes for every AST node `end_lineno` and `end_col_offset`.
* I add end position information to both CST and AST. Although it may be technically possible to avoid adding end positions to CST, the code becomes more cumbersome and less efficient.
* Since the end position is not known for non-leaf CST nodes while the next token is added, this requires a bit of extra care (see `_PyNode_FinalizeEndPos`). Unless I made some mistake, the algorithm should be linear.
* For statements, I "trim" the end position of suites to not include the terminal newlines and dedent (this seems to be what people would expect), for example in
```python
class C:
pass
pass
```
the end line and end column for the class definition is (2, 8).
* For `end_col_offset` I use the common Python convention for indexing, for example for `pass` the `end_col_offset` is 4 (not 3), so that `[0:4]` gives one the source code that corresponds to the node.
* I added a helper function `ast.get_source_segment()`, to get source text segment corresponding to a given AST node. It is also useful for testing.
An (inevitable) downside of this PR is that AST now takes almost 25% more memory. I think however it is probably justified by the benefits.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/library/ast.rst | 40 |
1 files changed, 31 insertions, 9 deletions
diff --git a/Doc/library/ast.rst b/Doc/library/ast.rst index 2883f3c..7715a28 100644 --- a/Doc/library/ast.rst +++ b/Doc/library/ast.rst @@ -61,13 +61,21 @@ Node classes .. attribute:: lineno col_offset + end_lineno + end_col_offset Instances of :class:`ast.expr` and :class:`ast.stmt` subclasses have - :attr:`lineno` and :attr:`col_offset` attributes. The :attr:`lineno` is - the line number of source text (1-indexed so the first line is line 1) and - the :attr:`col_offset` is the UTF-8 byte offset of the first token that - generated the node. The UTF-8 offset is recorded because the parser uses - UTF-8 internally. + :attr:`lineno`, :attr:`col_offset`, :attr:`lineno`, and :attr:`col_offset` + attributes. The :attr:`lineno` and :attr:`end_lineno` are the first and + last line numbers of source text span (1-indexed so the first line is line 1) + and the :attr:`col_offset` and :attr:`end_col_offset` are the corresponding + UTF-8 byte offsets of the first and last tokens that generated the node. + The UTF-8 offset is recorded because the parser uses UTF-8 internally. + + Note that the end positions are not required by the compiler and are + therefore optional. The end offset is *after* the last symbol, for example + one can get the source segment of a one-line expression node using + ``source_line[node.col_offset : node.end_col_offset]``. The constructor of a class :class:`ast.T` parses its arguments as follows: @@ -162,6 +170,18 @@ and classes for traversing abstract syntax trees: :class:`AsyncFunctionDef` is now supported. +.. function:: get_source_segment(source, node, *, padded=False) + + Get source code segment of the *source* that generated *node*. + If some location information (:attr:`lineno`, :attr:`end_lineno`, + :attr:`col_offset`, or :attr:`end_col_offset`) is missing, return ``None``. + + If *padded* is ``True``, the first line of a multi-line statement will + be padded with spaces to match its original position. + + .. versionadded:: 3.8 + + .. function:: fix_missing_locations(node) When you compile a node tree with :func:`compile`, the compiler expects @@ -173,14 +193,16 @@ and classes for traversing abstract syntax trees: .. function:: increment_lineno(node, n=1) - Increment the line number of each node in the tree starting at *node* by *n*. - This is useful to "move code" to a different location in a file. + Increment the line number and end line number of each node in the tree + starting at *node* by *n*. This is useful to "move code" to a different + location in a file. .. function:: copy_location(new_node, old_node) - Copy source location (:attr:`lineno` and :attr:`col_offset`) from *old_node* - to *new_node* if possible, and return *new_node*. + Copy source location (:attr:`lineno`, :attr:`col_offset`, :attr:`end_lineno`, + and :attr:`end_col_offset`) from *old_node* to *new_node* if possible, + and return *new_node*. .. function:: iter_fields(node) |