summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Doc/library/parser.rst335
1 files changed, 2 insertions, 333 deletions
diff --git a/Doc/library/parser.rst b/Doc/library/parser.rst
index efac1a5..ad336d3 100644
--- a/Doc/library/parser.rst
+++ b/Doc/library/parser.rst
@@ -317,22 +317,8 @@ ST objects have the following methods:
Same as ``st2tuple(st, line_info, col_info)``.
-.. _st-examples:
-
-Examples
---------
-
-.. index:: builtin: compile
-
-The parser modules allows operations to be performed on the parse tree of Python
-source code before the :term:`bytecode` is generated, and provides for inspection of the
-parse tree for information gathering purposes. Two examples are presented. The
-simple example demonstrates emulation of the :func:`compile` built-in function
-and the complex example shows the use of a parse tree for information discovery.
-
-
-Emulation of :func:`compile`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Example: Emulation of :func:`compile`
+-------------------------------------
While many useful operations may take place between parsing and bytecode
generation, the simplest operation is to do nothing. For this purpose, using
@@ -366,320 +352,3 @@ readily available functions::
def load_expression(source_string):
st = parser.expr(source_string)
return st, st.compile()
-
-
-Information Discovery
-^^^^^^^^^^^^^^^^^^^^^
-
-.. index::
- single: string; documentation
- single: docstrings
-
-Some applications benefit from direct access to the parse tree. The remainder
-of this section demonstrates how the parse tree provides access to module
-documentation defined in docstrings without requiring that the code being
-examined be loaded into a running interpreter via :keyword:`import`. This can
-be very useful for performing analyses of untrusted code.
-
-Generally, the example will demonstrate how the parse tree may be traversed to
-distill interesting information. Two functions and a set of classes are
-developed which provide programmatic access to high level function and class
-definitions provided by a module. The classes extract information from the
-parse tree and provide access to the information at a useful semantic level, one
-function provides a simple low-level pattern matching capability, and the other
-function defines a high-level interface to the classes by handling file
-operations on behalf of the caller. All source files mentioned here which are
-not part of the Python installation are located in the :file:`Demo/parser/`
-directory of the distribution.
-
-The dynamic nature of Python allows the programmer a great deal of flexibility,
-but most modules need only a limited measure of this when defining classes,
-functions, and methods. In this example, the only definitions that will be
-considered are those which are defined in the top level of their context, e.g.,
-a function defined by a :keyword:`def` statement at column zero of a module, but
-not a function defined within a branch of an :keyword:`if` ... :keyword:`else`
-construct, though there are some good reasons for doing so in some situations.
-Nesting of definitions will be handled by the code developed in the example.
-
-To construct the upper-level extraction methods, we need to know what the parse
-tree structure looks like and how much of it we actually need to be concerned
-about. Python uses a moderately deep parse tree so there are a large number of
-intermediate nodes. It is important to read and understand the formal grammar
-used by Python. This is specified in the file :file:`Grammar/Grammar` in the
-distribution. Consider the simplest case of interest when searching for
-docstrings: a module consisting of a docstring and nothing else. (See file
-:file:`docstring.py`.) ::
-
- """Some documentation.
- """
-
-Using the interpreter to take a look at the parse tree, we find a bewildering
-mass of numbers and parentheses, with the documentation buried deep in nested
-tuples. ::
-
- >>> import parser
- >>> import pprint
- >>> st = parser.suite(open('docstring.py').read())
- >>> tup = st.totuple()
- >>> pprint.pprint(tup)
- (257,
- (264,
- (265,
- (266,
- (267,
- (307,
- (287,
- (288,
- (289,
- (290,
- (292,
- (293,
- (294,
- (295,
- (296,
- (297,
- (298,
- (299,
- (300, (3, '"""Some documentation.\n"""'))))))))))))))))),
- (4, ''))),
- (4, ''),
- (0, ''))
-
-The numbers at the first element of each node in the tree are the node types;
-they map directly to terminal and non-terminal symbols in the grammar.
-Unfortunately, they are represented as integers in the internal representation,
-and the Python structures generated do not change that. However, the
-:mod:`symbol` and :mod:`token` modules provide symbolic names for the node types
-and dictionaries which map from the integers to the symbolic names for the node
-types.
-
-In the output presented above, the outermost tuple contains four elements: the
-integer ``257`` and three additional tuples. Node type ``257`` has the symbolic
-name :const:`file_input`. Each of these inner tuples contains an integer as the
-first element; these integers, ``264``, ``4``, and ``0``, represent the node
-types :const:`stmt`, :const:`NEWLINE`, and :const:`ENDMARKER`, respectively.
-Note that these values may change depending on the version of Python you are
-using; consult :file:`symbol.py` and :file:`token.py` for details of the
-mapping. It should be fairly clear that the outermost node is related primarily
-to the input source rather than the contents of the file, and may be disregarded
-for the moment. The :const:`stmt` node is much more interesting. In
-particular, all docstrings are found in subtrees which are formed exactly as
-this node is formed, with the only difference being the string itself. The
-association between the docstring in a similar tree and the defined entity
-(class, function, or module) which it describes is given by the position of the
-docstring subtree within the tree defining the described structure.
-
-By replacing the actual docstring with something to signify a variable component
-of the tree, we allow a simple pattern matching approach to check any given
-subtree for equivalence to the general pattern for docstrings. Since the
-example demonstrates information extraction, we can safely require that the tree
-be in tuple form rather than list form, allowing a simple variable
-representation to be ``['variable_name']``. A simple recursive function can
-implement the pattern matching, returning a Boolean and a dictionary of variable
-name to value mappings. (See file :file:`example.py`.) ::
-
- def match(pattern, data, vars=None):
- if vars is None:
- vars = {}
- if isinstance(pattern, list):
- vars[pattern[0]] = data
- return True, vars
- if not instance(pattern, tuple):
- return (pattern == data), vars
- if len(data) != len(pattern):
- return False, vars
- for pattern, data in zip(pattern, data):
- same, vars = match(pattern, data, vars)
- if not same:
- break
- return same, vars
-
-Using this simple representation for syntactic variables and the symbolic node
-types, the pattern for the candidate docstring subtrees becomes fairly readable.
-(See file :file:`example.py`.) ::
-
- import symbol
- import token
-
- DOCSTRING_STMT_PATTERN = (
- symbol.stmt,
- (symbol.simple_stmt,
- (symbol.small_stmt,
- (symbol.expr_stmt,
- (symbol.testlist,
- (symbol.test,
- (symbol.and_test,
- (symbol.not_test,
- (symbol.comparison,
- (symbol.expr,
- (symbol.xor_expr,
- (symbol.and_expr,
- (symbol.shift_expr,
- (symbol.arith_expr,
- (symbol.term,
- (symbol.factor,
- (symbol.power,
- (symbol.atom,
- (token.STRING, ['docstring'])
- )))))))))))))))),
- (token.NEWLINE, '')
- ))
-
-Using the :func:`match` function with this pattern, extracting the module
-docstring from the parse tree created previously is easy::
-
- >>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1])
- >>> found
- True
- >>> vars
- {'docstring': '"""Some documentation.\n"""'}
-
-Once specific data can be extracted from a location where it is expected, the
-question of where information can be expected needs to be answered. When
-dealing with docstrings, the answer is fairly simple: the docstring is the first
-:const:`stmt` node in a code block (:const:`file_input` or :const:`suite` node
-types). A module consists of a single :const:`file_input` node, and class and
-function definitions each contain exactly one :const:`suite` node. Classes and
-functions are readily identified as subtrees of code block nodes which start
-with ``(stmt, (compound_stmt, (classdef, ...`` or ``(stmt, (compound_stmt,
-(funcdef, ...``. Note that these subtrees cannot be matched by :func:`match`
-since it does not support multiple sibling nodes to match without regard to
-number. A more elaborate matching function could be used to overcome this
-limitation, but this is sufficient for the example.
-
-Given the ability to determine whether a statement might be a docstring and
-extract the actual string from the statement, some work needs to be performed to
-walk the parse tree for an entire module and extract information about the names
-defined in each context of the module and associate any docstrings with the
-names. The code to perform this work is not complicated, but bears some
-explanation.
-
-The public interface to the classes is straightforward and should probably be
-somewhat more flexible. Each "major" block of the module is described by an
-object providing several methods for inquiry and a constructor which accepts at
-least the subtree of the complete parse tree which it represents. The
-:class:`ModuleInfo` constructor accepts an optional *name* parameter since it
-cannot otherwise determine the name of the module.
-
-The public classes include :class:`ClassInfo`, :class:`FunctionInfo`, and
-:class:`ModuleInfo`. All objects provide the methods :meth:`get_name`,
-:meth:`get_docstring`, :meth:`get_class_names`, and :meth:`get_class_info`. The
-:class:`ClassInfo` objects support :meth:`get_method_names` and
-:meth:`get_method_info` while the other classes provide
-:meth:`get_function_names` and :meth:`get_function_info`.
-
-Within each of the forms of code block that the public classes represent, most
-of the required information is in the same form and is accessed in the same way,
-with classes having the distinction that functions defined at the top level are
-referred to as "methods." Since the difference in nomenclature reflects a real
-semantic distinction from functions defined outside of a class, the
-implementation needs to maintain the distinction. Hence, most of the
-functionality of the public classes can be implemented in a common base class,
-:class:`SuiteInfoBase`, with the accessors for function and method information
-provided elsewhere. Note that there is only one class which represents function
-and method information; this parallels the use of the :keyword:`def` statement
-to define both types of elements.
-
-Most of the accessor functions are declared in :class:`SuiteInfoBase` and do not
-need to be overridden by subclasses. More importantly, the extraction of most
-information from a parse tree is handled through a method called by the
-:class:`SuiteInfoBase` constructor. The example code for most of the classes is
-clear when read alongside the formal grammar, but the method which recursively
-creates new information objects requires further examination. Here is the
-relevant part of the :class:`SuiteInfoBase` definition from :file:`example.py`::
-
- class SuiteInfoBase:
- _docstring = ''
- _name = ''
-
- def __init__(self, tree = None):
- self._class_info = {}
- self._function_info = {}
- if tree:
- self._extract_info(tree)
-
- def _extract_info(self, tree):
- # extract docstring
- if len(tree) == 2:
- found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1])
- else:
- found, vars = match(DOCSTRING_STMT_PATTERN, tree[3])
- if found:
- self._docstring = eval(vars['docstring'])
- # discover inner definitions
- for node in tree[1:]:
- found, vars = match(COMPOUND_STMT_PATTERN, node)
- if found:
- cstmt = vars['compound']
- if cstmt[0] == symbol.funcdef:
- name = cstmt[2][1]
- self._function_info[name] = FunctionInfo(cstmt)
- elif cstmt[0] == symbol.classdef:
- name = cstmt[2][1]
- self._class_info[name] = ClassInfo(cstmt)
-
-After initializing some internal state, the constructor calls the
-:meth:`_extract_info` method. This method performs the bulk of the information
-extraction which takes place in the entire example. The extraction has two
-distinct phases: the location of the docstring for the parse tree passed in, and
-the discovery of additional definitions within the code block represented by the
-parse tree.
-
-The initial :keyword:`if` test determines whether the nested suite is of the
-"short form" or the "long form." The short form is used when the code block is
-on the same line as the definition of the code block, as in ::
-
- def square(x): "Square an argument."; return x ** 2
-
-while the long form uses an indented block and allows nested definitions::
-
- def make_power(exp):
- "Make a function that raises an argument to the exponent `exp`."
- def raiser(x, y=exp):
- return x ** y
- return raiser
-
-When the short form is used, the code block may contain a docstring as the
-first, and possibly only, :const:`small_stmt` element. The extraction of such a
-docstring is slightly different and requires only a portion of the complete
-pattern used in the more common case. As implemented, the docstring will only
-be found if there is only one :const:`small_stmt` node in the
-:const:`simple_stmt` node. Since most functions and methods which use the short
-form do not provide a docstring, this may be considered sufficient. The
-extraction of the docstring proceeds using the :func:`match` function as
-described above, and the value of the docstring is stored as an attribute of the
-:class:`SuiteInfoBase` object.
-
-After docstring extraction, a simple definition discovery algorithm operates on
-the :const:`stmt` nodes of the :const:`suite` node. The special case of the
-short form is not tested; since there are no :const:`stmt` nodes in the short
-form, the algorithm will silently skip the single :const:`simple_stmt` node and
-correctly not discover any nested definitions.
-
-Each statement in the code block is categorized as a class definition, function
-or method definition, or something else. For the definition statements, the
-name of the element defined is extracted and a representation object appropriate
-to the definition is created with the defining subtree passed as an argument to
-the constructor. The representation objects are stored in instance variables
-and may be retrieved by name using the appropriate accessor methods.
-
-The public classes provide any accessors required which are more specific than
-those provided by the :class:`SuiteInfoBase` class, but the real extraction
-algorithm remains common to all forms of code blocks. A high-level function can
-be used to extract the complete set of information from a source file. (See
-file :file:`example.py`.) ::
-
- def get_docs(fileName):
- import os
- import parser
-
- source = open(fileName).read()
- basename = os.path.basename(os.path.splitext(fileName)[0])
- st = parser.suite(source)
- return ModuleInfo(st.totuple(), basename)
-
-This provides an easy-to-use interface to the documentation of a module. If
-information is required which is not extracted by the code of this example, the
-code may be extended at clearly defined points to provide additional
-capabilities.
-