diff options
author | Guido van Rossum <guido@python.org> | 1995-10-11 17:30:04 (GMT) |
---|---|---|
committer | Guido van Rossum <guido@python.org> | 1995-10-11 17:30:04 (GMT) |
commit | 4b73a06e922936c9368957d17023ba5ef537c7e0 (patch) | |
tree | a6f19838f83eeb6e4392d186f9048f77c88de827 /Doc | |
parent | c1822a4dd1b5276211be041c7ac216c549c787a4 (diff) | |
download | cpython-4b73a06e922936c9368957d17023ba5ef537c7e0.zip cpython-4b73a06e922936c9368957d17023ba5ef537c7e0.tar.gz cpython-4b73a06e922936c9368957d17023ba5ef537c7e0.tar.bz2 |
Fred Drake's parser module
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/lib/libparser.tex | 250 | ||||
-rw-r--r-- | Doc/libparser.tex | 250 |
2 files changed, 500 insertions, 0 deletions
diff --git a/Doc/lib/libparser.tex b/Doc/lib/libparser.tex new file mode 100644 index 0000000..1f5d4fd --- /dev/null +++ b/Doc/lib/libparser.tex @@ -0,0 +1,250 @@ +% libparser.tex +% +% Introductory documentation for the new parser built-in module. +% +% Copyright 1995 Virginia Polytechnic Institute and State University +% and Fred L. Drake, Jr. This copyright notice must be distributed on +% all copies, but this document otherwise may be distributed as part +% of the Python distribution. No fee may be charged for this document +% in any representation, either on paper or electronically. This +% restriction does not affect other elements in a distributed package +% in any way. +% + +\section{Built-in Module \sectcode{parser}} +\bimodindex{parser} + + +% ==== 2. ==== +% Give a short overview of what the module does. +% If it is platform specific, mention this. +% Mention other important restrictions or general operating principles. + +The \code{parser} module provides an interface to Python's internal +parser and byte-code compiler. The primary purpose for this interface +is to allow Python code to edit the parse tree of a Python expression +and create executable code from this. This can be better than trying +to parse and modify an arbitrary Python code fragment as a string, and +ensures that parsing is performed in a manner identical to the code +forming the application. It's also faster. + +There are a few things to note about this module which are important +to making use of the data structures created. This is not a tutorial +on editing the parse trees for Python code. + +Most importantly, a good understanding of the Python grammar processed +by the internal parser is required. For full information on the +language syntax, refer to the Language Reference. The parser itself +is created from a grammar specification defined in the file +\code{Grammar/Grammar} in the standard Python distribution. The parse +trees stored in the ``AST objects'' created by this module are the +actual output from the internal parser when created by the +\code{expr()} or \code{suite()} functions, described below. The AST +objects created by \code{tuple2ast()} faithfully simulate those +structures. + +Each element of the tuples returned by \code{ast2tuple()} has a simple +form. Tuples representing non-terminal elements in the grammar always +have a length greater than one. The first element is an integer which +identifies a production in the grammar. These integers are given +symbolic names in the C header file \code{Include/graminit.h} and the +Python module \code{Lib/symbol.py}. Each additional element of the +tuple represents a component of the production as recognized in the +input string: these are always tuples which have the same form as the +parent. An important aspect of this structure which should be noted +is that keywords used to identify the parent node type, such as the +keyword \code{if} in an \emph{if\_stmt}, are included in the node tree +without any special treatment. For example, the \code{if} keyword is +represented by the tuple \code{(1, 'if')}, where \code{1} is the +numeric value associated with all \code{NAME} elements, including +variable and function names defined by the user. + +Terminal elements are represented in much the same way, but without +any child elements and the addition of the source text which was +identified. The example of the \code{if} keyword above is +representative. The various types of terminal symbols are defined in +the C header file \code{Include/token.h} and the Python module +\code{Lib/token.py}. + +The AST objects are not actually required to support the functionality +of this module, but are provided for three purposes: to allow an +application to amortize the cost of processing complex parse trees, to +provide a parse tree representation which conserves memory space when +compared to the Python tuple representation, and to ease the creation +of additional modules in C which manipulate parse trees. A simple +``wrapper'' module may be created in Python if desired to hide the use +of AST objects. + + +% ==== 3. ==== +% List the public functions defined by the module. Begin with a +% standard phrase. You may also list the exceptions and other data +% items defined in the module, insofar as they are important for the +% user. + +The \code{parser} module defines the following functions: + +% ---- 3.1. ---- +% Redefine the ``indexsubitem'' macro to point to this module +% (alternatively, you can put this at the top of the file): + +\renewcommand{\indexsubitem}{(in module parser)} + +% ---- 3.2. ---- +% For each function, use a ``funcdesc'' block. This has exactly two +% parameters (each parameters is contained in a set of curly braces): +% the first parameter is the function name (this automatically +% generates an index entry); the second parameter is the function's +% argument list. If there are no arguments, use an empty pair of +% curly braces. If there is more than one argument, separate the +% arguments with backslash-comma. Optional parts of the parameter +% list are contained in \optional{...} (this generates a set of square +% brackets around its parameter). Arguments are automatically set in +% italics in the parameter list. Each argument should be mentioned at +% least once in the description; each usage (even inside \code{...}) +% should be enclosed in \var{...}. + +\begin{funcdesc}{ast2tuple}{ast} +This function accepts an AST object from the caller in +\code{\var{ast}} and returns a Python tuple representing the +equivelent parse tree. The resulting tuple representation can be used +for inspection or the creation of a new parse tree in tuple form. +This function does not fail so long as memory is available to build +the tuple representation. +\end{funcdesc} + + +\begin{funcdesc}{compileast}{ast\optional{\, filename \code{= '<ast>'}}} +The Python byte compiler can be invoked on an AST object to produce +code objects which can be used as part of an \code{exec} statement or +a call to the built-in \code{eval()} function. This function provides +the interface to the compiler, passing the internal parse tree from +\code{\var{ast}} to the parser, using the source file name specified +by the \code{\var{filename}} parameter. The default value supplied +for \code{\var{filename}} indicates that the source was an AST object. +\end{funcdesc} + + +\begin{funcdesc}{expr}{string} +The \code{expr()} function parses the parameter \code{\var{string}} +as if it were an input to \code{compile(\var{string}, 'eval')}. If +the parse succeeds, an AST object is created to hold the internal +parse tree representation, otherwise an appropriate exception is +thrown. +\end{funcdesc} + + +\begin{funcdesc}{isexpr}{ast} +When \code{\var{ast}} represents an \code{'eval'} form, this function +returns a true value (\code{1}), otherwise it returns false +(\code{0}). This is useful, since code objects normally cannot be +queried for this information using existing built-in functions. Note +that the code objects created by \code{compileast()} cannot be queried +like this either, and are identical to those created by the built-in +\code{compile()} function. +\end{funcdesc} + + +\begin{funcdesc}{issuite}{ast} +This function mirrors \code{isexpr()} in that it reports whether an +AST object represents a suite of statements. It is not safe to assume +that this function is equivelent to \code{not isexpr(\var{ast})}, as +additional syntactic fragments may be supported in the future. +\end{funcdesc} + + +\begin{funcdesc}{suite}{string} +The \code{suite()} function parses the parameter \code{\var{string}} +as if it were an input to \code{compile(\var{string}, 'exec')}. If +the parse succeeds, an AST object is created to hold the internal +parse tree representation, otherwise an appropriate exception is +thrown. +\end{funcdesc} + + +\begin{funcdesc}{tuple2ast}{tuple} +This function accepts a parse tree represented as a tuple and builds +an internal representation if possible. If it can validate that the +tree conforms to the Python syntax and all nodes are valid node types +in the host version of Python, an AST object is created from the +internal representation and returned to the called. If there is a +problem creating the internal representation, or if the tree cannot be +validated, a \code{ParserError} exception is thrown. An AST object +created this way should not be assumed to compile correctly; normal +exceptions thrown by compilation may still be initiated when the AST +object is passed to \code{compileast()}. This will normally indicate +problems not related to syntax (such as a \code{MemoryError} +exception). +\end{funcdesc} + + +% --- 3.4. --- +% Exceptions are described using a ``excdesc'' block. This has only +% one parameter: the exception name. + +\subsection{Exceptions and Error Handling} + +The parser module defines a single exception, but may also pass other +built-in exceptions from other portions of the Python runtime +environment. See each function for information about the exceptions +it can raise. + +\begin{excdesc}{ParserError} +Exception raised when a failure occurs within the parser module. This +is generally produced for validation failures rather than the built in +\code{SyntaxError} thrown during normal parsing. +The exception argument is either a string describing the reason of the +failure or a tuple containing a tuple causing the failure from a parse +tree passed to \code{tuple2ast()} and an explanatory string. Calls to +\code{tuple2ast()} need to be able to handle either type of exception, +while calls to other functions in the module will only need to be +aware of the simple string values. +\end{excdesc} + +Note that the functions \code{compileast()}, \code{expr()}, and +\code{suite()} may throw exceptions which are normally thrown by the +parsing and compilation process. These include the built in +exceptions \code{MemoryError}, \code{OverflowError}, +\code{SyntaxError}, and \code{SystemError}. In these cases, these +exceptions carry all the meaning normally associated with them. Refer +to the descriptions of each function for detailed information. + +% ---- 3.5. ---- +% There is no standard block type for classes. I generally use +% ``funcdesc'' blocks, since class instantiation looks very much like +% a function call. + + +% ==== 4. ==== +% Now is probably a good time for a complete example. (Alternatively, +% an example giving the flavor of the module may be given before the +% detailed list of functions.) + +\subsection{Example} + +A simple example: + +\begin{verbatim} +>>> import parser +>>> ast = parser.expr('a + 5') +>>> code = parser.compileast(ast) +>>> a = 5 +>>> eval(code) +10 +\end{verbatim} + + +\subsection{AST Objects} + +AST objects (returned by \code{expr()}, \code{suite()}, and +\code{tuple2ast()}, described above) have no methods of their own. +Some of the functions defined which accept an AST object as their +first argument may change to object methods in the future. + +Ordered and equality comparisons are supported between AST objects. + +\renewcommand{\indexsubitem}{(ast method)} + +%\begin{funcdesc}{empty}{} +%Empty the can into the trash. +%\end{funcdesc} diff --git a/Doc/libparser.tex b/Doc/libparser.tex new file mode 100644 index 0000000..1f5d4fd --- /dev/null +++ b/Doc/libparser.tex @@ -0,0 +1,250 @@ +% libparser.tex +% +% Introductory documentation for the new parser built-in module. +% +% Copyright 1995 Virginia Polytechnic Institute and State University +% and Fred L. Drake, Jr. This copyright notice must be distributed on +% all copies, but this document otherwise may be distributed as part +% of the Python distribution. No fee may be charged for this document +% in any representation, either on paper or electronically. This +% restriction does not affect other elements in a distributed package +% in any way. +% + +\section{Built-in Module \sectcode{parser}} +\bimodindex{parser} + + +% ==== 2. ==== +% Give a short overview of what the module does. +% If it is platform specific, mention this. +% Mention other important restrictions or general operating principles. + +The \code{parser} module provides an interface to Python's internal +parser and byte-code compiler. The primary purpose for this interface +is to allow Python code to edit the parse tree of a Python expression +and create executable code from this. This can be better than trying +to parse and modify an arbitrary Python code fragment as a string, and +ensures that parsing is performed in a manner identical to the code +forming the application. It's also faster. + +There are a few things to note about this module which are important +to making use of the data structures created. This is not a tutorial +on editing the parse trees for Python code. + +Most importantly, a good understanding of the Python grammar processed +by the internal parser is required. For full information on the +language syntax, refer to the Language Reference. The parser itself +is created from a grammar specification defined in the file +\code{Grammar/Grammar} in the standard Python distribution. The parse +trees stored in the ``AST objects'' created by this module are the +actual output from the internal parser when created by the +\code{expr()} or \code{suite()} functions, described below. The AST +objects created by \code{tuple2ast()} faithfully simulate those +structures. + +Each element of the tuples returned by \code{ast2tuple()} has a simple +form. Tuples representing non-terminal elements in the grammar always +have a length greater than one. The first element is an integer which +identifies a production in the grammar. These integers are given +symbolic names in the C header file \code{Include/graminit.h} and the +Python module \code{Lib/symbol.py}. Each additional element of the +tuple represents a component of the production as recognized in the +input string: these are always tuples which have the same form as the +parent. An important aspect of this structure which should be noted +is that keywords used to identify the parent node type, such as the +keyword \code{if} in an \emph{if\_stmt}, are included in the node tree +without any special treatment. For example, the \code{if} keyword is +represented by the tuple \code{(1, 'if')}, where \code{1} is the +numeric value associated with all \code{NAME} elements, including +variable and function names defined by the user. + +Terminal elements are represented in much the same way, but without +any child elements and the addition of the source text which was +identified. The example of the \code{if} keyword above is +representative. The various types of terminal symbols are defined in +the C header file \code{Include/token.h} and the Python module +\code{Lib/token.py}. + +The AST objects are not actually required to support the functionality +of this module, but are provided for three purposes: to allow an +application to amortize the cost of processing complex parse trees, to +provide a parse tree representation which conserves memory space when +compared to the Python tuple representation, and to ease the creation +of additional modules in C which manipulate parse trees. A simple +``wrapper'' module may be created in Python if desired to hide the use +of AST objects. + + +% ==== 3. ==== +% List the public functions defined by the module. Begin with a +% standard phrase. You may also list the exceptions and other data +% items defined in the module, insofar as they are important for the +% user. + +The \code{parser} module defines the following functions: + +% ---- 3.1. ---- +% Redefine the ``indexsubitem'' macro to point to this module +% (alternatively, you can put this at the top of the file): + +\renewcommand{\indexsubitem}{(in module parser)} + +% ---- 3.2. ---- +% For each function, use a ``funcdesc'' block. This has exactly two +% parameters (each parameters is contained in a set of curly braces): +% the first parameter is the function name (this automatically +% generates an index entry); the second parameter is the function's +% argument list. If there are no arguments, use an empty pair of +% curly braces. If there is more than one argument, separate the +% arguments with backslash-comma. Optional parts of the parameter +% list are contained in \optional{...} (this generates a set of square +% brackets around its parameter). Arguments are automatically set in +% italics in the parameter list. Each argument should be mentioned at +% least once in the description; each usage (even inside \code{...}) +% should be enclosed in \var{...}. + +\begin{funcdesc}{ast2tuple}{ast} +This function accepts an AST object from the caller in +\code{\var{ast}} and returns a Python tuple representing the +equivelent parse tree. The resulting tuple representation can be used +for inspection or the creation of a new parse tree in tuple form. +This function does not fail so long as memory is available to build +the tuple representation. +\end{funcdesc} + + +\begin{funcdesc}{compileast}{ast\optional{\, filename \code{= '<ast>'}}} +The Python byte compiler can be invoked on an AST object to produce +code objects which can be used as part of an \code{exec} statement or +a call to the built-in \code{eval()} function. This function provides +the interface to the compiler, passing the internal parse tree from +\code{\var{ast}} to the parser, using the source file name specified +by the \code{\var{filename}} parameter. The default value supplied +for \code{\var{filename}} indicates that the source was an AST object. +\end{funcdesc} + + +\begin{funcdesc}{expr}{string} +The \code{expr()} function parses the parameter \code{\var{string}} +as if it were an input to \code{compile(\var{string}, 'eval')}. If +the parse succeeds, an AST object is created to hold the internal +parse tree representation, otherwise an appropriate exception is +thrown. +\end{funcdesc} + + +\begin{funcdesc}{isexpr}{ast} +When \code{\var{ast}} represents an \code{'eval'} form, this function +returns a true value (\code{1}), otherwise it returns false +(\code{0}). This is useful, since code objects normally cannot be +queried for this information using existing built-in functions. Note +that the code objects created by \code{compileast()} cannot be queried +like this either, and are identical to those created by the built-in +\code{compile()} function. +\end{funcdesc} + + +\begin{funcdesc}{issuite}{ast} +This function mirrors \code{isexpr()} in that it reports whether an +AST object represents a suite of statements. It is not safe to assume +that this function is equivelent to \code{not isexpr(\var{ast})}, as +additional syntactic fragments may be supported in the future. +\end{funcdesc} + + +\begin{funcdesc}{suite}{string} +The \code{suite()} function parses the parameter \code{\var{string}} +as if it were an input to \code{compile(\var{string}, 'exec')}. If +the parse succeeds, an AST object is created to hold the internal +parse tree representation, otherwise an appropriate exception is +thrown. +\end{funcdesc} + + +\begin{funcdesc}{tuple2ast}{tuple} +This function accepts a parse tree represented as a tuple and builds +an internal representation if possible. If it can validate that the +tree conforms to the Python syntax and all nodes are valid node types +in the host version of Python, an AST object is created from the +internal representation and returned to the called. If there is a +problem creating the internal representation, or if the tree cannot be +validated, a \code{ParserError} exception is thrown. An AST object +created this way should not be assumed to compile correctly; normal +exceptions thrown by compilation may still be initiated when the AST +object is passed to \code{compileast()}. This will normally indicate +problems not related to syntax (such as a \code{MemoryError} +exception). +\end{funcdesc} + + +% --- 3.4. --- +% Exceptions are described using a ``excdesc'' block. This has only +% one parameter: the exception name. + +\subsection{Exceptions and Error Handling} + +The parser module defines a single exception, but may also pass other +built-in exceptions from other portions of the Python runtime +environment. See each function for information about the exceptions +it can raise. + +\begin{excdesc}{ParserError} +Exception raised when a failure occurs within the parser module. This +is generally produced for validation failures rather than the built in +\code{SyntaxError} thrown during normal parsing. +The exception argument is either a string describing the reason of the +failure or a tuple containing a tuple causing the failure from a parse +tree passed to \code{tuple2ast()} and an explanatory string. Calls to +\code{tuple2ast()} need to be able to handle either type of exception, +while calls to other functions in the module will only need to be +aware of the simple string values. +\end{excdesc} + +Note that the functions \code{compileast()}, \code{expr()}, and +\code{suite()} may throw exceptions which are normally thrown by the +parsing and compilation process. These include the built in +exceptions \code{MemoryError}, \code{OverflowError}, +\code{SyntaxError}, and \code{SystemError}. In these cases, these +exceptions carry all the meaning normally associated with them. Refer +to the descriptions of each function for detailed information. + +% ---- 3.5. ---- +% There is no standard block type for classes. I generally use +% ``funcdesc'' blocks, since class instantiation looks very much like +% a function call. + + +% ==== 4. ==== +% Now is probably a good time for a complete example. (Alternatively, +% an example giving the flavor of the module may be given before the +% detailed list of functions.) + +\subsection{Example} + +A simple example: + +\begin{verbatim} +>>> import parser +>>> ast = parser.expr('a + 5') +>>> code = parser.compileast(ast) +>>> a = 5 +>>> eval(code) +10 +\end{verbatim} + + +\subsection{AST Objects} + +AST objects (returned by \code{expr()}, \code{suite()}, and +\code{tuple2ast()}, described above) have no methods of their own. +Some of the functions defined which accept an AST object as their +first argument may change to object methods in the future. + +Ordered and equality comparisons are supported between AST objects. + +\renewcommand{\indexsubitem}{(ast method)} + +%\begin{funcdesc}{empty}{} +%Empty the can into the trash. +%\end{funcdesc} |