diff options
Diffstat (limited to 'Doc/lib/compiler.tex')
-rw-r--r-- | Doc/lib/compiler.tex | 353 |
1 files changed, 0 insertions, 353 deletions
diff --git a/Doc/lib/compiler.tex b/Doc/lib/compiler.tex deleted file mode 100644 index d4f4124..0000000 --- a/Doc/lib/compiler.tex +++ /dev/null @@ -1,353 +0,0 @@ -\chapter{Python compiler package \label{compiler}} - -\sectionauthor{Jeremy Hylton}{jeremy@zope.com} - - -The Python compiler package is a tool for analyzing Python source code -and generating Python bytecode. The compiler contains libraries to -generate an abstract syntax tree from Python source code and to -generate Python bytecode from the tree. - -The \refmodule{compiler} package is a Python source to bytecode -translator written in Python. It uses the built-in parser and -standard \refmodule{parser} module to generated a concrete syntax -tree. This tree is used to generate an abstract syntax tree (AST) and -then Python bytecode. - -The full functionality of the package duplicates the builtin compiler -provided with the Python interpreter. It is intended to match its -behavior almost exactly. Why implement another compiler that does the -same thing? The package is useful for a variety of purposes. It can -be modified more easily than the builtin compiler. The AST it -generates is useful for analyzing Python source code. - -This chapter explains how the various components of the -\refmodule{compiler} package work. It blends reference material with -a tutorial. - -The following modules are part of the \refmodule{compiler} package: - -\localmoduletable - - -\section{The basic interface} - -\declaremodule{}{compiler} - -The top-level of the package defines four functions. If you import -\module{compiler}, you will get these functions and a collection of -modules contained in the package. - -\begin{funcdesc}{parse}{buf} -Returns an abstract syntax tree for the Python source code in \var{buf}. -The function raises \exception{SyntaxError} if there is an error in the -source code. The return value is a \class{compiler.ast.Module} instance -that contains the tree. -\end{funcdesc} - -\begin{funcdesc}{parseFile}{path} -Return an abstract syntax tree for the Python source code in the file -specified by \var{path}. It is equivalent to -\code{parse(open(\var{path}).read())}. -\end{funcdesc} - -\begin{funcdesc}{walk}{ast, visitor\optional{, verbose}} -Do a pre-order walk over the abstract syntax tree \var{ast}. Call the -appropriate method on the \var{visitor} instance for each node -encountered. -\end{funcdesc} - -\begin{funcdesc}{compile}{source, filename, mode, flags=None, - dont_inherit=None} -Compile the string \var{source}, a Python module, statement or -expression, into a code object that can be executed by the exec -statement or \function{eval()}. This function is a replacement for the -built-in \function{compile()} function. - -The \var{filename} will be used for run-time error messages. - -The \var{mode} must be 'exec' to compile a module, 'single' to compile a -single (interactive) statement, or 'eval' to compile an expression. - -The \var{flags} and \var{dont_inherit} arguments affect future-related -statements, but are not supported yet. -\end{funcdesc} - -\begin{funcdesc}{compileFile}{source} -Compiles the file \var{source} and generates a .pyc file. -\end{funcdesc} - -The \module{compiler} package contains the following modules: -\refmodule[compiler.ast]{ast}, \module{consts}, \module{future}, -\module{misc}, \module{pyassem}, \module{pycodegen}, \module{symbols}, -\module{transformer}, and \refmodule[compiler.visitor]{visitor}. - -\section{Limitations} - -There are some problems with the error checking of the compiler -package. The interpreter detects syntax errors in two distinct -phases. One set of errors is detected by the interpreter's parser, -the other set by the compiler. The compiler package relies on the -interpreter's parser, so it get the first phases of error checking for -free. It implements the second phase itself, and that implementation is -incomplete. For example, the compiler package does not raise an error -if a name appears more than once in an argument list: -\code{def f(x, x): ...} - -A future version of the compiler should fix these problems. - -\section{Python Abstract Syntax} - -The \module{compiler.ast} module defines an abstract syntax for -Python. In the abstract syntax tree, each node represents a syntactic -construct. The root of the tree is \class{Module} object. - -The abstract syntax offers a higher level interface to parsed Python -source code. The \refmodule{parser} -module and the compiler written in C for the Python interpreter use a -concrete syntax tree. The concrete syntax is tied closely to the -grammar description used for the Python parser. Instead of a single -node for a construct, there are often several levels of nested nodes -that are introduced by Python's precedence rules. - -The abstract syntax tree is created by the -\module{compiler.transformer} module. The transformer relies on the -builtin Python parser to generate a concrete syntax tree. It -generates an abstract syntax tree from the concrete tree. - -The \module{transformer} module was created by Greg -Stein\index{Stein, Greg} and Bill Tutt\index{Tutt, Bill} for an -experimental Python-to-C compiler. The current version contains a -number of modifications and improvements, but the basic form of the -abstract syntax and of the transformer are due to Stein and Tutt. - -\subsection{AST Nodes} - -\declaremodule{}{compiler.ast} - -The \module{compiler.ast} module is generated from a text file that -describes each node type and its elements. Each node type is -represented as a class that inherits from the abstract base class -\class{compiler.ast.Node} and defines a set of named attributes for -child nodes. - -\begin{classdesc}{Node}{} - - The \class{Node} instances are created automatically by the parser - generator. The recommended interface for specific \class{Node} - instances is to use the public attributes to access child nodes. A - public attribute may be bound to a single node or to a sequence of - nodes, depending on the \class{Node} type. For example, the - \member{bases} attribute of the \class{Class} node, is bound to a - list of base class nodes, and the \member{doc} attribute is bound to - a single node. - - Each \class{Node} instance has a \member{lineno} attribute which may - be \code{None}. XXX Not sure what the rules are for which nodes - will have a useful lineno. -\end{classdesc} - -All \class{Node} objects offer the following methods: - -\begin{methoddesc}{getChildren}{} - Returns a flattened list of the child nodes and objects in the - order they occur. Specifically, the order of the nodes is the - order in which they appear in the Python grammar. Not all of the - children are \class{Node} instances. The names of functions and - classes, for example, are plain strings. -\end{methoddesc} - -\begin{methoddesc}{getChildNodes}{} - Returns a flattened list of the child nodes in the order they - occur. This method is like \method{getChildren()}, except that it - only returns those children that are \class{Node} instances. -\end{methoddesc} - -Two examples illustrate the general structure of \class{Node} -classes. The \keyword{while} statement is defined by the following -grammar production: - -\begin{verbatim} -while_stmt: "while" expression ":" suite - ["else" ":" suite] -\end{verbatim} - -The \class{While} node has three attributes: \member{test}, -\member{body}, and \member{else_}. (If the natural name for an -attribute is also a Python reserved word, it can't be used as an -attribute name. An underscore is appended to the word to make it a -legal identifier, hence \member{else_} instead of \keyword{else}.) - -The \keyword{if} statement is more complicated because it can include -several tests. - -\begin{verbatim} -if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite] -\end{verbatim} - -The \class{If} node only defines two attributes: \member{tests} and -\member{else_}. The \member{tests} attribute is a sequence of test -expression, consequent body pairs. There is one pair for each -\keyword{if}/\keyword{elif} clause. The first element of the pair is -the test expression. The second elements is a \class{Stmt} node that -contains the code to execute if the test is true. - -The \method{getChildren()} method of \class{If} returns a flat list of -child nodes. If there are three \keyword{if}/\keyword{elif} clauses -and no \keyword{else} clause, then \method{getChildren()} will return -a list of six elements: the first test expression, the first -\class{Stmt}, the second text expression, etc. - -The following table lists each of the \class{Node} subclasses defined -in \module{compiler.ast} and each of the public attributes available -on their instances. The values of most of the attributes are -themselves \class{Node} instances or sequences of instances. When the -value is something other than an instance, the type is noted in the -comment. The attributes are listed in the order in which they are -returned by \method{getChildren()} and \method{getChildNodes()}. - -\input{asttable} - - -\subsection{Assignment nodes} - -There is a collection of nodes used to represent assignments. Each -assignment statement in the source code becomes a single -\class{Assign} node in the AST. The \member{nodes} attribute is a -list that contains a node for each assignment target. This is -necessary because assignment can be chained, e.g. \code{a = b = 2}. -Each \class{Node} in the list will be one of the following classes: -\class{AssAttr}, \class{AssList}, \class{AssName}, or -\class{AssTuple}. - -Each target assignment node will describe the kind of object being -assigned to: \class{AssName} for a simple name, e.g. \code{a = 1}. -\class{AssAttr} for an attribute assigned, e.g. \code{a.x = 1}. -\class{AssList} and \class{AssTuple} for list and tuple expansion -respectively, e.g. \code{a, b, c = a_tuple}. - -The target assignment nodes also have a \member{flags} attribute that -indicates whether the node is being used for assignment or in a delete -statement. The \class{AssName} is also used to represent a delete -statement, e.g. \class{del x}. - -When an expression contains several attribute references, an -assignment or delete statement will contain only one \class{AssAttr} -node -- for the final attribute reference. The other attribute -references will be represented as \class{Getattr} nodes in the -\member{expr} attribute of the \class{AssAttr} instance. - -\subsection{Examples} - -This section shows several simple examples of ASTs for Python source -code. The examples demonstrate how to use the \function{parse()} -function, what the repr of an AST looks like, and how to access -attributes of an AST node. - -The first module defines a single function. Assume it is stored in -\file{/tmp/doublelib.py}. - -\begin{verbatim} -"""This is an example module. - -This is the docstring. -""" - -def double(x): - "Return twice the argument" - return x * 2 -\end{verbatim} - -In the interactive interpreter session below, I have reformatted the -long AST reprs for readability. The AST reprs use unqualified class -names. If you want to create an instance from a repr, you must import -the class names from the \module{compiler.ast} module. - -\begin{verbatim} ->>> import compiler ->>> mod = compiler.parseFile("/tmp/doublelib.py") ->>> mod -Module('This is an example module.\n\nThis is the docstring.\n', - Stmt([Function(None, 'double', ['x'], [], 0, - 'Return twice the argument', - Stmt([Return(Mul((Name('x'), Const(2))))]))])) ->>> from compiler.ast import * ->>> Module('This is an example module.\n\nThis is the docstring.\n', -... Stmt([Function(None, 'double', ['x'], [], 0, -... 'Return twice the argument', -... Stmt([Return(Mul((Name('x'), Const(2))))]))])) -Module('This is an example module.\n\nThis is the docstring.\n', - Stmt([Function(None, 'double', ['x'], [], 0, - 'Return twice the argument', - Stmt([Return(Mul((Name('x'), Const(2))))]))])) ->>> mod.doc -'This is an example module.\n\nThis is the docstring.\n' ->>> for node in mod.node.nodes: -... print node -... -Function(None, 'double', ['x'], [], 0, 'Return twice the argument', - Stmt([Return(Mul((Name('x'), Const(2))))])) ->>> func = mod.node.nodes[0] ->>> func.code -Stmt([Return(Mul((Name('x'), Const(2))))]) -\end{verbatim} - -\section{Using Visitors to Walk ASTs} - -\declaremodule{}{compiler.visitor} - -The visitor pattern is ... The \refmodule{compiler} package uses a -variant on the visitor pattern that takes advantage of Python's -introspection features to eliminate the need for much of the visitor's -infrastructure. - -The classes being visited do not need to be programmed to accept -visitors. The visitor need only define visit methods for classes it -is specifically interested in; a default visit method can handle the -rest. - -XXX The magic \method{visit()} method for visitors. - -\begin{funcdesc}{walk}{tree, visitor\optional{, verbose}} -\end{funcdesc} - -\begin{classdesc}{ASTVisitor}{} - -The \class{ASTVisitor} is responsible for walking over the tree in the -correct order. A walk begins with a call to \method{preorder()}. For -each node, it checks the \var{visitor} argument to \method{preorder()} -for a method named `visitNodeType,' where NodeType is the name of the -node's class, e.g. for a \class{While} node a \method{visitWhile()} -would be called. If the method exists, it is called with the node as -its first argument. - -The visitor method for a particular node type can control how child -nodes are visited during the walk. The \class{ASTVisitor} modifies -the visitor argument by adding a visit method to the visitor; this -method can be used to visit a particular child node. If no visitor is -found for a particular node type, the \method{default()} method is -called. -\end{classdesc} - -\class{ASTVisitor} objects have the following methods: - -XXX describe extra arguments - -\begin{methoddesc}{default}{node\optional{, \moreargs}} -\end{methoddesc} - -\begin{methoddesc}{dispatch}{node\optional{, \moreargs}} -\end{methoddesc} - -\begin{methoddesc}{preorder}{tree, visitor} -\end{methoddesc} - - -\section{Bytecode Generation} - -The code generator is a visitor that emits bytecodes. Each visit method -can call the \method{emit()} method to emit a new bytecode. The basic -code generator is specialized for modules, classes, and functions. An -assembler converts that emitted instructions to the low-level bytecode -format. It handles things like generator of constant lists of code -objects and calculation of jump offsets. |