diff options
Diffstat (limited to 'Doc/lib/libtokenize.tex')
-rw-r--r-- | Doc/lib/libtokenize.tex | 37 |
1 files changed, 27 insertions, 10 deletions
diff --git a/Doc/lib/libtokenize.tex b/Doc/lib/libtokenize.tex index 205407c..6cd9348 100644 --- a/Doc/lib/libtokenize.tex +++ b/Doc/lib/libtokenize.tex @@ -12,12 +12,33 @@ source code, implemented in Python. The scanner in this module returns comments as tokens as well, making it useful for implementing ``pretty-printers,'' including colorizers for on-screen displays. -The scanner is exposed by a single function: +The primary entry point is a generator: +\begin{funcdesc}{generate_tokens}{readline} + The \function{generate_tokens()} generator requires one argment, + \var{readline}, which must be a callable object which + provides the same interface as the \method{readline()} method of + built-in file objects (see section~\ref{bltin-file-objects}). Each + call to the function should return one line of input as a string. + + The generator produces 5-tuples with these members: + the token type; + the token string; + a 2-tuple \code{(\var{srow}, \var{scol})} of ints specifying the + row and column where the token begins in the source; + a 2-tuple \code{(\var{erow}, \var{ecol})} of ints specifying the + row and column where the token ends in the source; + and the line on which the token was found. + The line passed is the \emph{logical} line; + continuation lines are included. + \versionadded{2.2} +\end{funcdesc} + +An older entry point is retained for backward compatibility: \begin{funcdesc}{tokenize}{readline\optional{, tokeneater}} The \function{tokenize()} function accepts two parameters: one - representing the input stream, and one providing an output mechanism + representing the input stream, and one providing an output mechanism for \function{tokenize()}. The first parameter, \var{readline}, must be a callable object which @@ -26,17 +47,13 @@ The scanner is exposed by a single function: call to the function should return one line of input as a string. The second parameter, \var{tokeneater}, must also be a callable - object. It is called with five parameters: the token type, the - token string, a tuple \code{(\var{srow}, \var{scol})} specifying the - row and column where the token begins in the source, a tuple - \code{(\var{erow}, \var{ecol})} giving the ending position of the - token, and the line on which the token was found. The line passed - is the \emph{logical} line; continuation lines are included. + object. It is called once for each token, with five arguments, + corresponding to the tuples generated by \function{generate_tokens()}. \end{funcdesc} -All constants from the \refmodule{token} module are also exported from -\module{tokenize}, as are two additional token type values that might be +All constants from the \refmodule{token} module are also exported from +\module{tokenize}, as are two additional token type values that might be passed to the \var{tokeneater} function by \function{tokenize()}: \begin{datadesc}{COMMENT} |