diff options
author | Raymond Hettinger <python@rcn.com> | 2005-06-10 11:05:19 (GMT) |
---|---|---|
committer | Raymond Hettinger <python@rcn.com> | 2005-06-10 11:05:19 (GMT) |
commit | 68c04534182f2c09783b6506701a8bc25c98b4a9 (patch) | |
tree | 4e5f2b764eff65a3201dd2e666355c487e88a9b7 /Doc | |
parent | bf7255fffb5dda1b9541892cc40412bb6bbd4409 (diff) | |
download | cpython-68c04534182f2c09783b6506701a8bc25c98b4a9.zip cpython-68c04534182f2c09783b6506701a8bc25c98b4a9.tar.gz cpython-68c04534182f2c09783b6506701a8bc25c98b4a9.tar.bz2 |
Add untokenize() function to allow full round-trip tokenization.
Should significantly enhance the utility of the module by supporting
the creation of tools that modify the token stream and writeback the
modified result.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/lib/libtokenize.tex | 52 |
1 files changed, 52 insertions, 0 deletions
diff --git a/Doc/lib/libtokenize.tex b/Doc/lib/libtokenize.tex index 6cd9348..dc5f8c1 100644 --- a/Doc/lib/libtokenize.tex +++ b/Doc/lib/libtokenize.tex @@ -45,6 +45,9 @@ An older entry point is retained for backward compatibility: provides the same interface as the \method{readline()} method of built-in file objects (see section~\ref{bltin-file-objects}). Each call to the function should return one line of input as a string. + Alternately, \var{readline} may be a callable object that signals + completion by raising \exception{StopIteration}. + \versionchanged[Added StopIteration support]{2.5} The second parameter, \var{tokeneater}, must also be a callable object. It is called once for each token, with five arguments, @@ -65,3 +68,52 @@ passed to the \var{tokeneater} function by \function{tokenize()}: are generated when a logical line of code is continued over multiple physical lines. \end{datadesc} + +Another function is provided to reverse the tokenization process. +This is useful for creating tools that tokenize a script, modify +the token stream, and write back the modified script. + +\begin{funcdesc}{untokenize}{iterable} + Converts tokens back into Python source code. The \variable{iterable} + must return sequences with at least two elements, the token type and + the token string. Any additional sequence elements are ignored. + + The reconstructed script is returned as a single string. The + result is guaranteed to tokenize back to match the input so that + the conversion is lossless and round-trips are assured. The + guarantee applies only to the token type and token string as + the spacing between tokens (column positions) may change. + \versionadded{2.5} +\end{funcdesc} + +Example of a script re-writer that transforms float literals into +Decimal objects: +\begin{verbatim} +def decistmt(s): + """Substitute Decimals for floats in a string of statements. + + >>> from decimal import Decimal + >>> s = 'print +21.3e-5*-.1234/81.7' + >>> decistmt(s) + "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')" + + >>> exec(s) + -3.21716034272e-007 + >>> exec(decistmt(s)) + -3.217160342717258261933904529E-7 + + """ + result = [] + g = generate_tokens(StringIO(s).readline) # tokenize the string + for toknum, tokval, _, _, _ in g: + if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens + result.extend([ + (NAME, 'Decimal'), + (OP, '('), + (STRING, repr(tokval)), + (OP, ')') + ]) + else: + result.append((toknum, tokval)) + return untokenize(result) +\end{verbatim} |