summaryrefslogtreecommitdiffstats
path: root/Doc/library/tokenize.rst
diff options
context:
space:
mode:
authorGeorg Brandl <georg@python.org>2009-01-03 21:18:54 (GMT)
committerGeorg Brandl <georg@python.org>2009-01-03 21:18:54 (GMT)
commit48310cd3f2e02ced9ae836ccbcb67e9af3097d62 (patch)
tree04c86b387c11bfd4835a320e76bbb2ee24626e0d /Doc/library/tokenize.rst
parent3d3558a4653fcfcbdcbb75bda5d61e93c48f4d51 (diff)
downloadcpython-48310cd3f2e02ced9ae836ccbcb67e9af3097d62.zip
cpython-48310cd3f2e02ced9ae836ccbcb67e9af3097d62.tar.gz
cpython-48310cd3f2e02ced9ae836ccbcb67e9af3097d62.tar.bz2
Remove trailing whitespace.
Diffstat (limited to 'Doc/library/tokenize.rst')
-rw-r--r--Doc/library/tokenize.rst58
1 files changed, 29 insertions, 29 deletions
diff --git a/Doc/library/tokenize.rst b/Doc/library/tokenize.rst
index b2caded..197b574 100644
--- a/Doc/library/tokenize.rst
+++ b/Doc/library/tokenize.rst
@@ -19,16 +19,16 @@ The primary entry point is a :term:`generator`:
The :func:`tokenize` generator requires one argument, *readline*, which
must be a callable object which provides the same interface as the
:meth:`readline` method of built-in file objects (see section
- :ref:`bltin-file-objects`). Each call to the function should return one
+ :ref:`bltin-file-objects`). Each call to the function should return one
line of input as bytes.
- The generator produces 5-tuples with these members: the token type; the
- token string; a 2-tuple ``(srow, scol)`` of ints specifying the row and
- column where the token begins in the source; a 2-tuple ``(erow, ecol)`` of
- ints specifying the row and column where the token ends in the source; and
+ The generator produces 5-tuples with these members: the token type; the
+ token string; a 2-tuple ``(srow, scol)`` of ints specifying the row and
+ column where the token begins in the source; a 2-tuple ``(erow, ecol)`` of
+ ints specifying the row and column where the token ends in the source; and
the line on which the token was found. The line passed (the last tuple item)
is the *logical* line; continuation lines are included.
-
+
:func:`tokenize` determines the source encoding of the file by looking for a
UTF-8 BOM or encoding cookie, according to :pep:`263`.
@@ -44,35 +44,35 @@ All constants from the :mod:`token` module are also exported from
.. data:: NL
Token value used to indicate a non-terminating newline. The NEWLINE token
- indicates the end of a logical line of Python code; NL tokens are generated
+ indicates the end of a logical line of Python code; NL tokens are generated
when a logical line of code is continued over multiple physical lines.
.. data:: ENCODING
- Token value that indicates the encoding used to decode the source bytes
- into text. The first token returned by :func:`tokenize` will always be an
+ Token value that indicates the encoding used to decode the source bytes
+ into text. The first token returned by :func:`tokenize` will always be an
ENCODING token.
-Another function is provided to reverse the tokenization process. This is
-useful for creating tools that tokenize a script, modify the token stream, and
+Another function is provided to reverse the tokenization process. This is
+useful for creating tools that tokenize a script, modify the token stream, and
write back the modified script.
.. function:: untokenize(iterable)
Converts tokens back into Python source code. The *iterable* must return
- sequences with at least two elements, the token type and the token string.
+ sequences with at least two elements, the token type and the token string.
Any additional sequence elements are ignored.
-
+
The reconstructed script is returned as a single string. The result is
guaranteed to tokenize back to match the input so that the conversion is
- lossless and round-trips are assured. The guarantee applies only to the
- token type and token string as the spacing between tokens (column
+ lossless and round-trips are assured. The guarantee applies only to the
+ token type and token string as the spacing between tokens (column
positions) may change.
-
- It returns bytes, encoded using the ENCODING token, which is the first
+
+ It returns bytes, encoded using the ENCODING token, which is the first
token sequence output by :func:`tokenize`.
@@ -81,43 +81,43 @@ function it uses to do this is available:
.. function:: detect_encoding(readline)
- The :func:`detect_encoding` function is used to detect the encoding that
- should be used to decode a Python source file. It requires one argment,
+ The :func:`detect_encoding` function is used to detect the encoding that
+ should be used to decode a Python source file. It requires one argment,
readline, in the same way as the :func:`tokenize` generator.
-
+
It will call readline a maximum of twice, and return the encoding used
(as a string) and a list of any lines (not decoded from bytes) it has read
in.
-
+
It detects the encoding from the presence of a utf-8 bom or an encoding
cookie as specified in pep-0263. If both a bom and a cookie are present,
but disagree, a SyntaxError will be raised.
-
- If no encoding is specified, then the default of 'utf-8' will be returned.
-
+ If no encoding is specified, then the default of 'utf-8' will be returned.
+
+
Example of a script re-writer that transforms float literals into Decimal
objects::
def decistmt(s):
"""Substitute Decimals for floats in a string of statements.
-
+
>>> from decimal import Decimal
>>> s = 'print(+21.3e-5*-.1234/81.7)'
>>> decistmt(s)
"print (+Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7'))"
-
+
The format of the exponent is inherited from the platform C library.
Known cases are "e-007" (Windows) and "e-07" (not Windows). Since
we're only showing 12 digits, and the 13th isn't close to 5, the
rest of the output should be platform-independent.
-
+
>>> exec(s) #doctest: +ELLIPSIS
-3.21716034272e-0...7
-
+
Output from calculations with Decimal should be identical across all
platforms.
-
+
>>> exec(decistmt(s))
-3.217160342717258261933904529E-7
"""