summaryrefslogtreecommitdiffstats
path: root/Doc/library/tokenize.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/tokenize.rst')
-rw-r--r--Doc/library/tokenize.rst53
1 files changed, 52 insertions, 1 deletions
diff --git a/Doc/library/tokenize.rst b/Doc/library/tokenize.rst
index 050d74c..37d9f41 100644
--- a/Doc/library/tokenize.rst
+++ b/Doc/library/tokenize.rst
@@ -15,6 +15,11 @@ implemented in Python. The scanner in this module returns comments as tokens
as well, making it useful for implementing "pretty-printers," including
colorizers for on-screen displays.
+To simplify token stream handling, all :ref:`operators` and :ref:`delimiters`
+tokens are returned using the generic :data:`token.OP` token type. The exact
+type can be determined by checking the ``exact_type`` property on the
+:term:`named tuple` returned from :func:`tokenize.tokenize`.
+
Tokenizing Input
----------------
@@ -36,9 +41,17 @@ The primary entry point is a :term:`generator`:
returned as a :term:`named tuple` with the field names:
``type string start end line``.
+ The returned :term:`named tuple` has a additional property named
+ ``exact_type`` that contains the exact operator type for
+ :data:`token.OP` tokens. For all other token types ``exact_type``
+ equals the named tuple ``type`` field.
+
.. versionchanged:: 3.1
Added support for named tuples.
+ .. versionchanged:: 3.3
+ Added support for ``exact_type``.
+
:func:`tokenize` determines the source encoding of the file by looking for a
UTF-8 BOM or encoding cookie, according to :pep:`263`.
@@ -131,7 +144,19 @@ It is as simple as:
.. code-block:: sh
- python -m tokenize [filename.py]
+ python -m tokenize [-e] [filename.py]
+
+The following options are accepted:
+
+.. program:: tokenize
+
+.. cmdoption:: -h, --help
+
+ show this help message and exit
+
+.. cmdoption:: -e, --exact
+
+ display token names using the exact type
If :file:`filename.py` is specified its contents are tokenized to stdout.
Otherwise, tokenization is performed on stdin.
@@ -215,3 +240,29 @@ the name of the token, and the final column is the value of the token (if any)
4,10-4,11: OP ')'
4,11-4,12: NEWLINE '\n'
5,0-5,0: ENDMARKER ''
+
+The exact token type names can be displayed using the ``-e`` option:
+
+.. code-block:: sh
+
+ $ python -m tokenize -e hello.py
+ 0,0-0,0: ENCODING 'utf-8'
+ 1,0-1,3: NAME 'def'
+ 1,4-1,13: NAME 'say_hello'
+ 1,13-1,14: LPAR '('
+ 1,14-1,15: RPAR ')'
+ 1,15-1,16: COLON ':'
+ 1,16-1,17: NEWLINE '\n'
+ 2,0-2,4: INDENT ' '
+ 2,4-2,9: NAME 'print'
+ 2,9-2,10: LPAR '('
+ 2,10-2,25: STRING '"Hello, World!"'
+ 2,25-2,26: RPAR ')'
+ 2,26-2,27: NEWLINE '\n'
+ 3,0-3,1: NL '\n'
+ 4,0-4,0: DEDENT ''
+ 4,0-4,9: NAME 'say_hello'
+ 4,9-4,10: LPAR '('
+ 4,10-4,11: RPAR ')'
+ 4,11-4,12: NEWLINE '\n'
+ 5,0-5,0: ENDMARKER ''