Doc/library/token.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137

:mod:`token` --- Constants used with Python parse trees
=======================================================

.. module:: token
   :synopsis: Constants representing terminal nodes of the parse tree.

.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>

**Source code:** :source:`Lib/token.py`

--------------

This module provides constants which represent the numeric values of leaf nodes
of the parse tree (terminal tokens).  Refer to the file :file:`Grammar/Grammar`
in the Python distribution for the definitions of the names in the context of
the language grammar.  The specific numeric values which the names map to may
change between Python versions.

The module also provides a mapping from numeric codes to names and some
functions.  The functions mirror definitions in the Python C header files.


.. data:: tok_name

   Dictionary mapping the numeric values of the constants defined in this module
   back to name strings, allowing more human-readable representation of parse trees
   to be generated.


.. function:: ISTERMINAL(x)

   Return true for terminal token values.


.. function:: ISNONTERMINAL(x)

   Return true for non-terminal token values.


.. function:: ISEOF(x)

   Return true if *x* is the marker indicating the end of input.


The token constants are:

.. data:: ENDMARKER
          NAME
          NUMBER
          STRING
          NEWLINE
          INDENT
          DEDENT
          LPAR
          RPAR
          LSQB
          RSQB
          COLON
          COMMA
          SEMI
          PLUS
          MINUS
          STAR
          SLASH
          VBAR
          AMPER
          LESS
          GREATER
          EQUAL
          DOT
          PERCENT
          LBRACE
          RBRACE
          EQEQUAL
          NOTEQUAL
          LESSEQUAL
          GREATEREQUAL
          TILDE
          CIRCUMFLEX
          LEFTSHIFT
          RIGHTSHIFT
          DOUBLESTAR
          PLUSEQUAL
          MINEQUAL
          STAREQUAL
          SLASHEQUAL
          PERCENTEQUAL
          AMPEREQUAL
          VBAREQUAL
          CIRCUMFLEXEQUAL
          LEFTSHIFTEQUAL
          RIGHTSHIFTEQUAL
          DOUBLESTAREQUAL
          DOUBLESLASH
          DOUBLESLASHEQUAL
          AT
          ATEQUAL
          RARROW
          ELLIPSIS
          OP
          AWAIT
          ASYNC
          ERRORTOKEN
          N_TOKENS
          NT_OFFSET


The following token type values aren't used by the C tokenizer but are needed for
the :mod:`tokenize` module.

.. data:: COMMENT

   Token value used to indicate a comment.


.. data:: NL

   Token value used to indicate a non-terminating newline.  The
   :data:`NEWLINE` token indicates the end of a logical line of Python code;
   ``NL`` tokens are generated when a logical line of code is continued over
   multiple physical lines.


.. data:: ENCODING

   Token value that indicates the encoding used to decode the source bytes
   into text. The first token returned by :func:`tokenize.tokenize` will
   always be an ``ENCODING`` token.


.. versionchanged:: 3.5
   Added :data:`AWAIT` and :data:`ASYNC` tokens. Starting with
   Python 3.7, "async" and "await" will be tokenized as :data:`NAME`
   tokens, and :data:`AWAIT` and :data:`ASYNC` will be removed.

.. versionchanged:: 3.7
   Added :data:`COMMENT`, :data:`NL` and :data:`ENCODING` tokens.