summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorGeorg Brandl <georg@python.org>2007-08-31 08:07:45 (GMT)
committerGeorg Brandl <georg@python.org>2007-08-31 08:07:45 (GMT)
commit57e3b68c220ef2a6387419cef69ff1d1c7f283cf (patch)
treea0bb8df6896cc13872bfbb776563b54e00dcb063 /Doc
parent3dc33d18452de871cff98914dda81ff00b4d00f6 (diff)
downloadcpython-57e3b68c220ef2a6387419cef69ff1d1c7f283cf.zip
cpython-57e3b68c220ef2a6387419cef69ff1d1c7f283cf.tar.gz
cpython-57e3b68c220ef2a6387419cef69ff1d1c7f283cf.tar.bz2
Update the first two parts of the reference manual for Py3k,
mainly concerning PEPs 3131 and 3120.
Diffstat (limited to 'Doc')
-rw-r--r--Doc/documenting/index.rst1
-rw-r--r--Doc/reference/introduction.rst21
-rw-r--r--Doc/reference/lexical_analysis.rst387
3 files changed, 175 insertions, 234 deletions
diff --git a/Doc/documenting/index.rst b/Doc/documenting/index.rst
index 1a3778b..5adbd46 100644
--- a/Doc/documenting/index.rst
+++ b/Doc/documenting/index.rst
@@ -27,6 +27,7 @@ are more than welcome as well.
style.rst
rest.rst
markup.rst
+ fromlatex.rst
sphinx.rst
.. XXX add credits, thanks etc.
diff --git a/Doc/reference/introduction.rst b/Doc/reference/introduction.rst
index 0d53719..4da1606 100644
--- a/Doc/reference/introduction.rst
+++ b/Doc/reference/introduction.rst
@@ -22,11 +22,12 @@ language, maybe you could volunteer your time --- or invent a cloning machine
It is dangerous to add too many implementation details to a language reference
document --- the implementation may change, and other implementations of the
-same language may work differently. On the other hand, there is currently only
-one Python implementation in widespread use (although alternate implementations
-exist), and its particular quirks are sometimes worth being mentioned,
-especially where the implementation imposes additional limitations. Therefore,
-you'll find short "implementation notes" sprinkled throughout the text.
+same language may work differently. On the other hand, CPython is the one
+Python implementation in widespread use (although alternate implementations
+continue to gain support), and its particular quirks are sometimes worth being
+mentioned, especially where the implementation imposes additional limitations.
+Therefore, you'll find short "implementation notes" sprinkled throughout the
+text.
Every Python implementation comes with a number of built-in and standard
modules. These are documented in :ref:`library-index`. A few built-in modules
@@ -88,11 +89,7 @@ implementation you're using.
Notation
========
-.. index::
- single: BNF
- single: grammar
- single: syntax
- single: notation
+.. index:: BNF, grammar, syntax, notation
The descriptions of lexical analysis and syntax use a modified BNF grammar
notation. This uses the following style of definition:
@@ -118,9 +115,7 @@ meaningful to separate tokens. Rules are normally contained on a single line;
rules with many alternatives may be formatted alternatively with each line after
the first beginning with a vertical bar.
-.. index::
- single: lexical definitions
- single: ASCII@ASCII
+.. index:: lexical definitions, ASCII
In lexical definitions (as the example above), two more conventions are used:
Two literal characters separated by three dots mean a choice of any single
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 35e92cf..856137d 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -5,38 +5,16 @@
Lexical analysis
****************
-.. index::
- single: lexical analysis
- single: parser
- single: token
+.. index:: lexical analysis, parser, token
A Python program is read by a *parser*. Input to the parser is a stream of
*tokens*, generated by the *lexical analyzer*. This chapter describes how the
lexical analyzer breaks a file into tokens.
-Python uses the 7-bit ASCII character set for program text.
-
-.. versionadded:: 2.3
- An encoding declaration can be used to indicate that string literals and
- comments use an encoding different from ASCII.
-
-For compatibility with older versions, Python only warns if it finds 8-bit
-characters; those warnings should be corrected by either declaring an explicit
-encoding, or using escape sequences if those bytes are binary data, instead of
-characters.
-
-The run-time character set depends on the I/O devices connected to the program
-but is generally a superset of ASCII.
-
-**Future compatibility note:** It may be tempting to assume that the character
-set for 8-bit characters is ISO Latin-1 (an ASCII superset that covers most
-western languages that use the Latin alphabet), but it is possible that in the
-future Unicode text editors will become common. These generally use the UTF-8
-encoding, which is also an ASCII superset, but with very different use for the
-characters with ordinals 128-255. While there is no consensus on this subject
-yet, it is unwise to assume either Latin-1 or UTF-8, even though the current
-implementation appears to favor Latin-1. This applies both to the source
-character set and the run-time character set.
+Python reads program text as Unicode code points; the encoding of a source file
+can be given by an encoding declaration and defaults to UTF-8, see :pep:`3120`
+for details. If the source file cannot be decoded, a :exc:`SyntaxError` is
+raised.
.. _line-structure:
@@ -44,21 +22,17 @@ character set and the run-time character set.
Line structure
==============
-.. index:: single: line structure
+.. index:: line structure
A Python program is divided into a number of *logical lines*.
-.. _logical:
+.. _logical-lines:
Logical lines
-------------
-.. index::
- single: logical line
- single: physical line
- single: line joining
- single: NEWLINE token
+.. index:: logical line, physical line, line joining, NEWLINE token
The end of a logical line is represented by the token NEWLINE. Statements
cannot cross logical line boundaries except where NEWLINE is allowed by the
@@ -67,7 +41,7 @@ constructed from one or more *physical lines* by following the explicit or
implicit *line joining* rules.
-.. _physical:
+.. _physical-lines:
Physical lines
--------------
@@ -89,9 +63,7 @@ representing ASCII LF, is the line terminator).
Comments
--------
-.. index::
- single: comment
- single: hash character
+.. index:: comment, hash character
A comment starts with a hash character (``#``) that is not part of a string
literal, and ends at the end of the physical line. A comment signifies the end
@@ -104,9 +76,7 @@ are ignored by the syntax; they are not tokens.
Encoding declarations
---------------------
-.. index::
- single: source character set
- single: encodings
+.. index:: source character set, encodings
If a comment in the first or second line of the Python script matches the
regular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an
@@ -119,19 +89,19 @@ which is recognized also by GNU Emacs, and ::
# vim:fileencoding=<encoding-name>
-which is recognized by Bram Moolenaar's VIM. In addition, if the first bytes of
-the file are the UTF-8 byte-order mark (``'\xef\xbb\xbf'``), the declared file
-encoding is UTF-8 (this is supported, among others, by Microsoft's
-:program:`notepad`).
+which is recognized by Bram Moolenaar's VIM.
+
+If no encoding declaration is found, the default encoding is UTF-8. In
+addition, if the first bytes of the file are the UTF-8 byte-order mark
+(``b'\xef\xbb\xbf'``), the declared file encoding is UTF-8 (this is supported,
+among others, by Microsoft's :program:`notepad`).
If an encoding is declared, the encoding name must be recognized by Python. The
-encoding is used for all lexical analysis, in particular to find the end of a
-string, and to interpret the contents of Unicode literals. String literals are
-converted to Unicode for syntactical analysis, then converted back to their
-original encoding before interpretation starts. The encoding declaration must
-appear on a line of its own.
+encoding is used for all lexical analysis, including string literals, comments
+and identifiers. The encoding declaration must appear on a line of its own.
-.. % XXX there should be a list of supported encodings.
+A list of standard encodings can be found in the section
+:ref:`standard-encodings`.
.. _explicit-joining:
@@ -139,21 +109,13 @@ appear on a line of its own.
Explicit line joining
---------------------
-.. index::
- single: physical line
- single: line joining
- single: line continuation
- single: backslash character
+.. index:: physical line, line joining, line continuation, backslash character
Two or more physical lines may be joined into logical lines using backslash
characters (``\``), as follows: when a physical line ends in a backslash that is
not part of a string literal or comment, it is joined with the following forming
a single logical line, deleting the backslash and the following end-of-line
-character. For example:
-
-.. %
-
-::
+character. For example::
if 1900 < year < 2100 and 1 <= month <= 12 \
and 1 <= day <= 31 and 0 <= hour < 24 \
@@ -197,9 +159,9 @@ Blank lines
A logical line that contains only spaces, tabs, formfeeds and possibly a
comment, is ignored (i.e., no NEWLINE token is generated). During interactive
input of statements, handling of a blank line may differ depending on the
-implementation of the read-eval-print loop. In the standard implementation, an
-entirely blank logical line (i.e. one containing not even whitespace or a
-comment) terminates a multi-line statement.
+implementation of the read-eval-print loop. In the standard interactive
+interpreter, an entirely blank logical line (i.e. one containing not even
+whitespace or a comment) terminates a multi-line statement.
.. _indentation:
@@ -207,14 +169,7 @@ comment) terminates a multi-line statement.
Indentation
-----------
-.. index::
- single: indentation
- single: whitespace
- single: leading whitespace
- single: space
- single: tab
- single: grouping
- single: statement grouping
+.. index:: indentation, leading whitespace, space, tab, grouping, statement grouping
Leading whitespace (spaces and tabs) at the beginning of a logical line is used
to compute the indentation level of the line, which in turn is used to determine
@@ -238,9 +193,7 @@ for the indentation calculations above. Formfeed characters occurring elsewhere
in the leading whitespace have an undefined effect (for instance, they may reset
the space count to zero).
-.. index::
- single: INDENT token
- single: DEDENT token
+.. index:: INDENT token, DEDENT token
The indentation levels of consecutive lines are used to generate INDENT and
DEDENT tokens, using a stack, as follows.
@@ -315,22 +268,48 @@ possible string that forms a legal token, when read from left to right.
Identifiers and keywords
========================
-.. index::
- single: identifier
- single: name
+.. index:: identifier, name
Identifiers (also referred to as *names*) are described by the following lexical
definitions:
-.. productionlist::
- identifier: (`letter`|"_") (`letter` | `digit` | "_")*
- letter: `lowercase` | `uppercase`
- lowercase: "a"..."z"
- uppercase: "A"..."Z"
- digit: "0"..."9"
+The syntax of identifiers in Python is based on the Unicode standard annex
+UAX-31, with elaboration and changes as defined below.
+
+Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
+are the same as in Python 2.5; Python 3.0 introduces additional
+characters from outside the ASCII range (see :pep:`3131`). For other
+characters, the classification uses the version of the Unicode Character
+Database as included in the :mod:`unicodedata` module.
Identifiers are unlimited in length. Case is significant.
+.. productionlist::
+ identifier: `id_start` `id_continue`*
+ id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl,
+ the underscore, and characters with the Other_ID_Start property>
+ id_continue: <all characters in `id_start`, plus characters in the categories
+ Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
+
+The Unicode category codes mentioned above stand for:
+
+* *Lu* - uppercase letters
+* *Ll* - lowercase letters
+* *Lt* - titlecase letters
+* *Lm* - modifier letters
+* *Lo* - other letters
+* *Nl* - letter numbers
+* *Mn* - nonspacing marks
+* *Mc* - spacing combining marks
+* *Nd* - decimal numbers
+* *Pc* - connector punctuations
+
+All identifiers are converted into the normal form NFC while parsing; comparison
+of identifiers is based on NFC.
+
+A non-normative HTML file listing all valid identifier characters for Unicode
+4.1 can be found at
+http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.
.. _keywords:
@@ -345,25 +324,13 @@ The following identifiers are used as reserved words, or *keywords* of the
language, and cannot be used as ordinary identifiers. They must be spelled
exactly as written here::
- and def for is raise
- as del from lambda return
- assert elif global not try
- break else if or while
- class except import pass with
- continue finally in print yield
-
-.. versionchanged:: 2.4
- :const:`None` became a constant and is now recognized by the compiler as a name
- for the built-in object :const:`None`. Although it is not a keyword, you cannot
- assign a different object to it.
-
-.. versionchanged:: 2.5
- Both :keyword:`as` and :keyword:`with` are only recognized when the
- ``with_statement`` future feature has been enabled. It will always be enabled in
- Python 2.6. See section :ref:`with` for details. Note that using :keyword:`as`
- and :keyword:`with` as identifiers will always issue a warning, even when the
- ``with_statement`` future directive is not in effect.
-
+ False class finally is return
+ None continue for lambda try
+ True def from nonlocal while
+ and del global not with
+ as elif if or yield
+ assert else import pass
+ break except in raise
.. _id-classes:
@@ -405,71 +372,71 @@ characters:
Literals
========
-.. index::
- single: literal
- single: constant
+.. index:: literal, constant
Literals are notations for constant values of some built-in types.
.. _strings:
-String literals
----------------
+String and Bytes literals
+-------------------------
-.. index:: single: string literal
+.. index:: string literal, bytes literal, ASCII
String literals are described by the following lexical definitions:
-.. index:: single: ASCII@ASCII
-
.. productionlist::
stringliteral: [`stringprefix`](`shortstring` | `longstring`)
- stringprefix: "r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"
+ stringprefix: "r" | "R"
shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"'
- longstring: ""'" `longstringitem`* ""'"
- : | '"""' `longstringitem`* '"""'
- shortstringitem: `shortstringchar` | `escapeseq`
- longstringitem: `longstringchar` | `escapeseq`
+ longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""'
+ shortstringitem: `shortstringchar` | `stringescapeseq`
+ longstringitem: `longstringchar` | `stringescapeseq`
shortstringchar: <any source character except "\" or newline or the quote>
longstringchar: <any source character except "\">
- escapeseq: "\" <any ASCII character>
+ stringescapeseq: "\" <any source character>
+
+.. productionlist::
+ bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`)
+ bytesprefix: "b" | "B"
+ shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"'
+ longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""'
+ shortbytesitem: `shortbyteschar` | `bytesescapeseq`
+ longbytesitem: `longbyteschar` | `bytesescapeseq`
+ shortbyteschar: <any ASCII character except "\" or newline or the quote>
+ longbyteschar: <any ASCII character except "\">
+ bytesescapeseq: "\" <any ASCII character>
One syntactic restriction not indicated by these productions is that whitespace
-is not allowed between the :token:`stringprefix` and the rest of the string
-literal. The source character set is defined by the encoding declaration; it is
-ASCII if no encoding declaration is given in the source file; see section
-:ref:`encodings`.
+is not allowed between the :token:`stringprefix` or :token:`bytesprefix` and the
+rest of the literal. The source character set is defined by the encoding
+declaration; it is UTF-8 if no encoding declaration is given in the source file;
+see section :ref:`encodings`.
-.. index::
- single: triple-quoted string
- single: Unicode Consortium
- single: string; Unicode
- single: raw string
+.. index:: triple-quoted string, Unicode Consortium, raw string
-In plain English: String literals can be enclosed in matching single quotes
+In plain English: Both types of literals can be enclosed in matching single quotes
(``'``) or double quotes (``"``). They can also be enclosed in matching groups
of three single or double quotes (these are generally referred to as
*triple-quoted strings*). The backslash (``\``) character is used to escape
characters that otherwise have a special meaning, such as newline, backslash
-itself, or the quote character. String literals may optionally be prefixed with
-a letter ``'r'`` or ``'R'``; such strings are called :dfn:`raw strings` and use
-different rules for interpreting backslash escape sequences. A prefix of
-``'u'`` or ``'U'`` makes the string a Unicode string. Unicode strings use the
-Unicode character set as defined by the Unicode Consortium and ISO 10646. Some
-additional escape sequences, described below, are available in Unicode strings.
-The two prefix characters may be combined; in this case, ``'u'`` must appear
-before ``'r'``.
+itself, or the quote character.
+
+String literals may optionally be prefixed with a letter ``'r'`` or ``'R'``;
+such strings are called :dfn:`raw strings` and use different rules for
+interpreting backslash escape sequences.
+
+Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an
+instance of the :class:`bytes` type instead of the :class:`str` type. They
+may only contain ASCII characters; bytes with a numeric value of 128 or greater
+must be expressed with escapes.
In triple-quoted strings, unescaped newlines and quotes are allowed (and are
retained), except that three unescaped quotes in a row terminate the string. (A
"quote" is the character used to open the string, i.e. either ``'`` or ``"``.)
-.. index::
- single: physical line
- single: escape sequence
- single: Standard C
- single: C
+.. index:: physical line, escape sequence, Standard C, C
Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in strings are
interpreted according to rules similar to those used by Standard C. The
@@ -478,7 +445,7 @@ recognized escape sequences are:
+-----------------+---------------------------------+-------+
| Escape Sequence | Meaning | Notes |
+=================+=================================+=======+
-| ``\newline`` | Ignored | |
+| ``\newline`` | Backslash and newline ignored | |
+-----------------+---------------------------------+-------+
| ``\\`` | Backslash (``\``) | |
+-----------------+---------------------------------+-------+
@@ -494,83 +461,83 @@ recognized escape sequences are:
+-----------------+---------------------------------+-------+
| ``\n`` | ASCII Linefeed (LF) | |
+-----------------+---------------------------------+-------+
-| ``\N{name}`` | Character named *name* in the | |
-| | Unicode database (Unicode only) | |
-+-----------------+---------------------------------+-------+
| ``\r`` | ASCII Carriage Return (CR) | |
+-----------------+---------------------------------+-------+
| ``\t`` | ASCII Horizontal Tab (TAB) | |
+-----------------+---------------------------------+-------+
-| ``\uxxxx`` | Character with 16-bit hex value | \(1) |
-| | *xxxx* (Unicode only) | |
-+-----------------+---------------------------------+-------+
-| ``\Uxxxxxxxx`` | Character with 32-bit hex value | \(2) |
-| | *xxxxxxxx* (Unicode only) | |
-+-----------------+---------------------------------+-------+
| ``\v`` | ASCII Vertical Tab (VT) | |
+-----------------+---------------------------------+-------+
-| ``\ooo`` | Character with octal value | (3,5) |
+| ``\ooo`` | Character with octal value | (1,3) |
| | *ooo* | |
+-----------------+---------------------------------+-------+
-| ``\xhh`` | Character with hex value *hh* | (4,5) |
+| ``\xhh`` | Character with hex value *hh* | (2,3) |
+-----------------+---------------------------------+-------+
-.. index:: single: ASCII@ASCII
+Escape sequences only recognized in string literals are:
+
++-----------------+---------------------------------+-------+
+| Escape Sequence | Meaning | Notes |
++=================+=================================+=======+
+| ``\N{name}`` | Character named *name* in the | |
+| | Unicode database | |
++-----------------+---------------------------------+-------+
+| ``\uxxxx`` | Character with 16-bit hex value | \(4) |
+| | *xxxx* | |
++-----------------+---------------------------------+-------+
+| ``\Uxxxxxxxx`` | Character with 32-bit hex value | \(5) |
+| | *xxxxxxxx* | |
++-----------------+---------------------------------+-------+
Notes:
(1)
- Individual code units which form parts of a surrogate pair can be encoded using
- this escape sequence.
+ As in Standard C, up to three octal digits are accepted.
(2)
- Any Unicode character can be encoded this way, but characters outside the Basic
- Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
- compiled to use 16-bit code units (the default). Individual code units which
- form parts of a surrogate pair can be encoded using this escape sequence.
+ Unlike in Standard C, at most two hex digits are accepted.
(3)
- As in Standard C, up to three octal digits are accepted.
+ In a bytes literal, hexadecimal and octal escapes denote the byte with the
+ given value. In a string literal, these escapes denote a Unicode character
+ with the given value.
(4)
- Unlike in Standard C, at most two hex digits are accepted.
+ Individual code units which form parts of a surrogate pair can be encoded using
+ this escape sequence.
(5)
- In a string literal, hexadecimal and octal escapes denote the byte with the
- given value; it is not necessary that the byte encodes a character in the source
- character set. In a Unicode literal, these escapes denote a Unicode character
- with the given value.
+ Any Unicode character can be encoded this way, but characters outside the Basic
+ Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
+ compiled to use 16-bit code units (the default). Individual code units which
+ form parts of a surrogate pair can be encoded using this escape sequence.
+
-.. index:: single: unrecognized escape sequence
+.. index:: unrecognized escape sequence
Unlike Standard C, all unrecognized escape sequences are left in the string
unchanged, i.e., *the backslash is left in the string*. (This behavior is
useful when debugging: if an escape sequence is mistyped, the resulting output
is more easily recognized as broken.) It is also important to note that the
-escape sequences marked as "(Unicode only)" in the table above fall into the
-category of unrecognized escapes for non-Unicode string literals.
-
-When an ``'r'`` or ``'R'`` prefix is present, a character following a backslash
-is included in the string without change, and *all backslashes are left in the
-string*. For example, the string literal ``r"\n"`` consists of two characters:
-a backslash and a lowercase ``'n'``. String quotes can be escaped with a
-backslash, but the backslash remains in the string; for example, ``r"\""`` is a
-valid string literal consisting of two characters: a backslash and a double
-quote; ``r"\"`` is not a valid string literal (even a raw string cannot end in
-an odd number of backslashes). Specifically, *a raw string cannot end in a
-single backslash* (since the backslash would escape the following quote
-character). Note also that a single backslash followed by a newline is
-interpreted as those two characters as part of the string, *not* as a line
-continuation.
-
-When an ``'r'`` or ``'R'`` prefix is used in conjunction with a ``'u'`` or
-``'U'`` prefix, then the ``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are
-processed while *all other backslashes are left in the string*. For example,
-the string literal ``ur"\u0062\n"`` consists of three Unicode characters: 'LATIN
-SMALL LETTER B', 'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can
-be escaped with a preceding backslash; however, both remain in the string. As a
-result, ``\uXXXX`` escape sequences are only recognized when there are an odd
-number of backslashes.
+escape sequences only recognized in string literals fall into the category of
+unrecognized escapes for bytes literals.
+
+When an ``'r'`` or ``'R'`` prefix is used in a string literal, then the
+``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are processed while *all other
+backslashes are left in the string*. For example, the string literal
+``r"\u0062\n"`` consists of three Unicode characters: 'LATIN SMALL LETTER B',
+'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can be escaped with a
+preceding backslash; however, both remain in the string. As a result,
+``\uXXXX`` escape sequences are only recognized when there is an odd number of
+backslashes.
+
+Even in a raw string, string quotes can be escaped with a backslash, but the
+backslash remains in the string; for example, ``r"\""`` is a valid string
+literal consisting of two characters: a backslash and a double quote; ``r"\"``
+is not a valid string literal (even a raw string cannot end in an odd number of
+backslashes). Specifically, *a raw string cannot end in a single backslash*
+(since the backslash would escape the following quote character). Note also
+that a single backslash followed by a newline is interpreted as those two
+characters as part of the string, *not* as a line continuation.
.. _string-catenation:
@@ -600,19 +567,9 @@ styles for each component (even mixing raw strings and triple quoted strings).
Numeric literals
----------------
-.. index::
- single: number
- single: numeric literal
- single: integer literal
- single: plain integer literal
- single: long integer literal
- single: floating point literal
- single: hexadecimal literal
- single: octal literal
- single: binary literal
- single: decimal literal
- single: imaginary literal
- single: complex; literal
+.. index:: number, numeric literal, integer literal, plain integer literal
+ long integer literal, floating point literal, hexadecimal literal
+ octal literal, binary literal, decimal literal, imaginary literal, complex literal
There are four types of numeric literals: plain integers, long integers,
floating point numbers, and imaginary numbers. There are no complex literals
@@ -633,18 +590,17 @@ Integer literals are described by the following lexical definitions:
.. productionlist::
integer: `decimalinteger` | `octinteger` | `hexinteger`
decimalinteger: `nonzerodigit` `digit`* | "0"+
+ nonzerodigit: "1"..."9"
+ digit: "0"..."9"
octinteger: "0" ("o" | "O") `octdigit`+
hexinteger: "0" ("x" | "X") `hexdigit`+
bininteger: "0" ("b" | "B") `bindigit`+
- nonzerodigit: "1"..."9"
octdigit: "0"..."7"
hexdigit: `digit` | "a"..."f" | "A"..."F"
- bindigit: "0"..."1"
+ bindigit: "0" | "1"
-Plain integer literals that are above the largest representable plain integer
-(e.g., 2147483647 when using 32-bit arithmetic) are accepted as if they were
-long integers instead. [#]_ There is no limit for long integer literals apart
-from what can be stored in available memory.
+There is no limit for the length of integer literals apart from what can be
+stored in available memory.
Note that leading zeros in a non-zero decimal number are not allowed. This is
for disambiguation with C-style octal literals, which Python used before version
@@ -732,7 +688,7 @@ The following tokens serve as delimiters in the grammar::
&= |= ^= >>= <<= **=
The period can also occur in floating-point and imaginary literals. A sequence
-of three periods has a special meaning as an ellipsis in slices. The second half
+of three periods has a special meaning as an ellipsis literal. The second half
of the list, the augmented assignment operators, serve lexically as delimiters,
but also perform an operation.
@@ -741,18 +697,7 @@ tokens or are otherwise significant to the lexical analyzer::
' " # \
-.. index:: single: ASCII@ASCII
-
The following printing ASCII characters are not used in Python. Their
occurrence outside string literals and comments is an unconditional error::
$ ?
-
-.. rubric:: Footnotes
-
-.. [#] In versions of Python prior to 2.4, octal and hexadecimal literals in the range
- just above the largest representable plain integer but below the largest
- unsigned 32-bit number (on a machine using 32-bit arithmetic), 4294967296, were
- taken as the negative plain integer obtained by subtracting 4294967296 from
- their unsigned value.
-