summaryrefslogtreecommitdiffstats
path: root/Doc/ref/ref2.tex
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/ref/ref2.tex')
-rw-r--r--Doc/ref/ref2.tex372
1 files changed, 0 insertions, 372 deletions
diff --git a/Doc/ref/ref2.tex b/Doc/ref/ref2.tex
deleted file mode 100644
index b0939988..0000000
--- a/Doc/ref/ref2.tex
+++ /dev/null
@@ -1,372 +0,0 @@
-\chapter{Lexical analysis}
-
-A Python program is read by a {\em parser}. Input to the parser is a
-stream of {\em tokens}, generated by the {\em lexical analyzer}. This
-chapter describes how the lexical analyzer breaks a file into tokens.
-\index{lexical analysis}
-\index{parser}
-\index{token}
-
-\section{Line structure}
-
-A Python program is divided in a number of logical lines. The end of
-a logical line is represented by the token NEWLINE. Statements cannot
-cross logical line boundaries except where NEWLINE is allowed by the
-syntax (e.g. between statements in compound statements).
-\index{line structure}
-\index{logical line}
-\index{NEWLINE token}
-
-\subsection{Comments}
-
-A comment starts with a hash character (\verb@#@) that is not part of
-a string literal, and ends at the end of the physical line. A comment
-always signifies the end of the logical line. Comments are ignored by
-the syntax.
-\index{comment}
-\index{logical line}
-\index{physical line}
-\index{hash character}
-
-\subsection{Explicit line joining}
-
-Two or more physical lines may be joined into logical lines using
-backslash characters (\verb/\/), as follows: when a physical line ends
-in a backslash that is not part of a string literal or comment, it is
-joined with the following forming a single logical line, deleting the
-backslash and the following end-of-line character. For example:
-\index{physical line}
-\index{line joining}
-\index{line continuation}
-\index{backslash character}
-%
-\begin{verbatim}
-if 1900 < year < 2100 and 1 <= month <= 12 \
- and 1 <= day <= 31 and 0 <= hour < 24 \
- and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date
- return 1
-\end{verbatim}
-
-A line ending in a backslash cannot carry a comment; a backslash does
-not continue a comment (but it does continue a string literal, see
-below).
-
-\subsection{Implicit line joining}
-
-Expressions in parentheses, square brackets or curly braces can be
-split over more than one physical line without using backslashes.
-For example:
-
-\begin{verbatim}
-month_names = ['Januari', 'Februari', 'Maart', # These are the
- 'April', 'Mei', 'Juni', # Dutch names
- 'Juli', 'Augustus', 'September', # for the months
- 'Oktober', 'November', 'December'] # of the year
-\end{verbatim}
-
-Implicitly continued lines can carry comments. The indentation of the
-continuation lines is not important. Blank continuation lines are
-allowed.
-
-\subsection{Blank lines}
-
-A logical line that contains only spaces, tabs, and possibly a
-comment, is ignored (i.e., no NEWLINE token is generated), except that
-during interactive input of statements, an entirely blank logical line
-terminates a multi-line statement.
-\index{blank line}
-
-\subsection{Indentation}
-
-Leading whitespace (spaces and tabs) at the beginning of a logical
-line is used to compute the indentation level of the line, which in
-turn is used to determine the grouping of statements.
-\index{indentation}
-\index{whitespace}
-\index{leading whitespace}
-\index{space}
-\index{tab}
-\index{grouping}
-\index{statement grouping}
-
-First, tabs are replaced (from left to right) by one to eight spaces
-such that the total number of characters up to there is a multiple of
-eight (this is intended to be the same rule as used by {\UNIX}). The
-total number of spaces preceding the first non-blank character then
-determines the line's indentation. Indentation cannot be split over
-multiple physical lines using backslashes.
-
-The indentation levels of consecutive lines are used to generate
-INDENT and DEDENT tokens, using a stack, as follows.
-\index{INDENT token}
-\index{DEDENT token}
-
-Before the first line of the file is read, a single zero is pushed on
-the stack; this will never be popped off again. The numbers pushed on
-the stack will always be strictly increasing from bottom to top. At
-the beginning of each logical line, the line's indentation level is
-compared to the top of the stack. If it is equal, nothing happens.
-If it is larger, it is pushed on the stack, and one INDENT token is
-generated. If it is smaller, it {\em must} be one of the numbers
-occurring on the stack; all numbers on the stack that are larger are
-popped off, and for each number popped off a DEDENT token is
-generated. At the end of the file, a DEDENT token is generated for
-each number remaining on the stack that is larger than zero.
-
-Here is an example of a correctly (though confusingly) indented piece
-of Python code:
-
-\begin{verbatim}
-def perm(l):
- # Compute the list of all permutations of l
-
- if len(l) <= 1:
- return [l]
- r = []
- for i in range(len(l)):
- s = l[:i] + l[i+1:]
- p = perm(s)
- for x in p:
- r.append(l[i:i+1] + x)
- return r
-\end{verbatim}
-
-The following example shows various indentation errors:
-
-\begin{verbatim}
- def perm(l): # error: first line indented
- for i in range(len(l)): # error: not indented
- s = l[:i] + l[i+1:]
- p = perm(l[:i] + l[i+1:]) # error: unexpected indent
- for x in p:
- r.append(l[i:i+1] + x)
- return r # error: inconsistent dedent
-\end{verbatim}
-
-(Actually, the first three errors are detected by the parser; only the
-last error is found by the lexical analyzer --- the indentation of
-\verb@return r@ does not match a level popped off the stack.)
-
-\section{Other tokens}
-
-Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
-exist: identifiers, keywords, literals, operators, and delimiters.
-Spaces and tabs are not tokens, but serve to delimit tokens. Where
-ambiguity exists, a token comprises the longest possible string that
-forms a legal token, when read from left to right.
-
-\section{Identifiers}
-
-Identifiers (also referred to as names) are described by the following
-lexical definitions:
-\index{identifier}
-\index{name}
-
-\begin{verbatim}
-identifier: (letter|"_") (letter|digit|"_")*
-letter: lowercase | uppercase
-lowercase: "a"..."z"
-uppercase: "A"..."Z"
-digit: "0"..."9"
-\end{verbatim}
-
-Identifiers are unlimited in length. Case is significant.
-
-\subsection{Keywords}
-
-The following identifiers are used as reserved words, or {\em
-keywords} of the language, and cannot be used as ordinary
-identifiers. They must be spelled exactly as written here:
-\index{keyword}
-\index{reserved word}
-
-\begin{verbatim}
-and elif global not try
-break else if or while
-class except import pass
-continue finally in print
-def for is raise
-del from lambda return
-\end{verbatim}
-
-% When adding keywords, pipe it through keywords.py for reformatting
-
-\section{Literals} \label{literals}
-
-Literals are notations for constant values of some built-in types.
-\index{literal}
-\index{constant}
-
-\subsection{String literals}
-
-String literals are described by the following lexical definitions:
-\index{string literal}
-
-\begin{verbatim}
-stringliteral: shortstring | longstring
-shortstring: "'" shortstringitem* "'" | '"' shortstringitem* '"'
-longstring: "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
-shortstringitem: shortstringchar | escapeseq
-longstringitem: longstringchar | escapeseq
-shortstringchar: <any ASCII character except "\" or newline or the quote>
-longstringchar: <any ASCII character except "\">
-escapeseq: "\" <any ASCII character>
-\end{verbatim}
-\index{ASCII}
-
-In ``long strings'' (strings surrounded by sets of three quotes),
-unescaped newlines and quotes are allowed (and are retained), except
-that three unescaped quotes in a row terminate the string. (A
-``quote'' is the character used to open the string, i.e. either
-\verb/'/ or \verb/"/.)
-
-Escape sequences in strings are interpreted according to rules similar
-to those used by Standard C. The recognized escape sequences are:
-\index{physical line}
-\index{escape sequence}
-\index{Standard C}
-\index{C}
-
-\begin{center}
-\begin{tabular}{|l|l|}
-\hline
-\verb/\/{\em newline} & Ignored \\
-\verb/\\/ & Backslash (\verb/\/) \\
-\verb/\'/ & Single quote (\verb/'/) \\
-\verb/\"/ & Double quote (\verb/"/) \\
-\verb/\a/ & \ASCII{} Bell (BEL) \\
-\verb/\b/ & \ASCII{} Backspace (BS) \\
-%\verb/\E/ & \ASCII{} Escape (ESC) \\
-\verb/\f/ & \ASCII{} Formfeed (FF) \\
-\verb/\n/ & \ASCII{} Linefeed (LF) \\
-\verb/\r/ & \ASCII{} Carriage Return (CR) \\
-\verb/\t/ & \ASCII{} Horizontal Tab (TAB) \\
-\verb/\v/ & \ASCII{} Vertical Tab (VT) \\
-\verb/\/{\em ooo} & \ASCII{} character with octal value {\em ooo} \\
-\verb/\x/{\em xx...} & \ASCII{} character with hex value {\em xx...} \\
-\hline
-\end{tabular}
-\end{center}
-\index{ASCII}
-
-In strict compatibility with Standard C, up to three octal digits are
-accepted, but an unlimited number of hex digits is taken to be part of
-the hex escape (and then the lower 8 bits of the resulting hex number
-are used in all current implementations...).
-
-All unrecognized escape sequences are left in the string unchanged,
-i.e., {\em the backslash is left in the string.} (This behavior is
-useful when debugging: if an escape sequence is mistyped, the
-resulting output is more easily recognized as broken. It also helps a
-great deal for string literals used as regular expressions or
-otherwise passed to other modules that do their own escape handling.)
-\index{unrecognized escape sequence}
-
-\subsection{Numeric literals}
-
-There are three types of numeric literals: plain integers, long
-integers, and floating point numbers.
-\index{number}
-\index{numeric literal}
-\index{integer literal}
-\index{plain integer literal}
-\index{long integer literal}
-\index{floating point literal}
-\index{hexadecimal literal}
-\index{octal literal}
-\index{decimal literal}
-
-Integer and long integer literals are described by the following
-lexical definitions:
-
-\begin{verbatim}
-longinteger: integer ("l"|"L")
-integer: decimalinteger | octinteger | hexinteger
-decimalinteger: nonzerodigit digit* | "0"
-octinteger: "0" octdigit+
-hexinteger: "0" ("x"|"X") hexdigit+
-
-nonzerodigit: "1"..."9"
-octdigit: "0"..."7"
-hexdigit: digit|"a"..."f"|"A"..."F"
-\end{verbatim}
-
-Although both lower case `l' and upper case `L' are allowed as suffix
-for long integers, it is strongly recommended to always use `L', since
-the letter `l' looks too much like the digit `1'.
-
-Plain integer decimal literals must be at most 2147483647 (i.e., the
-largest positive integer, using 32-bit arithmetic). Plain octal and
-hexadecimal literals may be as large as 4294967295, but values larger
-than 2147483647 are converted to a negative value by subtracting
-4294967296. There is no limit for long integer literals apart from
-what can be stored in available memory.
-
-Some examples of plain and long integer literals:
-
-\begin{verbatim}
-7 2147483647 0177 0x80000000
-3L 79228162514264337593543950336L 0377L 0x100000000L
-\end{verbatim}
-
-Floating point literals are described by the following lexical
-definitions:
-
-\begin{verbatim}
-floatnumber: pointfloat | exponentfloat
-pointfloat: [intpart] fraction | intpart "."
-exponentfloat: (intpart | pointfloat) exponent
-intpart: digit+
-fraction: "." digit+
-exponent: ("e"|"E") ["+"|"-"] digit+
-\end{verbatim}
-
-The allowed range of floating point literals is
-implementation-dependent.
-
-Some examples of floating point literals:
-
-\begin{verbatim}
-3.14 10. .001 1e100 3.14e-10
-\end{verbatim}
-
-Note that numeric literals do not include a sign; a phrase like
-\verb@-1@ is actually an expression composed of the operator
-\verb@-@ and the literal \verb@1@.
-
-\section{Operators}
-
-The following tokens are operators:
-\index{operators}
-
-\begin{verbatim}
-+ - * / %
-<< >> & | ^ ~
-< == > <= <> != >=
-\end{verbatim}
-
-The comparison operators \verb@<>@ and \verb@!=@ are alternate
-spellings of the same operator.
-
-\section{Delimiters}
-
-The following tokens serve as delimiters or otherwise have a special
-meaning:
-\index{delimiters}
-
-\begin{verbatim}
-( ) [ ] { }
-, : . " ` '
-= ;
-\end{verbatim}
-
-The following printing \ASCII{} characters are not used in Python. Their
-occurrence outside string literals and comments is an unconditional
-error:
-\index{ASCII}
-
-\begin{verbatim}
-@ $ ?
-\end{verbatim}
-
-They may be used by future versions of the language though!