First round of corrections (lexer only).

author: Guido van Rossum <guido@python.org> 1991-11-25 17:26:57 (GMT)
committer: Guido van Rossum <guido@python.org> 1991-11-25 17:26:57 (GMT)
commit: 4fc43bc377a0e9d0642af32d83459f5c71d8e733 (patch)
tree: 0ac5e2a3d2dbe9c1018967f29f921df6fe9edd0a /Doc
parent: 01ebbb80ab9a4cdbc8acaa646b2f7a1b234215fc (diff)
download: cpython-4fc43bc377a0e9d0642af32d83459f5c71d8e733.zip
cpython-4fc43bc377a0e9d0642af32d83459f5c71d8e733.tar.gz
cpython-4fc43bc377a0e9d0642af32d83459f5c71d8e733.tar.bz2
2 files changed, 278 insertions, 260 deletions
diff --git a/Doc/ref.tex b/Doc/ref.tex
index 6af7535..a2eb381 100644
--- a/Doc/ref.tex
+++ b/Doc/ref.tex
@@ -42,9 +42,8 @@ and MS-DOS.
 This reference manual describes the syntax and ``core semantics'' of
 the language.  It is terse, but exact and complete.  The semantics of
 non-essential built-in object types and of the built-in functions and
-modules are described in the {\em Library Reference} document.  For an
-informal introduction to the language, see the {\em Tutorial}
-document.
+modules are described in the {\em Python Library Reference}.  For an
+informal introduction to the language, see the {\em Python Tutorial}.
 
 \end{abstract}
 
@@ -63,132 +62,119 @@ It is not intended as a tutorial.
 
 \chapter{Lexical analysis}
 
-A Python program is read by a {\em parser}.
-Input to the parser is a stream of {\em tokens}, generated
-by the {\em lexical analyzer}.
+A Python program is read by a {\em parser}.  Input to the parser is a
+stream of {\em tokens}, generated by the {\em lexical analyzer}.  This
+chapter describes how the lexical analyzer breaks a file into tokens.
 
 \section{Line structure}
 
-A Python program is divided in a number of logical lines.
-Statements may not straddle logical line boundaries except where
-explicitly allowed by the syntax.
-To this purpose, the end of a logical line
-is represented by the token NEWLINE.
+A Python program is divided in a number of logical lines.  Statements
+do not straddle logical line boundaries except where explicitly
+indicated by the syntax (i.e., for compound statements).  To this
+purpose, the end of a logical line is represented by the token
+NEWLINE.
 
 \subsection{Comments}
 
-A comment starts with a hash character (\verb/#/) and ends at the end
-of the physical line.  Comments are ignored by the syntax.
-A hash character in a string literal does not start a comment.
+A comment starts with a hash character (\verb\#\) that is not part of
+a string literal, and ends at the end of the physical line.  Comments
+are ignored by the syntax.
 
 \subsection{Line joining}
 
-Physical lines may be joined into logical lines using backslash
-characters (\verb/\/), as follows.
-If a physical line ends in a backslash that is not part of a string
-literal or comment, it is joined with
-the following forming a single logical line, deleting the backslash
-and the following end-of-line character.  More than two physical
-lines may be joined together in this way.
+Two or more physical lines may be joined into logical lines using
+backslash characters (\verb/\/), as follows: When physical line ends
+in a backslash that is not part of a string literal or comment, it is
+joined with the following forming a single logical line, deleting the
+backslash and the following end-of-line character.
 
 \subsection{Blank lines}
 
-A physical line that is not the continuation of the previous line
-and contains only spaces, tabs and possibly a comment, is ignored
-(i.e., no NEWLINE token is generated),
-except that during interactive input of statements, an empty
-physical line terminates a multi-line statement.
+A logical line that contains only spaces, tabs, and possibly a
+comment, is ignored (i.e., no NEWLINE token is generated), except that
+during interactive input of statements, an entirely blank logical line
+terminates a multi-line statement.
 
 \subsection{Indentation}
 
-Spaces and tabs at the beginning of a line are used to compute
+Spaces and tabs at the beginning of a logical line are used to compute
 the indentation level of the line, which in turn is used to determine
 the grouping of statements.
 
-First, each tab is replaced by one to eight spaces such that the column number
-of the next character is a multiple of eight (counting from zero).
-The column number of the first non-space character then defines the
-line's indentation.
-Indentation cannot be split over multiple physical lines using
-backslashes.
+First, each tab is replaced by one to eight spaces such that the total
+number of spaces up to that point is a multiple of eight.  The total
+number of spaces preceding the first non-blank character then
+determines the line's indentation.  Indentation cannot be split over
+multiple physical lines using backslashes.
 
 The indentation levels of consecutive lines are used to generate
 INDENT and DEDENT tokens, using a stack, as follows.
 
 Before the first line of the file is read, a single zero is pushed on
-the stack; this will never be popped off again.  The numbers pushed
-on the stack will always be strictly increasing from bottom to top.
-At the beginning of each logical line, the line's indentation level
-is compared to the top of the stack.
-If it is equal, nothing happens.
-If it larger, it is pushed on the stack, and one INDENT token is generated.
-If it is smaller, it {\em must} be one of the numbers occurring on the
-stack; all numbers on the stack that are larger are popped off,
-and for each number popped off a DEDENT token is generated.
-At the end of the file, a DEDENT token is generated for each number
-remaining on the stack that is larger than zero.
+the stack; this will never be popped off again.  The numbers pushed on
+the stack will always be strictly increasing from bottom to top.  At
+the beginning of each logical line, the line's indentation level is
+compared to the top of the stack.  If it is equal, nothing happens.
+If it larger, it is pushed on the stack, and one INDENT token is
+generated.  If it is smaller, it {\em must} be one of the numbers
+occurring on the stack; all numbers on the stack that are larger are
+popped off, and for each number popped off a DEDENT token is
+generated.  At the end of the file, a DEDENT token is generated for
+each number remaining on the stack that is larger than zero.
 
 \section{Other tokens}
 
 Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
 exist: identifiers, keywords, literals, operators, and delimiters.
-Spaces and tabs are not tokens, but serve to delimit tokens.
-Where ambiguity exists, a token comprises the longest possible
-string that forms a legal token, when reading from left to right.
+Spaces and tabs are not tokens, but serve to delimit tokens.  Where
+ambiguity exists, a token comprises the longest possible string that
+forms a legal token, when read from left to right.
 
 Tokens are described using an extended regular expression notation.
 This is similar to the extended BNF notation used later, except that
-the notation <...> is used to give an informal description of a character,
-and that spaces and tabs are not to be ignored.
+the notation \verb\<...>\ is used to give an informal description of a
+character, and that spaces and tabs are not to be ignored.
 
 \section{Identifiers}
 
 Identifiers are described by the following regular expressions:
 
 \begin{verbatim}
-identifier:     (letter|'_') (letter|digit|'_')*
+identifier:     (letter|"_") (letter|digit|"_")*
 letter:         lowercase | uppercase
-lowercase:      'a'|'b'|...|'z'
-uppercase:      'A'|'B'|...|'Z'
-digit:          '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
+lowercase:      "a"|"b"|...|"z"
+uppercase:      "A"|"B"|...|"Z"
+digit:          "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
 \end{verbatim}
 
-Identifiers are unlimited in length.
-Upper and lower case letters are different.
+Identifiers are unlimited in length.  Case is significant.
 
 \section{Keywords}
 
-The following tokens are used as reserved words,
-or keywords of the language,
-and may not be used as ordinary identifiers.
-They must be spelled exactly as written here:
-
-{\tt
-	and
-	break
-	class
-	continue
-	def
-	del
-	elif
-	else
-	except
-	finally
-	for
-	from
-	if
-	import
-	in
-	is
-	not
-	or
-	pass
-	print
-	raise
-	return
-	try
-	while
-}
+The following identifiers are used as reserved words, or {\em
+keywords} of the language, and may not be used as ordinary
+identifiers.  They must be spelled exactly as written here:
+
+\begin{verbatim}
+and        del        for        is         raise
+break      elif       from       not        return
+class      else       if         or         try
+continue   except     import     pass       while
+def        finally    in         print
+\end{verbatim}
+
+%	import string
+%	l = []
+%	try:
+%		while 1:
+%			l = l + string.split(raw_input())
+%	except EOFError:
+%		pass
+%	l.sort()
+%	for i in range((len(l)+4)/5):
+%		for j in range(i, len(l), 5):
+%			print string.ljust(l[j], 10),
+%		print
 
 \section{Literals}
 
@@ -197,24 +183,47 @@ They must be spelled exactly as written here:
 String literals are described by the following regular expressions:
 
 \begin{verbatim}
-stringliteral:  '\'' stringitem* '\''
+stringliteral:  "'" stringitem* "'"
 stringitem:     stringchar | escapeseq
-stringchar:     <any character except newline or '\\' or '\''>
-escapeseq:      '\\' <any character except newline>
-\end{verbatim}
-
-String literals cannot span physical line boundaries.
-Escape sequences in strings are actually interpreted according to almost the
-same rules as used by Standard C
-(XXX which should be made explicit here),
-except that \verb/\E/ is equivalent to \verb/\033/,
-\verb/\"/ is not recognized,
-newline characters cannot be escaped, and
-{\em all unrecognized escape sequences are left in the string unchanged}.
-(The latter rule is useful when debugging: if an escape sequence is
-mistyped, the resulting output is more easily recognized as broken.
-It also helps somewhat for string literals used as regular expressions
-or otherwise passed to other modules that do their own escape handling.)
+stringchar:     <any character except newline or "\" or "'">
+escapeseq:      "'" <any character except newline>
+\end{verbatim}
+
+String literals cannot span physical line boundaries.  Escape
+sequences in strings are actually interpreted according to rules
+simular to those used by Standard C.  The recognized escape sequences
+are:
+
+\begin{center}
+\begin{tabular}{|l|l|}
+\hline
+\verb/\\/	& Backslash (\verb/\/) \\
+\verb/\'/	& Single quote (\verb/'/) \\
+\verb/\a/	& ASCII Bell (BEL) \\
+\verb/\b/	& ASCII Backspace (BS) \\
+\verb/\E/	& ASCII Escape (ESC) \\
+\verb/\f/	& ASCII Formfeed (FF) \\
+\verb/\n/	& ASCII Linefeed (LF) \\
+\verb/\r/	& ASCII Carriage Return (CR) \\
+\verb/\t/	& ASCII Horizontal Tab (TAB) \\
+\verb/\v/	& ASCII Vertical Tab (VT) \\
+\verb/\/{\em ooo}	& ASCII character with octal value {\em ooo} \\
+\verb/\x/{em xx...}	& ASCII character with hex value {\em xx} \\
+\hline
+\end{tabular}
+\end{center}
+
+For compatibility with in Standard C, up to three octal digits are
+accepted, but an unlimited number of hex digits is taken to be part of
+the hex escape (and then the lower 8 bits of the resulting hex number
+are used...).
+
+All unrecognized escape sequences are left in the string {\em
+unchanged}, i.e., the backslash is left in the string.  (This rule is
+useful when debugging: if an escape sequence is mistyped, the
+resulting output is more easily recognized as broken.  It also helps
+somewhat for string literals used as regular expressions or otherwise
+passed to other modules that do their own escape handling.)
 
 \subsection{Numeric literals}
 
@@ -224,24 +233,24 @@ and floating point numbers.
 Integers and long integers are described by the following regular expressions:
 
 \begin{verbatim}
-longinteger:    integer ('l'|'L')
+longinteger:    integer ("l"|"L")
 integer:        decimalinteger | octinteger | hexinteger
-decimalinteger: nonzerodigit digit* | '0'
-octinteger:     '0' octdigit+
-hexinteger:     '0' ('x'|'X') hexdigit+
+decimalinteger: nonzerodigit digit* | "0"
+octinteger:     "0" octdigit+
+hexinteger:     "0" ("x"|"X") hexdigit+
 
-nonzerodigit:   '1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
-octdigit:       '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'
-hexdigit:        digit|'a'|'b'|'c'|'d'|'e'|'f'|'A'|'B'|'C'|'D'|'E'|'F'
+nonzerodigit:   "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
+octdigit:       "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"
+hexdigit:        digit|"a"|"b"|"c"|"d"|"e"|"f"|"A"|"B"|"C"|"D"|"E"|"F"
 \end{verbatim}
 
 Floating point numbers are described by the following regular expressions:
 
 \begin{verbatim}
-floatnumber:    [intpart] fraction [exponent] | intpart ['.'] exponent
+floatnumber:    [intpart] fraction [exponent] | intpart ["."] exponent
 intpart:        digit+
-fraction:       '.' digit+
-exponent:       ('e'|'E') ['+'|'-'] digit+
+fraction:       "." digit+
+exponent:       ("e"|"E") ["+"|"-"] digit+
 \end{verbatim}
 
 \section{Operators}
@@ -292,15 +301,15 @@ conditions.  Conditions are a superset of expressions, and a condition
 may be used where an expression is required by enclosing it in
 parentheses.  The only place where an unparenthesized condition
 is not allowed is on the right-hand side of the assignment operator,
-because this operator is the same token (\verb/'='/) as used for
+because this operator is the same token (\verb\=\) as used for
 compasisons.
 
 The comma plays a somewhat special role in Python's syntax.
 It is an operator with a lower precedence than all others, but
 occasionally serves other purposes as well (e.g., it has special
 semantics in print statements).  When a comma is accepted by the
-syntax, one of the syntactic categories \verb/expression_list/
-or \verb/condition_list/ is always used.
+syntax, one of the syntactic categories \verb\expression_list\
+or \verb\condition_list\ is always used.
 
 When (one alternative of) a syntax rule has the form
 
@@ -308,8 +317,8 @@ When (one alternative of) a syntax rule has the form
 name:           othername
 \end{verbatim}
 
-and no semantics are given, the semantics of this form of \verb/name/
-are the same as for \verb/othername/.
+and no semantics are given, the semantics of this form of \verb\name\
+are the same as for \verb\othername\.
 
 \section{Arithmetic conversions}
 
@@ -414,11 +423,11 @@ key value prevails.
 A string conversion evaluates the contained condition list and converts the
 resulting object into a string according to rules specific to its type.
 
-If the object is a string, a number, \verb/None/, or a tuple, list or
+If the object is a string, a number, \verb\None\, or a tuple, list or
 dictionary containing only objects whose type is in this list,
 the resulting
 string is a valid Python expression which can be passed to the
-built-in function \verb/eval()/ to yield an expression with the
+built-in function \verb\eval()\ to yield an expression with the
 same value (or an approximation, if floating point numbers are
 involved).
 
@@ -459,11 +468,11 @@ Their syntax is:
 factor:         primary | '-' factor | '+' factor | '~' factor
 \end{verbatim}
 
-The unary \verb/'-'/ operator yields the negative of its numeric argument.
+The unary \verb\-\ operator yields the negative of its numeric argument.
 
-The unary \verb/'+'/ operator yields its numeric argument unchanged.
+The unary \verb\+\ operator yields its numeric argument unchanged.
 
-The unary \verb/'~'/ operator yields the bit-wise negation of its
+The unary \verb\~\ operator yields the bit-wise negation of its
 integral numerical argument.
 
 In all three cases, if the argument does not have the proper type,
@@ -477,7 +486,7 @@ Terms represent the most tightly binding binary operators:
 term:           factor | term '*' factor | term '/' factor | term '%' factor
 \end{verbatim}
 
-The \verb/'*'/ operator yields the product of its arguments.
+The \verb\*\ operator yields the product of its arguments.
 The arguments must either both be numbers, or one argument must be
 a (short) integer and the other must be a string.
 In the former case, the numbers are converted to a common type
@@ -572,7 +581,7 @@ it is optional in all other cases (a single expression without
 a trailing comma doesn't create a tuple, but rather yields the
 value of that expression).
 
-To create an empty tuple, use an empty pair of parentheses: \verb/()/.
+To create an empty tuple, use an empty pair of parentheses: \verb\()\.
 
 \section{Comparisons}
 
@@ -597,8 +606,8 @@ Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison
 between $e_0$ and $e_2$, e.g., $x < y > z$ is perfectly legal.
 
 For the benefit of C programmers,
-the comparison operators \verb/=/ and \verb/==/ are equivalent,
-and so are \verb/<>/ and \verb/!=/.
+the comparison operators \verb\=\ and \verb\==\ are equivalent,
+and so are \verb\<>\ and \verb\!=\.
 Use of the C variants is discouraged.
 
 The operators {\tt '<', '>', '=', '>=', '<='}, and {\tt '<>'} compare
@@ -610,7 +619,7 @@ the value \verb\None\ compares smaller than the values of any other type.
 
 (This unusual
 definition of comparison is done to simplify the definition of
-operations like sorting and the \verb/in/ and \verb/not in/ operators.)
+operations like sorting and the \verb\in\ and \verb\not in\ operators.)
 
 Comparison of objects of the same type depends on the type:
 
@@ -869,12 +878,12 @@ A space is written before each object is (converted and) written,
 unless the output system believes it is positioned at the beginning
 of a line.  This is the case: (1) when no characters have been written
 to standard output; or (2) when the last character written to
-standard output is \verb/'\n'/;
+standard output is \verb/\n/;
 or (3) when the last I/O operation
 on standard output was not a \verb\print\ statement.
 
 Finally,
-a \verb/'\n'/ character is written at the end,
+a \verb/\n/ character is written at the end,
 unless the \verb\print\ statement ends with a comma.
 This is the only action if the statement contains just the keyword
 \verb\print\.
diff --git a/Doc/ref/ref.tex b/Doc/ref/ref.tex
index 6af7535..a2eb381 100644
--- a/Doc/ref/ref.tex
+++ b/Doc/ref/ref.tex
@@ -42,9 +42,8 @@ and MS-DOS.
 This reference manual describes the syntax and ``core semantics'' of
 the language.  It is terse, but exact and complete.  The semantics of
 non-essential built-in object types and of the built-in functions and
-modules are described in the {\em Library Reference} document.  For an
-informal introduction to the language, see the {\em Tutorial}
-document.
+modules are described in the {\em Python Library Reference}.  For an
+informal introduction to the language, see the {\em Python Tutorial}.
 
 \end{abstract}
 
@@ -63,132 +62,119 @@ It is not intended as a tutorial.
 
 \chapter{Lexical analysis}
 
-A Python program is read by a {\em parser}.
-Input to the parser is a stream of {\em tokens}, generated
-by the {\em lexical analyzer}.
+A Python program is read by a {\em parser}.  Input to the parser is a
+stream of {\em tokens}, generated by the {\em lexical analyzer}.  This
+chapter describes how the lexical analyzer breaks a file into tokens.
 
 \section{Line structure}
 
-A Python program is divided in a number of logical lines.
-Statements may not straddle logical line boundaries except where
-explicitly allowed by the syntax.
-To this purpose, the end of a logical line
-is represented by the token NEWLINE.
+A Python program is divided in a number of logical lines.  Statements
+do not straddle logical line boundaries except where explicitly
+indicated by the syntax (i.e., for compound statements).  To this
+purpose, the end of a logical line is represented by the token
+NEWLINE.
 
 \subsection{Comments}
 
-A comment starts with a hash character (\verb/#/) and ends at the end
-of the physical line.  Comments are ignored by the syntax.
-A hash character in a string literal does not start a comment.
+A comment starts with a hash character (\verb\#\) that is not part of
+a string literal, and ends at the end of the physical line.  Comments
+are ignored by the syntax.
 
 \subsection{Line joining}
 
-Physical lines may be joined into logical lines using backslash
-characters (\verb/\/), as follows.
-If a physical line ends in a backslash that is not part of a string
-literal or comment, it is joined with
-the following forming a single logical line, deleting the backslash
-and the following end-of-line character.  More than two physical
-lines may be joined together in this way.
+Two or more physical lines may be joined into logical lines using
+backslash characters (\verb/\/), as follows: When physical line ends
+in a backslash that is not part of a string literal or comment, it is
+joined with the following forming a single logical line, deleting the
+backslash and the following end-of-line character.
 
 \subsection{Blank lines}
 
-A physical line that is not the continuation of the previous line
-and contains only spaces, tabs and possibly a comment, is ignored
-(i.e., no NEWLINE token is generated),
-except that during interactive input of statements, an empty
-physical line terminates a multi-line statement.
+A logical line that contains only spaces, tabs, and possibly a
+comment, is ignored (i.e., no NEWLINE token is generated), except that
+during interactive input of statements, an entirely blank logical line
+terminates a multi-line statement.
 
 \subsection{Indentation}
 
-Spaces and tabs at the beginning of a line are used to compute
+Spaces and tabs at the beginning of a logical line are used to compute
 the indentation level of the line, which in turn is used to determine
 the grouping of statements.
 
-First, each tab is replaced by one to eight spaces such that the column number
-of the next character is a multiple of eight (counting from zero).
-The column number of the first non-space character then defines the
-line's indentation.
-Indentation cannot be split over multiple physical lines using
-backslashes.
+First, each tab is replaced by one to eight spaces such that the total
+number of spaces up to that point is a multiple of eight.  The total
+number of spaces preceding the first non-blank character then
+determines the line's indentation.  Indentation cannot be split over
+multiple physical lines using backslashes.
 
 The indentation levels of consecutive lines are used to generate
 INDENT and DEDENT tokens, using a stack, as follows.
 
 Before the first line of the file is read, a single zero is pushed on
-the stack; this will never be popped off again.  The numbers pushed
-on the stack will always be strictly increasing from bottom to top.
-At the beginning of each logical line, the line's indentation level
-is compared to the top of the stack.
-If it is equal, nothing happens.
-If it larger, it is pushed on the stack, and one INDENT token is generated.
-If it is smaller, it {\em must} be one of the numbers occurring on the
-stack; all numbers on the stack that are larger are popped off,
-and for each number popped off a DEDENT token is generated.
-At the end of the file, a DEDENT token is generated for each number
-remaining on the stack that is larger than zero.
+the stack; this will never be popped off again.  The numbers pushed on
+the stack will always be strictly increasing from bottom to top.  At
+the beginning of each logical line, the line's indentation level is
+compared to the top of the stack.  If it is equal, nothing happens.
+If it larger, it is pushed on the stack, and one INDENT token is
+generated.  If it is smaller, it {\em must} be one of the numbers
+occurring on the stack; all numbers on the stack that are larger are
+popped off, and for each number popped off a DEDENT token is
+generated.  At the end of the file, a DEDENT token is generated for
+each number remaining on the stack that is larger than zero.
 
 \section{Other tokens}
 
 Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
 exist: identifiers, keywords, literals, operators, and delimiters.
-Spaces and tabs are not tokens, but serve to delimit tokens.
-Where ambiguity exists, a token comprises the longest possible
-string that forms a legal token, when reading from left to right.
+Spaces and tabs are not tokens, but serve to delimit tokens.  Where
+ambiguity exists, a token comprises the longest possible string that
+forms a legal token, when read from left to right.
 
 Tokens are described using an extended regular expression notation.
 This is similar to the extended BNF notation used later, except that
-the notation <...> is used to give an informal description of a character,
-and that spaces and tabs are not to be ignored.
+the notation \verb\<...>\ is used to give an informal description of a
+character, and that spaces and tabs are not to be ignored.
 
 \section{Identifiers}
 
 Identifiers are described by the following regular expressions:
 
 \begin{verbatim}
-identifier:     (letter|'_') (letter|digit|'_')*
+identifier:     (letter|"_") (letter|digit|"_")*
 letter:         lowercase | uppercase
-lowercase:      'a'|'b'|...|'z'
-uppercase:      'A'|'B'|...|'Z'
-digit:          '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
+lowercase:      "a"|"b"|...|"z"
+uppercase:      "A"|"B"|...|"Z"
+digit:          "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
 \end{verbatim}
 
-Identifiers are unlimited in length.
-Upper and lower case letters are different.
+Identifiers are unlimited in length.  Case is significant.
 
 \section{Keywords}
 
-The following tokens are used as reserved words,
-or keywords of the language,
-and may not be used as ordinary identifiers.
-They must be spelled exactly as written here:
-
-{\tt
-	and
-	break
-	class
-	continue
-	def
-	del
-	elif
-	else
-	except
-	finally
-	for
-	from
-	if
-	import
-	in
-	is
-	not
-	or
-	pass
-	print
-	raise
-	return
-	try
-	while
-}
+The following identifiers are used as reserved words, or {\em
+keywords} of the language, and may not be used as ordinary
+identifiers.  They must be spelled exactly as written here:
+
+\begin{verbatim}
+and        del        for        is         raise
+break      elif       from       not        return
+class      else       if         or         try
+continue   except     import     pass       while
+def        finally    in         print
+\end{verbatim}
+
+%	import string
+%	l = []
+%	try:
+%		while 1:
+%			l = l + string.split(raw_input())
+%	except EOFError:
+%		pass
+%	l.sort()
+%	for i in range((len(l)+4)/5):
+%		for j in range(i, len(l), 5):
+%			print string.ljust(l[j], 10),
+%		print
 
 \section{Literals}
 
@@ -197,24 +183,47 @@ They must be spelled exactly as written here:
 String literals are described by the following regular expressions:
 
 \begin{verbatim}
-stringliteral:  '\'' stringitem* '\''
+stringliteral:  "'" stringitem* "'"
 stringitem:     stringchar | escapeseq
-stringchar:     <any character except newline or '\\' or '\''>
-escapeseq:      '\\' <any character except newline>
-\end{verbatim}
-
-String literals cannot span physical line boundaries.
-Escape sequences in strings are actually interpreted according to almost the
-same rules as used by Standard C
-(XXX which should be made explicit here),
-except that \verb/\E/ is equivalent to \verb/\033/,
-\verb/\"/ is not recognized,
-newline characters cannot be escaped, and
-{\em all unrecognized escape sequences are left in the string unchanged}.
-(The latter rule is useful when debugging: if an escape sequence is
-mistyped, the resulting output is more easily recognized as broken.
-It also helps somewhat for string literals used as regular expressions
-or otherwise passed to other modules that do their own escape handling.)
+stringchar:     <any character except newline or "\" or "'">
+escapeseq:      "'" <any character except newline>
+\end{verbatim}
+
+String literals cannot span physical line boundaries.  Escape
+sequences in strings are actually interpreted according to rules
+simular to those used by Standard C.  The recognized escape sequences
+are:
+
+\begin{center}
+\begin{tabular}{|l|l|}
+\hline
+\verb/\\/	& Backslash (\verb/\/) \\
+\verb/\'/	& Single quote (\verb/'/) \\
+\verb/\a/	& ASCII Bell (BEL) \\
+\verb/\b/	& ASCII Backspace (BS) \\
+\verb/\E/	& ASCII Escape (ESC) \\
+\verb/\f/	& ASCII Formfeed (FF) \\
+\verb/\n/	& ASCII Linefeed (LF) \\
+\verb/\r/	& ASCII Carriage Return (CR) \\
+\verb/\t/	& ASCII Horizontal Tab (TAB) \\
+\verb/\v/	& ASCII Vertical Tab (VT) \\
+\verb/\/{\em ooo}	& ASCII character with octal value {\em ooo} \\
+\verb/\x/{em xx...}	& ASCII character with hex value {\em xx} \\
+\hline
+\end{tabular}
+\end{center}
+
+For compatibility with in Standard C, up to three octal digits are
+accepted, but an unlimited number of hex digits is taken to be part of
+the hex escape (and then the lower 8 bits of the resulting hex number
+are used...).
+
+All unrecognized escape sequences are left in the string {\em
+unchanged}, i.e., the backslash is left in the string.  (This rule is
+useful when debugging: if an escape sequence is mistyped, the
+resulting output is more easily recognized as broken.  It also helps
+somewhat for string literals used as regular expressions or otherwise
+passed to other modules that do their own escape handling.)
 
 \subsection{Numeric literals}
 
@@ -224,24 +233,24 @@ and floating point numbers.
 Integers and long integers are described by the following regular expressions:
 
 \begin{verbatim}
-longinteger:    integer ('l'|'L')
+longinteger:    integer ("l"|"L")
 integer:        decimalinteger | octinteger | hexinteger
-decimalinteger: nonzerodigit digit* | '0'
-octinteger:     '0' octdigit+
-hexinteger:     '0' ('x'|'X') hexdigit+
+decimalinteger: nonzerodigit digit* | "0"
+octinteger:     "0" octdigit+
+hexinteger:     "0" ("x"|"X") hexdigit+
 
-nonzerodigit:   '1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
-octdigit:       '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'
-hexdigit:        digit|'a'|'b'|'c'|'d'|'e'|'f'|'A'|'B'|'C'|'D'|'E'|'F'
+nonzerodigit:   "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
+octdigit:       "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"
+hexdigit:        digit|"a"|"b"|"c"|"d"|"e"|"f"|"A"|"B"|"C"|"D"|"E"|"F"
 \end{verbatim}
 
 Floating point numbers are described by the following regular expressions:
 
 \begin{verbatim}
-floatnumber:    [intpart] fraction [exponent] | intpart ['.'] exponent
+floatnumber:    [intpart] fraction [exponent] | intpart ["."] exponent
 intpart:        digit+
-fraction:       '.' digit+
-exponent:       ('e'|'E') ['+'|'-'] digit+
+fraction:       "." digit+
+exponent:       ("e"|"E") ["+"|"-"] digit+
 \end{verbatim}
 
 \section{Operators}
@@ -292,15 +301,15 @@ conditions.  Conditions are a superset of expressions, and a condition
 may be used where an expression is required by enclosing it in
 parentheses.  The only place where an unparenthesized condition
 is not allowed is on the right-hand side of the assignment operator,
-because this operator is the same token (\verb/'='/) as used for
+because this operator is the same token (\verb\=\) as used for
 compasisons.
 
 The comma plays a somewhat special role in Python's syntax.
 It is an operator with a lower precedence than all others, but
 occasionally serves other purposes as well (e.g., it has special
 semantics in print statements).  When a comma is accepted by the
-syntax, one of the syntactic categories \verb/expression_list/
-or \verb/condition_list/ is always used.
+syntax, one of the syntactic categories \verb\expression_list\
+or \verb\condition_list\ is always used.
 
 When (one alternative of) a syntax rule has the form
 
@@ -308,8 +317,8 @@ When (one alternative of) a syntax rule has the form
 name:           othername
 \end{verbatim}
 
-and no semantics are given, the semantics of this form of \verb/name/
-are the same as for \verb/othername/.
+and no semantics are given, the semantics of this form of \verb\name\
+are the same as for \verb\othername\.
 
 \section{Arithmetic conversions}
 
@@ -414,11 +423,11 @@ key value prevails.
 A string conversion evaluates the contained condition list and converts the
 resulting object into a string according to rules specific to its type.
 
-If the object is a string, a number, \verb/None/, or a tuple, list or
+If the object is a string, a number, \verb\None\, or a tuple, list or
 dictionary containing only objects whose type is in this list,
 the resulting
 string is a valid Python expression which can be passed to the
-built-in function \verb/eval()/ to yield an expression with the
+built-in function \verb\eval()\ to yield an expression with the
 same value (or an approximation, if floating point numbers are
 involved).
 
@@ -459,11 +468,11 @@ Their syntax is:
 factor:         primary | '-' factor | '+' factor | '~' factor
 \end{verbatim}
 
-The unary \verb/'-'/ operator yields the negative of its numeric argument.
+The unary \verb\-\ operator yields the negative of its numeric argument.
 
-The unary \verb/'+'/ operator yields its numeric argument unchanged.
+The unary \verb\+\ operator yields its numeric argument unchanged.
 
-The unary \verb/'~'/ operator yields the bit-wise negation of its
+The unary \verb\~\ operator yields the bit-wise negation of its
 integral numerical argument.
 
 In all three cases, if the argument does not have the proper type,
@@ -477,7 +486,7 @@ Terms represent the most tightly binding binary operators:
 term:           factor | term '*' factor | term '/' factor | term '%' factor
 \end{verbatim}
 
-The \verb/'*'/ operator yields the product of its arguments.
+The \verb\*\ operator yields the product of its arguments.
 The arguments must either both be numbers, or one argument must be
 a (short) integer and the other must be a string.
 In the former case, the numbers are converted to a common type
@@ -572,7 +581,7 @@ it is optional in all other cases (a single expression without
 a trailing comma doesn't create a tuple, but rather yields the
 value of that expression).
 
-To create an empty tuple, use an empty pair of parentheses: \verb/()/.
+To create an empty tuple, use an empty pair of parentheses: \verb\()\.
 
 \section{Comparisons}
 
@@ -597,8 +606,8 @@ Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison
 between $e_0$ and $e_2$, e.g., $x < y > z$ is perfectly legal.
 
 For the benefit of C programmers,
-the comparison operators \verb/=/ and \verb/==/ are equivalent,
-and so are \verb/<>/ and \verb/!=/.
+the comparison operators \verb\=\ and \verb\==\ are equivalent,
+and so are \verb\<>\ and \verb\!=\.
 Use of the C variants is discouraged.
 
 The operators {\tt '<', '>', '=', '>=', '<='}, and {\tt '<>'} compare
@@ -610,7 +619,7 @@ the value \verb\None\ compares smaller than the values of any other type.
 
 (This unusual
 definition of comparison is done to simplify the definition of
-operations like sorting and the \verb/in/ and \verb/not in/ operators.)
+operations like sorting and the \verb\in\ and \verb\not in\ operators.)
 
 Comparison of objects of the same type depends on the type:
 
@@ -869,12 +878,12 @@ A space is written before each object is (converted and) written,
 unless the output system believes it is positioned at the beginning
 of a line.  This is the case: (1) when no characters have been written
 to standard output; or (2) when the last character written to
-standard output is \verb/'\n'/;
+standard output is \verb/\n/;
 or (3) when the last I/O operation
 on standard output was not a \verb\print\ statement.
 
 Finally,
-a \verb/'\n'/ character is written at the end,
+a \verb/\n/ character is written at the end,
 unless the \verb\print\ statement ends with a comma.
 This is the only action if the statement contains just the keyword
 \verb\print\.
author	Guido van Rossum <guido@python.org>	1991-11-25 17:26:57 (GMT)
committer	Guido van Rossum <guido@python.org>	1991-11-25 17:26:57 (GMT)
commit	4fc43bc377a0e9d0642af32d83459f5c71d8e733 (patch)
tree	0ac5e2a3d2dbe9c1018967f29f921df6fe9edd0a /Doc
parent	01ebbb80ab9a4cdbc8acaa646b2f7a1b234215fc (diff)
download	cpython-4fc43bc377a0e9d0642af32d83459f5c71d8e733.zip cpython-4fc43bc377a0e9d0642af32d83459f5c71d8e733.tar.gz cpython-4fc43bc377a0e9d0642af32d83459f5c71d8e733.tar.bz2