diff options
-rw-r--r-- | Doc/ref.tex | 338 | ||||
-rw-r--r-- | Doc/ref/ref.tex | 338 |
2 files changed, 456 insertions, 220 deletions
diff --git a/Doc/ref.tex b/Doc/ref.tex index c47b084..d6d4f56 100644 --- a/Doc/ref.tex +++ b/Doc/ref.tex @@ -60,6 +60,69 @@ informal introduction to the language, see the {\em Python Tutorial}. This reference manual describes the Python programming language. It is not intended as a tutorial. +While I am trying to be as precise as possible, I chose to use English +rather than formal specifications for everything except syntax and +lexical analysis. This should make the document better understandable +to the average reader, but will leave room for ambiguities. +Consequently, if you were coming from Mars and tried to re-implement +Python from this document alone, you might in fact be implementing +quite a different language. On the other hand, if you are using +Python and wonder what the precise rules about a particular area of +the language are, you should be able to find it here. + +It is dangerous to add too many implementation details to a language +reference document -- the implementation may change, and other +implementations of the same language may work differently. On the +other hand, there is currently only one Python implementation, and +particular quirks of it are sometimes worth mentioning, especially +where it differs from the ``ideal'' specification. + +Every Python implementation comes with a number of built-in and +standard modules. These are not documented here, but in the separate +{\em Python Library Reference} document. A few built-in modules are +mentioned when they interact in a significant way with the language +definition. + +\section{Notation} + +The descriptions of lexical analysis and syntax use a modified BNF +grammar notation. This uses the following style of definition: + +\begin{verbatim} +name: lcletter (lcletter | "_")* +lcletter: "a"..."z" +\end{verbatim} + +The first line says that a \verb\name\ is a \verb\lcletter\ followed by +a sequence of zero or more \verb\lcletter\s and underscores. A +\verb\lcletter\ in turn is any of the single characters `a' through `z'. +(This rule is actually adhered to for the names defined in syntax and +grammar rules in this document.) + +Each rule begins with a name (which is the name defined by the rule) +followed by a colon. Each rule is wholly contained on one line. A +vertical bar (\verb\|\) is used to separate alternatives, it is the +least binding operator in this notation. A star (\verb\*\) means zero +or more repetitions of the preceding item; likewise, a plus (\verb\+\) +means one or more repetitions and a question mark (\verb\?\) zero or +one (in other words, the preceding item is optional). These three +operators bind as tight as possible; parentheses are used for +grouping. Literal strings are enclosed in double quotes. White space +is only meaningful to separate tokens. + +In lexical definitions (as the example above), two more conventions +are used: Two literal characters separated by three dots mean a choice +of any single character in the given (inclusive) range of ASCII +characters. A phrase between angular brackets (\verb\<...>\) gives an +informal description of the symbol defined; e.g., this could be used +to describe the notion of `control character' if needed. + +Although the notation used is almost the same, there is a big +difference between the meaning of lexical and syntactic definitions: +a lexical definition operates on the individual characters of the +input source, while a syntax definition operates on the stream of +tokens generated by the lexical analysis. + \chapter{Lexical analysis} A Python program is read by a {\em parser}. Input to the parser is a @@ -130,11 +193,6 @@ Spaces and tabs are not tokens, but serve to delimit tokens. Where ambiguity exists, a token comprises the longest possible string that forms a legal token, when read from left to right. -Tokens are described using an extended regular expression notation. -This is similar to the extended BNF notation used later, except that -the notation \verb\<...>\ is used to give an informal description of a -character, and that spaces and tabs are not to be ignored. - \section{Identifiers} Identifiers are described by the following regular expressions: @@ -142,9 +200,9 @@ Identifiers are described by the following regular expressions: \begin{verbatim} identifier: (letter|"_") (letter|digit|"_")* letter: lowercase | uppercase -lowercase: "a"|"b"|...|"z" -uppercase: "A"|"B"|...|"Z" -digit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" +lowercase: "a"..."z" +uppercase: "A"..."Z" +digit: "0"..."9" \end{verbatim} Identifiers are unlimited in length. Case is significant. @@ -156,13 +214,14 @@ keywords} of the language, and may not be used as ordinary identifiers. They must be spelled exactly as written here: \begin{verbatim} -and del for is raise -break elif from not return -class else if or try -continue except import pass while -def finally in print +and del for in print +break elif from is raise +class else global not return +continue except if or try +def finally import pass while \end{verbatim} +% # This Python program sorts and formats the above table % import string % l = [] % try: @@ -185,8 +244,8 @@ String literals are described by the following regular expressions: \begin{verbatim} stringliteral: "'" stringitem* "'" stringitem: stringchar | escapeseq -stringchar: <any character except newline or "\" or "'"> -escapeseq: "'" <any character except newline> +stringchar: <any ASCII character except newline or "\" or "'"> +escapeseq: "'" <any ASCII character except newline> \end{verbatim} String literals cannot span physical line boundaries. Escape @@ -208,7 +267,7 @@ are: \verb/\t/ & ASCII Horizontal Tab (TAB) \\ \verb/\v/ & ASCII Vertical Tab (VT) \\ \verb/\/{\em ooo} & ASCII character with octal value {\em ooo} \\ -\verb/\x/{em xx...} & ASCII character with hex value {\em xx} \\ +\verb/\x/{em xx...} & ASCII character with hex value {\em xx...} \\ \hline \end{tabular} \end{center} @@ -221,9 +280,10 @@ are used...). All unrecognized escape sequences are left in the string {\em unchanged}, i.e., the backslash is left in the string. (This rule is useful when debugging: if an escape sequence is mistyped, the -resulting output is more easily recognized as broken. It also helps -somewhat for string literals used as regular expressions or otherwise -passed to other modules that do their own escape handling.) +resulting output is more easily recognized as broken. It also helps a +great deal for string literals used as regular expressions or +otherwise passed to other modules that do their own escape handling -- +but you may end up quadrupling backslashes that must appear literally.) \subsection{Numeric literals} @@ -239,9 +299,9 @@ decimalinteger: nonzerodigit digit* | "0" octinteger: "0" octdigit+ hexinteger: "0" ("x"|"X") hexdigit+ -nonzerodigit: "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" -octdigit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7" -hexdigit: digit|"a"|"b"|"c"|"d"|"e"|"f"|"A"|"B"|"C"|"D"|"E"|"F" +nonzerodigit: "1"..."9" +octdigit: "0"..."7" +hexdigit: digit|"a"..."f"|"A"..."F" \end{verbatim} Floating point numbers are described by the following regular expressions: @@ -260,16 +320,20 @@ The following tokens are operators: \begin{verbatim} + - * / % << >> & | ^ ~ -< = == > <= <> != >= +< == > <= <> != >= \end{verbatim} +The comparison operators \verb\<>\ and \verb\!=\ are alternate +spellings of the same operator. + \section{Delimiters} -The following tokens are delimiters: +The following tokens serve as delimiters or otherwise have a special +meaning: \begin{verbatim} ( ) [ ] { } -; , : . ` +; , : . ` = \end{verbatim} The following printing ASCII characters are currently not used; @@ -281,35 +345,83 @@ their occurrence is an unconditional error: \chapter{Execution model} -(XXX This chapter should explain the general model -of the execution of Python code and -the evaluation of expressions. -It should introduce objects, values, code blocks, scopes, name spaces, -name binding, -types, sequences, numbers, mappings, -exceptions, and other technical terms needed to make the following -chapters concise and exact.) +(XXX This chapter should explain the general model of the execution of +Python code and the evaluation of expressions. It should introduce +objects, values, code blocks, scopes, name spaces, name binding, +types, sequences, numbers, mappings, exceptions, and other technical +terms needed to make the following chapters concise and exact.) + +\section{Objects, values and types} + +I won't try to define rigorously here what an object is, but I'll give +some properties of objects that are important to know about. + +Every object has an identity, a type and a value. An object's {\em +identity} never changes once it has been created; think of it as the +object's (permanent) address. An object's {\em type} determines the +operations that an object supports (e.g., can its length be taken?) +and also defines the ``meaning'' of the object's value; it also never +changes. The {\em value} of some objects can change; whether an +object's value can change is a property of its type. + +Objects are never explicitly destroyed; however, when they become +unreachable they may be garbage-collected. An implementation, +however, is allowed to delay garbage collection or omit it altogether +-- it is a matter of implementation quality how garbage collection is +implemented. (Implementation note: the current implementation uses a +reference-counting scheme which collects most objects as soon as they +become onreachable, but does not detect garbage containing circular +references.) + +(Some objects contain references to ``external'' resources such as +open files. It is understood that these resources are freed when the +object is garbage-collected, but since garbage collection is not +guaranteed such objects also provide an explicit way to release the +external resource (e.g., a \verb\close\ method) and programs are +recommended to use this.) + +Some objects contain references to other objects. These references +are part of the object's value; in most cases, when such a +``container'' object is compared to another (of the same type), the +comparison takes the {\em values} of the referenced objects into +account (not their identities). + +Except for their identity, types affect almost any aspect of objects. +Even object identities are affected in some sense: for immutable +types, operations that compute new values may actually return a +reference to an existing object with the same type and value, while +for mutable objects this is not allowed. E.g., after + +\begin{verbatim} +a = 1; b = 1; c = []; d = [] +\end{verbatim} + +\verb\a\ and \verb\b\ may or may not refer to the same object, but +\verb\c\ and \verb\d\ are guaranteed to refer to two different, unique, +newly created lists. + +\section{Execution frames, name spaces, and scopes} + +XXX \chapter{Expressions and conditions} -(From now on, extended BNF notation will be used to describe -syntax, not lexical analysis.) -(XXX Explain the notation.) +From now on, extended BNF notation will be used to describe syntax, +not lexical analysis. This chapter explains the meaning of the elements of expressions and conditions. Conditions are a superset of expressions, and a condition may be used where an expression is required by enclosing it in -parentheses. The only place where an unparenthesized condition -is not allowed is on the right-hand side of the assignment operator, -because this operator is the same token (\verb\=\) as used for -compasisons. - -The comma plays a somewhat special role in Python's syntax. -It is an operator with a lower precedence than all others, but -occasionally serves other purposes as well (e.g., it has special -semantics in print statements). When a comma is accepted by the -syntax, one of the syntactic categories \verb\expression_list\ -or \verb\condition_list\ is always used. +parentheses. The only place where an unparenthesized condition is not +allowed is on the right-hand side of the assignment operator, because +this operator is the same token (\verb\=\) as used for compasisons. + +The comma plays a somewhat special role in Python's syntax. It is an +operator with a lower precedence than all others, but occasionally +serves other purposes as well (e.g., it has special semantics in print +statements). When a comma is accepted by the syntax, one of the +syntactic categories \verb\expression_list\ or \verb\condition_list\ +is always used. When (one alternative of) a syntax rule has the form @@ -351,11 +463,11 @@ Syntax rules for atoms: atom: identifier | literal | parenth_form | string_conversion literal: stringliteral | integer | longinteger | floatnumber parenth_form: enclosure | list_display | dict_display -enclosure: '(' [condition_list] ')' -list_display: '[' [condition_list] ']' -dict_display: '{' [key_datum (',' key_datum)* [','] '}' -key_datum: condition ':' condition -string_conversion:'`' condition_list '`' +enclosure: "(" [condition_list] ")" +list_display: "[" [condition_list] "]" +dict_display: "{" [key_datum ("," key_datum)* [","] "}" +key_datum: condition ":" condition +string_conversion:"`" condition_list "`" \end{verbatim} \subsection{Identifiers (Names)} @@ -413,10 +525,9 @@ define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum pair. -Key objects must be strings, otherwise a {\tt TypeError} -exception is raised. -Clashes between keys are not detected; the last datum stored for a given -key value prevails. +Keys must be strings, otherwise a {\tt TypeError} exception is raised. +Clashes between keys are not detected; the last datum (textually +rightmost in the display) stored for a given key value prevails. \subsection{String conversions} @@ -445,10 +556,10 @@ Their syntax is: \begin{verbatim} primary: atom | attributeref | call | subscription | slicing -attributeref: primary '.' identifier -call: primary '(' [condition_list] ')' -subscription: primary '[' condition ']' -slicing: primary '[' [condition] ':' [condition] ']' +attributeref: primary "." identifier +call: primary "(" [condition_list] ")" +subscription: primary "[" condition "]" +slicing: primary "[" [condition] ":" [condition] "]" \end{verbatim} \subsection{Attribute references} @@ -465,7 +576,7 @@ Factors represent the unary numeric operators. Their syntax is: \begin{verbatim} -factor: primary | '-' factor | '+' factor | '~' factor +factor: primary | "-" factor | "+" factor | "~" factor \end{verbatim} The unary \verb\-\ operator yields the negative of its numeric argument. @@ -483,7 +594,7 @@ a {\tt TypeError} exception is raised. Terms represent the most tightly binding binary operators: \begin{verbatim} -term: factor | term '*' factor | term '/' factor | term '%' factor +term: factor | term "*" factor | term "/" factor | term "%" factor \end{verbatim} The \verb\*\ operator yields the product of its arguments. @@ -494,13 +605,13 @@ and then multiplied together. In the latter case, string repetition is performed; a negative repetition factor yields the empty string. -The \verb|'/'| operator yields the quotient of its arguments. +The \verb|"/"| operator yields the quotient of its arguments. The numeric arguments are first converted to a common type. (Short or long) integer division yields an integer of the same type, truncating towards zero. Division by zero raises a {\tt RuntimeError} exception. -The \verb|'%'| operator yields the remainder from the division +The \verb|"%"| operator yields the remainder from the division of the first argument by the second. The numeric arguments are first converted to a common type. The outcome of $x \% y$ is defined as $x - y*trunc(x/y)$. @@ -511,28 +622,28 @@ $3.14 \% 0.7$ equals $0.34$. \section{Arithmetic expressions} \begin{verbatim} -arith_expr: term | arith_expr '+' term | arith_expr '-' term +arith_expr: term | arith_expr "+" term | arith_expr "-" term \end{verbatim} -The \verb|'+'| operator yields the sum of its arguments. +The \verb|"+"| operator yields the sum of its arguments. The arguments must either both be numbers, or both strings. In the former case, the numbers are converted to a common type and then added together. In the latter case, the strings are concatenated directly, without inserting a space. -The \verb|'-'| operator yields the difference of its arguments. +The \verb|"-"| operator yields the difference of its arguments. The numeric arguments are first converted to a common type. \section{Shift expressions} \begin{verbatim} -shift_expr: arith_expr | shift_expr '<<' arith_expr | shift_expr '>>' arith_expr +shift_expr: arith_expr | shift_expr "<<" arith_expr | shift_expr ">>" arith_expr \end{verbatim} These operators accept short integers as arguments only. They shift their left argument to the left or right by the number of bits -given by the right argument. Shifts are ``logical'', e.g., bits shifted +given by the right argument. Shifts are ``logical"", e.g., bits shifted out on one end are lost, and bits shifted in are zero; negative numbers are shifted as if they were unsigned in C. Negative shift counts and shift counts greater than {\em or equal to} @@ -541,7 +652,7 @@ the word size yield undefined results. \section{Bitwise AND expressions} \begin{verbatim} -and_expr: shift_expr | and_expr '&' shift_expr +and_expr: shift_expr | and_expr "&" shift_expr \end{verbatim} This operator yields the bitwise AND of its arguments, @@ -550,7 +661,7 @@ which must be short integers. \section{Bitwise XOR expressions} \begin{verbatim} -xor_expr: and_expr | xor_expr '^' and_expr +xor_expr: and_expr | xor_expr "^" and_expr \end{verbatim} This operator yields the bitwise exclusive OR of its arguments, @@ -559,7 +670,7 @@ which must be short integers. \section{Bitwise OR expressions} \begin{verbatim} -or_expr: xor_expr | or_expr '|' xor_expr +or_expr: xor_expr | or_expr "|" xor_expr \end{verbatim} This operator yields the bitwise OR of its arguments, @@ -569,7 +680,7 @@ which must be short integers. \begin{verbatim} expression: or_expression -expr_list: expression (',' expression)* [','] +expr_list: expression ("," expression)* [","] \end{verbatim} An expression list containing at least one comma yields a new tuple. @@ -587,7 +698,7 @@ To create an empty tuple, use an empty pair of parentheses: \verb\()\. \begin{verbatim} comparison: expression (comp_operator expression)* -comp_operator: '<'|'>'|'='|'=='|'>='|'<='|'<>'|'!='|['not'] 'in'|is' ['not'] +comp_operator: "<"|">"|"=="|">="|"<="|"<>"|"!="|"is" ["not"]|["not"] "in" \end{verbatim} Comparisons yield integer value: 1 for true, 0 for false. @@ -605,12 +716,9 @@ $e_{n-1} op_n e_n$, except that each expression is evaluated at most once. Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison between $e_0$ and $e_2$, e.g., $x < y > z$ is perfectly legal. -For the benefit of C programmers, -the comparison operators \verb\=\ and \verb\==\ are equivalent, -and so are \verb\<>\ and \verb\!=\. -Use of the C variants is discouraged. +The forms \verb\<>\ and \verb\!=\ are equivalent. -The operators {\tt '<', '>', '=', '>=', '<='}, and {\tt '<>'} compare +The operators {\tt "<", ">", "==", ">=", "<="}, and {\tt "<>"} compare the values of two objects. The objects needn't have the same type. If both are numbers, they are compared to a common type. Otherwise, objects of different types {\em always} compare unequal, @@ -652,9 +760,9 @@ $x {\tt is not} y$ yields the inverse truth value. \begin{verbatim} condition: or_test -or_test: and_test | or_test 'or' and_test -and_test: not_test | and_test 'and' not_test -not_test: comparison | 'not' not_test +or_test: and_test | or_test "or" and_test +and_test: not_test | and_test "and" not_test +not_test: comparison | "not" not_test \end{verbatim} In the context of Boolean operators, and also when conditions are @@ -686,7 +794,7 @@ Several simple statements may occor on a single line separated by semicolons. The syntax for simple statements is: \begin{verbatim} -stmt_list: simple_stmt (';' simple_stmt)* [';'] +stmt_list: simple_stmt (";" simple_stmt)* [";"] simple_stmt: expression_stmt | assignment | pass_stmt @@ -697,6 +805,7 @@ simple_stmt: expression_stmt | break_stmt | continue_stmt | import_stmt + | global_stmt \end{verbatim} \section{Expression statements} @@ -718,9 +827,9 @@ do not cause any output.) \section{Assignments} \begin{verbatim} -assignment: target_list ('=' target_list)* '=' expression_list -target_list: target (',' target)* [','] -target: identifier | '(' target_list ')' | '[' target_list ']' +assignment: target_list ("=" target_list)* "=" expression_list +target_list: target ("," target)* [","] +target: identifier | "(" target_list ")" | "[" target_list "]" | attributeref | subscription | slicing \end{verbatim} @@ -835,7 +944,7 @@ messages.) \section{The {\tt pass} statement} \begin{verbatim} -pass_stmt: 'pass' +pass_stmt: "pass" \end{verbatim} {\tt pass} is a null operation -- when it is executed, @@ -844,7 +953,7 @@ nothing happens. \section{The {\tt del} statement} \begin{verbatim} -del_stmt: 'del' target_list +del_stmt: "del" target_list \end{verbatim} Deletion is recursively defined similar to assignment. @@ -866,7 +975,7 @@ right type (but even this is determined by the sliced object). \section{The {\tt print} statement} \begin{verbatim} -print_stmt: 'print' [ condition (',' condition)* [','] ] +print_stmt: "print" [ condition ("," condition)* [","] ] \end{verbatim} {\tt print} evaluates each condition in turn and writes the resulting @@ -897,7 +1006,7 @@ standard output instead, but this is not safe, and should be fixed.) \section{The {\tt return} statement} \begin{verbatim} -return_stmt: 'return' [condition_list] +return_stmt: "return" [condition_list] \end{verbatim} \verb\return\ may only occur syntactically nested in a function @@ -917,7 +1026,7 @@ before really leaving the function. \section{The {\tt raise} statement} \begin{verbatim} -raise_stmt: 'raise' condition [',' condition] +raise_stmt: "raise" condition ["," condition] \end{verbatim} \verb\raise\ evaluates its first condition, which must yield @@ -930,7 +1039,7 @@ with the second one (or \verb\None\) as its parameter. \section{The {\tt break} statement} \begin{verbatim} -break_stmt: 'break' +break_stmt: "break" \end{verbatim} \verb\break\ may only occur syntactically nested in a \verb\for\ @@ -949,7 +1058,7 @@ before really leaving the loop. \section{The {\tt continue} statement} \begin{verbatim} -continue_stmt: 'continue' +continue_stmt: "continue" \end{verbatim} \verb\continue\ may only occur syntactically nested in a \verb\for\ @@ -962,9 +1071,17 @@ It continues with the next cycle of the nearest enclosing loop. \section{The {\tt import} statement} \begin{verbatim} -import_stmt: 'import' identifier (',' identifier)* - | 'from' identifier 'import' identifier (',' identifier)* - | 'from' identifier 'import' '*' +import_stmt: "import" identifier ("," identifier)* + | "from" identifier "import" identifier ("," identifier)* + | "from" identifier "import" "*" +\end{verbatim} + +(XXX To be done.) + +\section{The {\tt global} statement} + +\begin{verbatim} +global_stmt: "global" identifier ("," identifier)* \end{verbatim} (XXX To be done.) @@ -982,48 +1099,49 @@ suite: statement | NEWLINE INDENT statement+ DEDENT \section{The {\tt if} statement} \begin{verbatim} -if_stmt: 'if' condition ':' suite - ('elif' condition ':' suite)* - ['else' ':' suite] +if_stmt: "if" condition ":" suite + ("elif" condition ":" suite)* + ["else" ":" suite] \end{verbatim} \section{The {\tt while} statement} \begin{verbatim} -while_stmt: 'while' condition ':' suite ['else' ':' suite] +while_stmt: "while" condition ":" suite ["else" ":" suite] \end{verbatim} \section{The {\tt for} statement} \begin{verbatim} -for_stmt: 'for' target_list 'in' condition_list ':' suite - ['else' ':' suite] +for_stmt: "for" target_list "in" condition_list ":" suite + ["else" ":" suite] \end{verbatim} \section{The {\tt try} statement} \begin{verbatim} -try_stmt: 'try' ':' suite - ('except' condition [',' condition] ':' suite)* - ['finally' ':' suite] +try_stmt: "try" ":" suite + ("except" condition ["," condition] ":" suite)* + ["finally" ":" suite] \end{verbatim} \section{Function definitions} \begin{verbatim} -funcdef: 'def' identifier '(' [parameter_list] ')' ':' suite -parameter_list: parameter (',' parameter)* -parameter: identifier | '(' parameter_list ')' +funcdef: "def" identifier "(" [parameter_list] ")" ":" suite +parameter_list: parameter ("," parameter)* +parameter: identifier | "(" parameter_list ")" \end{verbatim} \section{Class definitions} \begin{verbatim} -classdef: 'class' identifier '(' ')' [inheritance] ':' suite -inheritance: '=' identifier '(' ')' (',' identifier '(' ')')* +classdef: "class" identifier [inheritance] ":" suite +inheritance: "(" expression ("," expression)* ")" \end{verbatim} XXX Syntax for scripts, modules XXX Syntax for interactive input, eval, exec, input +XXX New definition of expressions (as conditions) \end{document} diff --git a/Doc/ref/ref.tex b/Doc/ref/ref.tex index c47b084..d6d4f56 100644 --- a/Doc/ref/ref.tex +++ b/Doc/ref/ref.tex @@ -60,6 +60,69 @@ informal introduction to the language, see the {\em Python Tutorial}. This reference manual describes the Python programming language. It is not intended as a tutorial. +While I am trying to be as precise as possible, I chose to use English +rather than formal specifications for everything except syntax and +lexical analysis. This should make the document better understandable +to the average reader, but will leave room for ambiguities. +Consequently, if you were coming from Mars and tried to re-implement +Python from this document alone, you might in fact be implementing +quite a different language. On the other hand, if you are using +Python and wonder what the precise rules about a particular area of +the language are, you should be able to find it here. + +It is dangerous to add too many implementation details to a language +reference document -- the implementation may change, and other +implementations of the same language may work differently. On the +other hand, there is currently only one Python implementation, and +particular quirks of it are sometimes worth mentioning, especially +where it differs from the ``ideal'' specification. + +Every Python implementation comes with a number of built-in and +standard modules. These are not documented here, but in the separate +{\em Python Library Reference} document. A few built-in modules are +mentioned when they interact in a significant way with the language +definition. + +\section{Notation} + +The descriptions of lexical analysis and syntax use a modified BNF +grammar notation. This uses the following style of definition: + +\begin{verbatim} +name: lcletter (lcletter | "_")* +lcletter: "a"..."z" +\end{verbatim} + +The first line says that a \verb\name\ is a \verb\lcletter\ followed by +a sequence of zero or more \verb\lcletter\s and underscores. A +\verb\lcletter\ in turn is any of the single characters `a' through `z'. +(This rule is actually adhered to for the names defined in syntax and +grammar rules in this document.) + +Each rule begins with a name (which is the name defined by the rule) +followed by a colon. Each rule is wholly contained on one line. A +vertical bar (\verb\|\) is used to separate alternatives, it is the +least binding operator in this notation. A star (\verb\*\) means zero +or more repetitions of the preceding item; likewise, a plus (\verb\+\) +means one or more repetitions and a question mark (\verb\?\) zero or +one (in other words, the preceding item is optional). These three +operators bind as tight as possible; parentheses are used for +grouping. Literal strings are enclosed in double quotes. White space +is only meaningful to separate tokens. + +In lexical definitions (as the example above), two more conventions +are used: Two literal characters separated by three dots mean a choice +of any single character in the given (inclusive) range of ASCII +characters. A phrase between angular brackets (\verb\<...>\) gives an +informal description of the symbol defined; e.g., this could be used +to describe the notion of `control character' if needed. + +Although the notation used is almost the same, there is a big +difference between the meaning of lexical and syntactic definitions: +a lexical definition operates on the individual characters of the +input source, while a syntax definition operates on the stream of +tokens generated by the lexical analysis. + \chapter{Lexical analysis} A Python program is read by a {\em parser}. Input to the parser is a @@ -130,11 +193,6 @@ Spaces and tabs are not tokens, but serve to delimit tokens. Where ambiguity exists, a token comprises the longest possible string that forms a legal token, when read from left to right. -Tokens are described using an extended regular expression notation. -This is similar to the extended BNF notation used later, except that -the notation \verb\<...>\ is used to give an informal description of a -character, and that spaces and tabs are not to be ignored. - \section{Identifiers} Identifiers are described by the following regular expressions: @@ -142,9 +200,9 @@ Identifiers are described by the following regular expressions: \begin{verbatim} identifier: (letter|"_") (letter|digit|"_")* letter: lowercase | uppercase -lowercase: "a"|"b"|...|"z" -uppercase: "A"|"B"|...|"Z" -digit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" +lowercase: "a"..."z" +uppercase: "A"..."Z" +digit: "0"..."9" \end{verbatim} Identifiers are unlimited in length. Case is significant. @@ -156,13 +214,14 @@ keywords} of the language, and may not be used as ordinary identifiers. They must be spelled exactly as written here: \begin{verbatim} -and del for is raise -break elif from not return -class else if or try -continue except import pass while -def finally in print +and del for in print +break elif from is raise +class else global not return +continue except if or try +def finally import pass while \end{verbatim} +% # This Python program sorts and formats the above table % import string % l = [] % try: @@ -185,8 +244,8 @@ String literals are described by the following regular expressions: \begin{verbatim} stringliteral: "'" stringitem* "'" stringitem: stringchar | escapeseq -stringchar: <any character except newline or "\" or "'"> -escapeseq: "'" <any character except newline> +stringchar: <any ASCII character except newline or "\" or "'"> +escapeseq: "'" <any ASCII character except newline> \end{verbatim} String literals cannot span physical line boundaries. Escape @@ -208,7 +267,7 @@ are: \verb/\t/ & ASCII Horizontal Tab (TAB) \\ \verb/\v/ & ASCII Vertical Tab (VT) \\ \verb/\/{\em ooo} & ASCII character with octal value {\em ooo} \\ -\verb/\x/{em xx...} & ASCII character with hex value {\em xx} \\ +\verb/\x/{em xx...} & ASCII character with hex value {\em xx...} \\ \hline \end{tabular} \end{center} @@ -221,9 +280,10 @@ are used...). All unrecognized escape sequences are left in the string {\em unchanged}, i.e., the backslash is left in the string. (This rule is useful when debugging: if an escape sequence is mistyped, the -resulting output is more easily recognized as broken. It also helps -somewhat for string literals used as regular expressions or otherwise -passed to other modules that do their own escape handling.) +resulting output is more easily recognized as broken. It also helps a +great deal for string literals used as regular expressions or +otherwise passed to other modules that do their own escape handling -- +but you may end up quadrupling backslashes that must appear literally.) \subsection{Numeric literals} @@ -239,9 +299,9 @@ decimalinteger: nonzerodigit digit* | "0" octinteger: "0" octdigit+ hexinteger: "0" ("x"|"X") hexdigit+ -nonzerodigit: "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" -octdigit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7" -hexdigit: digit|"a"|"b"|"c"|"d"|"e"|"f"|"A"|"B"|"C"|"D"|"E"|"F" +nonzerodigit: "1"..."9" +octdigit: "0"..."7" +hexdigit: digit|"a"..."f"|"A"..."F" \end{verbatim} Floating point numbers are described by the following regular expressions: @@ -260,16 +320,20 @@ The following tokens are operators: \begin{verbatim} + - * / % << >> & | ^ ~ -< = == > <= <> != >= +< == > <= <> != >= \end{verbatim} +The comparison operators \verb\<>\ and \verb\!=\ are alternate +spellings of the same operator. + \section{Delimiters} -The following tokens are delimiters: +The following tokens serve as delimiters or otherwise have a special +meaning: \begin{verbatim} ( ) [ ] { } -; , : . ` +; , : . ` = \end{verbatim} The following printing ASCII characters are currently not used; @@ -281,35 +345,83 @@ their occurrence is an unconditional error: \chapter{Execution model} -(XXX This chapter should explain the general model -of the execution of Python code and -the evaluation of expressions. -It should introduce objects, values, code blocks, scopes, name spaces, -name binding, -types, sequences, numbers, mappings, -exceptions, and other technical terms needed to make the following -chapters concise and exact.) +(XXX This chapter should explain the general model of the execution of +Python code and the evaluation of expressions. It should introduce +objects, values, code blocks, scopes, name spaces, name binding, +types, sequences, numbers, mappings, exceptions, and other technical +terms needed to make the following chapters concise and exact.) + +\section{Objects, values and types} + +I won't try to define rigorously here what an object is, but I'll give +some properties of objects that are important to know about. + +Every object has an identity, a type and a value. An object's {\em +identity} never changes once it has been created; think of it as the +object's (permanent) address. An object's {\em type} determines the +operations that an object supports (e.g., can its length be taken?) +and also defines the ``meaning'' of the object's value; it also never +changes. The {\em value} of some objects can change; whether an +object's value can change is a property of its type. + +Objects are never explicitly destroyed; however, when they become +unreachable they may be garbage-collected. An implementation, +however, is allowed to delay garbage collection or omit it altogether +-- it is a matter of implementation quality how garbage collection is +implemented. (Implementation note: the current implementation uses a +reference-counting scheme which collects most objects as soon as they +become onreachable, but does not detect garbage containing circular +references.) + +(Some objects contain references to ``external'' resources such as +open files. It is understood that these resources are freed when the +object is garbage-collected, but since garbage collection is not +guaranteed such objects also provide an explicit way to release the +external resource (e.g., a \verb\close\ method) and programs are +recommended to use this.) + +Some objects contain references to other objects. These references +are part of the object's value; in most cases, when such a +``container'' object is compared to another (of the same type), the +comparison takes the {\em values} of the referenced objects into +account (not their identities). + +Except for their identity, types affect almost any aspect of objects. +Even object identities are affected in some sense: for immutable +types, operations that compute new values may actually return a +reference to an existing object with the same type and value, while +for mutable objects this is not allowed. E.g., after + +\begin{verbatim} +a = 1; b = 1; c = []; d = [] +\end{verbatim} + +\verb\a\ and \verb\b\ may or may not refer to the same object, but +\verb\c\ and \verb\d\ are guaranteed to refer to two different, unique, +newly created lists. + +\section{Execution frames, name spaces, and scopes} + +XXX \chapter{Expressions and conditions} -(From now on, extended BNF notation will be used to describe -syntax, not lexical analysis.) -(XXX Explain the notation.) +From now on, extended BNF notation will be used to describe syntax, +not lexical analysis. This chapter explains the meaning of the elements of expressions and conditions. Conditions are a superset of expressions, and a condition may be used where an expression is required by enclosing it in -parentheses. The only place where an unparenthesized condition -is not allowed is on the right-hand side of the assignment operator, -because this operator is the same token (\verb\=\) as used for -compasisons. - -The comma plays a somewhat special role in Python's syntax. -It is an operator with a lower precedence than all others, but -occasionally serves other purposes as well (e.g., it has special -semantics in print statements). When a comma is accepted by the -syntax, one of the syntactic categories \verb\expression_list\ -or \verb\condition_list\ is always used. +parentheses. The only place where an unparenthesized condition is not +allowed is on the right-hand side of the assignment operator, because +this operator is the same token (\verb\=\) as used for compasisons. + +The comma plays a somewhat special role in Python's syntax. It is an +operator with a lower precedence than all others, but occasionally +serves other purposes as well (e.g., it has special semantics in print +statements). When a comma is accepted by the syntax, one of the +syntactic categories \verb\expression_list\ or \verb\condition_list\ +is always used. When (one alternative of) a syntax rule has the form @@ -351,11 +463,11 @@ Syntax rules for atoms: atom: identifier | literal | parenth_form | string_conversion literal: stringliteral | integer | longinteger | floatnumber parenth_form: enclosure | list_display | dict_display -enclosure: '(' [condition_list] ')' -list_display: '[' [condition_list] ']' -dict_display: '{' [key_datum (',' key_datum)* [','] '}' -key_datum: condition ':' condition -string_conversion:'`' condition_list '`' +enclosure: "(" [condition_list] ")" +list_display: "[" [condition_list] "]" +dict_display: "{" [key_datum ("," key_datum)* [","] "}" +key_datum: condition ":" condition +string_conversion:"`" condition_list "`" \end{verbatim} \subsection{Identifiers (Names)} @@ -413,10 +525,9 @@ define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum pair. -Key objects must be strings, otherwise a {\tt TypeError} -exception is raised. -Clashes between keys are not detected; the last datum stored for a given -key value prevails. +Keys must be strings, otherwise a {\tt TypeError} exception is raised. +Clashes between keys are not detected; the last datum (textually +rightmost in the display) stored for a given key value prevails. \subsection{String conversions} @@ -445,10 +556,10 @@ Their syntax is: \begin{verbatim} primary: atom | attributeref | call | subscription | slicing -attributeref: primary '.' identifier -call: primary '(' [condition_list] ')' -subscription: primary '[' condition ']' -slicing: primary '[' [condition] ':' [condition] ']' +attributeref: primary "." identifier +call: primary "(" [condition_list] ")" +subscription: primary "[" condition "]" +slicing: primary "[" [condition] ":" [condition] "]" \end{verbatim} \subsection{Attribute references} @@ -465,7 +576,7 @@ Factors represent the unary numeric operators. Their syntax is: \begin{verbatim} -factor: primary | '-' factor | '+' factor | '~' factor +factor: primary | "-" factor | "+" factor | "~" factor \end{verbatim} The unary \verb\-\ operator yields the negative of its numeric argument. @@ -483,7 +594,7 @@ a {\tt TypeError} exception is raised. Terms represent the most tightly binding binary operators: \begin{verbatim} -term: factor | term '*' factor | term '/' factor | term '%' factor +term: factor | term "*" factor | term "/" factor | term "%" factor \end{verbatim} The \verb\*\ operator yields the product of its arguments. @@ -494,13 +605,13 @@ and then multiplied together. In the latter case, string repetition is performed; a negative repetition factor yields the empty string. -The \verb|'/'| operator yields the quotient of its arguments. +The \verb|"/"| operator yields the quotient of its arguments. The numeric arguments are first converted to a common type. (Short or long) integer division yields an integer of the same type, truncating towards zero. Division by zero raises a {\tt RuntimeError} exception. -The \verb|'%'| operator yields the remainder from the division +The \verb|"%"| operator yields the remainder from the division of the first argument by the second. The numeric arguments are first converted to a common type. The outcome of $x \% y$ is defined as $x - y*trunc(x/y)$. @@ -511,28 +622,28 @@ $3.14 \% 0.7$ equals $0.34$. \section{Arithmetic expressions} \begin{verbatim} -arith_expr: term | arith_expr '+' term | arith_expr '-' term +arith_expr: term | arith_expr "+" term | arith_expr "-" term \end{verbatim} -The \verb|'+'| operator yields the sum of its arguments. +The \verb|"+"| operator yields the sum of its arguments. The arguments must either both be numbers, or both strings. In the former case, the numbers are converted to a common type and then added together. In the latter case, the strings are concatenated directly, without inserting a space. -The \verb|'-'| operator yields the difference of its arguments. +The \verb|"-"| operator yields the difference of its arguments. The numeric arguments are first converted to a common type. \section{Shift expressions} \begin{verbatim} -shift_expr: arith_expr | shift_expr '<<' arith_expr | shift_expr '>>' arith_expr +shift_expr: arith_expr | shift_expr "<<" arith_expr | shift_expr ">>" arith_expr \end{verbatim} These operators accept short integers as arguments only. They shift their left argument to the left or right by the number of bits -given by the right argument. Shifts are ``logical'', e.g., bits shifted +given by the right argument. Shifts are ``logical"", e.g., bits shifted out on one end are lost, and bits shifted in are zero; negative numbers are shifted as if they were unsigned in C. Negative shift counts and shift counts greater than {\em or equal to} @@ -541,7 +652,7 @@ the word size yield undefined results. \section{Bitwise AND expressions} \begin{verbatim} -and_expr: shift_expr | and_expr '&' shift_expr +and_expr: shift_expr | and_expr "&" shift_expr \end{verbatim} This operator yields the bitwise AND of its arguments, @@ -550,7 +661,7 @@ which must be short integers. \section{Bitwise XOR expressions} \begin{verbatim} -xor_expr: and_expr | xor_expr '^' and_expr +xor_expr: and_expr | xor_expr "^" and_expr \end{verbatim} This operator yields the bitwise exclusive OR of its arguments, @@ -559,7 +670,7 @@ which must be short integers. \section{Bitwise OR expressions} \begin{verbatim} -or_expr: xor_expr | or_expr '|' xor_expr +or_expr: xor_expr | or_expr "|" xor_expr \end{verbatim} This operator yields the bitwise OR of its arguments, @@ -569,7 +680,7 @@ which must be short integers. \begin{verbatim} expression: or_expression -expr_list: expression (',' expression)* [','] +expr_list: expression ("," expression)* [","] \end{verbatim} An expression list containing at least one comma yields a new tuple. @@ -587,7 +698,7 @@ To create an empty tuple, use an empty pair of parentheses: \verb\()\. \begin{verbatim} comparison: expression (comp_operator expression)* -comp_operator: '<'|'>'|'='|'=='|'>='|'<='|'<>'|'!='|['not'] 'in'|is' ['not'] +comp_operator: "<"|">"|"=="|">="|"<="|"<>"|"!="|"is" ["not"]|["not"] "in" \end{verbatim} Comparisons yield integer value: 1 for true, 0 for false. @@ -605,12 +716,9 @@ $e_{n-1} op_n e_n$, except that each expression is evaluated at most once. Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison between $e_0$ and $e_2$, e.g., $x < y > z$ is perfectly legal. -For the benefit of C programmers, -the comparison operators \verb\=\ and \verb\==\ are equivalent, -and so are \verb\<>\ and \verb\!=\. -Use of the C variants is discouraged. +The forms \verb\<>\ and \verb\!=\ are equivalent. -The operators {\tt '<', '>', '=', '>=', '<='}, and {\tt '<>'} compare +The operators {\tt "<", ">", "==", ">=", "<="}, and {\tt "<>"} compare the values of two objects. The objects needn't have the same type. If both are numbers, they are compared to a common type. Otherwise, objects of different types {\em always} compare unequal, @@ -652,9 +760,9 @@ $x {\tt is not} y$ yields the inverse truth value. \begin{verbatim} condition: or_test -or_test: and_test | or_test 'or' and_test -and_test: not_test | and_test 'and' not_test -not_test: comparison | 'not' not_test +or_test: and_test | or_test "or" and_test +and_test: not_test | and_test "and" not_test +not_test: comparison | "not" not_test \end{verbatim} In the context of Boolean operators, and also when conditions are @@ -686,7 +794,7 @@ Several simple statements may occor on a single line separated by semicolons. The syntax for simple statements is: \begin{verbatim} -stmt_list: simple_stmt (';' simple_stmt)* [';'] +stmt_list: simple_stmt (";" simple_stmt)* [";"] simple_stmt: expression_stmt | assignment | pass_stmt @@ -697,6 +805,7 @@ simple_stmt: expression_stmt | break_stmt | continue_stmt | import_stmt + | global_stmt \end{verbatim} \section{Expression statements} @@ -718,9 +827,9 @@ do not cause any output.) \section{Assignments} \begin{verbatim} -assignment: target_list ('=' target_list)* '=' expression_list -target_list: target (',' target)* [','] -target: identifier | '(' target_list ')' | '[' target_list ']' +assignment: target_list ("=" target_list)* "=" expression_list +target_list: target ("," target)* [","] +target: identifier | "(" target_list ")" | "[" target_list "]" | attributeref | subscription | slicing \end{verbatim} @@ -835,7 +944,7 @@ messages.) \section{The {\tt pass} statement} \begin{verbatim} -pass_stmt: 'pass' +pass_stmt: "pass" \end{verbatim} {\tt pass} is a null operation -- when it is executed, @@ -844,7 +953,7 @@ nothing happens. \section{The {\tt del} statement} \begin{verbatim} -del_stmt: 'del' target_list +del_stmt: "del" target_list \end{verbatim} Deletion is recursively defined similar to assignment. @@ -866,7 +975,7 @@ right type (but even this is determined by the sliced object). \section{The {\tt print} statement} \begin{verbatim} -print_stmt: 'print' [ condition (',' condition)* [','] ] +print_stmt: "print" [ condition ("," condition)* [","] ] \end{verbatim} {\tt print} evaluates each condition in turn and writes the resulting @@ -897,7 +1006,7 @@ standard output instead, but this is not safe, and should be fixed.) \section{The {\tt return} statement} \begin{verbatim} -return_stmt: 'return' [condition_list] +return_stmt: "return" [condition_list] \end{verbatim} \verb\return\ may only occur syntactically nested in a function @@ -917,7 +1026,7 @@ before really leaving the function. \section{The {\tt raise} statement} \begin{verbatim} -raise_stmt: 'raise' condition [',' condition] +raise_stmt: "raise" condition ["," condition] \end{verbatim} \verb\raise\ evaluates its first condition, which must yield @@ -930,7 +1039,7 @@ with the second one (or \verb\None\) as its parameter. \section{The {\tt break} statement} \begin{verbatim} -break_stmt: 'break' +break_stmt: "break" \end{verbatim} \verb\break\ may only occur syntactically nested in a \verb\for\ @@ -949,7 +1058,7 @@ before really leaving the loop. \section{The {\tt continue} statement} \begin{verbatim} -continue_stmt: 'continue' +continue_stmt: "continue" \end{verbatim} \verb\continue\ may only occur syntactically nested in a \verb\for\ @@ -962,9 +1071,17 @@ It continues with the next cycle of the nearest enclosing loop. \section{The {\tt import} statement} \begin{verbatim} -import_stmt: 'import' identifier (',' identifier)* - | 'from' identifier 'import' identifier (',' identifier)* - | 'from' identifier 'import' '*' +import_stmt: "import" identifier ("," identifier)* + | "from" identifier "import" identifier ("," identifier)* + | "from" identifier "import" "*" +\end{verbatim} + +(XXX To be done.) + +\section{The {\tt global} statement} + +\begin{verbatim} +global_stmt: "global" identifier ("," identifier)* \end{verbatim} (XXX To be done.) @@ -982,48 +1099,49 @@ suite: statement | NEWLINE INDENT statement+ DEDENT \section{The {\tt if} statement} \begin{verbatim} -if_stmt: 'if' condition ':' suite - ('elif' condition ':' suite)* - ['else' ':' suite] +if_stmt: "if" condition ":" suite + ("elif" condition ":" suite)* + ["else" ":" suite] \end{verbatim} \section{The {\tt while} statement} \begin{verbatim} -while_stmt: 'while' condition ':' suite ['else' ':' suite] +while_stmt: "while" condition ":" suite ["else" ":" suite] \end{verbatim} \section{The {\tt for} statement} \begin{verbatim} -for_stmt: 'for' target_list 'in' condition_list ':' suite - ['else' ':' suite] +for_stmt: "for" target_list "in" condition_list ":" suite + ["else" ":" suite] \end{verbatim} \section{The {\tt try} statement} \begin{verbatim} -try_stmt: 'try' ':' suite - ('except' condition [',' condition] ':' suite)* - ['finally' ':' suite] +try_stmt: "try" ":" suite + ("except" condition ["," condition] ":" suite)* + ["finally" ":" suite] \end{verbatim} \section{Function definitions} \begin{verbatim} -funcdef: 'def' identifier '(' [parameter_list] ')' ':' suite -parameter_list: parameter (',' parameter)* -parameter: identifier | '(' parameter_list ')' +funcdef: "def" identifier "(" [parameter_list] ")" ":" suite +parameter_list: parameter ("," parameter)* +parameter: identifier | "(" parameter_list ")" \end{verbatim} \section{Class definitions} \begin{verbatim} -classdef: 'class' identifier '(' ')' [inheritance] ':' suite -inheritance: '=' identifier '(' ')' (',' identifier '(' ')')* +classdef: "class" identifier [inheritance] ":" suite +inheritance: "(" expression ("," expression)* ")" \end{verbatim} XXX Syntax for scripts, modules XXX Syntax for interactive input, eval, exec, input +XXX New definition of expressions (as conditions) \end{document} |