From 46f3e00407d614e0a1003379197c75e1b835e629 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Fri, 14 Aug 1992 09:11:01 +0000 Subject: Initial revision --- Doc/ref/ref1.tex | 81 +++++++ Doc/ref/ref2.tex | 349 +++++++++++++++++++++++++++ Doc/ref/ref3.tex | 705 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ Doc/ref/ref4.tex | 147 ++++++++++++ Doc/ref/ref5.tex | 672 ++++++++++++++++++++++++++++++++++++++++++++++++++++ Doc/ref1.tex | 81 +++++++ Doc/ref2.tex | 349 +++++++++++++++++++++++++++ Doc/ref3.tex | 705 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ Doc/ref4.tex | 147 ++++++++++++ Doc/ref5.tex | 672 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 10 files changed, 3908 insertions(+) create mode 100644 Doc/ref/ref1.tex create mode 100644 Doc/ref/ref2.tex create mode 100644 Doc/ref/ref3.tex create mode 100644 Doc/ref/ref4.tex create mode 100644 Doc/ref/ref5.tex create mode 100644 Doc/ref1.tex create mode 100644 Doc/ref2.tex create mode 100644 Doc/ref3.tex create mode 100644 Doc/ref4.tex create mode 100644 Doc/ref5.tex diff --git a/Doc/ref/ref1.tex b/Doc/ref/ref1.tex new file mode 100644 index 0000000..b373e36 --- /dev/null +++ b/Doc/ref/ref1.tex @@ -0,0 +1,81 @@ +\chapter{Introduction} + +This reference manual describes the Python programming language. +It is not intended as a tutorial. + +While I am trying to be as precise as possible, I chose to use English +rather than formal specifications for everything except syntax and +lexical analysis. This should make the document better understandable +to the average reader, but will leave room for ambiguities. +Consequently, if you were coming from Mars and tried to re-implement +Python from this document alone, you might have to guess things and in +fact you would probably end up implementing quite a different language. +On the other hand, if you are using +Python and wonder what the precise rules about a particular area of +the language are, you should definitely be able to find them here. + +It is dangerous to add too many implementation details to a language +reference document --- the implementation may change, and other +implementations of the same language may work differently. On the +other hand, there is currently only one Python implementation, and +its particular quirks are sometimes worth being mentioned, especially +where the implementation imposes additional limitations. Therefore, +you'll find short ``implementation notes'' sprinkled throughout the +text. + +Every Python implementation comes with a number of built-in and +standard modules. These are not documented here, but in the separate +{\em Python Library Reference} document. A few built-in modules are +mentioned when they interact in a significant way with the language +definition. + +\section{Notation} + +The descriptions of lexical analysis and syntax use a modified BNF +grammar notation. This uses the following style of definition: +\index{BNF} +\index{grammar} +\index{syntax} +\index{notation} + +\begin{verbatim} +name: lc_letter (lc_letter | "_")* +lc_letter: "a"..."z" +\end{verbatim} + +The first line says that a \verb\name\ is an \verb\lc_letter\ followed by +a sequence of zero or more \verb\lc_letter\s and underscores. An +\verb\lc_letter\ in turn is any of the single characters `a' through `z'. +(This rule is actually adhered to for the names defined in lexical and +grammar rules in this document.) + +Each rule begins with a name (which is the name defined by the rule) +and a colon. A vertical bar (\verb\|\) is used to separate +alternatives; it is the least binding operator in this notation. A +star (\verb\*\) means zero or more repetitions of the preceding item; +likewise, a plus (\verb\+\) means one or more repetitions, and a +phrase enclosed in square brackets (\verb\[ ]\) means zero or one +occurrences (in other words, the enclosed phrase is optional). The +\verb\*\ and \verb\+\ operators bind as tightly as possible; +parentheses are used for grouping. Literal strings are enclosed in +double quotes. White space is only meaningful to separate tokens. +Rules are normally contained on a single line; rules with many +alternatives may be formatted alternatively with each line after the +first beginning with a vertical bar. + +In lexical definitions (as the example above), two more conventions +are used: Two literal characters separated by three dots mean a choice +of any single character in the given (inclusive) range of ASCII +characters. A phrase between angular brackets (\verb\<...>\) gives an +informal description of the symbol defined; e.g. this could be used +to describe the notion of `control character' if needed. +\index{lexical definitions} +\index{ASCII} + +Even though the notation used is almost the same, there is a big +difference between the meaning of lexical and syntactic definitions: +a lexical definition operates on the individual characters of the +input source, while a syntax definition operates on the stream of +tokens generated by the lexical analysis. All uses of BNF in the next +chapter (``Lexical Analysis'') are lexical definitions; uses in +subsequent chapters are syntactic definitions. diff --git a/Doc/ref/ref2.tex b/Doc/ref/ref2.tex new file mode 100644 index 0000000..250bd2e --- /dev/null +++ b/Doc/ref/ref2.tex @@ -0,0 +1,349 @@ +\chapter{Lexical analysis} + +A Python program is read by a {\em parser}. Input to the parser is a +stream of {\em tokens}, generated by the {\em lexical analyzer}. This +chapter describes how the lexical analyzer breaks a file into tokens. +\index{lexical analysis} +\index{parser} +\index{token} + +\section{Line structure} + +A Python program is divided in a number of logical lines. The end of +a logical line is represented by the token NEWLINE. Statements cannot +cross logical line boundaries except where NEWLINE is allowed by the +syntax (e.g. between statements in compound statements). +\index{line structure} +\index{logical line} +\index{NEWLINE token} + +\subsection{Comments} + +A comment starts with a hash character (\verb\#\) that is not part of +a string literal, and ends at the end of the physical line. A comment +always signifies the end of the logical line. Comments are ignored by +the syntax. +\index{comment} +\index{logical line} +\index{physical line} +\index{hash character} + +\subsection{Line joining} + +Two or more physical lines may be joined into logical lines using +backslash characters (\verb/\/), as follows: when a physical line ends +in a backslash that is not part of a string literal or comment, it is +joined with the following forming a single logical line, deleting the +backslash and the following end-of-line character. For example: +\index{physical line} +\index{line joining} +\index{backslash character} +% +\begin{verbatim} +month_names = ['Januari', 'Februari', 'Maart', \ + 'April', 'Mei', 'Juni', \ + 'Juli', 'Augustus', 'September', \ + 'Oktober', 'November', 'December'] +\end{verbatim} + +\subsection{Blank lines} + +A logical line that contains only spaces, tabs, and possibly a +comment, is ignored (i.e., no NEWLINE token is generated), except that +during interactive input of statements, an entirely blank logical line +terminates a multi-line statement. +\index{blank line} + +\subsection{Indentation} + +Leading whitespace (spaces and tabs) at the beginning of a logical +line is used to compute the indentation level of the line, which in +turn is used to determine the grouping of statements. +\index{indentation} +\index{whitespace} +\index{leading whitespace} +\index{space} +\index{tab} +\index{grouping} +\index{statement grouping} + +First, tabs are replaced (from left to right) by one to eight spaces +such that the total number of characters up to there is a multiple of +eight (this is intended to be the same rule as used by {\UNIX}). The +total number of spaces preceding the first non-blank character then +determines the line's indentation. Indentation cannot be split over +multiple physical lines using backslashes. + +The indentation levels of consecutive lines are used to generate +INDENT and DEDENT tokens, using a stack, as follows. +\index{INDENT token} +\index{DEDENT token} + +Before the first line of the file is read, a single zero is pushed on +the stack; this will never be popped off again. The numbers pushed on +the stack will always be strictly increasing from bottom to top. At +the beginning of each logical line, the line's indentation level is +compared to the top of the stack. If it is equal, nothing happens. +If it is larger, it is pushed on the stack, and one INDENT token is +generated. If it is smaller, it {\em must} be one of the numbers +occurring on the stack; all numbers on the stack that are larger are +popped off, and for each number popped off a DEDENT token is +generated. At the end of the file, a DEDENT token is generated for +each number remaining on the stack that is larger than zero. + +Here is an example of a correctly (though confusingly) indented piece +of Python code: + +\begin{verbatim} +def perm(l): + # Compute the list of all permutations of l + + if len(l) <= 1: + return [l] + r = [] + for i in range(len(l)): + s = l[:i] + l[i+1:] + p = perm(s) + for x in p: + r.append(l[i:i+1] + x) + return r +\end{verbatim} + +The following example shows various indentation errors: + +\begin{verbatim} + def perm(l): # error: first line indented + for i in range(len(l)): # error: not indented + s = l[:i] + l[i+1:] + p = perm(l[:i] + l[i+1:]) # error: unexpected indent + for x in p: + r.append(l[i:i+1] + x) + return r # error: inconsistent dedent +\end{verbatim} + +(Actually, the first three errors are detected by the parser; only the +last error is found by the lexical analyzer --- the indentation of +\verb\return r\ does not match a level popped off the stack.) + +\section{Other tokens} + +Besides NEWLINE, INDENT and DEDENT, the following categories of tokens +exist: identifiers, keywords, literals, operators, and delimiters. +Spaces and tabs are not tokens, but serve to delimit tokens. Where +ambiguity exists, a token comprises the longest possible string that +forms a legal token, when read from left to right. + +\section{Identifiers} + +Identifiers (also referred to as names) are described by the following +lexical definitions: +\index{identifier} +\index{name} + +\begin{verbatim} +identifier: (letter|"_") (letter|digit|"_")* +letter: lowercase | uppercase +lowercase: "a"..."z" +uppercase: "A"..."Z" +digit: "0"..."9" +\end{verbatim} + +Identifiers are unlimited in length. Case is significant. + +\subsection{Keywords} + +The following identifiers are used as reserved words, or {\em +keywords} of the language, and cannot be used as ordinary +identifiers. They must be spelled exactly as written here: +\index{keyword} +\index{reserved word} + +\begin{verbatim} +and del for in print +break elif from is raise +class else global not return +continue except if or try +def finally import pass while +\end{verbatim} + +% # This Python program sorts and formats the above table +% import string +% l = [] +% try: +% while 1: +% l = l + string.split(raw_input()) +% except EOFError: +% pass +% l.sort() +% for i in range((len(l)+4)/5): +% for j in range(i, len(l), 5): +% print string.ljust(l[j], 10), +% print + +\section{Literals} \label{literals} + +Literals are notations for constant values of some built-in types. +\index{literal} +\index{constant} + +\subsection{String literals} + +String literals are described by the following lexical definitions: +\index{string literal} + +\begin{verbatim} +stringliteral: "'" stringitem* "'" +stringitem: stringchar | escapeseq +stringchar: +escapeseq: "'" +\end{verbatim} +\index{ASCII} + +String literals cannot span physical line boundaries. Escape +sequences in strings are actually interpreted according to rules +similar to those used by Standard C. The recognized escape sequences +are: +\index{physical line} +\index{escape sequence} +\index{Standard C} +\index{C} + +\begin{center} +\begin{tabular}{|l|l|} +\hline +\verb/\\/ & Backslash (\verb/\/) \\ +\verb/\'/ & Single quote (\verb/'/) \\ +\verb/\a/ & ASCII Bell (BEL) \\ +\verb/\b/ & ASCII Backspace (BS) \\ +%\verb/\E/ & ASCII Escape (ESC) \\ +\verb/\f/ & ASCII Formfeed (FF) \\ +\verb/\n/ & ASCII Linefeed (LF) \\ +\verb/\r/ & ASCII Carriage Return (CR) \\ +\verb/\t/ & ASCII Horizontal Tab (TAB) \\ +\verb/\v/ & ASCII Vertical Tab (VT) \\ +\verb/\/{\em ooo} & ASCII character with octal value {\em ooo} \\ +\verb/\x/{\em xx...} & ASCII character with hex value {\em xx...} \\ +\hline +\end{tabular} +\end{center} +\index{ASCII} + +In strict compatibility with Standard C, up to three octal digits are +accepted, but an unlimited number of hex digits is taken to be part of +the hex escape (and then the lower 8 bits of the resulting hex number +are used in all current implementations...). + +All unrecognized escape sequences are left in the string unchanged, +i.e., {\em the backslash is left in the string.} (This behavior is +useful when debugging: if an escape sequence is mistyped, the +resulting output is more easily recognized as broken. It also helps a +great deal for string literals used as regular expressions or +otherwise passed to other modules that do their own escape handling.) +\index{unrecognized escape sequence} + +\subsection{Numeric literals} + +There are three types of numeric literals: plain integers, long +integers, and floating point numbers. +\index{number} +\index{numeric literal} +\index{integer literal} +\index{plain integer literal} +\index{long integer literal} +\index{floating point literal} +\index{hexadecimal literal} +\index{octal literal} +\index{decimal literal} + +Integer and long integer literals are described by the following +lexical definitions: + +\begin{verbatim} +longinteger: integer ("l"|"L") +integer: decimalinteger | octinteger | hexinteger +decimalinteger: nonzerodigit digit* | "0" +octinteger: "0" octdigit+ +hexinteger: "0" ("x"|"X") hexdigit+ + +nonzerodigit: "1"..."9" +octdigit: "0"..."7" +hexdigit: digit|"a"..."f"|"A"..."F" +\end{verbatim} + +Although both lower case `l' and upper case `L' are allowed as suffix +for long integers, it is strongly recommended to always use `L', since +the letter `l' looks too much like the digit `1'. + +Plain integer decimal literals must be at most $2^{31} - 1$ (i.e., the +largest positive integer, assuming 32-bit arithmetic). Plain octal and +hexadecimal literals may be as large as $2^{32} - 1$, but values +larger than $2^{31} - 1$ are converted to a negative value by +subtracting $2^{32}$. There is no limit for long integer literals. + +Some examples of plain and long integer literals: + +\begin{verbatim} +7 2147483647 0177 0x80000000 +3L 79228162514264337593543950336L 0377L 0x100000000L +\end{verbatim} + +Floating point literals are described by the following lexical +definitions: + +\begin{verbatim} +floatnumber: pointfloat | exponentfloat +pointfloat: [intpart] fraction | intpart "." +exponentfloat: (intpart | pointfloat) exponent +intpart: digit+ +fraction: "." digit+ +exponent: ("e"|"E") ["+"|"-"] digit+ +\end{verbatim} + +The allowed range of floating point literals is +implementation-dependent. + +Some examples of floating point literals: + +\begin{verbatim} +3.14 10. .001 1e100 3.14e-10 +\end{verbatim} + +Note that numeric literals do not include a sign; a phrase like +\verb\-1\ is actually an expression composed of the operator +\verb\-\ and the literal \verb\1\. + +\section{Operators} + +The following tokens are operators: +\index{operators} + +\begin{verbatim} ++ - * / % +<< >> & | ^ ~ +< == > <= <> != >= +\end{verbatim} + +The comparison operators \verb\<>\ and \verb\!=\ are alternate +spellings of the same operator. + +\section{Delimiters} + +The following tokens serve as delimiters or otherwise have a special +meaning: +\index{delimiters} + +\begin{verbatim} +( ) [ ] { } +; , : . ` = +\end{verbatim} + +The following printing ASCII characters are not used in Python. Their +occurrence outside string literals and comments is an unconditional +error: +\index{ASCII} + +\begin{verbatim} +@ $ " ? +\end{verbatim} + +They may be used by future versions of the language though! diff --git a/Doc/ref/ref3.tex b/Doc/ref/ref3.tex new file mode 100644 index 0000000..fff448e --- /dev/null +++ b/Doc/ref/ref3.tex @@ -0,0 +1,705 @@ +\chapter{Data model} + +\section{Objects, values and types} + +{\em Objects} are Python's abstraction for data. All data in a Python +program is represented by objects or by relations between objects. +(In a sense, and in conformance to Von Neumann's model of a +``stored program computer'', code is also represented by objects.) +\index{object} +\index{data} + +Every object has an identity, a type and a value. An object's {\em +identity} never changes once it has been created; you may think of it +as the object's address in memory. An object's {\em type} is also +unchangeable. It determines the operations that an object supports +(e.g. ``does it have a length?'') and also defines the possible +values for objects of that type. The {\em value} of some objects can +change. Objects whose value can change are said to be {\em mutable}; +objects whose value is unchangeable once they are created are called +{\em immutable}. The type determines an object's (im)mutability. +\index{identity of an object} +\index{value of an object} +\index{type of an object} +\index{mutable object} +\index{immutable object} + +Objects are never explicitly destroyed; however, when they become +unreachable they may be garbage-collected. An implementation is +allowed to delay garbage collection or omit it altogether --- it is a +matter of implementation quality how garbage collection is +implemented, as long as no objects are collected that are still +reachable. (Implementation note: the current implementation uses a +reference-counting scheme which collects most objects as soon as they +become unreachable, but never collects garbage containing circular +references.) +\index{garbage collection} +\index{reference counting} +\index{unreachable object} + +Note that the use of the implementation's tracing or debugging +facilities may keep objects alive that would normally be collectable. + +Some objects contain references to ``external'' resources such as open +files or windows. It is understood that these resources are freed +when the object is garbage-collected, but since garbage collection is +not guaranteed to happen, such objects also provide an explicit way to +release the external resource, usually a \verb\close\ method. +Programs are strongly recommended to always explicitly close such +objects. + +Some objects contain references to other objects; these are called +{\em containers}. Examples of containers are tuples, lists and +dictionaries. The references are part of a container's value. In +most cases, when we talk about the value of a container, we imply the +values, not the identities of the contained objects; however, when we +talk about the (im)mutability of a container, only the identities of +the immediately contained objects are implied. (So, if an immutable +container contains a reference to a mutable object, its value changes +if that mutable object is changed.) +\index{container} + +Types affect almost all aspects of objects' lives. Even the meaning +of object identity is affected in some sense: for immutable types, +operations that compute new values may actually return a reference to +any existing object with the same type and value, while for mutable +objects this is not allowed. E.g. after + +\begin{verbatim} +a = 1; b = 1; c = []; d = [] +\end{verbatim} + +\verb\a\ and \verb\b\ may or may not refer to the same object with the +value one, depending on the implementation, but \verb\c\ and \verb\d\ +are guaranteed to refer to two different, unique, newly created empty +lists. + +\section{The standard type hierarchy} \label{types} + +Below is a list of the types that are built into Python. Extension +modules written in C can define additional types. Future versions of +Python may add types to the type hierarchy (e.g. rational or complex +numbers, efficiently stored arrays of integers, etc.). +\index{type} +\indexii{data}{type} +\indexii{type}{hierarchy} +\indexii{extension}{module} +\index{C} + +Some of the type descriptions below contain a paragraph listing +`special attributes'. These are attributes that provide access to the +implementation and are not intended for general use. Their definition +may change in the future. There are also some `generic' special +attributes, not listed with the individual objects: \verb\__methods__\ +is a list of the method names of a built-in object, if it has any; +\verb\__members__\ is a list of the data attribute names of a built-in +object, if it has any. +\index{attribute} +\indexii{special}{attribute} +\indexiii{generic}{special}{attribute} +\ttindex{__methods__} +\ttindex{__members__} + +\begin{description} + +\item[None] +This type has a single value. There is a single object with this value. +This object is accessed through the built-in name \verb\None\. +It is returned from functions that don't explicitly return an object. +\ttindex{None} +\obindex{None@{\tt None}} + +\item[Numbers] +These are created by numeric literals and returned as results by +arithmetic operators and arithmetic built-in functions. Numeric +objects are immutable; once created their value never changes. Python +numbers are of course strongly related to mathematical numbers, but +subject to the limitations of numerical representation in computers. +\obindex{number} +\obindex{numeric} + +Python distinguishes between integers and floating point numbers: + +\begin{description} +\item[Integers] +These represent elements from the mathematical set of whole numbers. +\obindex{integer} + +There are two types of integers: + +\begin{description} + +\item[Plain integers] +These represent numbers in the range $-2^{31}$ through $2^{31}-1$. +(The range may be larger on machines with a larger natural word +size, but not smaller.) +When the result of an operation falls outside this range, the +exception \verb\OverflowError\ is raised. +For the purpose of shift and mask operations, integers are assumed to +have a binary, 2's complement notation using 32 or more bits, and +hiding no bits from the user (i.e., all $2^{32}$ different bit +patterns correspond to different values). +\obindex{plain integer} + +\item[Long integers] +These represent numbers in an unlimited range, subject to available +(virtual) memory only. For the purpose of shift and mask operations, +a binary representation is assumed, and negative numbers are +represented in a variant of 2's complement which gives the illusion of +an infinite string of sign bits extending to the left. +\obindex{long integer} + +\end{description} % Integers + +The rules for integer representation are intended to give the most +meaningful interpretation of shift and mask operations involving +negative integers and the least surprises when switching between the +plain and long integer domains. For any operation except left shift, +if it yields a result in the plain integer domain without causing +overflow, it will yield the same result in the long integer domain or +when using mixed operands. +\indexii{integer}{representation} + +\item[Floating point numbers] +These represent machine-level double precision floating point numbers. +You are at the mercy of the underlying machine architecture and +C implementation for the accepted range and handling of overflow. +\obindex{floating point} +\indexii{floating point}{number} +\index{C} + +\end{description} % Numbers + +\item[Sequences] +These represent finite ordered sets indexed by natural numbers. +The built-in function \verb\len()\ returns the number of elements +of a sequence. When this number is $n$, the index set contains +the numbers $0, 1, \ldots, n-1$. Element \verb\i\ of sequence +\verb\a\ is selected by \verb\a[i]\. +\obindex{seqence} +\bifuncindex{len} +\index{index operation} +\index{item selection} +\index{subscription} + +Sequences also support slicing: \verb\a[i:j]\ selects all elements +with index $k$ such that $i <= k < j$. When used as an expression, +a slice is a sequence of the same type --- this implies that the +index set is renumbered so that it starts at 0 again. +\index{slicing} + +Sequences are distinguished according to their mutability: + +\begin{description} +% +\item[Immutable sequences] +An object of an immutable sequence type cannot change once it is +created. (If the object contains references to other objects, +these other objects may be mutable and may be changed; however +the collection of objects directly referenced by an immutable object +cannot change.) +\obindex{immutable sequence} +\obindex{immutable} + +The following types are immutable sequences: + +\begin{description} + +\item[Strings] +The elements of a string are characters. There is no separate +character type; a character is represented by a string of one element. +Characters represent (at least) 8-bit bytes. The built-in +functions \verb\chr()\ and \verb\ord()\ convert between characters +and nonnegative integers representing the byte values. +Bytes with the values 0-127 represent the corresponding ASCII values. +The string data type is also used to represent arrays of bytes, e.g. +to hold data read from a file. +\obindex{string} +\index{character} +\index{byte} +\index{ASCII} +\bifuncindex{chr} +\bifuncindex{ord} + +(On systems whose native character set is not ASCII, strings may use +EBCDIC in their internal representation, provided the functions +\verb\chr()\ and \verb\ord()\ implement a mapping between ASCII and +EBCDIC, and string comparison preserves the ASCII order. +Or perhaps someone can propose a better rule?) +\index{ASCII} +\index{EBCDIC} +\index{character set} +\indexii{string}{comparison} +\bifuncindex{chr} +\bifuncindex{ord} + +\item[Tuples] +The elements of a tuple are arbitrary Python objects. +Tuples of two or more elements are formed by comma-separated lists +of expressions. A tuple of one element (a `singleton') can be formed +by affixing a comma to an expression (an expression by itself does +not create a tuple, since parentheses must be usable for grouping of +expressions). An empty tuple can be formed by enclosing `nothing' in +parentheses. +\obindex{tuple} +\indexii{singleton}{tuple} +\indexii{empty}{tuple} + +\end{description} % Immutable sequences + +\item[Mutable sequences] +Mutable sequences can be changed after they are created. The +subscription and slicing notations can be used as the target of +assignment and \verb\del\ (delete) statements. +\obindex{mutable sequece} +\obindex{mutable} +\indexii{assignment}{statement} +\index{delete} +\stindex{del} +\index{subscription} +\index{slicing} + +There is currently a single mutable sequence type: + +\begin{description} + +\item[Lists] +The elements of a list are arbitrary Python objects. Lists are formed +by placing a comma-separated list of expressions in square brackets. +(Note that there are no special cases needed to form lists of length 0 +or 1.) +\obindex{list} + +\end{description} % Mutable sequences + +\end{description} % Sequences + +\item[Mapping types] +These represent finite sets of objects indexed by arbitrary index sets. +The subscript notation \verb\a[k]\ selects the element indexed +by \verb\k\ from the mapping \verb\a\; this can be used in +expressions and as the target of assignments or \verb\del\ statements. +The built-in function \verb\len()\ returns the number of elements +in a mapping. +\bifuncindex{len} +\index{subscription} +\obindex{mapping} + +There is currently a single mapping type: + +\begin{description} + +\item[Dictionaries] +These represent finite sets of objects indexed by strings. +Dictionaries are mutable; they are created by the \verb\{...}\ +notation (see section \ref{dict}). (Implementation note: the strings +used for indexing must not contain null bytes.) +\obindex{dictionary} +\obindex{mutable} + +\end{description} % Mapping types + +\item[Callable types] +These are the types to which the function call (invocation) operation, +written as \verb\function(argument, argument, ...)\, can be applied: +\indexii{function}{call} +\index{invocation} +\indexii{function}{argument} +\obindex{callable} + +\begin{description} + +\item[User-defined functions] +A user-defined function object is created by a function definition +(see section \ref{function}). It should be called with an argument +list containing the same number of items as the function's formal +parameter list. +\indexii{user-defined}{function} +\obindex{function} +\obindex{user-defined function} + +Special read-only attributes: \verb\func_code\ is the code object +representing the compiled function body, and \verb\func_globals\ is (a +reference to) the dictionary that holds the function's global +variables --- it implements the global name space of the module in +which the function was defined. +\ttindex{func_code} +\ttindex{func_globals} +\indexii{global}{name space} + +\item[User-defined methods] +A user-defined method (a.k.a. {\em object closure}) is a pair of a +class instance object and a user-defined function. It should be +called with an argument list containing one item less than the number +of items in the function's formal parameter list. When called, the +class instance becomes the first argument, and the call arguments are +shifted one to the right. +\obindex{method} +\obindex{user-defined method} +\indexii{user-defined}{method} +\index{object closure} + +Special read-only attributes: \verb\im_self\ is the class instance +object, \verb\im_func\ is the function object. +\ttindex{im_func} +\ttindex{im_self} + +\item[Built-in functions] +A built-in function object is a wrapper around a C function. Examples +of built-in functions are \verb\len\ and \verb\math.sin\. There +are no special attributes. The number and type of the arguments are +determined by the C function. +\obindex{built-in function} +\obindex{function} +\index{C} + +\item[Built-in methods] +This is really a different disguise of a built-in function, this time +containing an object passed to the C function as an implicit extra +argument. An example of a built-in method is \verb\list.append\ if +\verb\list\ is a list object. +\obindex{built-in method} +\obindex{method} +\indexii{built-in}{method} + +\item[Classes] +Class objects are described below. When a class object is called as a +parameterless function, a new class instance (also described below) is +created and returned. The class's initialization function is not +called --- this is the responsibility of the caller. It is illegal to +call a class object with one or more arguments. +\obindex{class} +\obindex{class instance} +\obindex{instance} +\indexii{class object}{call} + +\end{description} + +\item[Modules] +Modules are imported by the \verb\import\ statement (see section +\ref{import}). A module object is a container for a module's name +space, which is a dictionary (the same dictionary as referenced by the +\verb\func_globals\ attribute of functions defined in the module). +Module attribute references are translated to lookups in this +dictionary. A module object does not contain the code object used to +initialize the module (since it isn't needed once the initialization +is done). +\stindex{import} +\obindex{module} + +Attribute assignment update the module's name space dictionary. + +Special read-only attributes: \verb\__dict__\ yields the module's name +space as a dictionary object; \verb\__name__\ yields the module's name +as a string object. +\ttindex{__dict__} +\ttindex{__name__} +\indexii{module}{name space} + +\item[Classes] +Class objects are created by class definitions (see section +\ref{class}). A class is a container for a dictionary containing the +class's name space. Class attribute references are translated to +lookups in this dictionary. When an attribute name is not found +there, the attribute search continues in the base classes. The search +is depth-first, left-to-right in the order of their occurrence in the +base class list. +\obindex{class} +\obindex{class instance} +\obindex{instance} +\indexii{class object}{call} +\index{container} +\index{dictionary} +\indexii{class}{attribute} + +Class attribute assignments update the class's dictionary, never the +dictionary of a base class. +\indexiii{class}{attribute}{assignment} + +A class can be called as a parameterless function to yield a class +instance (see above). +\indexii{class object}{call} + +Special read-only attributes: \verb\__dict__\ yields the dictionary +containing the class's name space; \verb\__bases__\ yields a tuple +(possibly empty or a singleton) containing the base classes, in the +order of their occurrence in the base class list. +\ttindex{__dict__} +\ttindex{__bases__} + +\item[Class instances] +A class instance is created by calling a class object as a +parameterless function. A class instance has a dictionary in which +attribute references are searched. When an attribute is not found +there, and the instance's class has an attribute by that name, and +that class attribute is a user-defined function (and in no other +cases), the instance attribute reference yields a user-defined method +object (see above) constructed from the instance and the function. +\obindex{class instance} +\obindex{instance} +\indexii{class}{instance} +\indexii{class instance}{attribute} + +Attribute assignments update the instance's dictionary. +\indexiii{class instance}{attribute}{assignment} + +Class instances can pretend to be numbers, sequences, or mappings if +they have methods with certain special names. These are described in +section \ref{specialnames}. +\obindex{number} +\obindex{sequence} +\obindex{mapping} + +Special read-only attributes: \verb\__dict__\ yields the attribute +dictionary; \verb\__class__\ yields the instance's class. +\ttindex{__dict__} +\ttindex{__class__} + +\item[Files] +A file object represents an open file. (It is a wrapper around a C +{\tt stdio} file pointer.) File objects are created by the +\verb\open()\ built-in function, and also by \verb\posix.popen()\ and +the \verb\makefile\ method of socket objects. \verb\sys.stdin\, +\verb\sys.stdout\ and \verb\sys.stderr\ are file objects corresponding +the the interpreter's standard input, output and error streams. +See the Python Library Reference for methods of file objects and other +details. +\obindex{file} +\index{C} +\index{stdio} +\bifuncindex{open} +\bifuncindex{popen} +\bifuncindex{makefile} +\ttindex{stdin} +\ttindex{stdout} +\ttindex{stderr} +\ttindex{sys.stdin} +\ttindex{sys.stdout} +\ttindex{sys.stderr} + +\item[Internal types] +A few types used internally by the interpreter are exposed to the user. +Their definition may change with future versions of the interpreter, +but they are mentioned here for completeness. +\index{internal type} + +\begin{description} + +\item[Code objects] +Code objects represent executable code. The difference between a code +object and a function object is that the function object contains an +explicit reference to the function's context (the module in which it +was defined) which a code object contains no context. There is no way +to execute a bare code object. +\obindex{code} + +Special read-only attributes: \verb\co_code\ is a string representing +the sequence of instructions; \verb\co_consts\ is a list of literals +used by the code; \verb\co_names\ is a list of names (strings) used by +the code; \verb\co_filename\ is the filename from which the code was +compiled. (To find out the line numbers, you would have to decode the +instructions; the standard library module \verb\dis\ contains an +example of how to do this.) +\ttindex{co_code} +\ttindex{co_consts} +\ttindex{co_names} +\ttindex{co_filename} + +\item[Frame objects] +Frame objects represent execution frames. They may occur in traceback +objects (see below). +\obindex{frame} + +Special read-only attributes: \verb\f_back\ is to the previous +stack frame (towards the caller), or \verb\None\ if this is the bottom +stack frame; \verb\f_code\ is the code object being executed in this +frame; \verb\f_globals\ is the dictionary used to look up global +variables; \verb\f_locals\ is used for local variables; +\verb\f_lineno\ gives the line number and \verb\f_lasti\ gives the +precise instruction (this is an index into the instruction string of +the code object). +\ttindex{f_back} +\ttindex{f_code} +\ttindex{f_globals} +\ttindex{f_locals} +\ttindex{f_lineno} +\ttindex{f_lasti} + +\item[Traceback objects] +Traceback objects represent a stack trace of an exception. A +traceback object is created when an exception occurs. When the search +for an exception handler unwinds the execution stack, at each unwound +level a traceback object is inserted in front of the current +traceback. When an exception handler is entered, the stack trace is +made available to the program as \verb\sys.exc_traceback\. When the +program contains no suitable handler, the stack trace is written +(nicely formatted) to the standard error stream; if the interpreter is +interactive, it is also made available to the user as +\verb\sys.last_traceback\. +\obindex{traceback} +\indexii{stack}{trace} +\indexii{exception}{handler} +\indexii{execution}{stack} +\ttindex{exc_traceback} +\ttindex{last_traceback} +\ttindex{sys.exc_traceback} +\ttindex{sys.last_traceback} + +Special read-only attributes: \verb\tb_next\ is the next level in the +stack trace (towards the frame where the exception occurred), or +\verb\None\ if there is no next level; \verb\tb_frame\ points to the +execution frame of the current level; \verb\tb_lineno\ gives the line +number where the exception occurred; \verb\tb_lasti\ indicates the +precise instruction. The line number and last instruction in the +traceback may differ from the line number of its frame object if the +exception occurred in a \verb\try\ statement with no matching +\verb\except\ clause or with a \verb\finally\ clause. +\ttindex{tb_next} +\ttindex{tb_frame} +\ttindex{tb_lineno} +\ttindex{tb_lasti} +\stindex{try} + +\end{description} % Internal types + +\end{description} % Types + + +\section{Special method names} \label{specialnames} + +A class can implement certain operations that are invoked by special +syntax (such as subscription or arithmetic operations) by defining +methods with special names. For instance, if a class defines a +method named \verb\__getitem__\, and \verb\x\ is an instance of this +class, then \verb\x[i]\ is equivalent to \verb\x.__getitem__(i)\. +(The reverse is not true --- if \verb\x\ is a list object, +\verb\x.__getitem__(i)\ is not equivalent to \verb\x[i]\.) + +Except for \verb\__repr__\ and \verb\__cmp__\, attempts to execute an +operation raise an exception when no appropriate method is defined. +For \verb\__repr__\ and \verb\__cmp__\, the traditional +interpretations are used in this case. + + +\subsection{Special methods for any type} + +\begin{description} + +\item[\tt __repr__(self)] +Called by the \verb\print\ statement and conversions (reverse quotes) to +compute the string representation of an object. + +\item[\tt _cmp__(self, other)] +Called by all comparison operations. Should return -1 if +\verb\self < other\, 0 if \verb\self == other\, +1 if +\verb\self > other\. (Implementation note: due to limitations in the +interpreter, exceptions raised by comparisons are ignored, and the +objects will be considered equal in this case.) + +\end{description} + + +\subsection{Special methods for sequence and mapping types} + +\begin{description} + +\item[\tt __len__(self)] +Called to implement the built-in function \verb\len()\. Should return +the length of the object, an integer \verb\>=\ 0. Also, an object +whose \verb\__len__()\ method returns 0 is considered to be false in a +Boolean context. + +\item[\tt __getitem__(self, key)] +Called to implement evaluation of \verb\self[key]\. Note that the +special interpretation of negative keys (if the class wishes to +emulate a sequence type) is up to the \verb\__getitem__\ method. + +\item[\tt __setitem__(self, key, value)] +Called to implement assignment to \verb\self[key]\. Same note as for +\verb\__getitem__\. + +\item[\tt __delitem__(self, key)] +Called to implement deletion of \verb\self[key]\. Same note as for +\verb\__getitem__\. + +\end{description} + + +\subsection{Special methods for sequence types} + +\begin{description} + +\item[\tt __getslice__(self, i, j)] +Called to implement evaluation of \verb\self[i:j]\. Note that missing +\verb\i\ or \verb\j\ are replaced by 0 or \verb\len(self)\, +respectively, and \verb\len(self)\ has been added (once) to originally +negative \verb\i\ or \verb\j\ by the time this function is called +(unlike for \verb\__getitem__\). + +\item[\tt __setslice__(self, i, j, sequence)] +Called to implement assignment to \verb\self[i:j]\. Same notes as for +\verb\__getslice__\. + +\item[\tt __delslice__(self, i, j)] +Called to implement deletion of \verb\self[i:j]\. Same notes as for +\verb\__getslice__\. + +\end{description} + + +\subsection{Special methods for numeric types} + +\begin{description} + +\item[\tt __add__(self, other)]\itemjoin +\item[\tt __sub__(self, other)]\itemjoin +\item[\tt __mul__(self, other)]\itemjoin +\item[\tt __div__(self, other)]\itemjoin +\item[\tt __mod__(self, other)]\itemjoin +\item[\tt __divmod__(self, other)]\itemjoin +\item[\tt __pow__(self, other)]\itemjoin +\item[\tt __lshift__(self, other)]\itemjoin +\item[\tt __rshift__(self, other)]\itemjoin +\item[\tt __and__(self, other)]\itemjoin +\item[\tt __xor__(self, other)]\itemjoin +\item[\tt __or__(self, other)]\itembreak +Called to implement the binary arithmetic operations (\verb\+\, +\verb\-\, \verb\*\, \verb\/\, \verb\%\, \verb\divmod()\, \verb\pow()\, +\verb\<<\, \verb\>>\, \verb\&\, \verb\^\, \verb\|\). + +\item[\tt __neg__(self)]\itemjoin +\item[\tt __pos__(self)]\itemjoin +\item[\tt __abs__(self)]\itemjoin +\item[\tt __invert__(self)]\itembreak +Called to implement the unary arithmetic operations (\verb\-\, \verb\+\, +\verb\abs()\ and \verb\~\). + +\item[\tt __nonzero__(self)] +Called to implement boolean testing; should return 0 or 1. An +alternative name for this method is \verb\__len__\. + +\item[\tt __coerce__(self, other)] +Called to implement ``mixed-mode'' numeric arithmetic. Should either +return a tuple containing self and other converted to a common numeric +type, or None if no way of conversion is known. When the common type +would be the type of other, it is sufficient to return None, since the +interpreter will also ask the other object to attempt a coercion (but +sometimes, if the implementation of the other type cannot be changed, +it is useful to do the conversion to the other type here). + +Note that this method is not called to coerce the arguments to \verb\+\ +and \verb\*\, because these are also used to implement sequence +concatenation and repetition, respectively. Also note that, for the +same reason, in \verb\n*x\, where \verb\n\ is a built-in number and +\verb\x\ is an instance, a call to \verb\x.__mul__(n)\ is made.% +\footnote{The interpreter should really distinguish between +user-defined classes implementing sequences, mappings or numbers, but +currently it doesn't --- hence this strange exception.} + +\item[\tt __int__(self)]\itemjoin +\item[\tt __long__(self)]\itemjoin +\item[\tt __float__(self)]\itembreak +Called to implement the built-in functions \verb\int()\, \verb\long()\ +and \verb\float()\. Should return a value of the appropriate type. + +\end{description} diff --git a/Doc/ref/ref4.tex b/Doc/ref/ref4.tex new file mode 100644 index 0000000..f8677b5 --- /dev/null +++ b/Doc/ref/ref4.tex @@ -0,0 +1,147 @@ +\chapter{Execution model} +\index{execution model} + +\section{Code blocks, execution frames, and name spaces} \label{execframes} +\index{code block} +\indexii{execution}{frame} +\index{name space} + +A {\em code block} is a piece of Python program text that can be +executed as a unit, such as a module, a class definition or a function +body. Some code blocks (like modules) are executed only once, others +(like function bodies) may be executed many times. Code block may +textually contain other code blocks. Code blocks may invoke other +code blocks (that may or may not be textually contained in them) as +part of their execution, e.g. by invoking (calling) a function. +\index{code block} +\indexii{code}{block} + +The following are code blocks: A module is a code block. A function +body is a code block. A class definition is a code block. Each +command typed interactively is a separate code block; a script file is +a code block. The string argument passed to the built-in functions +\verb\eval\ and \verb\exec\ are code blocks. And finally, the +expression read and evaluated by the built-in function \verb\input\ is +a code block. + +A code block is executed in an execution frame. An {\em execution +frame} contains some administrative information (used for debugging), +determines where and how execution continues after the code block's +execution has completed, and (perhaps most importantly) defines two +name spaces, the local and the global name space, that affect +execution of the code block. +\indexii{execution}{frame} + +A {\em name space} is a mapping from names (identifiers) to objects. +A particular name space may be referenced by more than one execution +frame, and from other places as well. Adding a name to a name space +is called {\em binding} a name (to an object); changing the mapping of +a name is called {\em rebinding}; removing a name is {\em unbinding}. +Name spaces are functionally equivalent to dictionaries. +\index{name space} +\indexii{binding}{name} +\indexii{rebinding}{name} +\indexii{unbinding}{name} + +The {\em local name space} of an execution frame determines the default +place where names are defined and searched. The {\em global name +space} determines the place where names listed in \verb\global\ +statements are defined and searched, and where names that are not +explicitly bound in the current code block are searched. +\indexii{local}{name space} +\indexii{global}{name space} +\stindex{global} + +Whether a name is local or global in a code block is determined by +static inspection of the source text for the code block: in the +absence of \verb\global\ statements, a name that is bound anywhere in +the code block is local in the entire code block; all other names are +considered global. The \verb\global\ statement forces global +interpretation of selected names throughout the code block. The +following constructs bind names: formal parameters, \verb\import\ +statements, class and function definitions (these bind the class or +function name), and targets that are identifiers if occurring in an +assignment, \verb\for\ loop header, or \verb\except\ clause header. +(A target occurring in a \verb\del\ statement does not bind a name.) + +When a global name is not found in the global name space, it is +searched in the list of ``built-in'' names (which is actually the +global name space of the module \verb\builtin\). When a name is not +found at all, the \verb\NameError\ exception is raised. + +The following table lists the meaning of the local and global name +space for various types of code blocks. The name space for a +particular module is automatically created when the module is first +referenced. + +\begin{center} +\begin{tabular}{|l|l|l|l|} +\hline +Code block type & Global name space & Local name space & Notes \\ +\hline +Module & n.s. for this module & same as global & \\ +Script & n.s. for \verb\__main__\ & same as global & \\ +Interactive command & n.s. for \verb\__main__\ & same as global & \\ +Class definition & global n.s. of containing block & new n.s. & \\ +Function body & global n.s. of containing block & new n.s. & \\ +String passed to \verb\exec\ or \verb\eval\ + & global n.s. of caller & local n.s. of caller & (1) \\ +File read by \verb\execfile\ + & global n.s. of caller & local n.s. of caller & (1) \\ +Expression read by \verb\input\ + & global n.s. of caller & local n.s. of caller & \\ +\hline +\end{tabular} +\end{center} + +Notes: + +\begin{description} + +\item[n.s.] means {\em name space} + +\item[(1)] The global and local name space for these functions can be +overridden with optional extra arguments. + +\end{description} + +\section{Exceptions} + +Exceptions are a means of breaking out of the normal flow of control +of a code block in order to handle errors or other exceptional +conditions. An exception is {\em raised} at the point where the error +is detected; it may be {\em handled} by the surrounding code block or +by any code block that directly or indirectly invoked the code block +where the error occurred. +\index{exception} +\index{raise an exception} +\index{handle an exception} +\index{exception handler} +\index{errors} +\index{error handling} + +The Python interpreter raises an exception when it detects an run-time +error (such as division by zero). A Python program can also +explicitly raise an exception with the \verb\raise\ statement. +Exception handlers are specified with the \verb\try...except\ +statement. + +Python uses the ``termination'' model of error handling: an exception +handler can find out what happened and continue execution at an outer +level, but it cannot repair the cause of the error and retry the +failing operation (except by re-entering the the offending piece of +code from the top). + +When an exception is not handled at all, the interpreter terminates +execution of the program, or returns to its interactive main loop. + +Exceptions are identified by string objects. Two different string +objects with the same value identify different exceptions. + +When an exception is raised, an object (maybe \verb\None\) is passed +as the exception's ``parameter''; this object does not affect the +selection of an exception handler, but is passed to the selected +exception handler as additional information. + +See also the description of the \verb\try\ and \verb\raise\ +statements. diff --git a/Doc/ref/ref5.tex b/Doc/ref/ref5.tex new file mode 100644 index 0000000..34dfeb1 --- /dev/null +++ b/Doc/ref/ref5.tex @@ -0,0 +1,672 @@ +\chapter{Expressions and conditions} +\index{expression} +\index{condition} + +{\bf Note:} In this and the following chapters, extended BNF notation +will be used to describe syntax, not lexical analysis. +\index{BNF} + +This chapter explains the meaning of the elements of expressions and +conditions. Conditions are a superset of expressions, and a condition +may be used wherever an expression is required by enclosing it in +parentheses. The only places where expressions are used in the syntax +instead of conditions is in expression statements and on the +right-hand side of assignment statements; this catches some nasty bugs +like accidentally writing \verb\x == 1\ instead of \verb\x = 1\. +\indexii{assignment}{statement} + +The comma plays several roles in Python's syntax. It is usually an +operator with a lower precedence than all others, but occasionally +serves other purposes as well; e.g. it separates function arguments, +is used in list and dictionary constructors, and has special semantics +in \verb\print\ statements. +\index{comma} + +When (one alternative of) a syntax rule has the form + +\begin{verbatim} +name: othername +\end{verbatim} + +and no semantics are given, the semantics of this form of \verb\name\ +are the same as for \verb\othername\. +\index{syntax} + +\section{Arithmetic conversions} +\indexii{arithmetic}{conversion} + +When a description of an arithmetic operator below uses the phrase +``the numeric arguments are converted to a common type'', +this both means that if either argument is not a number, a +\verb\TypeError\ exception is raised, and that otherwise +the following conversions are applied: +\exindex{TypeError} +\indexii{floating point}{number} +\indexii{long}{integer} +\indexii{plain}{integer} + +\begin{itemize} +\item first, if either argument is a floating point number, + the other is converted to floating point; +\item else, if either argument is a long integer, + the other is converted to long integer; +\item otherwise, both must be plain integers and no conversion + is necessary. +\end{itemize} + +\section{Atoms} +\index{atom} + +Atoms are the most basic elements of expressions. Forms enclosed in +reverse quotes or in parentheses, brackets or braces are also +categorized syntactically as atoms. The syntax for atoms is: + +\begin{verbatim} +atom: identifier | literal | enclosure +enclosure: parenth_form | list_display | dict_display | string_conversion +\end{verbatim} + +\subsection{Identifiers (Names)} +\index{name} +\index{identifier} + +An identifier occurring as an atom is a reference to a local, global +or built-in name binding. If a name can be assigned to anywhere in a +code block, and is not mentioned in a \verb\global\ statement in that +code block, it refers to a local name throughout that code block. +Otherwise, it refers to a global name if one exists, else to a +built-in name. +\indexii{name}{binding} +\index{code block} +\stindex{global} +\indexii{built-in}{name} +\indexii{global}{name} + +When the name is bound to an object, evaluation of the atom yields +that object. When a name is not bound, an attempt to evaluate it +raises a \verb\NameError\ exception. +\exindex{NameError} + +\subsection{Literals} +\index{literal} + +Python knows string and numeric literals: + +\begin{verbatim} +literal: stringliteral | integer | longinteger | floatnumber +\end{verbatim} + +Evaluation of a literal yields an object of the given type (string, +integer, long integer, floating point number) with the given value. +The value may be approximated in the case of floating point literals. +See section \ref{literals} for details. + +All literals correspond to immutable data types, and hence the +object's identity is less important than its value. Multiple +evaluations of literals with the same value (either the same +occurrence in the program text or a different occurrence) may obtain +the same object or a different object with the same value. +\indexiii{immutable}{data}{type} + +(In the original implementation, all literals in the same code block +with the same type and value yield the same object.) + +\subsection{Parenthesized forms} +\index{parenthesized form} + +A parenthesized form is an optional condition list enclosed in +parentheses: + +\begin{verbatim} +parenth_form: "(" [condition_list] ")" +\end{verbatim} + +A parenthesized condition list yields whatever that condition list +yields. + +An empty pair of parentheses yields an empty tuple object. Since +tuples are immutable, the rules for literals apply here. +\indexii{empty}{tuple} + +(Note that tuples are not formed by the parentheses, but rather by use +of the comma operator. The exception is the empty tuple, for which +parentheses {\em are} required --- allowing unparenthesized ``nothing'' +in expressions would causes ambiguities and allow common typos to +pass uncaught.) +\index{comma} +\indexii{tuple}{display} + +\subsection{List displays} +\indexii{list}{display} + +A list display is a possibly empty series of conditions enclosed in +square brackets: + +\begin{verbatim} +list_display: "[" [condition_list] "]" +\end{verbatim} + +A list display yields a new list object. +\obindex{list} + +If it has no condition list, the list object has no items. Otherwise, +the elements of the condition list are evaluated from left to right +and inserted in the list object in that order. +\indexii{empty}{list} + +\subsection{Dictionary displays} \label{dict} +\indexii{dictionary}{display} + +A dictionary display is a possibly empty series of key/datum pairs +enclosed in curly braces: +\index{key} +\index{datum} +\index{key/datum pair} + +\begin{verbatim} +dict_display: "{" [key_datum_list] "}" +key_datum_list: key_datum ("," key_datum)* [","] +key_datum: condition ":" condition +\end{verbatim} + +A dictionary display yields a new dictionary object. +\obindex{dictionary} + +The key/datum pairs are evaluated from left to right to define the +entries of the dictionary: each key object is used as a key into the +dictionary to store the corresponding datum. + +Keys must be strings, otherwise a \verb\TypeError\ exception is +raised. Clashes between duplicate keys are not detected; the last +datum (textually rightmost in the display) stored for a given key +value prevails. +\exindex{TypeError} + +\subsection{String conversions} +\indexii{string}{conversion} + +A string conversion is a condition list enclosed in reverse (or +backward) quotes: + +\begin{verbatim} +string_conversion: "`" condition_list "`" +\end{verbatim} + +A string conversion evaluates the contained condition list and +converts the resulting object into a string according to rules +specific to its type. + +If the object is a string, a number, \verb\None\, or a tuple, list or +dictionary containing only objects whose type is one of these, the +resulting string is a valid Python expression which can be passed to +the built-in function \verb\eval()\ to yield an expression with the +same value (or an approximation, if floating point numbers are +involved). + +(In particular, converting a string adds quotes around it and converts +``funny'' characters to escape sequences that are safe to print.) + +It is illegal to attempt to convert recursive objects (e.g. lists or +dictionaries that contain a reference to themselves, directly or +indirectly.) +\obindex{recursive} + +\section{Primaries} \label{primaries} +\index{primary} + +Primaries represent the most tightly bound operations of the language. +Their syntax is: + +\begin{verbatim} +primary: atom | attributeref | subscription | slicing | call +\end{verbatim} + +\subsection{Attribute references} +\indexii{attribute}{reference} + +An attribute reference is a primary followed by a period and a name: + +\begin{verbatim} +attributeref: primary "." identifier +\end{verbatim} + +The primary must evaluate to an object of a type that supports +attribute references, e.g. a module or a list. This object is then +asked to produce the attribute whose name is the identifier. If this +attribute is not available, the exception \verb\AttributeError\ is +raised. Otherwise, the type and value of the object produced is +determined by the object. Multiple evaluations of the same attribute +reference may yield different objects. +\obindex{module} +\obindex{list} + +\subsection{Subscriptions} +\index{subscription} + +A subscription selects an item of a sequence (string, tuple or list) +or mapping (dictionary) object: +\obindex{sequence} +\obindex{mapping} +\obindex{string} +\obindex{tuple} +\obindex{list} +\obindex{dictionary} +\indexii{sequence}{item} + +\begin{verbatim} +subscription: primary "[" condition "]" +\end{verbatim} + +The primary must evaluate to an object of a sequence or mapping type. + +If it is a mapping, the condition must evaluate to an object whose +value is one of the keys of the mapping, and the subscription selects +the value in the mapping that corresponds to that key. + +If it is a sequence, the condition must evaluate to a plain integer. +If this value is negative, the length of the sequence is added to it +(so that, e.g. \verb\x[-1]\ selects the last item of \verb\x\.) +The resulting value must be a nonnegative integer smaller than the +number of items in the sequence, and the subscription selects the item +whose index is that value (counting from zero). + +A string's items are characters. A character is not a separate data +type but a string of exactly one character. +\index{character} +\indexii{string}{item} + +\subsection{Slicings} +\index{slicing} +\index{slice} + +A slicing (or slice) selects a range of items in a sequence (string, +tuple or list) object: +\obindex{sequence} +\obindex{string} +\obindex{tuple} +\obindex{list} + +\begin{verbatim} +slicing: primary "[" [condition] ":" [condition] "]" +\end{verbatim} + +The primary must evaluate to a sequence object. The lower and upper +bound expressions, if present, must evaluate to plain integers; +defaults are zero and the sequence's length, respectively. If either +bound is negative, the sequence's length is added to it. The slicing +now selects all items with index $k$ such that $i <= k < j$ where $i$ +and $j$ are the specified lower and upper bounds. This may be an +empty sequence. It is not an error if $i$ or $j$ lie outside the +range of valid indexes (such items don't exist so they aren't +selected). + +\subsection{Calls} \label{calls} +\index{call} + +A call calls a callable object (e.g. a function) with a possibly empty +series of arguments: +\obindex{callable} + +\begin{verbatim} +call: primary "(" [condition_list] ")" +\end{verbatim} + +The primary must evaluate to a callable object (user-defined +functions, built-in functions, methods of built-in objects, class +objects, and methods of class instances are callable). If it is a +class, the argument list must be empty; otherwise, the arguments are +evaluated. + +A call always returns some value, possibly \verb\None\, unless it +raises an exception. How this value is computed depends on the type +of the callable object. If it is: + +\begin{description} + +\item[a user-defined function:] the code block for the function is +executed, passing it the argument list. The first thing the code +block will do is bind the formal parameters to the arguments; this is +described in section \ref{function}. When the code block executes a +\verb\return\ statement, this specifies the return value of the +function call. +\indexii{function}{call} +\indexiii{user-defined}{function}{call} +\obindex{user-defined function} +\obindex{function} + +\item[a built-in function or method:] the result is up to the +interpreter; see the library reference manual for the descriptions of +built-in functions and methods. +\indexii{function}{call} +\indexii{built-in function}{call} +\indexii{method}{call} +\indexii{built-in method}{call} +\obindex{built-in method} +\obindex{built-in function} +\obindex{method} +\obindex{function} + +\item[a class object:] a new instance of that class is returned. +\obindex{class} +\indexii{class object}{call} + +\item[a class instance method:] the corresponding user-defined +function is called, with an argument list that is one longer than the +argument list of the call: the instance becomes the first argument. +\obindex{class instance} +\obindex{instance} +\indexii{instance}{call} +\indexii{class instance}{call} + +\end{description} + +\section{Unary arithmetic operations} +\indexiii{unary}{arithmetic}{operation} +\indexiii{unary}{bit-wise}{operation} + +All unary arithmetic (and bit-wise) operations have the same priority: + +\begin{verbatim} +u_expr: primary | "-" u_expr | "+" u_expr | "~" u_expr +\end{verbatim} + +The unary \verb\"-"\ (minus) operator yields the negation of its +numeric argument. +\index{negation} +\index{minus} + +The unary \verb\"+"\ (plus) operator yields its numeric argument +unchanged. +\index{plus} + +The unary \verb\"~"\ (invert) operator yields the bit-wise inversion +of its plain or long integer argument. The bit-wise inversion of +\verb\x\ is defined as \verb\-(x+1)\. +\index{inversion} + +In all three cases, if the argument does not have the proper type, +a \verb\TypeError\ exception is raised. +\exindex{TypeError} + +\section{Binary arithmetic operations} +\indexiii{binary}{arithmetic}{operation} + +The binary arithmetic operations have the conventional priority +levels. Note that some of these operations also apply to certain +non-numeric types. There is no ``power'' operator, so there are only +two levels, one for multiplicative operators and one for additive +operators: + +\begin{verbatim} +m_expr: u_expr | m_expr "*" u_expr + | m_expr "/" u_expr | m_expr "%" u_expr +a_expr: m_expr | aexpr "+" m_expr | aexpr "-" m_expr +\end{verbatim} + +The \verb\"*"\ (multiplication) operator yields the product of its +arguments. The arguments must either both be numbers, or one argument +must be a plain integer and the other must be a sequence. In the +former case, the numbers are converted to a common type and then +multiplied together. In the latter case, sequence repetition is +performed; a negative repetition factor yields an empty sequence. +\index{multiplication} + +The \verb\"/"\ (division) operator yields the quotient of its +arguments. The numeric arguments are first converted to a common +type. Plain or long integer division yields an integer of the same +type; the result is that of mathematical division with the `floor' +function applied to the result. Division by zero raises the +\verb\ZeroDivisionError\ exception. +\exindex{ZeroDivisionError} +\index{division} + +The \verb\"%"\ (modulo) operator yields the remainder from the +division of the first argument by the second. The numeric arguments +are first converted to a common type. A zero right argument raises +the \verb\ZeroDivisionError\ exception. The arguments may be floating +point numbers, e.g. \verb\3.14 % 0.7\ equals \verb\0.34\. The modulo +operator always yields a result with the same sign as its second +operand (or zero); the absolute value of the result is strictly +smaller than the second operand. +\index{modulo} + +The integer division and modulo operators are connected by the +following identity: \verb\x == (x/y)*y + (x%y)\. Integer division and +modulo are also connected with the built-in function \verb\divmod()\: +\verb\divmod(x, y) == (x/y, x%y)\. These identities don't hold for +floating point numbers; there a similar identity holds where +\verb\x/y\ is replaced by \verb\floor(x/y)\). + +The \verb\"+"\ (addition) operator yields the sum of its arguments. +The arguments must either both be numbers, or both sequences of the +same type. In the former case, the numbers are converted to a common +type and then added together. In the latter case, the sequences are +concatenated. +\index{addition} + +The \verb\"-"\ (subtraction) operator yields the difference of its +arguments. The numeric arguments are first converted to a common +type. +\index{subtraction} + +\section{Shifting operations} +\indexii{shifting}{operation} + +The shifting operations have lower priority than the arithmetic +operations: + +\begin{verbatim} +shift_expr: a_expr | shift_expr ( "<<" | ">>" ) a_expr +\end{verbatim} + +These operators accept plain or long integers as arguments. The +arguments are converted to a common type. They shift the first +argument to the left or right by the number of bits given by the +second argument. + +A right shift by $n$ bits is defined as division by $2^n$. A left +shift by $n$ bits is defined as multiplication with $2^n$; for plain +integers there is no overflow check so this drops bits and flip the +sign if the result is not less than $2^{31}$ in absolute value. + +Negative shift counts raise a \verb\ValueError\ exception. +\exindex{ValueError} + +\section{Binary bit-wise operations} +\indexiii{binary}{bit-wise}{operation} + +Each of the three bitwise operations has a different priority level: + +\begin{verbatim} +and_expr: shift_expr | and_expr "&" shift_expr +xor_expr: and_expr | xor_expr "^" and_expr +or_expr: xor_expr | or_expr "|" xor_expr +\end{verbatim} + +The \verb\"&"\ operator yields the bitwise AND of its arguments, which +must be plain or long integers. The arguments are converted to a +common type. +\indexii{bit-wise}{and} + +The \verb\"^"\ operator yields the bitwise XOR (exclusive OR) of its +arguments, which must be plain or long integers. The arguments are +converted to a common type. +\indexii{bit-wise}{xor} +\indexii{exclusive}{or} + +The \verb\"|"\ operator yields the bitwise (inclusive) OR of its +arguments, which must be plain or long integers. The arguments are +converted to a common type. +\indexii{bit-wise}{or} +\indexii{inclusive}{or} + +\section{Comparisons} +\index{comparison} + +Contrary to C, all comparison operations in Python have the same +priority, which is lower than that of any arithmetic, shifting or +bitwise operation. Also contrary to C, expressions like +\verb\a < b < c\ have the interpretation that is conventional in +mathematics: +\index{C} + +\begin{verbatim} +comparison: or_expr (comp_operator or_expr)* +comp_operator: "<"|">"|"=="|">="|"<="|"<>"|"!="|"is" ["not"]|["not"] "in" +\end{verbatim} + +Comparisons yield integer values: 1 for true, 0 for false. + +Comparisons can be chained arbitrarily, e.g. $x < y <= z$ is +equivalent to $x < y$ \verb\and\ $y <= z$, except that $y$ is +evaluated only once (but in both cases $z$ is not evaluated at all +when $x < y$ is found to be false). +\indexii{chaining}{comparisons} + +Formally, $e_0 op_1 e_1 op_2 e_2 ...e_{n-1} op_n e_n$ is equivalent to +$e_0 op_1 e_1$ \verb\and\ $e_1 op_2 e_2$ \verb\and\ ... \verb\and\ +$e_{n-1} op_n e_n$, except that each expression is evaluated at most once. + +Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison +between $e_0$ and $e_2$, e.g. $x < y > z$ is perfectly legal. + +The forms \verb\<>\ and \verb\!=\ are equivalent; for consistency with +C, \verb\!=\ is preferred; where \verb\!=\ is mentioned below +\verb\<>\ is also implied. + +The operators {\tt "<", ">", "==", ">=", "<="}, and {\tt "!="} compare +the values of two objects. The objects needn't have the same type. +If both are numbers, they are coverted to a common type. Otherwise, +objects of different types {\em always} compare unequal, and are +ordered consistently but arbitrarily. + +(This unusual definition of comparison is done to simplify the +definition of operations like sorting and the \verb\in\ and \verb\not +in\ operators.) + +Comparison of objects of the same type depends on the type: + +\begin{itemize} + +\item +Numbers are compared arithmetically. + +\item +Strings are compared lexicographically using the numeric equivalents +(the result of the built-in function \verb\ord\) of their characters. + +\item +Tuples and lists are compared lexicographically using comparison of +corresponding items. + +\item +Mappings (dictionaries) are compared through lexicographic +comparison of their sorted (key, value) lists.% +\footnote{This is expensive since it requires sorting the keys first, +but about the only sensible definition. It was tried to compare +dictionaries by identity only, but this caused surprises because +people expected to be able to test a dictionary for emptiness by +comparing it to {\tt \{\}}.} + +\item +Most other types compare unequal unless they are the same object; +the choice whether one object is considered smaller or larger than +another one is made arbitrarily but consistently within one +execution of a program. + +\end{itemize} + +The operators \verb\in\ and \verb\not in\ test for sequence +membership: if $y$ is a sequence, $x ~\verb\in\~ y$ is true if and +only if there exists an index $i$ such that $x = y[i]$. +$x ~\verb\not in\~ y$ yields the inverse truth value. The exception +\verb\TypeError\ is raised when $y$ is not a sequence, or when $y$ is +a string and $x$ is not a string of length one.% +\footnote{The latter restriction is sometimes a nuisance.} +\opindex{in} +\opindex{not in} +\indexii{membership}{test} +\obindex{sequence} + +The operators \verb\is\ and \verb\is not\ test for object identity: +$x ~\verb\is\~ y$ is true if and only if $x$ and $y$ are the same +object. $x ~\verb\is not\~ y$ yields the inverse truth value. +\opindex{is} +\opindex{is not} +\indexii{identity}{test} + +\section{Boolean operations} \label{Booleans} +\indexii{Boolean}{operation} + +Boolean operations have the lowest priority of all Python operations: + +\begin{verbatim} +condition: or_test +or_test: and_test | or_test "or" and_test +and_test: not_test | and_test "and" not_test +not_test: comparison | "not" not_test +\end{verbatim} + +In the context of Boolean operations, and also when conditions are +used by control flow statements, the following values are interpreted +as false: \verb\None\, numeric zero of all types, empty sequences +(strings, tuples and lists), and empty mappings (dictionaries). All +other values are interpreted as true. + +The operator \verb\not\ yields 1 if its argument is false, 0 otherwise. +\opindex{not} + +The condition $x ~\verb\and\~ y$ first evaluates $x$; if $x$ is false, +its value is returned; otherwise, $y$ is evaluated and the resulting +value is returned. +\opindex{and} + +The condition $x ~\verb\or\~ y$ first evaluates $x$; if $x$ is true, +its value is returned; otherwise, $y$ is evaluated and the resulting +value is returned. +\opindex{or} + +(Note that \verb\and\ and \verb\or\ do not restrict the value and type +they return to 0 and 1, but rather return the last evaluated argument. +This is sometimes useful, e.g. if \verb\s\ is a string that should be +replaced by a default value if it is empty, the expression +\verb\s or 'foo'\ yields the desired value. Because \verb\not\ has to +invent a value anyway, it does not bother to return a value of the +same type as its argument, so e.g. \verb\not 'foo'\ yields \verb\0\, +not \verb\''\.) + +\section{Expression lists and condition lists} +\indexii{expression}{list} +\indexii{condition}{list} + +\begin{verbatim} +expr_list: or_expr ("," or_expr)* [","] +cond_list: condition ("," condition)* [","] +\end{verbatim} + +The only difference between expression lists and condition lists is +the lowest priority of operators that can be used in them without +being enclosed in parentheses; condition lists allow all operators, +while expression lists don't allow comparisons and Boolean operators +(they do allow bitwise and shift operators though). + +Expression lists are used in expression statements and assignments; +condition lists are used everywhere else where a list of +comma-separated values is required. + +An expression (condition) list containing at least one comma yields a +tuple. The length of the tuple is the number of expressions +(conditions) in the list. The expressions (conditions) are evaluated +from left to right. (Conditions lists are used syntactically is a few +places where no tuple is constructed but a list of values is needed +nevertheless.) +\obindex{tuple} + +The trailing comma is required only to create a single tuple (a.k.a. a +{\em singleton}); it is optional in all other cases. A single +expression (condition) without a trailing comma doesn't create a +tuple, but rather yields the value of that expression (condition). +\indexii{trailing}{comma} + +(To create an empty tuple, use an empty pair of parentheses: +\verb\()\.) diff --git a/Doc/ref1.tex b/Doc/ref1.tex new file mode 100644 index 0000000..b373e36 --- /dev/null +++ b/Doc/ref1.tex @@ -0,0 +1,81 @@ +\chapter{Introduction} + +This reference manual describes the Python programming language. +It is not intended as a tutorial. + +While I am trying to be as precise as possible, I chose to use English +rather than formal specifications for everything except syntax and +lexical analysis. This should make the document better understandable +to the average reader, but will leave room for ambiguities. +Consequently, if you were coming from Mars and tried to re-implement +Python from this document alone, you might have to guess things and in +fact you would probably end up implementing quite a different language. +On the other hand, if you are using +Python and wonder what the precise rules about a particular area of +the language are, you should definitely be able to find them here. + +It is dangerous to add too many implementation details to a language +reference document --- the implementation may change, and other +implementations of the same language may work differently. On the +other hand, there is currently only one Python implementation, and +its particular quirks are sometimes worth being mentioned, especially +where the implementation imposes additional limitations. Therefore, +you'll find short ``implementation notes'' sprinkled throughout the +text. + +Every Python implementation comes with a number of built-in and +standard modules. These are not documented here, but in the separate +{\em Python Library Reference} document. A few built-in modules are +mentioned when they interact in a significant way with the language +definition. + +\section{Notation} + +The descriptions of lexical analysis and syntax use a modified BNF +grammar notation. This uses the following style of definition: +\index{BNF} +\index{grammar} +\index{syntax} +\index{notation} + +\begin{verbatim} +name: lc_letter (lc_letter | "_")* +lc_letter: "a"..."z" +\end{verbatim} + +The first line says that a \verb\name\ is an \verb\lc_letter\ followed by +a sequence of zero or more \verb\lc_letter\s and underscores. An +\verb\lc_letter\ in turn is any of the single characters `a' through `z'. +(This rule is actually adhered to for the names defined in lexical and +grammar rules in this document.) + +Each rule begins with a name (which is the name defined by the rule) +and a colon. A vertical bar (\verb\|\) is used to separate +alternatives; it is the least binding operator in this notation. A +star (\verb\*\) means zero or more repetitions of the preceding item; +likewise, a plus (\verb\+\) means one or more repetitions, and a +phrase enclosed in square brackets (\verb\[ ]\) means zero or one +occurrences (in other words, the enclosed phrase is optional). The +\verb\*\ and \verb\+\ operators bind as tightly as possible; +parentheses are used for grouping. Literal strings are enclosed in +double quotes. White space is only meaningful to separate tokens. +Rules are normally contained on a single line; rules with many +alternatives may be formatted alternatively with each line after the +first beginning with a vertical bar. + +In lexical definitions (as the example above), two more conventions +are used: Two literal characters separated by three dots mean a choice +of any single character in the given (inclusive) range of ASCII +characters. A phrase between angular brackets (\verb\<...>\) gives an +informal description of the symbol defined; e.g. this could be used +to describe the notion of `control character' if needed. +\index{lexical definitions} +\index{ASCII} + +Even though the notation used is almost the same, there is a big +difference between the meaning of lexical and syntactic definitions: +a lexical definition operates on the individual characters of the +input source, while a syntax definition operates on the stream of +tokens generated by the lexical analysis. All uses of BNF in the next +chapter (``Lexical Analysis'') are lexical definitions; uses in +subsequent chapters are syntactic definitions. diff --git a/Doc/ref2.tex b/Doc/ref2.tex new file mode 100644 index 0000000..250bd2e --- /dev/null +++ b/Doc/ref2.tex @@ -0,0 +1,349 @@ +\chapter{Lexical analysis} + +A Python program is read by a {\em parser}. Input to the parser is a +stream of {\em tokens}, generated by the {\em lexical analyzer}. This +chapter describes how the lexical analyzer breaks a file into tokens. +\index{lexical analysis} +\index{parser} +\index{token} + +\section{Line structure} + +A Python program is divided in a number of logical lines. The end of +a logical line is represented by the token NEWLINE. Statements cannot +cross logical line boundaries except where NEWLINE is allowed by the +syntax (e.g. between statements in compound statements). +\index{line structure} +\index{logical line} +\index{NEWLINE token} + +\subsection{Comments} + +A comment starts with a hash character (\verb\#\) that is not part of +a string literal, and ends at the end of the physical line. A comment +always signifies the end of the logical line. Comments are ignored by +the syntax. +\index{comment} +\index{logical line} +\index{physical line} +\index{hash character} + +\subsection{Line joining} + +Two or more physical lines may be joined into logical lines using +backslash characters (\verb/\/), as follows: when a physical line ends +in a backslash that is not part of a string literal or comment, it is +joined with the following forming a single logical line, deleting the +backslash and the following end-of-line character. For example: +\index{physical line} +\index{line joining} +\index{backslash character} +% +\begin{verbatim} +month_names = ['Januari', 'Februari', 'Maart', \ + 'April', 'Mei', 'Juni', \ + 'Juli', 'Augustus', 'September', \ + 'Oktober', 'November', 'December'] +\end{verbatim} + +\subsection{Blank lines} + +A logical line that contains only spaces, tabs, and possibly a +comment, is ignored (i.e., no NEWLINE token is generated), except that +during interactive input of statements, an entirely blank logical line +terminates a multi-line statement. +\index{blank line} + +\subsection{Indentation} + +Leading whitespace (spaces and tabs) at the beginning of a logical +line is used to compute the indentation level of the line, which in +turn is used to determine the grouping of statements. +\index{indentation} +\index{whitespace} +\index{leading whitespace} +\index{space} +\index{tab} +\index{grouping} +\index{statement grouping} + +First, tabs are replaced (from left to right) by one to eight spaces +such that the total number of characters up to there is a multiple of +eight (this is intended to be the same rule as used by {\UNIX}). The +total number of spaces preceding the first non-blank character then +determines the line's indentation. Indentation cannot be split over +multiple physical lines using backslashes. + +The indentation levels of consecutive lines are used to generate +INDENT and DEDENT tokens, using a stack, as follows. +\index{INDENT token} +\index{DEDENT token} + +Before the first line of the file is read, a single zero is pushed on +the stack; this will never be popped off again. The numbers pushed on +the stack will always be strictly increasing from bottom to top. At +the beginning of each logical line, the line's indentation level is +compared to the top of the stack. If it is equal, nothing happens. +If it is larger, it is pushed on the stack, and one INDENT token is +generated. If it is smaller, it {\em must} be one of the numbers +occurring on the stack; all numbers on the stack that are larger are +popped off, and for each number popped off a DEDENT token is +generated. At the end of the file, a DEDENT token is generated for +each number remaining on the stack that is larger than zero. + +Here is an example of a correctly (though confusingly) indented piece +of Python code: + +\begin{verbatim} +def perm(l): + # Compute the list of all permutations of l + + if len(l) <= 1: + return [l] + r = [] + for i in range(len(l)): + s = l[:i] + l[i+1:] + p = perm(s) + for x in p: + r.append(l[i:i+1] + x) + return r +\end{verbatim} + +The following example shows various indentation errors: + +\begin{verbatim} + def perm(l): # error: first line indented + for i in range(len(l)): # error: not indented + s = l[:i] + l[i+1:] + p = perm(l[:i] + l[i+1:]) # error: unexpected indent + for x in p: + r.append(l[i:i+1] + x) + return r # error: inconsistent dedent +\end{verbatim} + +(Actually, the first three errors are detected by the parser; only the +last error is found by the lexical analyzer --- the indentation of +\verb\return r\ does not match a level popped off the stack.) + +\section{Other tokens} + +Besides NEWLINE, INDENT and DEDENT, the following categories of tokens +exist: identifiers, keywords, literals, operators, and delimiters. +Spaces and tabs are not tokens, but serve to delimit tokens. Where +ambiguity exists, a token comprises the longest possible string that +forms a legal token, when read from left to right. + +\section{Identifiers} + +Identifiers (also referred to as names) are described by the following +lexical definitions: +\index{identifier} +\index{name} + +\begin{verbatim} +identifier: (letter|"_") (letter|digit|"_")* +letter: lowercase | uppercase +lowercase: "a"..."z" +uppercase: "A"..."Z" +digit: "0"..."9" +\end{verbatim} + +Identifiers are unlimited in length. Case is significant. + +\subsection{Keywords} + +The following identifiers are used as reserved words, or {\em +keywords} of the language, and cannot be used as ordinary +identifiers. They must be spelled exactly as written here: +\index{keyword} +\index{reserved word} + +\begin{verbatim} +and del for in print +break elif from is raise +class else global not return +continue except if or try +def finally import pass while +\end{verbatim} + +% # This Python program sorts and formats the above table +% import string +% l = [] +% try: +% while 1: +% l = l + string.split(raw_input()) +% except EOFError: +% pass +% l.sort() +% for i in range((len(l)+4)/5): +% for j in range(i, len(l), 5): +% print string.ljust(l[j], 10), +% print + +\section{Literals} \label{literals} + +Literals are notations for constant values of some built-in types. +\index{literal} +\index{constant} + +\subsection{String literals} + +String literals are described by the following lexical definitions: +\index{string literal} + +\begin{verbatim} +stringliteral: "'" stringitem* "'" +stringitem: stringchar | escapeseq +stringchar: +escapeseq: "'" +\end{verbatim} +\index{ASCII} + +String literals cannot span physical line boundaries. Escape +sequences in strings are actually interpreted according to rules +similar to those used by Standard C. The recognized escape sequences +are: +\index{physical line} +\index{escape sequence} +\index{Standard C} +\index{C} + +\begin{center} +\begin{tabular}{|l|l|} +\hline +\verb/\\/ & Backslash (\verb/\/) \\ +\verb/\'/ & Single quote (\verb/'/) \\ +\verb/\a/ & ASCII Bell (BEL) \\ +\verb/\b/ & ASCII Backspace (BS) \\ +%\verb/\E/ & ASCII Escape (ESC) \\ +\verb/\f/ & ASCII Formfeed (FF) \\ +\verb/\n/ & ASCII Linefeed (LF) \\ +\verb/\r/ & ASCII Carriage Return (CR) \\ +\verb/\t/ & ASCII Horizontal Tab (TAB) \\ +\verb/\v/ & ASCII Vertical Tab (VT) \\ +\verb/\/{\em ooo} & ASCII character with octal value {\em ooo} \\ +\verb/\x/{\em xx...} & ASCII character with hex value {\em xx...} \\ +\hline +\end{tabular} +\end{center} +\index{ASCII} + +In strict compatibility with Standard C, up to three octal digits are +accepted, but an unlimited number of hex digits is taken to be part of +the hex escape (and then the lower 8 bits of the resulting hex number +are used in all current implementations...). + +All unrecognized escape sequences are left in the string unchanged, +i.e., {\em the backslash is left in the string.} (This behavior is +useful when debugging: if an escape sequence is mistyped, the +resulting output is more easily recognized as broken. It also helps a +great deal for string literals used as regular expressions or +otherwise passed to other modules that do their own escape handling.) +\index{unrecognized escape sequence} + +\subsection{Numeric literals} + +There are three types of numeric literals: plain integers, long +integers, and floating point numbers. +\index{number} +\index{numeric literal} +\index{integer literal} +\index{plain integer literal} +\index{long integer literal} +\index{floating point literal} +\index{hexadecimal literal} +\index{octal literal} +\index{decimal literal} + +Integer and long integer literals are described by the following +lexical definitions: + +\begin{verbatim} +longinteger: integer ("l"|"L") +integer: decimalinteger | octinteger | hexinteger +decimalinteger: nonzerodigit digit* | "0" +octinteger: "0" octdigit+ +hexinteger: "0" ("x"|"X") hexdigit+ + +nonzerodigit: "1"..."9" +octdigit: "0"..."7" +hexdigit: digit|"a"..."f"|"A"..."F" +\end{verbatim} + +Although both lower case `l' and upper case `L' are allowed as suffix +for long integers, it is strongly recommended to always use `L', since +the letter `l' looks too much like the digit `1'. + +Plain integer decimal literals must be at most $2^{31} - 1$ (i.e., the +largest positive integer, assuming 32-bit arithmetic). Plain octal and +hexadecimal literals may be as large as $2^{32} - 1$, but values +larger than $2^{31} - 1$ are converted to a negative value by +subtracting $2^{32}$. There is no limit for long integer literals. + +Some examples of plain and long integer literals: + +\begin{verbatim} +7 2147483647 0177 0x80000000 +3L 79228162514264337593543950336L 0377L 0x100000000L +\end{verbatim} + +Floating point literals are described by the following lexical +definitions: + +\begin{verbatim} +floatnumber: pointfloat | exponentfloat +pointfloat: [intpart] fraction | intpart "." +exponentfloat: (intpart | pointfloat) exponent +intpart: digit+ +fraction: "." digit+ +exponent: ("e"|"E") ["+"|"-"] digit+ +\end{verbatim} + +The allowed range of floating point literals is +implementation-dependent. + +Some examples of floating point literals: + +\begin{verbatim} +3.14 10. .001 1e100 3.14e-10 +\end{verbatim} + +Note that numeric literals do not include a sign; a phrase like +\verb\-1\ is actually an expression composed of the operator +\verb\-\ and the literal \verb\1\. + +\section{Operators} + +The following tokens are operators: +\index{operators} + +\begin{verbatim} ++ - * / % +<< >> & | ^ ~ +< == > <= <> != >= +\end{verbatim} + +The comparison operators \verb\<>\ and \verb\!=\ are alternate +spellings of the same operator. + +\section{Delimiters} + +The following tokens serve as delimiters or otherwise have a special +meaning: +\index{delimiters} + +\begin{verbatim} +( ) [ ] { } +; , : . ` = +\end{verbatim} + +The following printing ASCII characters are not used in Python. Their +occurrence outside string literals and comments is an unconditional +error: +\index{ASCII} + +\begin{verbatim} +@ $ " ? +\end{verbatim} + +They may be used by future versions of the language though! diff --git a/Doc/ref3.tex b/Doc/ref3.tex new file mode 100644 index 0000000..fff448e --- /dev/null +++ b/Doc/ref3.tex @@ -0,0 +1,705 @@ +\chapter{Data model} + +\section{Objects, values and types} + +{\em Objects} are Python's abstraction for data. All data in a Python +program is represented by objects or by relations between objects. +(In a sense, and in conformance to Von Neumann's model of a +``stored program computer'', code is also represented by objects.) +\index{object} +\index{data} + +Every object has an identity, a type and a value. An object's {\em +identity} never changes once it has been created; you may think of it +as the object's address in memory. An object's {\em type} is also +unchangeable. It determines the operations that an object supports +(e.g. ``does it have a length?'') and also defines the possible +values for objects of that type. The {\em value} of some objects can +change. Objects whose value can change are said to be {\em mutable}; +objects whose value is unchangeable once they are created are called +{\em immutable}. The type determines an object's (im)mutability. +\index{identity of an object} +\index{value of an object} +\index{type of an object} +\index{mutable object} +\index{immutable object} + +Objects are never explicitly destroyed; however, when they become +unreachable they may be garbage-collected. An implementation is +allowed to delay garbage collection or omit it altogether --- it is a +matter of implementation quality how garbage collection is +implemented, as long as no objects are collected that are still +reachable. (Implementation note: the current implementation uses a +reference-counting scheme which collects most objects as soon as they +become unreachable, but never collects garbage containing circular +references.) +\index{garbage collection} +\index{reference counting} +\index{unreachable object} + +Note that the use of the implementation's tracing or debugging +facilities may keep objects alive that would normally be collectable. + +Some objects contain references to ``external'' resources such as open +files or windows. It is understood that these resources are freed +when the object is garbage-collected, but since garbage collection is +not guaranteed to happen, such objects also provide an explicit way to +release the external resource, usually a \verb\close\ method. +Programs are strongly recommended to always explicitly close such +objects. + +Some objects contain references to other objects; these are called +{\em containers}. Examples of containers are tuples, lists and +dictionaries. The references are part of a container's value. In +most cases, when we talk about the value of a container, we imply the +values, not the identities of the contained objects; however, when we +talk about the (im)mutability of a container, only the identities of +the immediately contained objects are implied. (So, if an immutable +container contains a reference to a mutable object, its value changes +if that mutable object is changed.) +\index{container} + +Types affect almost all aspects of objects' lives. Even the meaning +of object identity is affected in some sense: for immutable types, +operations that compute new values may actually return a reference to +any existing object with the same type and value, while for mutable +objects this is not allowed. E.g. after + +\begin{verbatim} +a = 1; b = 1; c = []; d = [] +\end{verbatim} + +\verb\a\ and \verb\b\ may or may not refer to the same object with the +value one, depending on the implementation, but \verb\c\ and \verb\d\ +are guaranteed to refer to two different, unique, newly created empty +lists. + +\section{The standard type hierarchy} \label{types} + +Below is a list of the types that are built into Python. Extension +modules written in C can define additional types. Future versions of +Python may add types to the type hierarchy (e.g. rational or complex +numbers, efficiently stored arrays of integers, etc.). +\index{type} +\indexii{data}{type} +\indexii{type}{hierarchy} +\indexii{extension}{module} +\index{C} + +Some of the type descriptions below contain a paragraph listing +`special attributes'. These are attributes that provide access to the +implementation and are not intended for general use. Their definition +may change in the future. There are also some `generic' special +attributes, not listed with the individual objects: \verb\__methods__\ +is a list of the method names of a built-in object, if it has any; +\verb\__members__\ is a list of the data attribute names of a built-in +object, if it has any. +\index{attribute} +\indexii{special}{attribute} +\indexiii{generic}{special}{attribute} +\ttindex{__methods__} +\ttindex{__members__} + +\begin{description} + +\item[None] +This type has a single value. There is a single object with this value. +This object is accessed through the built-in name \verb\None\. +It is returned from functions that don't explicitly return an object. +\ttindex{None} +\obindex{None@{\tt None}} + +\item[Numbers] +These are created by numeric literals and returned as results by +arithmetic operators and arithmetic built-in functions. Numeric +objects are immutable; once created their value never changes. Python +numbers are of course strongly related to mathematical numbers, but +subject to the limitations of numerical representation in computers. +\obindex{number} +\obindex{numeric} + +Python distinguishes between integers and floating point numbers: + +\begin{description} +\item[Integers] +These represent elements from the mathematical set of whole numbers. +\obindex{integer} + +There are two types of integers: + +\begin{description} + +\item[Plain integers] +These represent numbers in the range $-2^{31}$ through $2^{31}-1$. +(The range may be larger on machines with a larger natural word +size, but not smaller.) +When the result of an operation falls outside this range, the +exception \verb\OverflowError\ is raised. +For the purpose of shift and mask operations, integers are assumed to +have a binary, 2's complement notation using 32 or more bits, and +hiding no bits from the user (i.e., all $2^{32}$ different bit +patterns correspond to different values). +\obindex{plain integer} + +\item[Long integers] +These represent numbers in an unlimited range, subject to available +(virtual) memory only. For the purpose of shift and mask operations, +a binary representation is assumed, and negative numbers are +represented in a variant of 2's complement which gives the illusion of +an infinite string of sign bits extending to the left. +\obindex{long integer} + +\end{description} % Integers + +The rules for integer representation are intended to give the most +meaningful interpretation of shift and mask operations involving +negative integers and the least surprises when switching between the +plain and long integer domains. For any operation except left shift, +if it yields a result in the plain integer domain without causing +overflow, it will yield the same result in the long integer domain or +when using mixed operands. +\indexii{integer}{representation} + +\item[Floating point numbers] +These represent machine-level double precision floating point numbers. +You are at the mercy of the underlying machine architecture and +C implementation for the accepted range and handling of overflow. +\obindex{floating point} +\indexii{floating point}{number} +\index{C} + +\end{description} % Numbers + +\item[Sequences] +These represent finite ordered sets indexed by natural numbers. +The built-in function \verb\len()\ returns the number of elements +of a sequence. When this number is $n$, the index set contains +the numbers $0, 1, \ldots, n-1$. Element \verb\i\ of sequence +\verb\a\ is selected by \verb\a[i]\. +\obindex{seqence} +\bifuncindex{len} +\index{index operation} +\index{item selection} +\index{subscription} + +Sequences also support slicing: \verb\a[i:j]\ selects all elements +with index $k$ such that $i <= k < j$. When used as an expression, +a slice is a sequence of the same type --- this implies that the +index set is renumbered so that it starts at 0 again. +\index{slicing} + +Sequences are distinguished according to their mutability: + +\begin{description} +% +\item[Immutable sequences] +An object of an immutable sequence type cannot change once it is +created. (If the object contains references to other objects, +these other objects may be mutable and may be changed; however +the collection of objects directly referenced by an immutable object +cannot change.) +\obindex{immutable sequence} +\obindex{immutable} + +The following types are immutable sequences: + +\begin{description} + +\item[Strings] +The elements of a string are characters. There is no separate +character type; a character is represented by a string of one element. +Characters represent (at least) 8-bit bytes. The built-in +functions \verb\chr()\ and \verb\ord()\ convert between characters +and nonnegative integers representing the byte values. +Bytes with the values 0-127 represent the corresponding ASCII values. +The string data type is also used to represent arrays of bytes, e.g. +to hold data read from a file. +\obindex{string} +\index{character} +\index{byte} +\index{ASCII} +\bifuncindex{chr} +\bifuncindex{ord} + +(On systems whose native character set is not ASCII, strings may use +EBCDIC in their internal representation, provided the functions +\verb\chr()\ and \verb\ord()\ implement a mapping between ASCII and +EBCDIC, and string comparison preserves the ASCII order. +Or perhaps someone can propose a better rule?) +\index{ASCII} +\index{EBCDIC} +\index{character set} +\indexii{string}{comparison} +\bifuncindex{chr} +\bifuncindex{ord} + +\item[Tuples] +The elements of a tuple are arbitrary Python objects. +Tuples of two or more elements are formed by comma-separated lists +of expressions. A tuple of one element (a `singleton') can be formed +by affixing a comma to an expression (an expression by itself does +not create a tuple, since parentheses must be usable for grouping of +expressions). An empty tuple can be formed by enclosing `nothing' in +parentheses. +\obindex{tuple} +\indexii{singleton}{tuple} +\indexii{empty}{tuple} + +\end{description} % Immutable sequences + +\item[Mutable sequences] +Mutable sequences can be changed after they are created. The +subscription and slicing notations can be used as the target of +assignment and \verb\del\ (delete) statements. +\obindex{mutable sequece} +\obindex{mutable} +\indexii{assignment}{statement} +\index{delete} +\stindex{del} +\index{subscription} +\index{slicing} + +There is currently a single mutable sequence type: + +\begin{description} + +\item[Lists] +The elements of a list are arbitrary Python objects. Lists are formed +by placing a comma-separated list of expressions in square brackets. +(Note that there are no special cases needed to form lists of length 0 +or 1.) +\obindex{list} + +\end{description} % Mutable sequences + +\end{description} % Sequences + +\item[Mapping types] +These represent finite sets of objects indexed by arbitrary index sets. +The subscript notation \verb\a[k]\ selects the element indexed +by \verb\k\ from the mapping \verb\a\; this can be used in +expressions and as the target of assignments or \verb\del\ statements. +The built-in function \verb\len()\ returns the number of elements +in a mapping. +\bifuncindex{len} +\index{subscription} +\obindex{mapping} + +There is currently a single mapping type: + +\begin{description} + +\item[Dictionaries] +These represent finite sets of objects indexed by strings. +Dictionaries are mutable; they are created by the \verb\{...}\ +notation (see section \ref{dict}). (Implementation note: the strings +used for indexing must not contain null bytes.) +\obindex{dictionary} +\obindex{mutable} + +\end{description} % Mapping types + +\item[Callable types] +These are the types to which the function call (invocation) operation, +written as \verb\function(argument, argument, ...)\, can be applied: +\indexii{function}{call} +\index{invocation} +\indexii{function}{argument} +\obindex{callable} + +\begin{description} + +\item[User-defined functions] +A user-defined function object is created by a function definition +(see section \ref{function}). It should be called with an argument +list containing the same number of items as the function's formal +parameter list. +\indexii{user-defined}{function} +\obindex{function} +\obindex{user-defined function} + +Special read-only attributes: \verb\func_code\ is the code object +representing the compiled function body, and \verb\func_globals\ is (a +reference to) the dictionary that holds the function's global +variables --- it implements the global name space of the module in +which the function was defined. +\ttindex{func_code} +\ttindex{func_globals} +\indexii{global}{name space} + +\item[User-defined methods] +A user-defined method (a.k.a. {\em object closure}) is a pair of a +class instance object and a user-defined function. It should be +called with an argument list containing one item less than the number +of items in the function's formal parameter list. When called, the +class instance becomes the first argument, and the call arguments are +shifted one to the right. +\obindex{method} +\obindex{user-defined method} +\indexii{user-defined}{method} +\index{object closure} + +Special read-only attributes: \verb\im_self\ is the class instance +object, \verb\im_func\ is the function object. +\ttindex{im_func} +\ttindex{im_self} + +\item[Built-in functions] +A built-in function object is a wrapper around a C function. Examples +of built-in functions are \verb\len\ and \verb\math.sin\. There +are no special attributes. The number and type of the arguments are +determined by the C function. +\obindex{built-in function} +\obindex{function} +\index{C} + +\item[Built-in methods] +This is really a different disguise of a built-in function, this time +containing an object passed to the C function as an implicit extra +argument. An example of a built-in method is \verb\list.append\ if +\verb\list\ is a list object. +\obindex{built-in method} +\obindex{method} +\indexii{built-in}{method} + +\item[Classes] +Class objects are described below. When a class object is called as a +parameterless function, a new class instance (also described below) is +created and returned. The class's initialization function is not +called --- this is the responsibility of the caller. It is illegal to +call a class object with one or more arguments. +\obindex{class} +\obindex{class instance} +\obindex{instance} +\indexii{class object}{call} + +\end{description} + +\item[Modules] +Modules are imported by the \verb\import\ statement (see section +\ref{import}). A module object is a container for a module's name +space, which is a dictionary (the same dictionary as referenced by the +\verb\func_globals\ attribute of functions defined in the module). +Module attribute references are translated to lookups in this +dictionary. A module object does not contain the code object used to +initialize the module (since it isn't needed once the initialization +is done). +\stindex{import} +\obindex{module} + +Attribute assignment update the module's name space dictionary. + +Special read-only attributes: \verb\__dict__\ yields the module's name +space as a dictionary object; \verb\__name__\ yields the module's name +as a string object. +\ttindex{__dict__} +\ttindex{__name__} +\indexii{module}{name space} + +\item[Classes] +Class objects are created by class definitions (see section +\ref{class}). A class is a container for a dictionary containing the +class's name space. Class attribute references are translated to +lookups in this dictionary. When an attribute name is not found +there, the attribute search continues in the base classes. The search +is depth-first, left-to-right in the order of their occurrence in the +base class list. +\obindex{class} +\obindex{class instance} +\obindex{instance} +\indexii{class object}{call} +\index{container} +\index{dictionary} +\indexii{class}{attribute} + +Class attribute assignments update the class's dictionary, never the +dictionary of a base class. +\indexiii{class}{attribute}{assignment} + +A class can be called as a parameterless function to yield a class +instance (see above). +\indexii{class object}{call} + +Special read-only attributes: \verb\__dict__\ yields the dictionary +containing the class's name space; \verb\__bases__\ yields a tuple +(possibly empty or a singleton) containing the base classes, in the +order of their occurrence in the base class list. +\ttindex{__dict__} +\ttindex{__bases__} + +\item[Class instances] +A class instance is created by calling a class object as a +parameterless function. A class instance has a dictionary in which +attribute references are searched. When an attribute is not found +there, and the instance's class has an attribute by that name, and +that class attribute is a user-defined function (and in no other +cases), the instance attribute reference yields a user-defined method +object (see above) constructed from the instance and the function. +\obindex{class instance} +\obindex{instance} +\indexii{class}{instance} +\indexii{class instance}{attribute} + +Attribute assignments update the instance's dictionary. +\indexiii{class instance}{attribute}{assignment} + +Class instances can pretend to be numbers, sequences, or mappings if +they have methods with certain special names. These are described in +section \ref{specialnames}. +\obindex{number} +\obindex{sequence} +\obindex{mapping} + +Special read-only attributes: \verb\__dict__\ yields the attribute +dictionary; \verb\__class__\ yields the instance's class. +\ttindex{__dict__} +\ttindex{__class__} + +\item[Files] +A file object represents an open file. (It is a wrapper around a C +{\tt stdio} file pointer.) File objects are created by the +\verb\open()\ built-in function, and also by \verb\posix.popen()\ and +the \verb\makefile\ method of socket objects. \verb\sys.stdin\, +\verb\sys.stdout\ and \verb\sys.stderr\ are file objects corresponding +the the interpreter's standard input, output and error streams. +See the Python Library Reference for methods of file objects and other +details. +\obindex{file} +\index{C} +\index{stdio} +\bifuncindex{open} +\bifuncindex{popen} +\bifuncindex{makefile} +\ttindex{stdin} +\ttindex{stdout} +\ttindex{stderr} +\ttindex{sys.stdin} +\ttindex{sys.stdout} +\ttindex{sys.stderr} + +\item[Internal types] +A few types used internally by the interpreter are exposed to the user. +Their definition may change with future versions of the interpreter, +but they are mentioned here for completeness. +\index{internal type} + +\begin{description} + +\item[Code objects] +Code objects represent executable code. The difference between a code +object and a function object is that the function object contains an +explicit reference to the function's context (the module in which it +was defined) which a code object contains no context. There is no way +to execute a bare code object. +\obindex{code} + +Special read-only attributes: \verb\co_code\ is a string representing +the sequence of instructions; \verb\co_consts\ is a list of literals +used by the code; \verb\co_names\ is a list of names (strings) used by +the code; \verb\co_filename\ is the filename from which the code was +compiled. (To find out the line numbers, you would have to decode the +instructions; the standard library module \verb\dis\ contains an +example of how to do this.) +\ttindex{co_code} +\ttindex{co_consts} +\ttindex{co_names} +\ttindex{co_filename} + +\item[Frame objects] +Frame objects represent execution frames. They may occur in traceback +objects (see below). +\obindex{frame} + +Special read-only attributes: \verb\f_back\ is to the previous +stack frame (towards the caller), or \verb\None\ if this is the bottom +stack frame; \verb\f_code\ is the code object being executed in this +frame; \verb\f_globals\ is the dictionary used to look up global +variables; \verb\f_locals\ is used for local variables; +\verb\f_lineno\ gives the line number and \verb\f_lasti\ gives the +precise instruction (this is an index into the instruction string of +the code object). +\ttindex{f_back} +\ttindex{f_code} +\ttindex{f_globals} +\ttindex{f_locals} +\ttindex{f_lineno} +\ttindex{f_lasti} + +\item[Traceback objects] +Traceback objects represent a stack trace of an exception. A +traceback object is created when an exception occurs. When the search +for an exception handler unwinds the execution stack, at each unwound +level a traceback object is inserted in front of the current +traceback. When an exception handler is entered, the stack trace is +made available to the program as \verb\sys.exc_traceback\. When the +program contains no suitable handler, the stack trace is written +(nicely formatted) to the standard error stream; if the interpreter is +interactive, it is also made available to the user as +\verb\sys.last_traceback\. +\obindex{traceback} +\indexii{stack}{trace} +\indexii{exception}{handler} +\indexii{execution}{stack} +\ttindex{exc_traceback} +\ttindex{last_traceback} +\ttindex{sys.exc_traceback} +\ttindex{sys.last_traceback} + +Special read-only attributes: \verb\tb_next\ is the next level in the +stack trace (towards the frame where the exception occurred), or +\verb\None\ if there is no next level; \verb\tb_frame\ points to the +execution frame of the current level; \verb\tb_lineno\ gives the line +number where the exception occurred; \verb\tb_lasti\ indicates the +precise instruction. The line number and last instruction in the +traceback may differ from the line number of its frame object if the +exception occurred in a \verb\try\ statement with no matching +\verb\except\ clause or with a \verb\finally\ clause. +\ttindex{tb_next} +\ttindex{tb_frame} +\ttindex{tb_lineno} +\ttindex{tb_lasti} +\stindex{try} + +\end{description} % Internal types + +\end{description} % Types + + +\section{Special method names} \label{specialnames} + +A class can implement certain operations that are invoked by special +syntax (such as subscription or arithmetic operations) by defining +methods with special names. For instance, if a class defines a +method named \verb\__getitem__\, and \verb\x\ is an instance of this +class, then \verb\x[i]\ is equivalent to \verb\x.__getitem__(i)\. +(The reverse is not true --- if \verb\x\ is a list object, +\verb\x.__getitem__(i)\ is not equivalent to \verb\x[i]\.) + +Except for \verb\__repr__\ and \verb\__cmp__\, attempts to execute an +operation raise an exception when no appropriate method is defined. +For \verb\__repr__\ and \verb\__cmp__\, the traditional +interpretations are used in this case. + + +\subsection{Special methods for any type} + +\begin{description} + +\item[\tt __repr__(self)] +Called by the \verb\print\ statement and conversions (reverse quotes) to +compute the string representation of an object. + +\item[\tt _cmp__(self, other)] +Called by all comparison operations. Should return -1 if +\verb\self < other\, 0 if \verb\self == other\, +1 if +\verb\self > other\. (Implementation note: due to limitations in the +interpreter, exceptions raised by comparisons are ignored, and the +objects will be considered equal in this case.) + +\end{description} + + +\subsection{Special methods for sequence and mapping types} + +\begin{description} + +\item[\tt __len__(self)] +Called to implement the built-in function \verb\len()\. Should return +the length of the object, an integer \verb\>=\ 0. Also, an object +whose \verb\__len__()\ method returns 0 is considered to be false in a +Boolean context. + +\item[\tt __getitem__(self, key)] +Called to implement evaluation of \verb\self[key]\. Note that the +special interpretation of negative keys (if the class wishes to +emulate a sequence type) is up to the \verb\__getitem__\ method. + +\item[\tt __setitem__(self, key, value)] +Called to implement assignment to \verb\self[key]\. Same note as for +\verb\__getitem__\. + +\item[\tt __delitem__(self, key)] +Called to implement deletion of \verb\self[key]\. Same note as for +\verb\__getitem__\. + +\end{description} + + +\subsection{Special methods for sequence types} + +\begin{description} + +\item[\tt __getslice__(self, i, j)] +Called to implement evaluation of \verb\self[i:j]\. Note that missing +\verb\i\ or \verb\j\ are replaced by 0 or \verb\len(self)\, +respectively, and \verb\len(self)\ has been added (once) to originally +negative \verb\i\ or \verb\j\ by the time this function is called +(unlike for \verb\__getitem__\). + +\item[\tt __setslice__(self, i, j, sequence)] +Called to implement assignment to \verb\self[i:j]\. Same notes as for +\verb\__getslice__\. + +\item[\tt __delslice__(self, i, j)] +Called to implement deletion of \verb\self[i:j]\. Same notes as for +\verb\__getslice__\. + +\end{description} + + +\subsection{Special methods for numeric types} + +\begin{description} + +\item[\tt __add__(self, other)]\itemjoin +\item[\tt __sub__(self, other)]\itemjoin +\item[\tt __mul__(self, other)]\itemjoin +\item[\tt __div__(self, other)]\itemjoin +\item[\tt __mod__(self, other)]\itemjoin +\item[\tt __divmod__(self, other)]\itemjoin +\item[\tt __pow__(self, other)]\itemjoin +\item[\tt __lshift__(self, other)]\itemjoin +\item[\tt __rshift__(self, other)]\itemjoin +\item[\tt __and__(self, other)]\itemjoin +\item[\tt __xor__(self, other)]\itemjoin +\item[\tt __or__(self, other)]\itembreak +Called to implement the binary arithmetic operations (\verb\+\, +\verb\-\, \verb\*\, \verb\/\, \verb\%\, \verb\divmod()\, \verb\pow()\, +\verb\<<\, \verb\>>\, \verb\&\, \verb\^\, \verb\|\). + +\item[\tt __neg__(self)]\itemjoin +\item[\tt __pos__(self)]\itemjoin +\item[\tt __abs__(self)]\itemjoin +\item[\tt __invert__(self)]\itembreak +Called to implement the unary arithmetic operations (\verb\-\, \verb\+\, +\verb\abs()\ and \verb\~\). + +\item[\tt __nonzero__(self)] +Called to implement boolean testing; should return 0 or 1. An +alternative name for this method is \verb\__len__\. + +\item[\tt __coerce__(self, other)] +Called to implement ``mixed-mode'' numeric arithmetic. Should either +return a tuple containing self and other converted to a common numeric +type, or None if no way of conversion is known. When the common type +would be the type of other, it is sufficient to return None, since the +interpreter will also ask the other object to attempt a coercion (but +sometimes, if the implementation of the other type cannot be changed, +it is useful to do the conversion to the other type here). + +Note that this method is not called to coerce the arguments to \verb\+\ +and \verb\*\, because these are also used to implement sequence +concatenation and repetition, respectively. Also note that, for the +same reason, in \verb\n*x\, where \verb\n\ is a built-in number and +\verb\x\ is an instance, a call to \verb\x.__mul__(n)\ is made.% +\footnote{The interpreter should really distinguish between +user-defined classes implementing sequences, mappings or numbers, but +currently it doesn't --- hence this strange exception.} + +\item[\tt __int__(self)]\itemjoin +\item[\tt __long__(self)]\itemjoin +\item[\tt __float__(self)]\itembreak +Called to implement the built-in functions \verb\int()\, \verb\long()\ +and \verb\float()\. Should return a value of the appropriate type. + +\end{description} diff --git a/Doc/ref4.tex b/Doc/ref4.tex new file mode 100644 index 0000000..f8677b5 --- /dev/null +++ b/Doc/ref4.tex @@ -0,0 +1,147 @@ +\chapter{Execution model} +\index{execution model} + +\section{Code blocks, execution frames, and name spaces} \label{execframes} +\index{code block} +\indexii{execution}{frame} +\index{name space} + +A {\em code block} is a piece of Python program text that can be +executed as a unit, such as a module, a class definition or a function +body. Some code blocks (like modules) are executed only once, others +(like function bodies) may be executed many times. Code block may +textually contain other code blocks. Code blocks may invoke other +code blocks (that may or may not be textually contained in them) as +part of their execution, e.g. by invoking (calling) a function. +\index{code block} +\indexii{code}{block} + +The following are code blocks: A module is a code block. A function +body is a code block. A class definition is a code block. Each +command typed interactively is a separate code block; a script file is +a code block. The string argument passed to the built-in functions +\verb\eval\ and \verb\exec\ are code blocks. And finally, the +expression read and evaluated by the built-in function \verb\input\ is +a code block. + +A code block is executed in an execution frame. An {\em execution +frame} contains some administrative information (used for debugging), +determines where and how execution continues after the code block's +execution has completed, and (perhaps most importantly) defines two +name spaces, the local and the global name space, that affect +execution of the code block. +\indexii{execution}{frame} + +A {\em name space} is a mapping from names (identifiers) to objects. +A particular name space may be referenced by more than one execution +frame, and from other places as well. Adding a name to a name space +is called {\em binding} a name (to an object); changing the mapping of +a name is called {\em rebinding}; removing a name is {\em unbinding}. +Name spaces are functionally equivalent to dictionaries. +\index{name space} +\indexii{binding}{name} +\indexii{rebinding}{name} +\indexii{unbinding}{name} + +The {\em local name space} of an execution frame determines the default +place where names are defined and searched. The {\em global name +space} determines the place where names listed in \verb\global\ +statements are defined and searched, and where names that are not +explicitly bound in the current code block are searched. +\indexii{local}{name space} +\indexii{global}{name space} +\stindex{global} + +Whether a name is local or global in a code block is determined by +static inspection of the source text for the code block: in the +absence of \verb\global\ statements, a name that is bound anywhere in +the code block is local in the entire code block; all other names are +considered global. The \verb\global\ statement forces global +interpretation of selected names throughout the code block. The +following constructs bind names: formal parameters, \verb\import\ +statements, class and function definitions (these bind the class or +function name), and targets that are identifiers if occurring in an +assignment, \verb\for\ loop header, or \verb\except\ clause header. +(A target occurring in a \verb\del\ statement does not bind a name.) + +When a global name is not found in the global name space, it is +searched in the list of ``built-in'' names (which is actually the +global name space of the module \verb\builtin\). When a name is not +found at all, the \verb\NameError\ exception is raised. + +The following table lists the meaning of the local and global name +space for various types of code blocks. The name space for a +particular module is automatically created when the module is first +referenced. + +\begin{center} +\begin{tabular}{|l|l|l|l|} +\hline +Code block type & Global name space & Local name space & Notes \\ +\hline +Module & n.s. for this module & same as global & \\ +Script & n.s. for \verb\__main__\ & same as global & \\ +Interactive command & n.s. for \verb\__main__\ & same as global & \\ +Class definition & global n.s. of containing block & new n.s. & \\ +Function body & global n.s. of containing block & new n.s. & \\ +String passed to \verb\exec\ or \verb\eval\ + & global n.s. of caller & local n.s. of caller & (1) \\ +File read by \verb\execfile\ + & global n.s. of caller & local n.s. of caller & (1) \\ +Expression read by \verb\input\ + & global n.s. of caller & local n.s. of caller & \\ +\hline +\end{tabular} +\end{center} + +Notes: + +\begin{description} + +\item[n.s.] means {\em name space} + +\item[(1)] The global and local name space for these functions can be +overridden with optional extra arguments. + +\end{description} + +\section{Exceptions} + +Exceptions are a means of breaking out of the normal flow of control +of a code block in order to handle errors or other exceptional +conditions. An exception is {\em raised} at the point where the error +is detected; it may be {\em handled} by the surrounding code block or +by any code block that directly or indirectly invoked the code block +where the error occurred. +\index{exception} +\index{raise an exception} +\index{handle an exception} +\index{exception handler} +\index{errors} +\index{error handling} + +The Python interpreter raises an exception when it detects an run-time +error (such as division by zero). A Python program can also +explicitly raise an exception with the \verb\raise\ statement. +Exception handlers are specified with the \verb\try...except\ +statement. + +Python uses the ``termination'' model of error handling: an exception +handler can find out what happened and continue execution at an outer +level, but it cannot repair the cause of the error and retry the +failing operation (except by re-entering the the offending piece of +code from the top). + +When an exception is not handled at all, the interpreter terminates +execution of the program, or returns to its interactive main loop. + +Exceptions are identified by string objects. Two different string +objects with the same value identify different exceptions. + +When an exception is raised, an object (maybe \verb\None\) is passed +as the exception's ``parameter''; this object does not affect the +selection of an exception handler, but is passed to the selected +exception handler as additional information. + +See also the description of the \verb\try\ and \verb\raise\ +statements. diff --git a/Doc/ref5.tex b/Doc/ref5.tex new file mode 100644 index 0000000..34dfeb1 --- /dev/null +++ b/Doc/ref5.tex @@ -0,0 +1,672 @@ +\chapter{Expressions and conditions} +\index{expression} +\index{condition} + +{\bf Note:} In this and the following chapters, extended BNF notation +will be used to describe syntax, not lexical analysis. +\index{BNF} + +This chapter explains the meaning of the elements of expressions and +conditions. Conditions are a superset of expressions, and a condition +may be used wherever an expression is required by enclosing it in +parentheses. The only places where expressions are used in the syntax +instead of conditions is in expression statements and on the +right-hand side of assignment statements; this catches some nasty bugs +like accidentally writing \verb\x == 1\ instead of \verb\x = 1\. +\indexii{assignment}{statement} + +The comma plays several roles in Python's syntax. It is usually an +operator with a lower precedence than all others, but occasionally +serves other purposes as well; e.g. it separates function arguments, +is used in list and dictionary constructors, and has special semantics +in \verb\print\ statements. +\index{comma} + +When (one alternative of) a syntax rule has the form + +\begin{verbatim} +name: othername +\end{verbatim} + +and no semantics are given, the semantics of this form of \verb\name\ +are the same as for \verb\othername\. +\index{syntax} + +\section{Arithmetic conversions} +\indexii{arithmetic}{conversion} + +When a description of an arithmetic operator below uses the phrase +``the numeric arguments are converted to a common type'', +this both means that if either argument is not a number, a +\verb\TypeError\ exception is raised, and that otherwise +the following conversions are applied: +\exindex{TypeError} +\indexii{floating point}{number} +\indexii{long}{integer} +\indexii{plain}{integer} + +\begin{itemize} +\item first, if either argument is a floating point number, + the other is converted to floating point; +\item else, if either argument is a long integer, + the other is converted to long integer; +\item otherwise, both must be plain integers and no conversion + is necessary. +\end{itemize} + +\section{Atoms} +\index{atom} + +Atoms are the most basic elements of expressions. Forms enclosed in +reverse quotes or in parentheses, brackets or braces are also +categorized syntactically as atoms. The syntax for atoms is: + +\begin{verbatim} +atom: identifier | literal | enclosure +enclosure: parenth_form | list_display | dict_display | string_conversion +\end{verbatim} + +\subsection{Identifiers (Names)} +\index{name} +\index{identifier} + +An identifier occurring as an atom is a reference to a local, global +or built-in name binding. If a name can be assigned to anywhere in a +code block, and is not mentioned in a \verb\global\ statement in that +code block, it refers to a local name throughout that code block. +Otherwise, it refers to a global name if one exists, else to a +built-in name. +\indexii{name}{binding} +\index{code block} +\stindex{global} +\indexii{built-in}{name} +\indexii{global}{name} + +When the name is bound to an object, evaluation of the atom yields +that object. When a name is not bound, an attempt to evaluate it +raises a \verb\NameError\ exception. +\exindex{NameError} + +\subsection{Literals} +\index{literal} + +Python knows string and numeric literals: + +\begin{verbatim} +literal: stringliteral | integer | longinteger | floatnumber +\end{verbatim} + +Evaluation of a literal yields an object of the given type (string, +integer, long integer, floating point number) with the given value. +The value may be approximated in the case of floating point literals. +See section \ref{literals} for details. + +All literals correspond to immutable data types, and hence the +object's identity is less important than its value. Multiple +evaluations of literals with the same value (either the same +occurrence in the program text or a different occurrence) may obtain +the same object or a different object with the same value. +\indexiii{immutable}{data}{type} + +(In the original implementation, all literals in the same code block +with the same type and value yield the same object.) + +\subsection{Parenthesized forms} +\index{parenthesized form} + +A parenthesized form is an optional condition list enclosed in +parentheses: + +\begin{verbatim} +parenth_form: "(" [condition_list] ")" +\end{verbatim} + +A parenthesized condition list yields whatever that condition list +yields. + +An empty pair of parentheses yields an empty tuple object. Since +tuples are immutable, the rules for literals apply here. +\indexii{empty}{tuple} + +(Note that tuples are not formed by the parentheses, but rather by use +of the comma operator. The exception is the empty tuple, for which +parentheses {\em are} required --- allowing unparenthesized ``nothing'' +in expressions would causes ambiguities and allow common typos to +pass uncaught.) +\index{comma} +\indexii{tuple}{display} + +\subsection{List displays} +\indexii{list}{display} + +A list display is a possibly empty series of conditions enclosed in +square brackets: + +\begin{verbatim} +list_display: "[" [condition_list] "]" +\end{verbatim} + +A list display yields a new list object. +\obindex{list} + +If it has no condition list, the list object has no items. Otherwise, +the elements of the condition list are evaluated from left to right +and inserted in the list object in that order. +\indexii{empty}{list} + +\subsection{Dictionary displays} \label{dict} +\indexii{dictionary}{display} + +A dictionary display is a possibly empty series of key/datum pairs +enclosed in curly braces: +\index{key} +\index{datum} +\index{key/datum pair} + +\begin{verbatim} +dict_display: "{" [key_datum_list] "}" +key_datum_list: key_datum ("," key_datum)* [","] +key_datum: condition ":" condition +\end{verbatim} + +A dictionary display yields a new dictionary object. +\obindex{dictionary} + +The key/datum pairs are evaluated from left to right to define the +entries of the dictionary: each key object is used as a key into the +dictionary to store the corresponding datum. + +Keys must be strings, otherwise a \verb\TypeError\ exception is +raised. Clashes between duplicate keys are not detected; the last +datum (textually rightmost in the display) stored for a given key +value prevails. +\exindex{TypeError} + +\subsection{String conversions} +\indexii{string}{conversion} + +A string conversion is a condition list enclosed in reverse (or +backward) quotes: + +\begin{verbatim} +string_conversion: "`" condition_list "`" +\end{verbatim} + +A string conversion evaluates the contained condition list and +converts the resulting object into a string according to rules +specific to its type. + +If the object is a string, a number, \verb\None\, or a tuple, list or +dictionary containing only objects whose type is one of these, the +resulting string is a valid Python expression which can be passed to +the built-in function \verb\eval()\ to yield an expression with the +same value (or an approximation, if floating point numbers are +involved). + +(In particular, converting a string adds quotes around it and converts +``funny'' characters to escape sequences that are safe to print.) + +It is illegal to attempt to convert recursive objects (e.g. lists or +dictionaries that contain a reference to themselves, directly or +indirectly.) +\obindex{recursive} + +\section{Primaries} \label{primaries} +\index{primary} + +Primaries represent the most tightly bound operations of the language. +Their syntax is: + +\begin{verbatim} +primary: atom | attributeref | subscription | slicing | call +\end{verbatim} + +\subsection{Attribute references} +\indexii{attribute}{reference} + +An attribute reference is a primary followed by a period and a name: + +\begin{verbatim} +attributeref: primary "." identifier +\end{verbatim} + +The primary must evaluate to an object of a type that supports +attribute references, e.g. a module or a list. This object is then +asked to produce the attribute whose name is the identifier. If this +attribute is not available, the exception \verb\AttributeError\ is +raised. Otherwise, the type and value of the object produced is +determined by the object. Multiple evaluations of the same attribute +reference may yield different objects. +\obindex{module} +\obindex{list} + +\subsection{Subscriptions} +\index{subscription} + +A subscription selects an item of a sequence (string, tuple or list) +or mapping (dictionary) object: +\obindex{sequence} +\obindex{mapping} +\obindex{string} +\obindex{tuple} +\obindex{list} +\obindex{dictionary} +\indexii{sequence}{item} + +\begin{verbatim} +subscription: primary "[" condition "]" +\end{verbatim} + +The primary must evaluate to an object of a sequence or mapping type. + +If it is a mapping, the condition must evaluate to an object whose +value is one of the keys of the mapping, and the subscription selects +the value in the mapping that corresponds to that key. + +If it is a sequence, the condition must evaluate to a plain integer. +If this value is negative, the length of the sequence is added to it +(so that, e.g. \verb\x[-1]\ selects the last item of \verb\x\.) +The resulting value must be a nonnegative integer smaller than the +number of items in the sequence, and the subscription selects the item +whose index is that value (counting from zero). + +A string's items are characters. A character is not a separate data +type but a string of exactly one character. +\index{character} +\indexii{string}{item} + +\subsection{Slicings} +\index{slicing} +\index{slice} + +A slicing (or slice) selects a range of items in a sequence (string, +tuple or list) object: +\obindex{sequence} +\obindex{string} +\obindex{tuple} +\obindex{list} + +\begin{verbatim} +slicing: primary "[" [condition] ":" [condition] "]" +\end{verbatim} + +The primary must evaluate to a sequence object. The lower and upper +bound expressions, if present, must evaluate to plain integers; +defaults are zero and the sequence's length, respectively. If either +bound is negative, the sequence's length is added to it. The slicing +now selects all items with index $k$ such that $i <= k < j$ where $i$ +and $j$ are the specified lower and upper bounds. This may be an +empty sequence. It is not an error if $i$ or $j$ lie outside the +range of valid indexes (such items don't exist so they aren't +selected). + +\subsection{Calls} \label{calls} +\index{call} + +A call calls a callable object (e.g. a function) with a possibly empty +series of arguments: +\obindex{callable} + +\begin{verbatim} +call: primary "(" [condition_list] ")" +\end{verbatim} + +The primary must evaluate to a callable object (user-defined +functions, built-in functions, methods of built-in objects, class +objects, and methods of class instances are callable). If it is a +class, the argument list must be empty; otherwise, the arguments are +evaluated. + +A call always returns some value, possibly \verb\None\, unless it +raises an exception. How this value is computed depends on the type +of the callable object. If it is: + +\begin{description} + +\item[a user-defined function:] the code block for the function is +executed, passing it the argument list. The first thing the code +block will do is bind the formal parameters to the arguments; this is +described in section \ref{function}. When the code block executes a +\verb\return\ statement, this specifies the return value of the +function call. +\indexii{function}{call} +\indexiii{user-defined}{function}{call} +\obindex{user-defined function} +\obindex{function} + +\item[a built-in function or method:] the result is up to the +interpreter; see the library reference manual for the descriptions of +built-in functions and methods. +\indexii{function}{call} +\indexii{built-in function}{call} +\indexii{method}{call} +\indexii{built-in method}{call} +\obindex{built-in method} +\obindex{built-in function} +\obindex{method} +\obindex{function} + +\item[a class object:] a new instance of that class is returned. +\obindex{class} +\indexii{class object}{call} + +\item[a class instance method:] the corresponding user-defined +function is called, with an argument list that is one longer than the +argument list of the call: the instance becomes the first argument. +\obindex{class instance} +\obindex{instance} +\indexii{instance}{call} +\indexii{class instance}{call} + +\end{description} + +\section{Unary arithmetic operations} +\indexiii{unary}{arithmetic}{operation} +\indexiii{unary}{bit-wise}{operation} + +All unary arithmetic (and bit-wise) operations have the same priority: + +\begin{verbatim} +u_expr: primary | "-" u_expr | "+" u_expr | "~" u_expr +\end{verbatim} + +The unary \verb\"-"\ (minus) operator yields the negation of its +numeric argument. +\index{negation} +\index{minus} + +The unary \verb\"+"\ (plus) operator yields its numeric argument +unchanged. +\index{plus} + +The unary \verb\"~"\ (invert) operator yields the bit-wise inversion +of its plain or long integer argument. The bit-wise inversion of +\verb\x\ is defined as \verb\-(x+1)\. +\index{inversion} + +In all three cases, if the argument does not have the proper type, +a \verb\TypeError\ exception is raised. +\exindex{TypeError} + +\section{Binary arithmetic operations} +\indexiii{binary}{arithmetic}{operation} + +The binary arithmetic operations have the conventional priority +levels. Note that some of these operations also apply to certain +non-numeric types. There is no ``power'' operator, so there are only +two levels, one for multiplicative operators and one for additive +operators: + +\begin{verbatim} +m_expr: u_expr | m_expr "*" u_expr + | m_expr "/" u_expr | m_expr "%" u_expr +a_expr: m_expr | aexpr "+" m_expr | aexpr "-" m_expr +\end{verbatim} + +The \verb\"*"\ (multiplication) operator yields the product of its +arguments. The arguments must either both be numbers, or one argument +must be a plain integer and the other must be a sequence. In the +former case, the numbers are converted to a common type and then +multiplied together. In the latter case, sequence repetition is +performed; a negative repetition factor yields an empty sequence. +\index{multiplication} + +The \verb\"/"\ (division) operator yields the quotient of its +arguments. The numeric arguments are first converted to a common +type. Plain or long integer division yields an integer of the same +type; the result is that of mathematical division with the `floor' +function applied to the result. Division by zero raises the +\verb\ZeroDivisionError\ exception. +\exindex{ZeroDivisionError} +\index{division} + +The \verb\"%"\ (modulo) operator yields the remainder from the +division of the first argument by the second. The numeric arguments +are first converted to a common type. A zero right argument raises +the \verb\ZeroDivisionError\ exception. The arguments may be floating +point numbers, e.g. \verb\3.14 % 0.7\ equals \verb\0.34\. The modulo +operator always yields a result with the same sign as its second +operand (or zero); the absolute value of the result is strictly +smaller than the second operand. +\index{modulo} + +The integer division and modulo operators are connected by the +following identity: \verb\x == (x/y)*y + (x%y)\. Integer division and +modulo are also connected with the built-in function \verb\divmod()\: +\verb\divmod(x, y) == (x/y, x%y)\. These identities don't hold for +floating point numbers; there a similar identity holds where +\verb\x/y\ is replaced by \verb\floor(x/y)\). + +The \verb\"+"\ (addition) operator yields the sum of its arguments. +The arguments must either both be numbers, or both sequences of the +same type. In the former case, the numbers are converted to a common +type and then added together. In the latter case, the sequences are +concatenated. +\index{addition} + +The \verb\"-"\ (subtraction) operator yields the difference of its +arguments. The numeric arguments are first converted to a common +type. +\index{subtraction} + +\section{Shifting operations} +\indexii{shifting}{operation} + +The shifting operations have lower priority than the arithmetic +operations: + +\begin{verbatim} +shift_expr: a_expr | shift_expr ( "<<" | ">>" ) a_expr +\end{verbatim} + +These operators accept plain or long integers as arguments. The +arguments are converted to a common type. They shift the first +argument to the left or right by the number of bits given by the +second argument. + +A right shift by $n$ bits is defined as division by $2^n$. A left +shift by $n$ bits is defined as multiplication with $2^n$; for plain +integers there is no overflow check so this drops bits and flip the +sign if the result is not less than $2^{31}$ in absolute value. + +Negative shift counts raise a \verb\ValueError\ exception. +\exindex{ValueError} + +\section{Binary bit-wise operations} +\indexiii{binary}{bit-wise}{operation} + +Each of the three bitwise operations has a different priority level: + +\begin{verbatim} +and_expr: shift_expr | and_expr "&" shift_expr +xor_expr: and_expr | xor_expr "^" and_expr +or_expr: xor_expr | or_expr "|" xor_expr +\end{verbatim} + +The \verb\"&"\ operator yields the bitwise AND of its arguments, which +must be plain or long integers. The arguments are converted to a +common type. +\indexii{bit-wise}{and} + +The \verb\"^"\ operator yields the bitwise XOR (exclusive OR) of its +arguments, which must be plain or long integers. The arguments are +converted to a common type. +\indexii{bit-wise}{xor} +\indexii{exclusive}{or} + +The \verb\"|"\ operator yields the bitwise (inclusive) OR of its +arguments, which must be plain or long integers. The arguments are +converted to a common type. +\indexii{bit-wise}{or} +\indexii{inclusive}{or} + +\section{Comparisons} +\index{comparison} + +Contrary to C, all comparison operations in Python have the same +priority, which is lower than that of any arithmetic, shifting or +bitwise operation. Also contrary to C, expressions like +\verb\a < b < c\ have the interpretation that is conventional in +mathematics: +\index{C} + +\begin{verbatim} +comparison: or_expr (comp_operator or_expr)* +comp_operator: "<"|">"|"=="|">="|"<="|"<>"|"!="|"is" ["not"]|["not"] "in" +\end{verbatim} + +Comparisons yield integer values: 1 for true, 0 for false. + +Comparisons can be chained arbitrarily, e.g. $x < y <= z$ is +equivalent to $x < y$ \verb\and\ $y <= z$, except that $y$ is +evaluated only once (but in both cases $z$ is not evaluated at all +when $x < y$ is found to be false). +\indexii{chaining}{comparisons} + +Formally, $e_0 op_1 e_1 op_2 e_2 ...e_{n-1} op_n e_n$ is equivalent to +$e_0 op_1 e_1$ \verb\and\ $e_1 op_2 e_2$ \verb\and\ ... \verb\and\ +$e_{n-1} op_n e_n$, except that each expression is evaluated at most once. + +Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison +between $e_0$ and $e_2$, e.g. $x < y > z$ is perfectly legal. + +The forms \verb\<>\ and \verb\!=\ are equivalent; for consistency with +C, \verb\!=\ is preferred; where \verb\!=\ is mentioned below +\verb\<>\ is also implied. + +The operators {\tt "<", ">", "==", ">=", "<="}, and {\tt "!="} compare +the values of two objects. The objects needn't have the same type. +If both are numbers, they are coverted to a common type. Otherwise, +objects of different types {\em always} compare unequal, and are +ordered consistently but arbitrarily. + +(This unusual definition of comparison is done to simplify the +definition of operations like sorting and the \verb\in\ and \verb\not +in\ operators.) + +Comparison of objects of the same type depends on the type: + +\begin{itemize} + +\item +Numbers are compared arithmetically. + +\item +Strings are compared lexicographically using the numeric equivalents +(the result of the built-in function \verb\ord\) of their characters. + +\item +Tuples and lists are compared lexicographically using comparison of +corresponding items. + +\item +Mappings (dictionaries) are compared through lexicographic +comparison of their sorted (key, value) lists.% +\footnote{This is expensive since it requires sorting the keys first, +but about the only sensible definition. It was tried to compare +dictionaries by identity only, but this caused surprises because +people expected to be able to test a dictionary for emptiness by +comparing it to {\tt \{\}}.} + +\item +Most other types compare unequal unless they are the same object; +the choice whether one object is considered smaller or larger than +another one is made arbitrarily but consistently within one +execution of a program. + +\end{itemize} + +The operators \verb\in\ and \verb\not in\ test for sequence +membership: if $y$ is a sequence, $x ~\verb\in\~ y$ is true if and +only if there exists an index $i$ such that $x = y[i]$. +$x ~\verb\not in\~ y$ yields the inverse truth value. The exception +\verb\TypeError\ is raised when $y$ is not a sequence, or when $y$ is +a string and $x$ is not a string of length one.% +\footnote{The latter restriction is sometimes a nuisance.} +\opindex{in} +\opindex{not in} +\indexii{membership}{test} +\obindex{sequence} + +The operators \verb\is\ and \verb\is not\ test for object identity: +$x ~\verb\is\~ y$ is true if and only if $x$ and $y$ are the same +object. $x ~\verb\is not\~ y$ yields the inverse truth value. +\opindex{is} +\opindex{is not} +\indexii{identity}{test} + +\section{Boolean operations} \label{Booleans} +\indexii{Boolean}{operation} + +Boolean operations have the lowest priority of all Python operations: + +\begin{verbatim} +condition: or_test +or_test: and_test | or_test "or" and_test +and_test: not_test | and_test "and" not_test +not_test: comparison | "not" not_test +\end{verbatim} + +In the context of Boolean operations, and also when conditions are +used by control flow statements, the following values are interpreted +as false: \verb\None\, numeric zero of all types, empty sequences +(strings, tuples and lists), and empty mappings (dictionaries). All +other values are interpreted as true. + +The operator \verb\not\ yields 1 if its argument is false, 0 otherwise. +\opindex{not} + +The condition $x ~\verb\and\~ y$ first evaluates $x$; if $x$ is false, +its value is returned; otherwise, $y$ is evaluated and the resulting +value is returned. +\opindex{and} + +The condition $x ~\verb\or\~ y$ first evaluates $x$; if $x$ is true, +its value is returned; otherwise, $y$ is evaluated and the resulting +value is returned. +\opindex{or} + +(Note that \verb\and\ and \verb\or\ do not restrict the value and type +they return to 0 and 1, but rather return the last evaluated argument. +This is sometimes useful, e.g. if \verb\s\ is a string that should be +replaced by a default value if it is empty, the expression +\verb\s or 'foo'\ yields the desired value. Because \verb\not\ has to +invent a value anyway, it does not bother to return a value of the +same type as its argument, so e.g. \verb\not 'foo'\ yields \verb\0\, +not \verb\''\.) + +\section{Expression lists and condition lists} +\indexii{expression}{list} +\indexii{condition}{list} + +\begin{verbatim} +expr_list: or_expr ("," or_expr)* [","] +cond_list: condition ("," condition)* [","] +\end{verbatim} + +The only difference between expression lists and condition lists is +the lowest priority of operators that can be used in them without +being enclosed in parentheses; condition lists allow all operators, +while expression lists don't allow comparisons and Boolean operators +(they do allow bitwise and shift operators though). + +Expression lists are used in expression statements and assignments; +condition lists are used everywhere else where a list of +comma-separated values is required. + +An expression (condition) list containing at least one comma yields a +tuple. The length of the tuple is the number of expressions +(conditions) in the list. The expressions (conditions) are evaluated +from left to right. (Conditions lists are used syntactically is a few +places where no tuple is constructed but a list of values is needed +nevertheless.) +\obindex{tuple} + +The trailing comma is required only to create a single tuple (a.k.a. a +{\em singleton}); it is optional in all other cases. A single +expression (condition) without a trailing comma doesn't create a +tuple, but rather yields the value of that expression (condition). +\indexii{trailing}{comma} + +(To create an empty tuple, use an empty pair of parentheses: +\verb\()\.) -- cgit v0.12