diff options
author | Georg Brandl <georg@python.org> | 2009-10-27 20:20:38 (GMT) |
---|---|---|
committer | Georg Brandl <georg@python.org> | 2009-10-27 20:20:38 (GMT) |
commit | cb7cb247b32acdd0816663d35ad51057a8571894 (patch) | |
tree | 3c0289252f056681f531ba9b6e3f33ae00210c11 /Doc/faq/design.rst | |
parent | ef9c9af915872a9fa2c7480fd637038cc3a445f8 (diff) | |
download | cpython-cb7cb247b32acdd0816663d35ad51057a8571894.zip cpython-cb7cb247b32acdd0816663d35ad51057a8571894.tar.gz cpython-cb7cb247b32acdd0816663d35ad51057a8571894.tar.bz2 |
Merged revisions 75374 via svnmerge from
svn+ssh://svn.python.org/python/branches/py3k
................
r75374 | georg.brandl | 2009-10-11 23:25:26 +0200 (So, 11 Okt 2009) | 9 lines
Merged revisions 75363 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r75363 | georg.brandl | 2009-10-11 20:31:23 +0200 (So, 11 Okt 2009) | 1 line
Add the Python FAQ lists to the documentation. Copied from sandbox/faq. Many thanks to AMK for the preparation work.
........
................
Diffstat (limited to 'Doc/faq/design.rst')
-rw-r--r-- | Doc/faq/design.rst | 924 |
1 files changed, 924 insertions, 0 deletions
diff --git a/Doc/faq/design.rst b/Doc/faq/design.rst new file mode 100644 index 0000000..aacb476 --- /dev/null +++ b/Doc/faq/design.rst @@ -0,0 +1,924 @@ +====================== +Design and History FAQ +====================== + +Why does Python use indentation for grouping of statements? +----------------------------------------------------------- + +Guido van Rossum believes that using indentation for grouping is extremely +elegant and contributes a lot to the clarity of the average Python program. +Most people learn to love this feature after awhile. + +Since there are no begin/end brackets there cannot be a disagreement between +grouping perceived by the parser and the human reader. Occasionally C +programmers will encounter a fragment of code like this:: + + if (x <= y) + x++; + y--; + z++; + +Only the ``x++`` statement is executed if the condition is true, but the +indentation leads you to believe otherwise. Even experienced C programmers will +sometimes stare at it a long time wondering why ``y`` is being decremented even +for ``x > y``. + +Because there are no begin/end brackets, Python is much less prone to +coding-style conflicts. In C there are many different ways to place the braces. +If you're used to reading and writing code that uses one style, you will feel at +least slightly uneasy when reading (or being required to write) another style. + +Many coding styles place begin/end brackets on a line by themself. This makes +programs considerably longer and wastes valuable screen space, making it harder +to get a good overview of a program. Ideally, a function should fit on one +screen (say, 20-30 lines). 20 lines of Python can do a lot more work than 20 +lines of C. This is not solely due to the lack of begin/end brackets -- the +lack of declarations and the high-level data types are also responsible -- but +the indentation-based syntax certainly helps. + + +Why am I getting strange results with simple arithmetic operations? +------------------------------------------------------------------- + +See the next question. + + +Why are floating point calculations so inaccurate? +-------------------------------------------------- + +People are often very surprised by results like this:: + + >>> 1.2-1.0 + 0.199999999999999996 + +and think it is a bug in Python. It's not. This has nothing to do with Python, +but with how the underlying C platform handles floating point numbers, and +ultimately with the inaccuracies introduced when writing down numbers as a +string of a fixed number of digits. + +The internal representation of floating point numbers uses a fixed number of +binary digits to represent a decimal number. Some decimal numbers can't be +represented exactly in binary, resulting in small roundoff errors. + +In decimal math, there are many numbers that can't be represented with a fixed +number of decimal digits, e.g. 1/3 = 0.3333333333....... + +In base 2, 1/2 = 0.1, 1/4 = 0.01, 1/8 = 0.001, etc. .2 equals 2/10 equals 1/5, +resulting in the binary fractional number 0.001100110011001... + +Floating point numbers only have 32 or 64 bits of precision, so the digits are +cut off at some point, and the resulting number is 0.199999999999999996 in +decimal, not 0.2. + +A floating point number's ``repr()`` function prints as many digits are +necessary to make ``eval(repr(f)) == f`` true for any float f. The ``str()`` +function prints fewer digits and this often results in the more sensible number +that was probably intended:: + + >>> 0.2 + 0.20000000000000001 + >>> print 0.2 + 0.2 + +One of the consequences of this is that it is error-prone to compare the result +of some computation to a float with ``==``. Tiny inaccuracies may mean that +``==`` fails. Instead, you have to check that the difference between the two +numbers is less than a certain threshold:: + + epsilon = 0.0000000000001 # Tiny allowed error + expected_result = 0.4 + + if expected_result-epsilon <= computation() <= expected_result+epsilon: + ... + +Please see the chapter on :ref:`floating point arithmetic <tut-fp-issues>` in +the Python tutorial for more information. + + +Why are Python strings immutable? +--------------------------------- + +There are several advantages. + +One is performance: knowing that a string is immutable means we can allocate +space for it at creation time, and the storage requirements are fixed and +unchanging. This is also one of the reasons for the distinction between tuples +and lists. + +Another advantage is that strings in Python are considered as "elemental" as +numbers. No amount of activity will change the value 8 to anything else, and in +Python, no amount of activity will change the string "eight" to anything else. + + +.. _why-self: + +Why must 'self' be used explicitly in method definitions and calls? +------------------------------------------------------------------- + +The idea was borrowed from Modula-3. It turns out to be very useful, for a +variety of reasons. + +First, it's more obvious that you are using a method or instance attribute +instead of a local variable. Reading ``self.x`` or ``self.meth()`` makes it +absolutely clear that an instance variable or method is used even if you don't +know the class definition by heart. In C++, you can sort of tell by the lack of +a local variable declaration (assuming globals are rare or easily recognizable) +-- but in Python, there are no local variable declarations, so you'd have to +look up the class definition to be sure. Some C++ and Java coding standards +call for instance attributes to have an ``m_`` prefix, so this explicitness is +still useful in those languages, too. + +Second, it means that no special syntax is necessary if you want to explicitly +reference or call the method from a particular class. In C++, if you want to +use a method from a base class which is overridden in a derived class, you have +to use the ``::`` operator -- in Python you can write baseclass.methodname(self, +<argument list>). This is particularly useful for :meth:`__init__` methods, and +in general in cases where a derived class method wants to extend the base class +method of the same name and thus has to call the base class method somehow. + +Finally, for instance variables it solves a syntactic problem with assignment: +since local variables in Python are (by definition!) those variables to which a +value assigned in a function body (and that aren't explicitly declared global), +there has to be some way to tell the interpreter that an assignment was meant to +assign to an instance variable instead of to a local variable, and it should +preferably be syntactic (for efficiency reasons). C++ does this through +declarations, but Python doesn't have declarations and it would be a pity having +to introduce them just for this purpose. Using the explicit "self.var" solves +this nicely. Similarly, for using instance variables, having to write +"self.var" means that references to unqualified names inside a method don't have +to search the instance's directories. To put it another way, local variables +and instance variables live in two different namespaces, and you need to tell +Python which namespace to use. + + +Why can't I use an assignment in an expression? +----------------------------------------------- + +Many people used to C or Perl complain that they want to use this C idiom: + +.. code-block:: c + + while (line = readline(f)) { + // do something with line + } + +where in Python you're forced to write this:: + + while True: + line = f.readline() + if not line: + break + ... # do something with line + +The reason for not allowing assignment in Python expressions is a common, +hard-to-find bug in those other languages, caused by this construct: + +.. code-block:: c + + if (x = 0) { + // error handling + } + else { + // code that only works for nonzero x + } + +The error is a simple typo: ``x = 0``, which assigns 0 to the variable ``x``, +was written while the comparison ``x == 0`` is certainly what was intended. + +Many alternatives have been proposed. Most are hacks that save some typing but +use arbitrary or cryptic syntax or keywords, and fail the simple criterion for +language change proposals: it should intuitively suggest the proper meaning to a +human reader who has not yet been introduced to the construct. + +An interesting phenomenon is that most experienced Python programmers recognize +the ``while True`` idiom and don't seem to be missing the assignment in +expression construct much; it's only newcomers who express a strong desire to +add this to the language. + +There's an alternative way of spelling this that seems attractive but is +generally less robust than the "while True" solution:: + + line = f.readline() + while line: + ... # do something with line... + line = f.readline() + +The problem with this is that if you change your mind about exactly how you get +the next line (e.g. you want to change it into ``sys.stdin.readline()``) you +have to remember to change two places in your program -- the second occurrence +is hidden at the bottom of the loop. + +The best approach is to use iterators, making it possible to loop through +objects using the ``for`` statement. For example, in the current version of +Python file objects support the iterator protocol, so you can now write simply:: + + for line in f: + ... # do something with line... + + + +Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))? +---------------------------------------------------------------------------------------------------------------- + +The major reason is history. Functions were used for those operations that were +generic for a group of types and which were intended to work even for objects +that didn't have methods at all (e.g. tuples). It is also convenient to have a +function that can readily be applied to an amorphous collection of objects when +you use the functional features of Python (``map()``, ``apply()`` et al). + +In fact, implementing ``len()``, ``max()``, ``min()`` as a built-in function is +actually less code than implementing them as methods for each type. One can +quibble about individual cases but it's a part of Python, and it's too late to +make such fundamental changes now. The functions have to remain to avoid massive +code breakage. + +.. XXX talk about protocols? + +Note that for string operations Python has moved from external functions (the +``string`` module) to methods. However, ``len()`` is still a function. + + +Why is join() a string method instead of a list or tuple method? +---------------------------------------------------------------- + +Strings became much more like other standard types starting in Python 1.6, when +methods were added which give the same functionality that has always been +available using the functions of the string module. Most of these new methods +have been widely accepted, but the one which appears to make some programmers +feel uncomfortable is:: + + ", ".join(['1', '2', '4', '8', '16']) + +which gives the result:: + + "1, 2, 4, 8, 16" + +There are two common arguments against this usage. + +The first runs along the lines of: "It looks really ugly using a method of a +string literal (string constant)", to which the answer is that it might, but a +string literal is just a fixed value. If the methods are to be allowed on names +bound to strings there is no logical reason to make them unavailable on +literals. + +The second objection is typically cast as: "I am really telling a sequence to +join its members together with a string constant". Sadly, you aren't. For some +reason there seems to be much less difficulty with having :meth:`~str.split` as +a string method, since in that case it is easy to see that :: + + "1, 2, 4, 8, 16".split(", ") + +is an instruction to a string literal to return the substrings delimited by the +given separator (or, by default, arbitrary runs of white space). In this case a +Unicode string returns a list of Unicode strings, an ASCII string returns a list +of ASCII strings, and everyone is happy. + +:meth:`~str.join` is a string method because in using it you are telling the +separator string to iterate over a sequence of strings and insert itself between +adjacent elements. This method can be used with any argument which obeys the +rules for sequence objects, including any new classes you might define yourself. + +Because this is a string method it can work for Unicode strings as well as plain +ASCII strings. If ``join()`` were a method of the sequence types then the +sequence types would have to decide which type of string to return depending on +the type of the separator. + +.. XXX remove next paragraph eventually + +If none of these arguments persuade you, then for the moment you can continue to +use the ``join()`` function from the string module, which allows you to write :: + + string.join(['1', '2', '4', '8', '16'], ", ") + + +How fast are exceptions? +------------------------ + +A try/except block is extremely efficient. Actually catching an exception is +expensive. In versions of Python prior to 2.0 it was common to use this idiom:: + + try: + value = dict[key] + except KeyError: + dict[key] = getvalue(key) + value = dict[key] + +This only made sense when you expected the dict to have the key almost all the +time. If that wasn't the case, you coded it like this:: + + if dict.has_key(key): + value = dict[key] + else: + dict[key] = getvalue(key) + value = dict[key] + +(In Python 2.0 and higher, you can code this as ``value = dict.setdefault(key, +getvalue(key))``.) + + +Why isn't there a switch or case statement in Python? +----------------------------------------------------- + +You can do this easily enough with a sequence of ``if... elif... elif... else``. +There have been some proposals for switch statement syntax, but there is no +consensus (yet) on whether and how to do range tests. See :pep:`275` for +complete details and the current status. + +For cases where you need to choose from a very large number of possibilities, +you can create a dictionary mapping case values to functions to call. For +example:: + + def function_1(...): + ... + + functions = {'a': function_1, + 'b': function_2, + 'c': self.method_1, ...} + + func = functions[value] + func() + +For calling methods on objects, you can simplify yet further by using the +:func:`getattr` built-in to retrieve methods with a particular name:: + + def visit_a(self, ...): + ... + ... + + def dispatch(self, value): + method_name = 'visit_' + str(value) + method = getattr(self, method_name) + method() + +It's suggested that you use a prefix for the method names, such as ``visit_`` in +this example. Without such a prefix, if values are coming from an untrusted +source, an attacker would be able to call any method on your object. + + +Can't you emulate threads in the interpreter instead of relying on an OS-specific thread implementation? +-------------------------------------------------------------------------------------------------------- + +Answer 1: Unfortunately, the interpreter pushes at least one C stack frame for +each Python stack frame. Also, extensions can call back into Python at almost +random moments. Therefore, a complete threads implementation requires thread +support for C. + +Answer 2: Fortunately, there is `Stackless Python <http://www.stackless.com>`_, +which has a completely redesigned interpreter loop that avoids the C stack. +It's still experimental but looks very promising. Although it is binary +compatible with standard Python, it's still unclear whether Stackless will make +it into the core -- maybe it's just too revolutionary. + + +Why can't lambda forms contain statements? +------------------------------------------ + +Python lambda forms cannot contain statements because Python's syntactic +framework can't handle statements nested inside expressions. However, in +Python, this is not a serious problem. Unlike lambda forms in other languages, +where they add functionality, Python lambdas are only a shorthand notation if +you're too lazy to define a function. + +Functions are already first class objects in Python, and can be declared in a +local scope. Therefore the only advantage of using a lambda form instead of a +locally-defined function is that you don't need to invent a name for the +function -- but that's just a local variable to which the function object (which +is exactly the same type of object that a lambda form yields) is assigned! + + +Can Python be compiled to machine code, C or some other language? +----------------------------------------------------------------- + +Not easily. Python's high level data types, dynamic typing of objects and +run-time invocation of the interpreter (using :func:`eval` or :keyword:`exec`) +together mean that a "compiled" Python program would probably consist mostly of +calls into the Python run-time system, even for seemingly simple operations like +``x+1``. + +Several projects described in the Python newsgroup or at past `Python +conferences <http://python.org/community/workshops/>`_ have shown that this approach is feasible, +although the speedups reached so far are only modest (e.g. 2x). Jython uses the +same strategy for compiling to Java bytecode. (Jim Hugunin has demonstrated +that in combination with whole-program analysis, speedups of 1000x are feasible +for small demo programs. See the proceedings from the `1997 Python conference +<http://python.org/community/workshops/1997-10/proceedings/>`_ for more information.) + +Internally, Python source code is always translated into a bytecode +representation, and this bytecode is then executed by the Python virtual +machine. In order to avoid the overhead of repeatedly parsing and translating +modules that rarely change, this byte code is written into a file whose name +ends in ".pyc" whenever a module is parsed. When the corresponding .py file is +changed, it is parsed and translated again and the .pyc file is rewritten. + +There is no performance difference once the .pyc file has been loaded, as the +bytecode read from the .pyc file is exactly the same as the bytecode created by +direct translation. The only difference is that loading code from a .pyc file +is faster than parsing and translating a .py file, so the presence of +precompiled .pyc files improves the start-up time of Python scripts. If +desired, the Lib/compileall.py module can be used to create valid .pyc files for +a given set of modules. + +Note that the main script executed by Python, even if its filename ends in .py, +is not compiled to a .pyc file. It is compiled to bytecode, but the bytecode is +not saved to a file. Usually main scripts are quite short, so this doesn't cost +much speed. + +.. XXX check which of these projects are still alive + +There are also several programs which make it easier to intermingle Python and C +code in various ways to increase performance. See, for example, `Psyco +<http://psyco.sourceforge.net/>`_, `Pyrex +<http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/>`_, `PyInline +<http://pyinline.sourceforge.net/>`_, `Py2Cmod +<http://sourceforge.net/projects/py2cmod/>`_, and `Weave +<http://www.scipy.org/site_content/weave>`_. + + +How does Python manage memory? +------------------------------ + +The details of Python memory management depend on the implementation. The +standard C implementation of Python uses reference counting to detect +inaccessible objects, and another mechanism to collect reference cycles, +periodically executing a cycle detection algorithm which looks for inaccessible +cycles and deletes the objects involved. The :mod:`gc` module provides functions +to perform a garbage collection, obtain debugging statistics, and tune the +collector's parameters. + +Jython relies on the Java runtime so the JVM's garbage collector is used. This +difference can cause some subtle porting problems if your Python code depends on +the behavior of the reference counting implementation. + +Sometimes objects get stuck in tracebacks temporarily and hence are not +deallocated when you might expect. Clear the tracebacks with:: + + import sys + sys.exc_clear() + sys.exc_traceback = sys.last_traceback = None + +Tracebacks are used for reporting errors, implementing debuggers and related +things. They contain a portion of the program state extracted during the +handling of an exception (usually the most recent exception). + +In the absence of circularities and tracebacks, Python programs need not +explicitly manage memory. + +Why doesn't Python use a more traditional garbage collection scheme? For one +thing, this is not a C standard feature and hence it's not portable. (Yes, we +know about the Boehm GC library. It has bits of assembler code for *most* +common platforms, not for all of them, and although it is mostly transparent, it +isn't completely transparent; patches are required to get Python to work with +it.) + +Traditional GC also becomes a problem when Python is embedded into other +applications. While in a standalone Python it's fine to replace the standard +malloc() and free() with versions provided by the GC library, an application +embedding Python may want to have its *own* substitute for malloc() and free(), +and may not want Python's. Right now, Python works with anything that +implements malloc() and free() properly. + +In Jython, the following code (which is fine in CPython) will probably run out +of file descriptors long before it runs out of memory:: + + for file in <very long list of files>: + f = open(file) + c = f.read(1) + +Using the current reference counting and destructor scheme, each new assignment +to f closes the previous file. Using GC, this is not guaranteed. If you want +to write code that will work with any Python implementation, you should +explicitly close the file; this will work regardless of GC:: + + for file in <very long list of files>: + f = open(file) + c = f.read(1) + f.close() + + +Why isn't all memory freed when Python exits? +--------------------------------------------- + +Objects referenced from the global namespaces of Python modules are not always +deallocated when Python exits. This may happen if there are circular +references. There are also certain bits of memory that are allocated by the C +library that are impossible to free (e.g. a tool like Purify will complain about +these). Python is, however, aggressive about cleaning up memory on exit and +does try to destroy every single object. + +If you want to force Python to delete certain things on deallocation use the +:mod:`atexit` module to run a function that will force those deletions. + + +Why are there separate tuple and list data types? +------------------------------------------------- + +Lists and tuples, while similar in many respects, are generally used in +fundamentally different ways. Tuples can be thought of as being similar to +Pascal records or C structs; they're small collections of related data which may +be of different types which are operated on as a group. For example, a +Cartesian coordinate is appropriately represented as a tuple of two or three +numbers. + +Lists, on the other hand, are more like arrays in other languages. They tend to +hold a varying number of objects all of which have the same type and which are +operated on one-by-one. For example, ``os.listdir('.')`` returns a list of +strings representing the files in the current directory. Functions which +operate on this output would generally not break if you added another file or +two to the directory. + +Tuples are immutable, meaning that once a tuple has been created, you can't +replace any of its elements with a new value. Lists are mutable, meaning that +you can always change a list's elements. Only immutable elements can be used as +dictionary keys, and hence only tuples and not lists can be used as keys. + + +How are lists implemented? +-------------------------- + +Python's lists are really variable-length arrays, not Lisp-style linked lists. +The implementation uses a contiguous array of references to other objects, and +keeps a pointer to this array and the array's length in a list head structure. + +This makes indexing a list ``a[i]`` an operation whose cost is independent of +the size of the list or the value of the index. + +When items are appended or inserted, the array of references is resized. Some +cleverness is applied to improve the performance of appending items repeatedly; +when the array must be grown, some extra space is allocated so the next few +times don't require an actual resize. + + +How are dictionaries implemented? +--------------------------------- + +Python's dictionaries are implemented as resizable hash tables. Compared to +B-trees, this gives better performance for lookup (the most common operation by +far) under most circumstances, and the implementation is simpler. + +Dictionaries work by computing a hash code for each key stored in the dictionary +using the :func:`hash` built-in function. The hash code varies widely depending +on the key; for example, "Python" hashes to -539294296 while "python", a string +that differs by a single bit, hashes to 1142331976. The hash code is then used +to calculate a location in an internal array where the value will be stored. +Assuming that you're storing keys that all have different hash values, this +means that dictionaries take constant time -- O(1), in computer science notation +-- to retrieve a key. It also means that no sorted order of the keys is +maintained, and traversing the array as the ``.keys()`` and ``.items()`` do will +output the dictionary's content in some arbitrary jumbled order. + + +Why must dictionary keys be immutable? +-------------------------------------- + +The hash table implementation of dictionaries uses a hash value calculated from +the key value to find the key. If the key were a mutable object, its value +could change, and thus its hash could also change. But since whoever changes +the key object can't tell that it was being used as a dictionary key, it can't +move the entry around in the dictionary. Then, when you try to look up the same +object in the dictionary it won't be found because its hash value is different. +If you tried to look up the old value it wouldn't be found either, because the +value of the object found in that hash bin would be different. + +If you want a dictionary indexed with a list, simply convert the list to a tuple +first; the function ``tuple(L)`` creates a tuple with the same entries as the +list ``L``. Tuples are immutable and can therefore be used as dictionary keys. + +Some unacceptable solutions that have been proposed: + +- Hash lists by their address (object ID). This doesn't work because if you + construct a new list with the same value it won't be found; e.g.:: + + d = {[1,2]: '12'} + print d[[1,2]] + + would raise a KeyError exception because the id of the ``[1,2]`` used in the + second line differs from that in the first line. In other words, dictionary + keys should be compared using ``==``, not using :keyword:`is`. + +- Make a copy when using a list as a key. This doesn't work because the list, + being a mutable object, could contain a reference to itself, and then the + copying code would run into an infinite loop. + +- Allow lists as keys but tell the user not to modify them. This would allow a + class of hard-to-track bugs in programs when you forgot or modified a list by + accident. It also invalidates an important invariant of dictionaries: every + value in ``d.keys()`` is usable as a key of the dictionary. + +- Mark lists as read-only once they are used as a dictionary key. The problem + is that it's not just the top-level object that could change its value; you + could use a tuple containing a list as a key. Entering anything as a key into + a dictionary would require marking all objects reachable from there as + read-only -- and again, self-referential objects could cause an infinite loop. + +There is a trick to get around this if you need to, but use it at your own risk: +You can wrap a mutable structure inside a class instance which has both a +:meth:`__cmp_` and a :meth:`__hash__` method. You must then make sure that the +hash value for all such wrapper objects that reside in a dictionary (or other +hash based structure), remain fixed while the object is in the dictionary (or +other structure). :: + + class ListWrapper: + def __init__(self, the_list): + self.the_list = the_list + def __cmp__(self, other): + return self.the_list == other.the_list + def __hash__(self): + l = self.the_list + result = 98767 - len(l)*555 + for i in range(len(l)): + try: + result = result + (hash(l[i]) % 9999999) * 1001 + i + except: + result = (result % 7777777) + i * 333 + return result + +Note that the hash computation is complicated by the possibility that some +members of the list may be unhashable and also by the possibility of arithmetic +overflow. + +Furthermore it must always be the case that if ``o1 == o2`` (ie ``o1.__cmp__(o2) +== 0``) then ``hash(o1) == hash(o2)`` (ie, ``o1.__hash__() == o2.__hash__()``), +regardless of whether the object is in a dictionary or not. If you fail to meet +these restrictions dictionaries and other hash based structures will misbehave. + +In the case of ListWrapper, whenever the wrapper object is in a dictionary the +wrapped list must not change to avoid anomalies. Don't do this unless you are +prepared to think hard about the requirements and the consequences of not +meeting them correctly. Consider yourself warned. + + +Why doesn't list.sort() return the sorted list? +----------------------------------------------- + +In situations where performance matters, making a copy of the list just to sort +it would be wasteful. Therefore, :meth:`list.sort` sorts the list in place. In +order to remind you of that fact, it does not return the sorted list. This way, +you won't be fooled into accidentally overwriting a list when you need a sorted +copy but also need to keep the unsorted version around. + +In Python 2.4 a new builtin -- :func:`sorted` -- has been added. This function +creates a new list from a provided iterable, sorts it and returns it. For +example, here's how to iterate over the keys of a dictionary in sorted order:: + + for key in sorted(dict.iterkeys()): + ... # do whatever with dict[key]... + + +How do you specify and enforce an interface spec in Python? +----------------------------------------------------------- + +An interface specification for a module as provided by languages such as C++ and +Java describes the prototypes for the methods and functions of the module. Many +feel that compile-time enforcement of interface specifications helps in the +construction of large programs. + +Python 2.6 adds an :mod:`abc` module that lets you define Abstract Base Classes +(ABCs). You can then use :func:`isinstance` and :func:`issubclass` to check +whether an instance or a class implements a particular ABC. The +:mod:`collections` modules defines a set of useful ABCs such as +:class:`Iterable`, :class:`Container`, and :class:`MutableMapping`. + +For Python, many of the advantages of interface specifications can be obtained +by an appropriate test discipline for components. There is also a tool, +PyChecker, which can be used to find problems due to subclassing. + +A good test suite for a module can both provide a regression test and serve as a +module interface specification and a set of examples. Many Python modules can +be run as a script to provide a simple "self test." Even modules which use +complex external interfaces can often be tested in isolation using trivial +"stub" emulations of the external interface. The :mod:`doctest` and +:mod:`unittest` modules or third-party test frameworks can be used to construct +exhaustive test suites that exercise every line of code in a module. + +An appropriate testing discipline can help build large complex applications in +Python as well as having interface specifications would. In fact, it can be +better because an interface specification cannot test certain properties of a +program. For example, the :meth:`append` method is expected to add new elements +to the end of some internal list; an interface specification cannot test that +your :meth:`append` implementation will actually do this correctly, but it's +trivial to check this property in a test suite. + +Writing test suites is very helpful, and you might want to design your code with +an eye to making it easily tested. One increasingly popular technique, +test-directed development, calls for writing parts of the test suite first, +before you write any of the actual code. Of course Python allows you to be +sloppy and not write test cases at all. + + +Why are default values shared between objects? +---------------------------------------------- + +This type of bug commonly bites neophyte programmers. Consider this function:: + + def foo(D={}): # Danger: shared reference to one dict for all calls + ... compute something ... + D[key] = value + return D + +The first time you call this function, ``D`` contains a single item. The second +time, ``D`` contains two items because when ``foo()`` begins executing, ``D`` +starts out with an item already in it. + +It is often expected that a function call creates new objects for default +values. This is not what happens. Default values are created exactly once, when +the function is defined. If that object is changed, like the dictionary in this +example, subsequent calls to the function will refer to this changed object. + +By definition, immutable objects such as numbers, strings, tuples, and ``None``, +are safe from change. Changes to mutable objects such as dictionaries, lists, +and class instances can lead to confusion. + +Because of this feature, it is good programming practice to not use mutable +objects as default values. Instead, use ``None`` as the default value and +inside the function, check if the parameter is ``None`` and create a new +list/dictionary/whatever if it is. For example, don't write:: + + def foo(dict={}): + ... + +but:: + + def foo(dict=None): + if dict is None: + dict = {} # create a new dict for local namespace + +This feature can be useful. When you have a function that's time-consuming to +compute, a common technique is to cache the parameters and the resulting value +of each call to the function, and return the cached value if the same value is +requested again. This is called "memoizing", and can be implemented like this:: + + # Callers will never provide a third parameter for this function. + def expensive (arg1, arg2, _cache={}): + if _cache.has_key((arg1, arg2)): + return _cache[(arg1, arg2)] + + # Calculate the value + result = ... expensive computation ... + _cache[(arg1, arg2)] = result # Store result in the cache + return result + +You could use a global variable containing a dictionary instead of the default +value; it's a matter of taste. + + +Why is there no goto? +--------------------- + +You can use exceptions to provide a "structured goto" that even works across +function calls. Many feel that exceptions can conveniently emulate all +reasonable uses of the "go" or "goto" constructs of C, Fortran, and other +languages. For example:: + + class label: pass # declare a label + + try: + ... + if (condition): raise label() # goto label + ... + except label: # where to goto + pass + ... + +This doesn't allow you to jump into the middle of a loop, but that's usually +considered an abuse of goto anyway. Use sparingly. + + +Why can't raw strings (r-strings) end with a backslash? +------------------------------------------------------- + +More precisely, they can't end with an odd number of backslashes: the unpaired +backslash at the end escapes the closing quote character, leaving an +unterminated string. + +Raw strings were designed to ease creating input for processors (chiefly regular +expression engines) that want to do their own backslash escape processing. Such +processors consider an unmatched trailing backslash to be an error anyway, so +raw strings disallow that. In return, they allow you to pass on the string +quote character by escaping it with a backslash. These rules work well when +r-strings are used for their intended purpose. + +If you're trying to build Windows pathnames, note that all Windows system calls +accept forward slashes too:: + + f = open("/mydir/file.txt") # works fine! + +If you're trying to build a pathname for a DOS command, try e.g. one of :: + + dir = r"\this\is\my\dos\dir" "\\" + dir = r"\this\is\my\dos\dir\ "[:-1] + dir = "\\this\\is\\my\\dos\\dir\\" + + +Why doesn't Python have a "with" statement for attribute assignments? +--------------------------------------------------------------------- + +Python has a 'with' statement that wraps the execution of a block, calling code +on the entrance and exit from the block. Some language have a construct that +looks like this:: + + with obj: + a = 1 # equivalent to obj.a = 1 + total = total + 1 # obj.total = obj.total + 1 + +In Python, such a construct would be ambiguous. + +Other languages, such as Object Pascal, Delphi, and C++, use static types, so +it's possible to know, in an unambiguous way, what member is being assigned +to. This is the main point of static typing -- the compiler *always* knows the +scope of every variable at compile time. + +Python uses dynamic types. It is impossible to know in advance which attribute +will be referenced at runtime. Member attributes may be added or removed from +objects on the fly. This makes it impossible to know, from a simple reading, +what attribute is being referenced: a local one, a global one, or a member +attribute? + +For instance, take the following incomplete snippet:: + + def foo(a): + with a: + print x + +The snippet assumes that "a" must have a member attribute called "x". However, +there is nothing in Python that tells the interpreter this. What should happen +if "a" is, let us say, an integer? If there is a global variable named "x", +will it be used inside the with block? As you see, the dynamic nature of Python +makes such choices much harder. + +The primary benefit of "with" and similar language features (reduction of code +volume) can, however, easily be achieved in Python by assignment. Instead of:: + + function(args).dict[index][index].a = 21 + function(args).dict[index][index].b = 42 + function(args).dict[index][index].c = 63 + +write this:: + + ref = function(args).dict[index][index] + ref.a = 21 + ref.b = 42 + ref.c = 63 + +This also has the side-effect of increasing execution speed because name +bindings are resolved at run-time in Python, and the second version only needs +to perform the resolution once. If the referenced object does not have a, b and +c attributes, of course, the end result is still a run-time exception. + + +Why are colons required for the if/while/def/class statements? +-------------------------------------------------------------- + +The colon is required primarily to enhance readability (one of the results of +the experimental ABC language). Consider this:: + + if a == b + print a + +versus :: + + if a == b: + print a + +Notice how the second one is slightly easier to read. Notice further how a +colon sets off the example in this FAQ answer; it's a standard usage in English. + +Another minor reason is that the colon makes it easier for editors with syntax +highlighting; they can look for colons to decide when indentation needs to be +increased instead of having to do a more elaborate parsing of the program text. + + +Why does Python allow commas at the end of lists and tuples? +------------------------------------------------------------ + +Python lets you add a trailing comma at the end of lists, tuples, and +dictionaries:: + + [1, 2, 3,] + ('a', 'b', 'c',) + d = { + "A": [1, 5], + "B": [6, 7], # last trailing comma is optional but good style + } + + +There are several reasons to allow this. + +When you have a literal value for a list, tuple, or dictionary spread across +multiple lines, it's easier to add more elements because you don't have to +remember to add a comma to the previous line. The lines can also be sorted in +your editor without creating a syntax error. + +Accidentally omitting the comma can lead to errors that are hard to diagnose. +For example:: + + x = [ + "fee", + "fie" + "foo", + "fum" + ] + +This list looks like it has four elements, but it actually contains three: +"fee", "fiefoo" and "fum". Always adding the comma avoids this source of error. + +Allowing the trailing comma may also make programmatic code generation easier. |