summaryrefslogtreecommitdiffstats
path: root/Doc/howto
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/howto')
-rw-r--r--Doc/howto/argparse.rst229
-rw-r--r--Doc/howto/clinic.rst1740
-rw-r--r--Doc/howto/cporting.rst269
-rw-r--r--Doc/howto/curses.rst495
-rw-r--r--Doc/howto/descriptor.rst124
-rw-r--r--Doc/howto/doanddont.rst327
-rw-r--r--Doc/howto/functional.rst892
-rw-r--r--Doc/howto/index.rst5
-rw-r--r--Doc/howto/instrumentation.rst436
-rw-r--r--Doc/howto/ipaddress.rst340
-rw-r--r--Doc/howto/logging-cookbook.rst1561
-rw-r--r--Doc/howto/logging.rst159
-rwxr-xr-x[-rw-r--r--]Doc/howto/logging_flow.pngbin22058 -> 49648 bytes
-rw-r--r--Doc/howto/pyporting.rst26
-rw-r--r--Doc/howto/regex.rst430
-rw-r--r--Doc/howto/sockets.rst93
-rw-r--r--Doc/howto/sorting.rst72
-rw-r--r--Doc/howto/unicode.rst989
-rw-r--r--Doc/howto/urllib2.rst227
-rw-r--r--Doc/howto/webservers.rst735
20 files changed, 3141 insertions, 6008 deletions
diff --git a/Doc/howto/argparse.rst b/Doc/howto/argparse.rst
index e78a022..63b0b28 100644
--- a/Doc/howto/argparse.rst
+++ b/Doc/howto/argparse.rst
@@ -8,6 +8,8 @@ Argparse Tutorial
This tutorial is intended to be a gentle introduction to :mod:`argparse`, the
recommended command-line parsing module in the Python standard library.
+This was written for argparse in Python 3. A few details are different in 2.x,
+especially some exception messages, which were improved in 3.x.
.. note::
@@ -24,7 +26,7 @@ Concepts
Let's show the sort of functionality that we are going to explore in this
introductory tutorial by making use of the :command:`ls` command:
-.. code-block:: shell-session
+.. code-block:: sh
$ ls
cpython devguide prog.py pypy rm-unused-function.patch
@@ -77,18 +79,18 @@ Let us start with a very simple example which does (almost) nothing::
Following is a result of running the code:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py
- $ python3 prog.py --help
+ $ python prog.py
+ $ python prog.py --help
usage: prog.py [-h]
optional arguments:
-h, --help show this help message and exit
- $ python3 prog.py --verbose
+ $ python prog.py --verbose
usage: prog.py [-h]
prog.py: error: unrecognized arguments: --verbose
- $ python3 prog.py foo
+ $ python prog.py foo
usage: prog.py [-h]
prog.py: error: unrecognized arguments: foo
@@ -115,16 +117,16 @@ An example::
parser = argparse.ArgumentParser()
parser.add_argument("echo")
args = parser.parse_args()
- print(args.echo)
+ print args.echo
And running the code:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py
+ $ python prog.py
usage: prog.py [-h] echo
prog.py: error: the following arguments are required: echo
- $ python3 prog.py --help
+ $ python prog.py --help
usage: prog.py [-h] echo
positional arguments:
@@ -132,7 +134,7 @@ And running the code:
optional arguments:
-h, --help show this help message and exit
- $ python3 prog.py foo
+ $ python prog.py foo
foo
Here is what's happening:
@@ -160,13 +162,13 @@ by reading the source code. So, let's make it a bit more useful::
parser = argparse.ArgumentParser()
parser.add_argument("echo", help="echo the string you use here")
args = parser.parse_args()
- print(args.echo)
+ print args.echo
And we get:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py -h
+ $ python prog.py -h
usage: prog.py [-h] echo
positional arguments:
@@ -181,16 +183,16 @@ Now, how about doing something even more useful::
parser = argparse.ArgumentParser()
parser.add_argument("square", help="display a square of a given number")
args = parser.parse_args()
- print(args.square**2)
+ print args.square**2
Following is a result of running the code:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py 4
+ $ python prog.py 4
Traceback (most recent call last):
File "prog.py", line 5, in <module>
- print(args.square**2)
+ print args.square**2
TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'
That didn't go so well. That's because :mod:`argparse` treats the options we
@@ -202,15 +204,15 @@ give it as strings, unless we tell it otherwise. So, let's tell
parser.add_argument("square", help="display a square of a given number",
type=int)
args = parser.parse_args()
- print(args.square**2)
+ print args.square**2
Following is a result of running the code:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py 4
+ $ python prog.py 4
16
- $ python3 prog.py four
+ $ python prog.py four
usage: prog.py [-h] square
prog.py: error: argument square: invalid int value: 'four'
@@ -229,23 +231,23 @@ have a look on how to add optional ones::
parser.add_argument("--verbosity", help="increase output verbosity")
args = parser.parse_args()
if args.verbosity:
- print("verbosity turned on")
+ print "verbosity turned on"
And the output:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py --verbosity 1
+ $ python prog.py --verbosity 1
verbosity turned on
- $ python3 prog.py
- $ python3 prog.py --help
+ $ python prog.py
+ $ python prog.py --help
usage: prog.py [-h] [--verbosity VERBOSITY]
optional arguments:
-h, --help show this help message and exit
--verbosity VERBOSITY
increase output verbosity
- $ python3 prog.py --verbosity
+ $ python prog.py --verbosity
usage: prog.py [-h] [--verbosity VERBOSITY]
prog.py: error: argument --verbosity: expected one argument
@@ -275,18 +277,18 @@ Let's modify the code accordingly::
action="store_true")
args = parser.parse_args()
if args.verbose:
- print("verbosity turned on")
+ print "verbosity turned on"
And the output:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py --verbose
+ $ python prog.py --verbose
verbosity turned on
- $ python3 prog.py --verbose 1
+ $ python prog.py --verbose 1
usage: prog.py [-h] [--verbose]
prog.py: error: unrecognized arguments: 1
- $ python3 prog.py --help
+ $ python prog.py --help
usage: prog.py [-h] [--verbose]
optional arguments:
@@ -321,15 +323,15 @@ versions of the options. It's quite simple::
action="store_true")
args = parser.parse_args()
if args.verbose:
- print("verbosity turned on")
+ print "verbosity turned on"
And here goes:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py -v
+ $ python prog.py -v
verbosity turned on
- $ python3 prog.py --help
+ $ python prog.py --help
usage: prog.py [-h] [-v]
optional arguments:
@@ -353,22 +355,22 @@ Our program keeps growing in complexity::
args = parser.parse_args()
answer = args.square**2
if args.verbose:
- print("the square of {} equals {}".format(args.square, answer))
+ print "the square of {} equals {}".format(args.square, answer)
else:
- print(answer)
+ print answer
And now the output:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py
+ $ python prog.py
usage: prog.py [-h] [-v] square
prog.py: error: the following arguments are required: square
- $ python3 prog.py 4
+ $ python prog.py 4
16
- $ python3 prog.py 4 --verbose
+ $ python prog.py 4 --verbose
the square of 4 equals 16
- $ python3 prog.py --verbose 4
+ $ python prog.py --verbose 4
the square of 4 equals 16
* We've brought back a positional argument, hence the complaint.
@@ -387,26 +389,26 @@ multiple verbosity values, and actually get to use them::
args = parser.parse_args()
answer = args.square**2
if args.verbosity == 2:
- print("the square of {} equals {}".format(args.square, answer))
+ print "the square of {} equals {}".format(args.square, answer)
elif args.verbosity == 1:
- print("{}^2 == {}".format(args.square, answer))
+ print "{}^2 == {}".format(args.square, answer)
else:
- print(answer)
+ print answer
And the output:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py 4
+ $ python prog.py 4
16
- $ python3 prog.py 4 -v
+ $ python prog.py 4 -v
usage: prog.py [-h] [-v VERBOSITY] square
prog.py: error: argument -v/--verbosity: expected one argument
- $ python3 prog.py 4 -v 1
+ $ python prog.py 4 -v 1
4^2 == 16
- $ python3 prog.py 4 -v 2
+ $ python prog.py 4 -v 2
the square of 4 equals 16
- $ python3 prog.py 4 -v 3
+ $ python prog.py 4 -v 3
16
These all look good except the last one, which exposes a bug in our program.
@@ -421,20 +423,20 @@ Let's fix it by restricting the values the ``--verbosity`` option can accept::
args = parser.parse_args()
answer = args.square**2
if args.verbosity == 2:
- print("the square of {} equals {}".format(args.square, answer))
+ print "the square of {} equals {}".format(args.square, answer)
elif args.verbosity == 1:
- print("{}^2 == {}".format(args.square, answer))
+ print "{}^2 == {}".format(args.square, answer)
else:
- print(answer)
+ print answer
And the output:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py 4 -v 3
+ $ python prog.py 4 -v 3
usage: prog.py [-h] [-v {0,1,2}] square
prog.py: error: argument -v/--verbosity: invalid choice: 3 (choose from 0, 1, 2)
- $ python3 prog.py 4 -h
+ $ python prog.py 4 -h
usage: prog.py [-h] [-v {0,1,2}] square
positional arguments:
@@ -461,29 +463,29 @@ verbosity argument (check the output of ``python --help``)::
args = parser.parse_args()
answer = args.square**2
if args.verbosity == 2:
- print("the square of {} equals {}".format(args.square, answer))
+ print "the square of {} equals {}".format(args.square, answer)
elif args.verbosity == 1:
- print("{}^2 == {}".format(args.square, answer))
+ print "{}^2 == {}".format(args.square, answer)
else:
- print(answer)
+ print answer
We have introduced another action, "count",
to count the number of occurrences of a specific optional arguments:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py 4
+ $ python prog.py 4
16
- $ python3 prog.py 4 -v
+ $ python prog.py 4 -v
4^2 == 16
- $ python3 prog.py 4 -vv
+ $ python prog.py 4 -vv
the square of 4 equals 16
- $ python3 prog.py 4 --verbosity --verbosity
+ $ python prog.py 4 --verbosity --verbosity
the square of 4 equals 16
- $ python3 prog.py 4 -v 1
+ $ python prog.py 4 -v 1
usage: prog.py [-h] [-v] square
prog.py: error: unrecognized arguments: 1
- $ python3 prog.py 4 -h
+ $ python prog.py 4 -h
usage: prog.py [-h] [-v] square
positional arguments:
@@ -492,7 +494,7 @@ to count the number of occurrences of a specific optional arguments:
optional arguments:
-h, --help show this help message and exit
-v, --verbosity increase output verbosity
- $ python3 prog.py 4 -vvv
+ $ python prog.py 4 -vvv
16
* Yes, it's now more of a flag (similar to ``action="store_true"``) in the
@@ -503,8 +505,8 @@ to count the number of occurrences of a specific optional arguments:
* Now here's a demonstration of what the "count" action gives. You've probably
seen this sort of usage before.
-* And if you don't specify the ``-v`` flag, that flag is considered to have
- ``None`` value.
+* And, just like the "store_true" action, if you don't specify the ``-v`` flag,
+ that flag is considered to have ``None`` value.
* As should be expected, specifying the long form of the flag, we should get
the same output.
@@ -529,26 +531,25 @@ Let's fix::
# bugfix: replace == with >=
if args.verbosity >= 2:
- print("the square of {} equals {}".format(args.square, answer))
+ print "the square of {} equals {}".format(args.square, answer)
elif args.verbosity >= 1:
- print("{}^2 == {}".format(args.square, answer))
+ print "{}^2 == {}".format(args.square, answer)
else:
- print(answer)
+ print answer
And this is what it gives:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py 4 -vvv
+ $ python prog.py 4 -vvv
the square of 4 equals 16
- $ python3 prog.py 4 -vvvv
+ $ python prog.py 4 -vvvv
the square of 4 equals 16
- $ python3 prog.py 4
+ $ python prog.py 4
Traceback (most recent call last):
File "prog.py", line 11, in <module>
if args.verbosity >= 2:
- TypeError: '>=' not supported between instances of 'NoneType' and 'int'
-
+ TypeError: unorderable types: NoneType() >= int()
* First output went well, and fixes the bug we had before.
That is, we want any value >= 2 to be as verbose as possible.
@@ -566,11 +567,11 @@ Let's fix that bug::
args = parser.parse_args()
answer = args.square**2
if args.verbosity >= 2:
- print("the square of {} equals {}".format(args.square, answer))
+ print "the square of {} equals {}".format(args.square, answer)
elif args.verbosity >= 1:
- print("{}^2 == {}".format(args.square, answer))
+ print "{}^2 == {}".format(args.square, answer)
else:
- print(answer)
+ print answer
We've just introduced yet another keyword, ``default``.
We've set it to ``0`` in order to make it comparable to the other int values.
@@ -581,9 +582,9 @@ it gets the ``None`` value, and that cannot be compared to an int value
And:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py 4
+ $ python prog.py 4
16
You can go quite far just with what we've learned so far,
@@ -606,20 +607,20 @@ not just squares::
args = parser.parse_args()
answer = args.x**args.y
if args.verbosity >= 2:
- print("{} to the power {} equals {}".format(args.x, args.y, answer))
+ print "{} to the power {} equals {}".format(args.x, args.y, answer)
elif args.verbosity >= 1:
- print("{}^{} == {}".format(args.x, args.y, answer))
+ print "{}^{} == {}".format(args.x, args.y, answer)
else:
- print(answer)
+ print answer
Output:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py
+ $ python prog.py
usage: prog.py [-h] [-v] x y
prog.py: error: the following arguments are required: x, y
- $ python3 prog.py -h
+ $ python prog.py -h
usage: prog.py [-h] [-v] x y
positional arguments:
@@ -629,7 +630,7 @@ Output:
optional arguments:
-h, --help show this help message and exit
-v, --verbosity
- $ python3 prog.py 4 2 -v
+ $ python prog.py 4 2 -v
4^2 == 16
@@ -645,20 +646,20 @@ to display *more* text instead::
args = parser.parse_args()
answer = args.x**args.y
if args.verbosity >= 2:
- print("Running '{}'".format(__file__))
+ print "Running '{}'".format(__file__)
if args.verbosity >= 1:
- print("{}^{} == ".format(args.x, args.y), end="")
- print(answer)
+ print "{}^{} ==".format(args.x, args.y),
+ print answer
Output:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py 4 2
+ $ python prog.py 4 2
16
- $ python3 prog.py 4 2 -v
+ $ python prog.py 4 2 -v
4^2 == 16
- $ python3 prog.py 4 2 -vv
+ $ python prog.py 4 2 -vv
Running 'prog.py'
4^2 == 16
@@ -686,27 +687,27 @@ which will be the opposite of the ``--verbose`` one::
answer = args.x**args.y
if args.quiet:
- print(answer)
+ print answer
elif args.verbose:
- print("{} to the power {} equals {}".format(args.x, args.y, answer))
+ print "{} to the power {} equals {}".format(args.x, args.y, answer)
else:
- print("{}^{} == {}".format(args.x, args.y, answer))
+ print "{}^{} == {}".format(args.x, args.y, answer)
Our program is now simpler, and we've lost some functionality for the sake of
demonstration. Anyways, here's the output:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py 4 2
+ $ python prog.py 4 2
4^2 == 16
- $ python3 prog.py 4 2 -q
+ $ python prog.py 4 2 -q
16
- $ python3 prog.py 4 2 -v
+ $ python prog.py 4 2 -v
4 to the power 2 equals 16
- $ python3 prog.py 4 2 -vq
+ $ python prog.py 4 2 -vq
usage: prog.py [-h] [-v | -q] x y
prog.py: error: argument -q/--quiet: not allowed with argument -v/--verbose
- $ python3 prog.py 4 2 -v --quiet
+ $ python prog.py 4 2 -v --quiet
usage: prog.py [-h] [-v | -q] x y
prog.py: error: argument -q/--quiet: not allowed with argument -v/--verbose
@@ -729,19 +730,19 @@ your program, just in case they don't know::
answer = args.x**args.y
if args.quiet:
- print(answer)
+ print answer
elif args.verbose:
- print("{} to the power {} equals {}".format(args.x, args.y, answer))
+ print "{} to the power {} equals {}".format(args.x, args.y, answer)
else:
- print("{}^{} == {}".format(args.x, args.y, answer))
+ print "{}^{} == {}".format(args.x, args.y, answer)
Note that slight difference in the usage text. Note the ``[-v | -q]``,
which tells us that we can either use ``-v`` or ``-q``,
but not both at the same time:
-.. code-block:: shell-session
+.. code-block:: sh
- $ python3 prog.py --help
+ $ python prog.py --help
usage: prog.py [-h] [-v | -q] x y
calculate X to the power of Y
diff --git a/Doc/howto/clinic.rst b/Doc/howto/clinic.rst
deleted file mode 100644
index 5004182..0000000
--- a/Doc/howto/clinic.rst
+++ /dev/null
@@ -1,1740 +0,0 @@
-.. highlight:: c
-
-**********************
-Argument Clinic How-To
-**********************
-
-:author: Larry Hastings
-
-
-.. topic:: Abstract
-
- Argument Clinic is a preprocessor for CPython C files.
- Its purpose is to automate all the boilerplate involved
- with writing argument parsing code for "builtins".
- This document shows you how to convert your first C
- function to work with Argument Clinic, and then introduces
- some advanced topics on Argument Clinic usage.
-
- Currently Argument Clinic is considered internal-only
- for CPython. Its use is not supported for files outside
- CPython, and no guarantees are made regarding backwards
- compatibility for future versions. In other words: if you
- maintain an external C extension for CPython, you're welcome
- to experiment with Argument Clinic in your own code. But the
- version of Argument Clinic that ships with the next version
- of CPython *could* be totally incompatible and break all your code.
-
-The Goals Of Argument Clinic
-============================
-
-Argument Clinic's primary goal
-is to take over responsibility for all argument parsing code
-inside CPython. This means that, when you convert a function
-to work with Argument Clinic, that function should no longer
-do any of its own argument parsing—the code generated by
-Argument Clinic should be a "black box" to you, where CPython
-calls in at the top, and your code gets called at the bottom,
-with ``PyObject *args`` (and maybe ``PyObject *kwargs``)
-magically converted into the C variables and types you need.
-
-In order for Argument Clinic to accomplish its primary goal,
-it must be easy to use. Currently, working with CPython's
-argument parsing library is a chore, requiring maintaining
-redundant information in a surprising number of places.
-When you use Argument Clinic, you don't have to repeat yourself.
-
-Obviously, no one would want to use Argument Clinic unless
-it's solving their problem—and without creating new problems of
-its own.
-So it's paramount that Argument Clinic generate correct code.
-It'd be nice if the code was faster, too, but at the very least
-it should not introduce a major speed regression. (Eventually Argument
-Clinic *should* make a major speedup possible—we could
-rewrite its code generator to produce tailor-made argument
-parsing code, rather than calling the general-purpose CPython
-argument parsing library. That would make for the fastest
-argument parsing possible!)
-
-Additionally, Argument Clinic must be flexible enough to
-work with any approach to argument parsing. Python has
-some functions with some very strange parsing behaviors;
-Argument Clinic's goal is to support all of them.
-
-Finally, the original motivation for Argument Clinic was
-to provide introspection "signatures" for CPython builtins.
-It used to be, the introspection query functions would throw
-an exception if you passed in a builtin. With Argument
-Clinic, that's a thing of the past!
-
-One idea you should keep in mind, as you work with
-Argument Clinic: the more information you give it, the
-better job it'll be able to do.
-Argument Clinic is admittedly relatively simple right
-now. But as it evolves it will get more sophisticated,
-and it should be able to do many interesting and smart
-things with all the information you give it.
-
-
-Basic Concepts And Usage
-========================
-
-Argument Clinic ships with CPython; you'll find it in ``Tools/clinic/clinic.py``.
-If you run that script, specifying a C file as an argument:
-
-.. code-block:: shell-session
-
- $ python3 Tools/clinic/clinic.py foo.c
-
-Argument Clinic will scan over the file looking for lines that
-look exactly like this:
-
-.. code-block:: none
-
- /*[clinic input]
-
-When it finds one, it reads everything up to a line that looks
-exactly like this:
-
-.. code-block:: none
-
- [clinic start generated code]*/
-
-Everything in between these two lines is input for Argument Clinic.
-All of these lines, including the beginning and ending comment
-lines, are collectively called an Argument Clinic "block".
-
-When Argument Clinic parses one of these blocks, it
-generates output. This output is rewritten into the C file
-immediately after the block, followed by a comment containing a checksum.
-The Argument Clinic block now looks like this:
-
-.. code-block:: none
-
- /*[clinic input]
- ... clinic input goes here ...
- [clinic start generated code]*/
- ... clinic output goes here ...
- /*[clinic end generated code: checksum=...]*/
-
-If you run Argument Clinic on the same file a second time, Argument Clinic
-will discard the old output and write out the new output with a fresh checksum
-line. However, if the input hasn't changed, the output won't change either.
-
-You should never modify the output portion of an Argument Clinic block. Instead,
-change the input until it produces the output you want. (That's the purpose of the
-checksum—to detect if someone changed the output, as these edits would be lost
-the next time Argument Clinic writes out fresh output.)
-
-For the sake of clarity, here's the terminology we'll use with Argument Clinic:
-
-* The first line of the comment (``/*[clinic input]``) is the *start line*.
-* The last line of the initial comment (``[clinic start generated code]*/``) is the *end line*.
-* The last line (``/*[clinic end generated code: checksum=...]*/``) is the *checksum line*.
-* In between the start line and the end line is the *input*.
-* In between the end line and the checksum line is the *output*.
-* All the text collectively, from the start line to the checksum line inclusively,
- is the *block*. (A block that hasn't been successfully processed by Argument
- Clinic yet doesn't have output or a checksum line, but it's still considered
- a block.)
-
-
-Converting Your First Function
-==============================
-
-The best way to get a sense of how Argument Clinic works is to
-convert a function to work with it. Here, then, are the bare
-minimum steps you'd need to follow to convert a function to
-work with Argument Clinic. Note that for code you plan to
-check in to CPython, you really should take the conversion farther,
-using some of the advanced concepts you'll see later on in
-the document (like "return converters" and "self converters").
-But we'll keep it simple for this walkthrough so you can learn.
-
-Let's dive in!
-
-0. Make sure you're working with a freshly updated checkout
- of the CPython trunk.
-
-1. Find a Python builtin that calls either :c:func:`PyArg_ParseTuple`
- or :c:func:`PyArg_ParseTupleAndKeywords`, and hasn't been converted
- to work with Argument Clinic yet.
- For my example I'm using ``_pickle.Pickler.dump()``.
-
-2. If the call to the ``PyArg_Parse`` function uses any of the
- following format units:
-
- .. code-block:: none
-
- O&
- O!
- es
- es#
- et
- et#
-
- or if it has multiple calls to :c:func:`PyArg_ParseTuple`,
- you should choose a different function. Argument Clinic *does*
- support all of these scenarios. But these are advanced
- topics—let's do something simpler for your first function.
-
- Also, if the function has multiple calls to :c:func:`PyArg_ParseTuple`
- or :c:func:`PyArg_ParseTupleAndKeywords` where it supports different
- types for the same argument, or if the function uses something besides
- PyArg_Parse functions to parse its arguments, it probably
- isn't suitable for conversion to Argument Clinic. Argument Clinic
- doesn't support generic functions or polymorphic parameters.
-
-3. Add the following boilerplate above the function, creating our block::
-
- /*[clinic input]
- [clinic start generated code]*/
-
-4. Cut the docstring and paste it in between the ``[clinic]`` lines,
- removing all the junk that makes it a properly quoted C string.
- When you're done you should have just the text, based at the left
- margin, with no line wider than 80 characters.
- (Argument Clinic will preserve indents inside the docstring.)
-
- If the old docstring had a first line that looked like a function
- signature, throw that line away. (The docstring doesn't need it
- anymore—when you use ``help()`` on your builtin in the future,
- the first line will be built automatically based on the function's
- signature.)
-
- Sample::
-
- /*[clinic input]
- Write a pickled representation of obj to the open file.
- [clinic start generated code]*/
-
-5. If your docstring doesn't have a "summary" line, Argument Clinic will
- complain. So let's make sure it has one. The "summary" line should
- be a paragraph consisting of a single 80-column line
- at the beginning of the docstring.
-
- (Our example docstring consists solely of a summary line, so the sample
- code doesn't have to change for this step.)
-
-6. Above the docstring, enter the name of the function, followed
- by a blank line. This should be the Python name of the function,
- and should be the full dotted path
- to the function—it should start with the name of the module,
- include any sub-modules, and if the function is a method on
- a class it should include the class name too.
-
- Sample::
-
- /*[clinic input]
- _pickle.Pickler.dump
-
- Write a pickled representation of obj to the open file.
- [clinic start generated code]*/
-
-7. If this is the first time that module or class has been used with Argument
- Clinic in this C file,
- you must declare the module and/or class. Proper Argument Clinic hygiene
- prefers declaring these in a separate block somewhere near the
- top of the C file, in the same way that include files and statics go at
- the top. (In our sample code we'll just show the two blocks next to
- each other.)
-
- The name of the class and module should be the same as the one
- seen by Python. Check the name defined in the :c:type:`PyModuleDef`
- or :c:type:`PyTypeObject` as appropriate.
-
- When you declare a class, you must also specify two aspects of its type
- in C: the type declaration you'd use for a pointer to an instance of
- this class, and a pointer to the :c:type:`PyTypeObject` for this class.
-
- Sample::
-
- /*[clinic input]
- module _pickle
- class _pickle.Pickler "PicklerObject *" "&Pickler_Type"
- [clinic start generated code]*/
-
- /*[clinic input]
- _pickle.Pickler.dump
-
- Write a pickled representation of obj to the open file.
- [clinic start generated code]*/
-
-
-
-
-8. Declare each of the parameters to the function. Each parameter
- should get its own line. All the parameter lines should be
- indented from the function name and the docstring.
-
- The general form of these parameter lines is as follows:
-
- .. code-block:: none
-
- name_of_parameter: converter
-
- If the parameter has a default value, add that after the
- converter:
-
- .. code-block:: none
-
- name_of_parameter: converter = default_value
-
- Argument Clinic's support for "default values" is quite sophisticated;
- please see :ref:`the section below on default values <default_values>`
- for more information.
-
- Add a blank line below the parameters.
-
- What's a "converter"? It establishes both the type
- of the variable used in C, and the method to convert the Python
- value into a C value at runtime.
- For now you're going to use what's called a "legacy converter"—a
- convenience syntax intended to make porting old code into Argument
- Clinic easier.
-
- For each parameter, copy the "format unit" for that
- parameter from the ``PyArg_Parse()`` format argument and
- specify *that* as its converter, as a quoted
- string. ("format unit" is the formal name for the one-to-three
- character substring of the ``format`` parameter that tells
- the argument parsing function what the type of the variable
- is and how to convert it. For more on format units please
- see :ref:`arg-parsing`.)
-
- For multicharacter format units like ``z#``, use the
- entire two-or-three character string.
-
- Sample::
-
- /*[clinic input]
- module _pickle
- class _pickle.Pickler "PicklerObject *" "&Pickler_Type"
- [clinic start generated code]*/
-
- /*[clinic input]
- _pickle.Pickler.dump
-
- obj: 'O'
-
- Write a pickled representation of obj to the open file.
- [clinic start generated code]*/
-
-9. If your function has ``|`` in the format string, meaning some
- parameters have default values, you can ignore it. Argument
- Clinic infers which parameters are optional based on whether
- or not they have default values.
-
- If your function has ``$`` in the format string, meaning it
- takes keyword-only arguments, specify ``*`` on a line by
- itself before the first keyword-only argument, indented the
- same as the parameter lines.
-
- (``_pickle.Pickler.dump`` has neither, so our sample is unchanged.)
-
-
-10. If the existing C function calls :c:func:`PyArg_ParseTuple`
- (as opposed to :c:func:`PyArg_ParseTupleAndKeywords`), then all its
- arguments are positional-only.
-
- To mark all parameters as positional-only in Argument Clinic,
- add a ``/`` on a line by itself after the last parameter,
- indented the same as the parameter lines.
-
- Currently this is all-or-nothing; either all parameters are
- positional-only, or none of them are. (In the future Argument
- Clinic may relax this restriction.)
-
- Sample::
-
- /*[clinic input]
- module _pickle
- class _pickle.Pickler "PicklerObject *" "&Pickler_Type"
- [clinic start generated code]*/
-
- /*[clinic input]
- _pickle.Pickler.dump
-
- obj: 'O'
- /
-
- Write a pickled representation of obj to the open file.
- [clinic start generated code]*/
-
-11. It's helpful to write a per-parameter docstring for each parameter.
- But per-parameter docstrings are optional; you can skip this step
- if you prefer.
-
- Here's how to add a per-parameter docstring. The first line
- of the per-parameter docstring must be indented further than the
- parameter definition. The left margin of this first line establishes
- the left margin for the whole per-parameter docstring; all the text
- you write will be outdented by this amount. You can write as much
- text as you like, across multiple lines if you wish.
-
- Sample::
-
- /*[clinic input]
- module _pickle
- class _pickle.Pickler "PicklerObject *" "&Pickler_Type"
- [clinic start generated code]*/
-
- /*[clinic input]
- _pickle.Pickler.dump
-
- obj: 'O'
- The object to be pickled.
- /
-
- Write a pickled representation of obj to the open file.
- [clinic start generated code]*/
-
-12. Save and close the file, then run ``Tools/clinic/clinic.py`` on
- it. With luck everything worked---your block now has output, and
- a ``.c.h`` file has been generated! Reopen the file in your
- text editor to see::
-
- /*[clinic input]
- _pickle.Pickler.dump
-
- obj: 'O'
- The object to be pickled.
- /
-
- Write a pickled representation of obj to the open file.
- [clinic start generated code]*/
-
- static PyObject *
- _pickle_Pickler_dump(PicklerObject *self, PyObject *obj)
- /*[clinic end generated code: output=87ecad1261e02ac7 input=552eb1c0f52260d9]*/
-
- Obviously, if Argument Clinic didn't produce any output, it's because
- it found an error in your input. Keep fixing your errors and retrying
- until Argument Clinic processes your file without complaint.
-
- For readability, most of the glue code has been generated to a ``.c.h``
- file. You'll need to include that in your original ``.c`` file,
- typically right after the clinic module block::
-
- #include "clinic/_pickle.c.h"
-
-13. Double-check that the argument-parsing code Argument Clinic generated
- looks basically the same as the existing code.
-
- First, ensure both places use the same argument-parsing function.
- The existing code must call either
- :c:func:`PyArg_ParseTuple` or :c:func:`PyArg_ParseTupleAndKeywords`;
- ensure that the code generated by Argument Clinic calls the
- *exact* same function.
-
- Second, the format string passed in to :c:func:`PyArg_ParseTuple` or
- :c:func:`PyArg_ParseTupleAndKeywords` should be *exactly* the same
- as the hand-written one in the existing function, up to the colon
- or semi-colon.
-
- (Argument Clinic always generates its format strings
- with a ``:`` followed by the name of the function. If the
- existing code's format string ends with ``;``, to provide
- usage help, this change is harmless—don't worry about it.)
-
- Third, for parameters whose format units require two arguments
- (like a length variable, or an encoding string, or a pointer
- to a conversion function), ensure that the second argument is
- *exactly* the same between the two invocations.
-
- Fourth, inside the output portion of the block you'll find a preprocessor
- macro defining the appropriate static :c:type:`PyMethodDef` structure for
- this builtin::
-
- #define __PICKLE_PICKLER_DUMP_METHODDEF \
- {"dump", (PyCFunction)__pickle_Pickler_dump, METH_O, __pickle_Pickler_dump__doc__},
-
- This static structure should be *exactly* the same as the existing static
- :c:type:`PyMethodDef` structure for this builtin.
-
- If any of these items differ in *any way*,
- adjust your Argument Clinic function specification and rerun
- ``Tools/clinic/clinic.py`` until they *are* the same.
-
-
-14. Notice that the last line of its output is the declaration
- of your "impl" function. This is where the builtin's implementation goes.
- Delete the existing prototype of the function you're modifying, but leave
- the opening curly brace. Now delete its argument parsing code and the
- declarations of all the variables it dumps the arguments into.
- Notice how the Python arguments are now arguments to this impl function;
- if the implementation used different names for these variables, fix it.
-
- Let's reiterate, just because it's kind of weird. Your code should now
- look like this::
-
- static return_type
- your_function_impl(...)
- /*[clinic end generated code: checksum=...]*/
- {
- ...
-
- Argument Clinic generated the checksum line and the function prototype just
- above it. You should write the opening (and closing) curly braces for the
- function, and the implementation inside.
-
- Sample::
-
- /*[clinic input]
- module _pickle
- class _pickle.Pickler "PicklerObject *" "&Pickler_Type"
- [clinic start generated code]*/
- /*[clinic end generated code: checksum=da39a3ee5e6b4b0d3255bfef95601890afd80709]*/
-
- /*[clinic input]
- _pickle.Pickler.dump
-
- obj: 'O'
- The object to be pickled.
- /
-
- Write a pickled representation of obj to the open file.
- [clinic start generated code]*/
-
- PyDoc_STRVAR(__pickle_Pickler_dump__doc__,
- "Write a pickled representation of obj to the open file.\n"
- "\n"
- ...
- static PyObject *
- _pickle_Pickler_dump_impl(PicklerObject *self, PyObject *obj)
- /*[clinic end generated code: checksum=3bd30745bf206a48f8b576a1da3d90f55a0a4187]*/
- {
- /* Check whether the Pickler was initialized correctly (issue3664).
- Developers often forget to call __init__() in their subclasses, which
- would trigger a segfault without this check. */
- if (self->write == NULL) {
- PyErr_Format(PicklingError,
- "Pickler.__init__() was not called by %s.__init__()",
- Py_TYPE(self)->tp_name);
- return NULL;
- }
-
- if (_Pickler_ClearBuffer(self) < 0)
- return NULL;
-
- ...
-
-15. Remember the macro with the :c:type:`PyMethodDef` structure for this
- function? Find the existing :c:type:`PyMethodDef` structure for this
- function and replace it with a reference to the macro. (If the builtin
- is at module scope, this will probably be very near the end of the file;
- if the builtin is a class method, this will probably be below but relatively
- near to the implementation.)
-
- Note that the body of the macro contains a trailing comma. So when you
- replace the existing static :c:type:`PyMethodDef` structure with the macro,
- *don't* add a comma to the end.
-
- Sample::
-
- static struct PyMethodDef Pickler_methods[] = {
- __PICKLE_PICKLER_DUMP_METHODDEF
- __PICKLE_PICKLER_CLEAR_MEMO_METHODDEF
- {NULL, NULL} /* sentinel */
- };
-
-
-16. Compile, then run the relevant portions of the regression-test suite.
- This change should not introduce any new compile-time warnings or errors,
- and there should be no externally-visible change to Python's behavior.
-
- Well, except for one difference: ``inspect.signature()`` run on your function
- should now provide a valid signature!
-
- Congratulations, you've ported your first function to work with Argument Clinic!
-
-Advanced Topics
-===============
-
-Now that you've had some experience working with Argument Clinic, it's time
-for some advanced topics.
-
-
-Symbolic default values
------------------------
-
-The default value you provide for a parameter can't be any arbitrary
-expression. Currently the following are explicitly supported:
-
-* Numeric constants (integer and float)
-* String constants
-* ``True``, ``False``, and ``None``
-* Simple symbolic constants like ``sys.maxsize``, which must
- start with the name of the module
-
-In case you're curious, this is implemented in ``from_builtin()``
-in ``Lib/inspect.py``.
-
-(In the future, this may need to get even more elaborate,
-to allow full expressions like ``CONSTANT - 1``.)
-
-
-Renaming the C functions and variables generated by Argument Clinic
--------------------------------------------------------------------
-
-Argument Clinic automatically names the functions it generates for you.
-Occasionally this may cause a problem, if the generated name collides with
-the name of an existing C function. There's an easy solution: override the names
-used for the C functions. Just add the keyword ``"as"``
-to your function declaration line, followed by the function name you wish to use.
-Argument Clinic will use that function name for the base (generated) function,
-then add ``"_impl"`` to the end and use that for the name of the impl function.
-
-For example, if we wanted to rename the C function names generated for
-``pickle.Pickler.dump``, it'd look like this::
-
- /*[clinic input]
- pickle.Pickler.dump as pickler_dumper
-
- ...
-
-The base function would now be named ``pickler_dumper()``,
-and the impl function would now be named ``pickler_dumper_impl()``.
-
-
-Similarly, you may have a problem where you want to give a parameter
-a specific Python name, but that name may be inconvenient in C. Argument
-Clinic allows you to give a parameter different names in Python and in C,
-using the same ``"as"`` syntax::
-
- /*[clinic input]
- pickle.Pickler.dump
-
- obj: object
- file as file_obj: object
- protocol: object = NULL
- *
- fix_imports: bool = True
-
-Here, the name used in Python (in the signature and the ``keywords``
-array) would be ``file``, but the C variable would be named ``file_obj``.
-
-You can use this to rename the ``self`` parameter too!
-
-
-Converting functions using PyArg_UnpackTuple
---------------------------------------------
-
-To convert a function parsing its arguments with :c:func:`PyArg_UnpackTuple`,
-simply write out all the arguments, specifying each as an ``object``. You
-may specify the ``type`` argument to cast the type as appropriate. All
-arguments should be marked positional-only (add a ``/`` on a line by itself
-after the last argument).
-
-Currently the generated code will use :c:func:`PyArg_ParseTuple`, but this
-will change soon.
-
-Optional Groups
----------------
-
-Some legacy functions have a tricky approach to parsing their arguments:
-they count the number of positional arguments, then use a ``switch`` statement
-to call one of several different :c:func:`PyArg_ParseTuple` calls depending on
-how many positional arguments there are. (These functions cannot accept
-keyword-only arguments.) This approach was used to simulate optional
-arguments back before :c:func:`PyArg_ParseTupleAndKeywords` was created.
-
-While functions using this approach can often be converted to
-use :c:func:`PyArg_ParseTupleAndKeywords`, optional arguments, and default values,
-it's not always possible. Some of these legacy functions have
-behaviors :c:func:`PyArg_ParseTupleAndKeywords` doesn't directly support.
-The most obvious example is the builtin function ``range()``, which has
-an optional argument on the *left* side of its required argument!
-Another example is ``curses.window.addch()``, which has a group of two
-arguments that must always be specified together. (The arguments are
-called ``x`` and ``y``; if you call the function passing in ``x``,
-you must also pass in ``y``—and if you don't pass in ``x`` you may not
-pass in ``y`` either.)
-
-In any case, the goal of Argument Clinic is to support argument parsing
-for all existing CPython builtins without changing their semantics.
-Therefore Argument Clinic supports
-this alternate approach to parsing, using what are called *optional groups*.
-Optional groups are groups of arguments that must all be passed in together.
-They can be to the left or the right of the required arguments. They
-can *only* be used with positional-only parameters.
-
-.. note:: Optional groups are *only* intended for use when converting
- functions that make multiple calls to :c:func:`PyArg_ParseTuple`!
- Functions that use *any* other approach for parsing arguments
- should *almost never* be converted to Argument Clinic using
- optional groups. Functions using optional groups currently
- cannot have accurate signatures in Python, because Python just
- doesn't understand the concept. Please avoid using optional
- groups wherever possible.
-
-To specify an optional group, add a ``[`` on a line by itself before
-the parameters you wish to group together, and a ``]`` on a line by itself
-after these parameters. As an example, here's how ``curses.window.addch``
-uses optional groups to make the first two parameters and the last
-parameter optional::
-
- /*[clinic input]
-
- curses.window.addch
-
- [
- x: int
- X-coordinate.
- y: int
- Y-coordinate.
- ]
-
- ch: object
- Character to add.
-
- [
- attr: long
- Attributes for the character.
- ]
- /
-
- ...
-
-
-Notes:
-
-* For every optional group, one additional parameter will be passed into the
- impl function representing the group. The parameter will be an int named
- ``group_{direction}_{number}``,
- where ``{direction}`` is either ``right`` or ``left`` depending on whether the group
- is before or after the required parameters, and ``{number}`` is a monotonically
- increasing number (starting at 1) indicating how far away the group is from
- the required parameters. When the impl is called, this parameter will be set
- to zero if this group was unused, and set to non-zero if this group was used.
- (By used or unused, I mean whether or not the parameters received arguments
- in this invocation.)
-
-* If there are no required arguments, the optional groups will behave
- as if they're to the right of the required arguments.
-
-* In the case of ambiguity, the argument parsing code
- favors parameters on the left (before the required parameters).
-
-* Optional groups can only contain positional-only parameters.
-
-* Optional groups are *only* intended for legacy code. Please do not
- use optional groups for new code.
-
-
-Using real Argument Clinic converters, instead of "legacy converters"
----------------------------------------------------------------------
-
-To save time, and to minimize how much you need to learn
-to achieve your first port to Argument Clinic, the walkthrough above tells
-you to use "legacy converters". "Legacy converters" are a convenience,
-designed explicitly to make porting existing code to Argument Clinic
-easier. And to be clear, their use is acceptable when porting code for
-Python 3.4.
-
-However, in the long term we probably want all our blocks to
-use Argument Clinic's real syntax for converters. Why? A couple
-reasons:
-
-* The proper converters are far easier to read and clearer in their intent.
-* There are some format units that are unsupported as "legacy converters",
- because they require arguments, and the legacy converter syntax doesn't
- support specifying arguments.
-* In the future we may have a new argument parsing library that isn't
- restricted to what :c:func:`PyArg_ParseTuple` supports; this flexibility
- won't be available to parameters using legacy converters.
-
-Therefore, if you don't mind a little extra effort, please use the normal
-converters instead of legacy converters.
-
-In a nutshell, the syntax for Argument Clinic (non-legacy) converters
-looks like a Python function call. However, if there are no explicit
-arguments to the function (all functions take their default values),
-you may omit the parentheses. Thus ``bool`` and ``bool()`` are exactly
-the same converters.
-
-All arguments to Argument Clinic converters are keyword-only.
-All Argument Clinic converters accept the following arguments:
-
- ``c_default``
- The default value for this parameter when defined in C.
- Specifically, this will be the initializer for the variable declared
- in the "parse function". See :ref:`the section on default values <default_values>`
- for how to use this.
- Specified as a string.
-
- ``annotation``
- The annotation value for this parameter. Not currently supported,
- because :pep:`8` mandates that the Python library may not use
- annotations.
-
-In addition, some converters accept additional arguments. Here is a list
-of these arguments, along with their meanings:
-
- ``accept``
- A set of Python types (and possibly pseudo-types);
- this restricts the allowable Python argument to values of these types.
- (This is not a general-purpose facility; as a rule it only supports
- specific lists of types as shown in the legacy converter table.)
-
- To accept ``None``, add ``NoneType`` to this set.
-
- ``bitwise``
- Only supported for unsigned integers. The native integer value of this
- Python argument will be written to the parameter without any range checking,
- even for negative values.
-
- ``converter``
- Only supported by the ``object`` converter. Specifies the name of a
- :ref:`C "converter function" <o_ampersand>`
- to use to convert this object to a native type.
-
- ``encoding``
- Only supported for strings. Specifies the encoding to use when converting
- this string from a Python str (Unicode) value into a C ``char *`` value.
-
-
- ``subclass_of``
- Only supported for the ``object`` converter. Requires that the Python
- value be a subclass of a Python type, as expressed in C.
-
- ``type``
- Only supported for the ``object`` and ``self`` converters. Specifies
- the C type that will be used to declare the variable. Default value is
- ``"PyObject *"``.
-
- ``zeroes``
- Only supported for strings. If true, embedded NUL bytes (``'\\0'``) are
- permitted inside the value. The length of the string will be passed in
- to the impl function, just after the string parameter, as a parameter named
- ``<parameter_name>_length``.
-
-Please note, not every possible combination of arguments will work.
-Usually these arguments are implemented by specific ``PyArg_ParseTuple``
-*format units*, with specific behavior. For example, currently you cannot
-call ``unsigned_short`` without also specifying ``bitwise=True``.
-Although it's perfectly reasonable to think this would work, these semantics don't
-map to any existing format unit. So Argument Clinic doesn't support it. (Or, at
-least, not yet.)
-
-Below is a table showing the mapping of legacy converters into real
-Argument Clinic converters. On the left is the legacy converter,
-on the right is the text you'd replace it with.
-
-========= =================================================================================
-``'B'`` ``unsigned_char(bitwise=True)``
-``'b'`` ``unsigned_char``
-``'c'`` ``char``
-``'C'`` ``int(accept={str})``
-``'d'`` ``double``
-``'D'`` ``Py_complex``
-``'es'`` ``str(encoding='name_of_encoding')``
-``'es#'`` ``str(encoding='name_of_encoding', zeroes=True)``
-``'et'`` ``str(encoding='name_of_encoding', accept={bytes, bytearray, str})``
-``'et#'`` ``str(encoding='name_of_encoding', accept={bytes, bytearray, str}, zeroes=True)``
-``'f'`` ``float``
-``'h'`` ``short``
-``'H'`` ``unsigned_short(bitwise=True)``
-``'i'`` ``int``
-``'I'`` ``unsigned_int(bitwise=True)``
-``'k'`` ``unsigned_long(bitwise=True)``
-``'K'`` ``unsigned_long_long(bitwise=True)``
-``'l'`` ``long``
-``'L'`` ``long long``
-``'n'`` ``Py_ssize_t``
-``'O'`` ``object``
-``'O!'`` ``object(subclass_of='&PySomething_Type')``
-``'O&'`` ``object(converter='name_of_c_function')``
-``'p'`` ``bool``
-``'S'`` ``PyBytesObject``
-``'s'`` ``str``
-``'s#'`` ``str(zeroes=True)``
-``'s*'`` ``Py_buffer(accept={buffer, str})``
-``'U'`` ``unicode``
-``'u'`` ``Py_UNICODE``
-``'u#'`` ``Py_UNICODE(zeroes=True)``
-``'w*'`` ``Py_buffer(accept={rwbuffer})``
-``'Y'`` ``PyByteArrayObject``
-``'y'`` ``str(accept={bytes})``
-``'y#'`` ``str(accept={robuffer}, zeroes=True)``
-``'y*'`` ``Py_buffer``
-``'Z'`` ``Py_UNICODE(accept={str, NoneType})``
-``'Z#'`` ``Py_UNICODE(accept={str, NoneType}, zeroes=True)``
-``'z'`` ``str(accept={str, NoneType})``
-``'z#'`` ``str(accept={str, NoneType}, zeroes=True)``
-``'z*'`` ``Py_buffer(accept={buffer, str, NoneType})``
-========= =================================================================================
-
-As an example, here's our sample ``pickle.Pickler.dump`` using the proper
-converter::
-
- /*[clinic input]
- pickle.Pickler.dump
-
- obj: object
- The object to be pickled.
- /
-
- Write a pickled representation of obj to the open file.
- [clinic start generated code]*/
-
-One advantage of real converters is that they're more flexible than legacy
-converters. For example, the ``unsigned_int`` converter (and all the
-``unsigned_`` converters) can be specified without ``bitwise=True``. Their
-default behavior performs range checking on the value, and they won't accept
-negative numbers. You just can't do that with a legacy converter!
-
-Argument Clinic will show you all the converters it has
-available. For each converter it'll show you all the parameters
-it accepts, along with the default value for each parameter.
-Just run ``Tools/clinic/clinic.py --converters`` to see the full list.
-
-Py_buffer
----------
-
-When using the ``Py_buffer`` converter
-(or the ``'s*'``, ``'w*'``, ``'*y'``, or ``'z*'`` legacy converters),
-you *must* not call :c:func:`PyBuffer_Release` on the provided buffer.
-Argument Clinic generates code that does it for you (in the parsing function).
-
-
-
-Advanced converters
--------------------
-
-Remember those format units you skipped for your first
-time because they were advanced? Here's how to handle those too.
-
-The trick is, all those format units take arguments—either
-conversion functions, or types, or strings specifying an encoding.
-(But "legacy converters" don't support arguments. That's why we
-skipped them for your first function.) The argument you specified
-to the format unit is now an argument to the converter; this
-argument is either ``converter`` (for ``O&``), ``subclass_of`` (for ``O!``),
-or ``encoding`` (for all the format units that start with ``e``).
-
-When using ``subclass_of``, you may also want to use the other
-custom argument for ``object()``: ``type``, which lets you set the type
-actually used for the parameter. For example, if you want to ensure
-that the object is a subclass of ``PyUnicode_Type``, you probably want
-to use the converter ``object(type='PyUnicodeObject *', subclass_of='&PyUnicode_Type')``.
-
-One possible problem with using Argument Clinic: it takes away some possible
-flexibility for the format units starting with ``e``. When writing a
-``PyArg_Parse`` call by hand, you could theoretically decide at runtime what
-encoding string to pass in to :c:func:`PyArg_ParseTuple`. But now this string must
-be hard-coded at Argument-Clinic-preprocessing-time. This limitation is deliberate;
-it made supporting this format unit much easier, and may allow for future optimizations.
-This restriction doesn't seem unreasonable; CPython itself always passes in static
-hard-coded encoding strings for parameters whose format units start with ``e``.
-
-
-.. _default_values:
-
-Parameter default values
-------------------------
-
-Default values for parameters can be any of a number of values.
-At their simplest, they can be string, int, or float literals:
-
-.. code-block:: none
-
- foo: str = "abc"
- bar: int = 123
- bat: float = 45.6
-
-They can also use any of Python's built-in constants:
-
-.. code-block:: none
-
- yep: bool = True
- nope: bool = False
- nada: object = None
-
-There's also special support for a default value of ``NULL``, and
-for simple expressions, documented in the following sections.
-
-
-The ``NULL`` default value
---------------------------
-
-For string and object parameters, you can set them to ``None`` to indicate
-that there's no default. However, that means the C variable will be
-initialized to ``Py_None``. For convenience's sakes, there's a special
-value called ``NULL`` for just this reason: from Python's perspective it
-behaves like a default value of ``None``, but the C variable is initialized
-with ``NULL``.
-
-Expressions specified as default values
----------------------------------------
-
-The default value for a parameter can be more than just a literal value.
-It can be an entire expression, using math operators and looking up attributes
-on objects. However, this support isn't exactly simple, because of some
-non-obvious semantics.
-
-Consider the following example:
-
-.. code-block:: none
-
- foo: Py_ssize_t = sys.maxsize - 1
-
-``sys.maxsize`` can have different values on different platforms. Therefore
-Argument Clinic can't simply evaluate that expression locally and hard-code it
-in C. So it stores the default in such a way that it will get evaluated at
-runtime, when the user asks for the function's signature.
-
-What namespace is available when the expression is evaluated? It's evaluated
-in the context of the module the builtin came from. So, if your module has an
-attribute called "``max_widgets``", you may simply use it:
-
-.. code-block:: none
-
- foo: Py_ssize_t = max_widgets
-
-If the symbol isn't found in the current module, it fails over to looking in
-``sys.modules``. That's how it can find ``sys.maxsize`` for example. (Since you
-don't know in advance what modules the user will load into their interpreter,
-it's best to restrict yourself to modules that are preloaded by Python itself.)
-
-Evaluating default values only at runtime means Argument Clinic can't compute
-the correct equivalent C default value. So you need to tell it explicitly.
-When you use an expression, you must also specify the equivalent expression
-in C, using the ``c_default`` parameter to the converter:
-
-.. code-block:: none
-
- foo: Py_ssize_t(c_default="PY_SSIZE_T_MAX - 1") = sys.maxsize - 1
-
-Another complication: Argument Clinic can't know in advance whether or not the
-expression you supply is valid. It parses it to make sure it looks legal, but
-it can't *actually* know. You must be very careful when using expressions to
-specify values that are guaranteed to be valid at runtime!
-
-Finally, because expressions must be representable as static C values, there
-are many restrictions on legal expressions. Here's a list of Python features
-you're not permitted to use:
-
-* Function calls.
-* Inline if statements (``3 if foo else 5``).
-* Automatic sequence unpacking (``*[1, 2, 3]``).
-* List/set/dict comprehensions and generator expressions.
-* Tuple/list/set/dict literals.
-
-
-
-Using a return converter
-------------------------
-
-By default the impl function Argument Clinic generates for you returns ``PyObject *``.
-But your C function often computes some C type, then converts it into the ``PyObject *``
-at the last moment. Argument Clinic handles converting your inputs from Python types
-into native C types—why not have it convert your return value from a native C type
-into a Python type too?
-
-That's what a "return converter" does. It changes your impl function to return
-some C type, then adds code to the generated (non-impl) function to handle converting
-that value into the appropriate ``PyObject *``.
-
-The syntax for return converters is similar to that of parameter converters.
-You specify the return converter like it was a return annotation on the
-function itself. Return converters behave much the same as parameter converters;
-they take arguments, the arguments are all keyword-only, and if you're not changing
-any of the default arguments you can omit the parentheses.
-
-(If you use both ``"as"`` *and* a return converter for your function,
-the ``"as"`` should come before the return converter.)
-
-There's one additional complication when using return converters: how do you
-indicate an error has occurred? Normally, a function returns a valid (non-``NULL``)
-pointer for success, and ``NULL`` for failure. But if you use an integer return converter,
-all integers are valid. How can Argument Clinic detect an error? Its solution: each return
-converter implicitly looks for a special value that indicates an error. If you return
-that value, and an error has been set (``PyErr_Occurred()`` returns a true
-value), then the generated code will propagate the error. Otherwise it will
-encode the value you return like normal.
-
-Currently Argument Clinic supports only a few return converters:
-
-.. code-block:: none
-
- bool
- int
- unsigned int
- long
- unsigned int
- size_t
- Py_ssize_t
- float
- double
- DecodeFSDefault
-
-None of these take parameters. For the first three, return -1 to indicate
-error. For ``DecodeFSDefault``, the return type is ``const char *``; return a ``NULL``
-pointer to indicate an error.
-
-(There's also an experimental ``NoneType`` converter, which lets you
-return ``Py_None`` on success or ``NULL`` on failure, without having
-to increment the reference count on ``Py_None``. I'm not sure it adds
-enough clarity to be worth using.)
-
-To see all the return converters Argument Clinic supports, along with
-their parameters (if any),
-just run ``Tools/clinic/clinic.py --converters`` for the full list.
-
-
-Cloning existing functions
---------------------------
-
-If you have a number of functions that look similar, you may be able to
-use Clinic's "clone" feature. When you clone an existing function,
-you reuse:
-
-* its parameters, including
-
- * their names,
-
- * their converters, with all parameters,
-
- * their default values,
-
- * their per-parameter docstrings,
-
- * their *kind* (whether they're positional only,
- positional or keyword, or keyword only), and
-
-* its return converter.
-
-The only thing not copied from the original function is its docstring;
-the syntax allows you to specify a new docstring.
-
-Here's the syntax for cloning a function::
-
- /*[clinic input]
- module.class.new_function [as c_basename] = module.class.existing_function
-
- Docstring for new_function goes here.
- [clinic start generated code]*/
-
-(The functions can be in different modules or classes. I wrote
-``module.class`` in the sample just to illustrate that you must
-use the full path to *both* functions.)
-
-Sorry, there's no syntax for partially-cloning a function, or cloning a function
-then modifying it. Cloning is an all-or nothing proposition.
-
-Also, the function you are cloning from must have been previously defined
-in the current file.
-
-Calling Python code
--------------------
-
-The rest of the advanced topics require you to write Python code
-which lives inside your C file and modifies Argument Clinic's
-runtime state. This is simple: you simply define a Python block.
-
-A Python block uses different delimiter lines than an Argument
-Clinic function block. It looks like this::
-
- /*[python input]
- # python code goes here
- [python start generated code]*/
-
-All the code inside the Python block is executed at the
-time it's parsed. All text written to stdout inside the block
-is redirected into the "output" after the block.
-
-As an example, here's a Python block that adds a static integer
-variable to the C code::
-
- /*[python input]
- print('static int __ignored_unused_variable__ = 0;')
- [python start generated code]*/
- static int __ignored_unused_variable__ = 0;
- /*[python checksum:...]*/
-
-
-Using a "self converter"
-------------------------
-
-Argument Clinic automatically adds a "self" parameter for you
-using a default converter. It automatically sets the ``type``
-of this parameter to the "pointer to an instance" you specified
-when you declared the type. However, you can override
-Argument Clinic's converter and specify one yourself.
-Just add your own ``self`` parameter as the first parameter in a
-block, and ensure that its converter is an instance of
-``self_converter`` or a subclass thereof.
-
-What's the point? This lets you override the type of ``self``,
-or give it a different default name.
-
-How do you specify the custom type you want to cast ``self`` to?
-If you only have one or two functions with the same type for ``self``,
-you can directly use Argument Clinic's existing ``self`` converter,
-passing in the type you want to use as the ``type`` parameter::
-
- /*[clinic input]
-
- _pickle.Pickler.dump
-
- self: self(type="PicklerObject *")
- obj: object
- /
-
- Write a pickled representation of the given object to the open file.
- [clinic start generated code]*/
-
-On the other hand, if you have a lot of functions that will use the same
-type for ``self``, it's best to create your own converter, subclassing
-``self_converter`` but overwriting the ``type`` member::
-
- /*[python input]
- class PicklerObject_converter(self_converter):
- type = "PicklerObject *"
- [python start generated code]*/
-
- /*[clinic input]
-
- _pickle.Pickler.dump
-
- self: PicklerObject
- obj: object
- /
-
- Write a pickled representation of the given object to the open file.
- [clinic start generated code]*/
-
-
-
-Writing a custom converter
---------------------------
-
-As we hinted at in the previous section... you can write your own converters!
-A converter is simply a Python class that inherits from ``CConverter``.
-The main purpose of a custom converter is if you have a parameter using
-the ``O&`` format unit—parsing this parameter means calling
-a :c:func:`PyArg_ParseTuple` "converter function".
-
-Your converter class should be named ``*something*_converter``.
-If the name follows this convention, then your converter class
-will be automatically registered with Argument Clinic; its name
-will be the name of your class with the ``_converter`` suffix
-stripped off. (This is accomplished with a metaclass.)
-
-You shouldn't subclass ``CConverter.__init__``. Instead, you should
-write a ``converter_init()`` function. ``converter_init()``
-always accepts a ``self`` parameter; after that, all additional
-parameters *must* be keyword-only. Any arguments passed in to
-the converter in Argument Clinic will be passed along to your
-``converter_init()``.
-
-There are some additional members of ``CConverter`` you may wish
-to specify in your subclass. Here's the current list:
-
-``type``
- The C type to use for this variable.
- ``type`` should be a Python string specifying the type, e.g. ``int``.
- If this is a pointer type, the type string should end with ``' *'``.
-
-``default``
- The Python default value for this parameter, as a Python value.
- Or the magic value ``unspecified`` if there is no default.
-
-``py_default``
- ``default`` as it should appear in Python code,
- as a string.
- Or ``None`` if there is no default.
-
-``c_default``
- ``default`` as it should appear in C code,
- as a string.
- Or ``None`` if there is no default.
-
-``c_ignored_default``
- The default value used to initialize the C variable when
- there is no default, but not specifying a default may
- result in an "uninitialized variable" warning. This can
- easily happen when using option groups—although
- properly-written code will never actually use this value,
- the variable does get passed in to the impl, and the
- C compiler will complain about the "use" of the
- uninitialized value. This value should always be a
- non-empty string.
-
-``converter``
- The name of the C converter function, as a string.
-
-``impl_by_reference``
- A boolean value. If true,
- Argument Clinic will add a ``&`` in front of the name of
- the variable when passing it into the impl function.
-
-``parse_by_reference``
- A boolean value. If true,
- Argument Clinic will add a ``&`` in front of the name of
- the variable when passing it into :c:func:`PyArg_ParseTuple`.
-
-
-Here's the simplest example of a custom converter, from ``Modules/zlibmodule.c``::
-
- /*[python input]
-
- class ssize_t_converter(CConverter):
- type = 'Py_ssize_t'
- converter = 'ssize_t_converter'
-
- [python start generated code]*/
- /*[python end generated code: output=da39a3ee5e6b4b0d input=35521e4e733823c7]*/
-
-This block adds a converter to Argument Clinic named ``ssize_t``. Parameters
-declared as ``ssize_t`` will be declared as type ``Py_ssize_t``, and will
-be parsed by the ``'O&'`` format unit, which will call the
-``ssize_t_converter`` converter function. ``ssize_t`` variables
-automatically support default values.
-
-More sophisticated custom converters can insert custom C code to
-handle initialization and cleanup.
-You can see more examples of custom converters in the CPython
-source tree; grep the C files for the string ``CConverter``.
-
-Writing a custom return converter
----------------------------------
-
-Writing a custom return converter is much like writing
-a custom converter. Except it's somewhat simpler, because return
-converters are themselves much simpler.
-
-Return converters must subclass ``CReturnConverter``.
-There are no examples yet of custom return converters,
-because they are not widely used yet. If you wish to
-write your own return converter, please read ``Tools/clinic/clinic.py``,
-specifically the implementation of ``CReturnConverter`` and
-all its subclasses.
-
-METH_O and METH_NOARGS
-----------------------------------------------
-
-To convert a function using ``METH_O``, make sure the function's
-single argument is using the ``object`` converter, and mark the
-arguments as positional-only::
-
- /*[clinic input]
- meth_o_sample
-
- argument: object
- /
- [clinic start generated code]*/
-
-
-To convert a function using ``METH_NOARGS``, just don't specify
-any arguments.
-
-You can still use a self converter, a return converter, and specify
-a ``type`` argument to the object converter for ``METH_O``.
-
-tp_new and tp_init functions
-----------------------------------------------
-
-You can convert ``tp_new`` and ``tp_init`` functions. Just name
-them ``__new__`` or ``__init__`` as appropriate. Notes:
-
-* The function name generated for ``__new__`` doesn't end in ``__new__``
- like it would by default. It's just the name of the class, converted
- into a valid C identifier.
-
-* No ``PyMethodDef`` ``#define`` is generated for these functions.
-
-* ``__init__`` functions return ``int``, not ``PyObject *``.
-
-* Use the docstring as the class docstring.
-
-* Although ``__new__`` and ``__init__`` functions must always
- accept both the ``args`` and ``kwargs`` objects, when converting
- you may specify any signature for these functions that you like.
- (If your function doesn't support keywords, the parsing function
- generated will throw an exception if it receives any.)
-
-Changing and redirecting Clinic's output
-----------------------------------------
-
-It can be inconvenient to have Clinic's output interspersed with
-your conventional hand-edited C code. Luckily, Clinic is configurable:
-you can buffer up its output for printing later (or earlier!), or write
-its output to a separate file. You can also add a prefix or suffix to
-every line of Clinic's generated output.
-
-While changing Clinic's output in this manner can be a boon to readability,
-it may result in Clinic code using types before they are defined, or
-your code attempting to use Clinic-generated code before it is defined.
-These problems can be easily solved by rearranging the declarations in your file,
-or moving where Clinic's generated code goes. (This is why the default behavior
-of Clinic is to output everything into the current block; while many people
-consider this hampers readability, it will never require rearranging your
-code to fix definition-before-use problems.)
-
-Let's start with defining some terminology:
-
-*field*
- A field, in this context, is a subsection of Clinic's output.
- For example, the ``#define`` for the ``PyMethodDef`` structure
- is a field, called ``methoddef_define``. Clinic has seven
- different fields it can output per function definition:
-
- .. code-block:: none
-
- docstring_prototype
- docstring_definition
- methoddef_define
- impl_prototype
- parser_prototype
- parser_definition
- impl_definition
-
- All the names are of the form ``"<a>_<b>"``,
- where ``"<a>"`` is the semantic object represented (the parsing function,
- the impl function, the docstring, or the methoddef structure) and ``"<b>"``
- represents what kind of statement the field is. Field names that end in
- ``"_prototype"``
- represent forward declarations of that thing, without the actual body/data
- of the thing; field names that end in ``"_definition"`` represent the actual
- definition of the thing, with the body/data of the thing. (``"methoddef"``
- is special, it's the only one that ends with ``"_define"``, representing that
- it's a preprocessor #define.)
-
-*destination*
- A destination is a place Clinic can write output to. There are
- five built-in destinations:
-
- ``block``
- The default destination: printed in the output section of
- the current Clinic block.
-
- ``buffer``
- A text buffer where you can save text for later. Text sent
- here is appended to the end of any existing text. It's an
- error to have any text left in the buffer when Clinic finishes
- processing a file.
-
- ``file``
- A separate "clinic file" that will be created automatically by Clinic.
- The filename chosen for the file is ``{basename}.clinic{extension}``,
- where ``basename`` and ``extension`` were assigned the output
- from ``os.path.splitext()`` run on the current file. (Example:
- the ``file`` destination for ``_pickle.c`` would be written to
- ``_pickle.clinic.c``.)
-
- **Important: When using a** ``file`` **destination, you**
- *must check in* **the generated file!**
-
- ``two-pass``
- A buffer like ``buffer``. However, a two-pass buffer can only
- be dumped once, and it prints out all text sent to it during
- all processing, even from Clinic blocks *after* the dumping point.
-
- ``suppress``
- The text is suppressed—thrown away.
-
-
-Clinic defines five new directives that let you reconfigure its output.
-
-The first new directive is ``dump``:
-
-.. code-block:: none
-
- dump <destination>
-
-This dumps the current contents of the named destination into the output of
-the current block, and empties it. This only works with ``buffer`` and
-``two-pass`` destinations.
-
-The second new directive is ``output``. The most basic form of ``output``
-is like this:
-
-.. code-block:: none
-
- output <field> <destination>
-
-This tells Clinic to output *field* to *destination*. ``output`` also
-supports a special meta-destination, called ``everything``, which tells
-Clinic to output *all* fields to that *destination*.
-
-``output`` has a number of other functions:
-
-.. code-block:: none
-
- output push
- output pop
- output preset <preset>
-
-
-``output push`` and ``output pop`` allow you to push and pop
-configurations on an internal configuration stack, so that you
-can temporarily modify the output configuration, then easily restore
-the previous configuration. Simply push before your change to save
-the current configuration, then pop when you wish to restore the
-previous configuration.
-
-``output preset`` sets Clinic's output to one of several built-in
-preset configurations, as follows:
-
- ``block``
- Clinic's original starting configuration. Writes everything
- immediately after the input block.
-
- Suppress the ``parser_prototype``
- and ``docstring_prototype``, write everything else to ``block``.
-
- ``file``
- Designed to write everything to the "clinic file" that it can.
- You then ``#include`` this file near the top of your file.
- You may need to rearrange your file to make this work, though
- usually this just means creating forward declarations for various
- ``typedef`` and ``PyTypeObject`` definitions.
-
- Suppress the ``parser_prototype``
- and ``docstring_prototype``, write the ``impl_definition`` to
- ``block``, and write everything else to ``file``.
-
- The default filename is ``"{dirname}/clinic/{basename}.h"``.
-
- ``buffer``
- Save up most of the output from Clinic, to be written into
- your file near the end. For Python files implementing modules
- or builtin types, it's recommended that you dump the buffer
- just above the static structures for your module or
- builtin type; these are normally very near the end. Using
- ``buffer`` may require even more editing than ``file``, if
- your file has static ``PyMethodDef`` arrays defined in the
- middle of the file.
-
- Suppress the ``parser_prototype``, ``impl_prototype``,
- and ``docstring_prototype``, write the ``impl_definition`` to
- ``block``, and write everything else to ``file``.
-
- ``two-pass``
- Similar to the ``buffer`` preset, but writes forward declarations to
- the ``two-pass`` buffer, and definitions to the ``buffer``.
- This is similar to the ``buffer`` preset, but may require
- less editing than ``buffer``. Dump the ``two-pass`` buffer
- near the top of your file, and dump the ``buffer`` near
- the end just like you would when using the ``buffer`` preset.
-
- Suppresses the ``impl_prototype``, write the ``impl_definition``
- to ``block``, write ``docstring_prototype``, ``methoddef_define``,
- and ``parser_prototype`` to ``two-pass``, write everything else
- to ``buffer``.
-
- ``partial-buffer``
- Similar to the ``buffer`` preset, but writes more things to ``block``,
- only writing the really big chunks of generated code to ``buffer``.
- This avoids the definition-before-use problem of ``buffer`` completely,
- at the small cost of having slightly more stuff in the block's output.
- Dump the ``buffer`` near the end, just like you would when using
- the ``buffer`` preset.
-
- Suppresses the ``impl_prototype``, write the ``docstring_definition``
- and ``parser_definition`` to ``buffer``, write everything else to ``block``.
-
-The third new directive is ``destination``:
-
-.. code-block:: none
-
- destination <name> <command> [...]
-
-This performs an operation on the destination named ``name``.
-
-There are two defined subcommands: ``new`` and ``clear``.
-
-The ``new`` subcommand works like this:
-
-.. code-block:: none
-
- destination <name> new <type>
-
-This creates a new destination with name ``<name>`` and type ``<type>``.
-
-There are five destination types:
-
- ``suppress``
- Throws the text away.
-
- ``block``
- Writes the text to the current block. This is what Clinic
- originally did.
-
- ``buffer``
- A simple text buffer, like the "buffer" builtin destination above.
-
- ``file``
- A text file. The file destination takes an extra argument,
- a template to use for building the filename, like so:
-
- destination <name> new <type> <file_template>
-
- The template can use three strings internally that will be replaced
- by bits of the filename:
-
- {path}
- The full path to the file, including directory and full filename.
- {dirname}
- The name of the directory the file is in.
- {basename}
- Just the name of the file, not including the directory.
- {basename_root}
- Basename with the extension clipped off
- (everything up to but not including the last '.').
- {basename_extension}
- The last '.' and everything after it. If the basename
- does not contain a period, this will be the empty string.
-
- If there are no periods in the filename, {basename} and {filename}
- are the same, and {extension} is empty. "{basename}{extension}"
- is always exactly the same as "{filename}"."
-
- ``two-pass``
- A two-pass buffer, like the "two-pass" builtin destination above.
-
-
-The ``clear`` subcommand works like this:
-
-.. code-block:: none
-
- destination <name> clear
-
-It removes all the accumulated text up to this point in the destination.
-(I don't know what you'd need this for, but I thought maybe it'd be
-useful while someone's experimenting.)
-
-The fourth new directive is ``set``:
-
-.. code-block:: none
-
- set line_prefix "string"
- set line_suffix "string"
-
-``set`` lets you set two internal variables in Clinic.
-``line_prefix`` is a string that will be prepended to every line of Clinic's output;
-``line_suffix`` is a string that will be appended to every line of Clinic's output.
-
-Both of these support two format strings:
-
- ``{block comment start}``
- Turns into the string ``/*``, the start-comment text sequence for C files.
-
- ``{block comment end}``
- Turns into the string ``*/``, the end-comment text sequence for C files.
-
-The final new directive is one you shouldn't need to use directly,
-called ``preserve``:
-
-.. code-block:: none
-
- preserve
-
-This tells Clinic that the current contents of the output should be kept, unmodified.
-This is used internally by Clinic when dumping output into ``file`` files; wrapping
-it in a Clinic block lets Clinic use its existing checksum functionality to ensure
-the file was not modified by hand before it gets overwritten.
-
-
-The #ifdef trick
-----------------------------------------------
-
-If you're converting a function that isn't available on all platforms,
-there's a trick you can use to make life a little easier. The existing
-code probably looks like this::
-
- #ifdef HAVE_FUNCTIONNAME
- static module_functionname(...)
- {
- ...
- }
- #endif /* HAVE_FUNCTIONNAME */
-
-And then in the ``PyMethodDef`` structure at the bottom the existing code
-will have:
-
-.. code-block:: none
-
- #ifdef HAVE_FUNCTIONNAME
- {'functionname', ... },
- #endif /* HAVE_FUNCTIONNAME */
-
-In this scenario, you should enclose the body of your impl function inside the ``#ifdef``,
-like so::
-
- #ifdef HAVE_FUNCTIONNAME
- /*[clinic input]
- module.functionname
- ...
- [clinic start generated code]*/
- static module_functionname(...)
- {
- ...
- }
- #endif /* HAVE_FUNCTIONNAME */
-
-Then, remove those three lines from the ``PyMethodDef`` structure,
-replacing them with the macro Argument Clinic generated:
-
-.. code-block:: none
-
- MODULE_FUNCTIONNAME_METHODDEF
-
-(You can find the real name for this macro inside the generated code.
-Or you can calculate it yourself: it's the name of your function as defined
-on the first line of your block, but with periods changed to underscores,
-uppercased, and ``"_METHODDEF"`` added to the end.)
-
-Perhaps you're wondering: what if ``HAVE_FUNCTIONNAME`` isn't defined?
-The ``MODULE_FUNCTIONNAME_METHODDEF`` macro won't be defined either!
-
-Here's where Argument Clinic gets very clever. It actually detects that the
-Argument Clinic block might be deactivated by the ``#ifdef``. When that
-happens, it generates a little extra code that looks like this::
-
- #ifndef MODULE_FUNCTIONNAME_METHODDEF
- #define MODULE_FUNCTIONNAME_METHODDEF
- #endif /* !defined(MODULE_FUNCTIONNAME_METHODDEF) */
-
-That means the macro always works. If the function is defined, this turns
-into the correct structure, including the trailing comma. If the function is
-undefined, this turns into nothing.
-
-However, this causes one ticklish problem: where should Argument Clinic put this
-extra code when using the "block" output preset? It can't go in the output block,
-because that could be deactivated by the ``#ifdef``. (That's the whole point!)
-
-In this situation, Argument Clinic writes the extra code to the "buffer" destination.
-This may mean that you get a complaint from Argument Clinic:
-
-.. code-block:: none
-
- Warning in file "Modules/posixmodule.c" on line 12357:
- Destination buffer 'buffer' not empty at end of file, emptying.
-
-When this happens, just open your file, find the ``dump buffer`` block that
-Argument Clinic added to your file (it'll be at the very bottom), then
-move it above the ``PyMethodDef`` structure where that macro is used.
-
-
-
-Using Argument Clinic in Python files
--------------------------------------
-
-It's actually possible to use Argument Clinic to preprocess Python files.
-There's no point to using Argument Clinic blocks, of course, as the output
-wouldn't make any sense to the Python interpreter. But using Argument Clinic
-to run Python blocks lets you use Python as a Python preprocessor!
-
-Since Python comments are different from C comments, Argument Clinic
-blocks embedded in Python files look slightly different. They look like this:
-
-.. code-block:: python3
-
- #/*[python input]
- #print("def foo(): pass")
- #[python start generated code]*/
- def foo(): pass
- #/*[python checksum:...]*/
diff --git a/Doc/howto/cporting.rst b/Doc/howto/cporting.rst
index ce7700f..7cacb0a 100644
--- a/Doc/howto/cporting.rst
+++ b/Doc/howto/cporting.rst
@@ -1,4 +1,4 @@
-.. highlight:: c
+.. highlightlang:: c
.. _cporting-howto:
@@ -6,21 +6,252 @@
Porting Extension Modules to Python 3
*************************************
-We recommend the following resources for porting extension modules to Python 3:
-
-* The `Migrating C extensions`_ chapter from
- *Supporting Python 3: An in-depth guide*, a book on moving from Python 2
- to Python 3 in general, guides the reader through porting an extension
- module.
-* The `Porting guide`_ from the *py3c* project provides opinionated
- suggestions with supporting code.
-* The `Cython`_ and `CFFI`_ libraries offer abstractions over
- Python's C API.
- Extensions generally need to be re-written to use one of them,
- but the library then handles differences between various Python
- versions and implementations.
-
-.. _Migrating C extensions: http://python3porting.com/cextensions.html
-.. _Porting guide: https://py3c.readthedocs.io/en/latest/guide.html
-.. _Cython: http://cython.org/
-.. _CFFI: https://cffi.readthedocs.io/en/latest/
+:author: Benjamin Peterson
+
+
+.. topic:: Abstract
+
+ Although changing the C-API was not one of Python 3's objectives,
+ the many Python-level changes made leaving Python 2's API intact
+ impossible. In fact, some changes such as :func:`int` and
+ :func:`long` unification are more obvious on the C level. This
+ document endeavors to document incompatibilities and how they can
+ be worked around.
+
+
+Conditional compilation
+=======================
+
+The easiest way to compile only some code for Python 3 is to check
+if :c:macro:`PY_MAJOR_VERSION` is greater than or equal to 3. ::
+
+ #if PY_MAJOR_VERSION >= 3
+ #define IS_PY3K
+ #endif
+
+API functions that are not present can be aliased to their equivalents within
+conditional blocks.
+
+
+Changes to Object APIs
+======================
+
+Python 3 merged together some types with similar functions while cleanly
+separating others.
+
+
+str/unicode Unification
+-----------------------
+
+Python 3's :func:`str` type is equivalent to Python 2's :func:`unicode`; the C
+functions are called ``PyUnicode_*`` for both. The old 8-bit string type has become
+:func:`bytes`, with C functions called ``PyBytes_*``. Python 2.6 and later provide a compatibility header,
+:file:`bytesobject.h`, mapping ``PyBytes`` names to ``PyString`` ones. For best
+compatibility with Python 3, :c:type:`PyUnicode` should be used for textual data and
+:c:type:`PyBytes` for binary data. It's also important to remember that
+:c:type:`PyBytes` and :c:type:`PyUnicode` in Python 3 are not interchangeable like
+:c:type:`PyString` and :c:type:`PyUnicode` are in Python 2. The following example
+shows best practices with regards to :c:type:`PyUnicode`, :c:type:`PyString`,
+and :c:type:`PyBytes`. ::
+
+ #include "stdlib.h"
+ #include "Python.h"
+ #include "bytesobject.h"
+
+ /* text example */
+ static PyObject *
+ say_hello(PyObject *self, PyObject *args) {
+ PyObject *name, *result;
+
+ if (!PyArg_ParseTuple(args, "U:say_hello", &name))
+ return NULL;
+
+ result = PyUnicode_FromFormat("Hello, %S!", name);
+ return result;
+ }
+
+ /* just a forward */
+ static char * do_encode(PyObject *);
+
+ /* bytes example */
+ static PyObject *
+ encode_object(PyObject *self, PyObject *args) {
+ char *encoded;
+ PyObject *result, *myobj;
+
+ if (!PyArg_ParseTuple(args, "O:encode_object", &myobj))
+ return NULL;
+
+ encoded = do_encode(myobj);
+ if (encoded == NULL)
+ return NULL;
+ result = PyBytes_FromString(encoded);
+ free(encoded);
+ return result;
+ }
+
+
+long/int Unification
+--------------------
+
+Python 3 has only one integer type, :func:`int`. But it actually
+corresponds to Python 2's :func:`long` type—the :func:`int` type
+used in Python 2 was removed. In the C-API, ``PyInt_*`` functions
+are replaced by their ``PyLong_*`` equivalents.
+
+
+Module initialization and state
+===============================
+
+Python 3 has a revamped extension module initialization system. (See
+:pep:`3121`.) Instead of storing module state in globals, they should
+be stored in an interpreter specific structure. Creating modules that
+act correctly in both Python 2 and Python 3 is tricky. The following
+simple example demonstrates how. ::
+
+ #include "Python.h"
+
+ struct module_state {
+ PyObject *error;
+ };
+
+ #if PY_MAJOR_VERSION >= 3
+ #define GETSTATE(m) ((struct module_state*)PyModule_GetState(m))
+ #else
+ #define GETSTATE(m) (&_state)
+ static struct module_state _state;
+ #endif
+
+ static PyObject *
+ error_out(PyObject *m) {
+ struct module_state *st = GETSTATE(m);
+ PyErr_SetString(st->error, "something bad happened");
+ return NULL;
+ }
+
+ static PyMethodDef myextension_methods[] = {
+ {"error_out", (PyCFunction)error_out, METH_NOARGS, NULL},
+ {NULL, NULL}
+ };
+
+ #if PY_MAJOR_VERSION >= 3
+
+ static int myextension_traverse(PyObject *m, visitproc visit, void *arg) {
+ Py_VISIT(GETSTATE(m)->error);
+ return 0;
+ }
+
+ static int myextension_clear(PyObject *m) {
+ Py_CLEAR(GETSTATE(m)->error);
+ return 0;
+ }
+
+
+ static struct PyModuleDef moduledef = {
+ PyModuleDef_HEAD_INIT,
+ "myextension",
+ NULL,
+ sizeof(struct module_state),
+ myextension_methods,
+ NULL,
+ myextension_traverse,
+ myextension_clear,
+ NULL
+ };
+
+ #define INITERROR return NULL
+
+ PyMODINIT_FUNC
+ PyInit_myextension(void)
+
+ #else
+ #define INITERROR return
+
+ void
+ initmyextension(void)
+ #endif
+ {
+ #if PY_MAJOR_VERSION >= 3
+ PyObject *module = PyModule_Create(&moduledef);
+ #else
+ PyObject *module = Py_InitModule("myextension", myextension_methods);
+ #endif
+
+ if (module == NULL)
+ INITERROR;
+ struct module_state *st = GETSTATE(module);
+
+ st->error = PyErr_NewException("myextension.Error", NULL, NULL);
+ if (st->error == NULL) {
+ Py_DECREF(module);
+ INITERROR;
+ }
+
+ #if PY_MAJOR_VERSION >= 3
+ return module;
+ #endif
+ }
+
+
+CObject replaced with Capsule
+=============================
+
+The :c:type:`Capsule` object was introduced in Python 3.1 and 2.7 to replace
+:c:type:`CObject`. CObjects were useful,
+but the :c:type:`CObject` API was problematic: it didn't permit distinguishing
+between valid CObjects, which allowed mismatched CObjects to crash the
+interpreter, and some of its APIs relied on undefined behavior in C.
+(For further reading on the rationale behind Capsules, please see :issue:`5630`.)
+
+If you're currently using CObjects, and you want to migrate to 3.1 or newer,
+you'll need to switch to Capsules.
+:c:type:`CObject` was deprecated in 3.1 and 2.7 and completely removed in
+Python 3.2. If you only support 2.7, or 3.1 and above, you
+can simply switch to :c:type:`Capsule`. If you need to support Python 3.0,
+or versions of Python earlier than 2.7,
+you'll have to support both CObjects and Capsules.
+(Note that Python 3.0 is no longer supported, and it is not recommended
+for production use.)
+
+The following example header file :file:`capsulethunk.h` may
+solve the problem for you. Simply write your code against the
+:c:type:`Capsule` API and include this header file after
+:file:`Python.h`. Your code will automatically use Capsules
+in versions of Python with Capsules, and switch to CObjects
+when Capsules are unavailable.
+
+:file:`capsulethunk.h` simulates Capsules using CObjects. However,
+:c:type:`CObject` provides no place to store the capsule's "name". As a
+result the simulated :c:type:`Capsule` objects created by :file:`capsulethunk.h`
+behave slightly differently from real Capsules. Specifically:
+
+ * The name parameter passed in to :c:func:`PyCapsule_New` is ignored.
+
+ * The name parameter passed in to :c:func:`PyCapsule_IsValid` and
+ :c:func:`PyCapsule_GetPointer` is ignored, and no error checking
+ of the name is performed.
+
+ * :c:func:`PyCapsule_GetName` always returns NULL.
+
+ * :c:func:`PyCapsule_SetName` always raises an exception and
+ returns failure. (Since there's no way to store a name
+ in a CObject, noisy failure of :c:func:`PyCapsule_SetName`
+ was deemed preferable to silent failure here. If this is
+ inconvenient, feel free to modify your local
+ copy as you see fit.)
+
+You can find :file:`capsulethunk.h` in the Python source distribution
+as :source:`Doc/includes/capsulethunk.h`. We also include it here for
+your convenience:
+
+.. literalinclude:: ../includes/capsulethunk.h
+
+
+
+Other options
+=============
+
+If you are writing a new extension module, you might consider `Cython
+<http://cython.org/>`_. It translates a Python-like language to C. The
+extension modules it creates are compatible with Python 3 and Python 2.
+
diff --git a/Doc/howto/curses.rst b/Doc/howto/curses.rst
index cc4b478..47585f6 100644
--- a/Doc/howto/curses.rst
+++ b/Doc/howto/curses.rst
@@ -5,43 +5,39 @@
**********************************
:Author: A.M. Kuchling, Eric S. Raymond
-:Release: 2.04
+:Release: 2.03
.. topic:: Abstract
- This document describes how to use the :mod:`curses` extension
- module to control text-mode displays.
+ This document describes how to write text-mode programs with Python 2.x, using
+ the :mod:`curses` extension module to control the display.
What is curses?
===============
The curses library supplies a terminal-independent screen-painting and
-keyboard-handling facility for text-based terminals; such terminals
-include VT100s, the Linux console, and the simulated terminal provided
-by various programs. Display terminals support various control codes
-to perform common operations such as moving the cursor, scrolling the
-screen, and erasing areas. Different terminals use widely differing
-codes, and often have their own minor quirks.
-
-In a world of graphical displays, one might ask "why bother"? It's
-true that character-cell display terminals are an obsolete technology,
-but there are niches in which being able to do fancy things with them
-are still valuable. One niche is on small-footprint or embedded
-Unixes that don't run an X server. Another is tools such as OS
-installers and kernel configurators that may have to run before any
-graphical support is available.
-
-The curses library provides fairly basic functionality, providing the
-programmer with an abstraction of a display containing multiple
-non-overlapping windows of text. The contents of a window can be
-changed in various ways---adding text, erasing it, changing its
-appearance---and the curses library will figure out what control codes
-need to be sent to the terminal to produce the right output. curses
-doesn't provide many user-interface concepts such as buttons, checkboxes,
-or dialogs; if you need such features, consider a user interface library such as
-`Urwid <https://pypi.org/project/urwid/>`_.
+keyboard-handling facility for text-based terminals; such terminals include
+VT100s, the Linux console, and the simulated terminal provided by X11 programs
+such as xterm and rxvt. Display terminals support various control codes to
+perform common operations such as moving the cursor, scrolling the screen, and
+erasing areas. Different terminals use widely differing codes, and often have
+their own minor quirks.
+
+In a world of X displays, one might ask "why bother"? It's true that
+character-cell display terminals are an obsolete technology, but there are
+niches in which being able to do fancy things with them are still valuable. One
+is on small-footprint or embedded Unixes that don't carry an X server. Another
+is for tools like OS installers and kernel configurators that may have to run
+before X is available.
+
+The curses library hides all the details of different terminals, and provides
+the programmer with an abstraction of a display, containing multiple
+non-overlapping windows. The contents of a window can be changed in various
+ways---adding text, erasing it, changing its appearance---and the curses library
+will automagically figure out what control codes need to be sent to the terminal
+to produce the right output.
The curses library was originally written for BSD Unix; the later System V
versions of Unix from AT&T added many enhancements and new functions. BSD curses
@@ -53,27 +49,23 @@ code, all the functions described here will probably be available. The older
versions of curses carried by some proprietary Unixes may not support
everything, though.
-The Windows version of Python doesn't include the :mod:`curses`
-module. A ported version called `UniCurses
-<https://pypi.org/project/UniCurses>`_ is available. You could
-also try `the Console module <http://effbot.org/zone/console-index.htm>`_
-written by Fredrik Lundh, which doesn't
-use the same API as curses but provides cursor-addressable text output
-and full support for mouse and keyboard input.
+No one has made a Windows port of the curses module. On a Windows platform, try
+the Console module written by Fredrik Lundh. The Console module provides
+cursor-addressable text output, plus full support for mouse and keyboard input,
+and is available from http://effbot.org/zone/console-index.htm.
The Python curses module
------------------------
-The Python module is a fairly simple wrapper over the C functions provided by
+Thy Python module is a fairly simple wrapper over the C functions provided by
curses; if you're already familiar with curses programming in C, it's really
easy to transfer that knowledge to Python. The biggest difference is that the
-Python interface makes things simpler by merging different C functions such as
-:c:func:`addstr`, :c:func:`mvaddstr`, and :c:func:`mvwaddstr` into a single
-:meth:`~curses.window.addstr` method. You'll see this covered in more
-detail later.
+Python interface makes things simpler, by merging different C functions such as
+:func:`addstr`, :func:`mvaddstr`, :func:`mvwaddstr`, into a single
+:meth:`addstr` method. You'll see this covered in more detail later.
-This HOWTO is an introduction to writing text-mode programs with curses
+This HOWTO is simply an introduction to writing text-mode programs with curses
and Python. It doesn't attempt to be a complete guide to the curses API; for
that, see the Python library guide's section on ncurses, and the C manual pages
for ncurses. It will, however, give you the basic ideas.
@@ -82,27 +74,25 @@ for ncurses. It will, however, give you the basic ideas.
Starting and ending a curses application
========================================
-Before doing anything, curses must be initialized. This is done by
-calling the :func:`~curses.initscr` function, which will determine the
-terminal type, send any required setup codes to the terminal, and
-create various internal data structures. If successful,
-:func:`initscr` returns a window object representing the entire
-screen; this is usually called ``stdscr`` after the name of the
+Before doing anything, curses must be initialized. This is done by calling the
+:func:`initscr` function, which will determine the terminal type, send any
+required setup codes to the terminal, and create various internal data
+structures. If successful, :func:`initscr` returns a window object representing
+the entire screen; this is usually called ``stdscr``, after the name of the
corresponding C variable. ::
import curses
stdscr = curses.initscr()
-Usually curses applications turn off automatic echoing of keys to the
-screen, in order to be able to read keys and only display them under
-certain circumstances. This requires calling the
-:func:`~curses.noecho` function. ::
+Usually curses applications turn off automatic echoing of keys to the screen, in
+order to be able to read keys and only display them under certain circumstances.
+This requires calling the :func:`noecho` function. ::
curses.noecho()
-Applications will also commonly need to react to keys instantly,
-without requiring the Enter key to be pressed; this is called cbreak
-mode, as opposed to the usual buffered input mode. ::
+Applications will also commonly need to react to keys instantly, without
+requiring the Enter key to be pressed; this is called cbreak mode, as opposed to
+the usual buffered input mode. ::
curses.cbreak()
@@ -113,18 +103,15 @@ curses can do it for you, returning a special value such as
:const:`curses.KEY_LEFT`. To get curses to do the job, you'll have to enable
keypad mode. ::
- stdscr.keypad(True)
+ stdscr.keypad(1)
Terminating a curses application is much easier than starting one. You'll need
-to call::
+to call ::
- curses.nocbreak()
- stdscr.keypad(False)
- curses.echo()
+ curses.nocbreak(); stdscr.keypad(0); curses.echo()
-to reverse the curses-friendly terminal settings. Then call the
-:func:`~curses.endwin` function to restore the terminal to its original
-operating mode. ::
+to reverse the curses-friendly terminal settings. Then call the :func:`endwin`
+function to restore the terminal to its original operating mode. ::
curses.endwin()
@@ -135,147 +122,103 @@ raises an uncaught exception. Keys are no longer echoed to the screen when
you type them, for example, which makes using the shell difficult.
In Python you can avoid these complications and make debugging much easier by
-importing the :func:`curses.wrapper` function and using it like this::
-
- from curses import wrapper
-
- def main(stdscr):
- # Clear screen
- stdscr.clear()
-
- # This raises ZeroDivisionError when i == 10.
- for i in range(0, 11):
- v = i-10
- stdscr.addstr(i, 0, '10 divided by {} is {}'.format(v, 10/v))
-
- stdscr.refresh()
- stdscr.getkey()
-
- wrapper(main)
-
-The :func:`~curses.wrapper` function takes a callable object and does the
-initializations described above, also initializing colors if color
-support is present. :func:`wrapper` then runs your provided callable.
-Once the callable returns, :func:`wrapper` will restore the original
-state of the terminal. The callable is called inside a
-:keyword:`try`...\ :keyword:`except` that catches exceptions, restores
-the state of the terminal, and then re-raises the exception. Therefore
-your terminal won't be left in a funny state on exception and you'll be
-able to read the exception's message and traceback.
+importing the :func:`curses.wrapper` function. It takes a callable and does
+the initializations described above, also initializing colors if color support
+is present. It then runs your provided callable and finally deinitializes
+appropriately. The callable is called inside a try-catch clause which catches
+exceptions, performs curses deinitialization, and then passes the exception
+upwards. Thus, your terminal won't be left in a funny state on exception.
Windows and Pads
================
Windows are the basic abstraction in curses. A window object represents a
-rectangular area of the screen, and supports methods to display text,
+rectangular area of the screen, and supports various methods to display text,
erase it, allow the user to input strings, and so forth.
-The ``stdscr`` object returned by the :func:`~curses.initscr` function is a
-window object that covers the entire screen. Many programs may need
-only this single window, but you might wish to divide the screen into
-smaller windows, in order to redraw or clear them separately. The
-:func:`~curses.newwin` function creates a new window of a given size,
-returning the new window object. ::
+The ``stdscr`` object returned by the :func:`initscr` function is a window
+object that covers the entire screen. Many programs may need only this single
+window, but you might wish to divide the screen into smaller windows, in order
+to redraw or clear them separately. The :func:`newwin` function creates a new
+window of a given size, returning the new window object. ::
begin_x = 20; begin_y = 7
height = 5; width = 40
win = curses.newwin(height, width, begin_y, begin_x)
-Note that the coordinate system used in curses is unusual.
-Coordinates are always passed in the order *y,x*, and the top-left
-corner of a window is coordinate (0,0). This breaks the normal
-convention for handling coordinates where the *x* coordinate comes
-first. This is an unfortunate difference from most other computer
-applications, but it's been part of curses since it was first written,
-and it's too late to change things now.
-
-Your application can determine the size of the screen by using the
-:data:`curses.LINES` and :data:`curses.COLS` variables to obtain the *y* and
-*x* sizes. Legal coordinates will then extend from ``(0,0)`` to
-``(curses.LINES - 1, curses.COLS - 1)``.
-
-When you call a method to display or erase text, the effect doesn't
-immediately show up on the display. Instead you must call the
-:meth:`~curses.window.refresh` method of window objects to update the
-screen.
-
-This is because curses was originally written with slow 300-baud
-terminal connections in mind; with these terminals, minimizing the
-time required to redraw the screen was very important. Instead curses
-accumulates changes to the screen and displays them in the most
-efficient manner when you call :meth:`refresh`. For example, if your
-program displays some text in a window and then clears the window,
-there's no need to send the original text because they're never
-visible.
-
-In practice, explicitly telling curses to redraw a window doesn't
+A word about the coordinate system used in curses: coordinates are always passed
+in the order *y,x*, and the top-left corner of a window is coordinate (0,0).
+This breaks a common convention for handling coordinates, where the *x*
+coordinate usually comes first. This is an unfortunate difference from most
+other computer applications, but it's been part of curses since it was first
+written, and it's too late to change things now.
+
+When you call a method to display or erase text, the effect doesn't immediately
+show up on the display. This is because curses was originally written with slow
+300-baud terminal connections in mind; with these terminals, minimizing the time
+required to redraw the screen is very important. This lets curses accumulate
+changes to the screen, and display them in the most efficient manner. For
+example, if your program displays some characters in a window, and then clears
+the window, there's no need to send the original characters because they'd never
+be visible.
+
+Accordingly, curses requires that you explicitly tell it to redraw windows,
+using the :func:`refresh` method of window objects. In practice, this doesn't
really complicate programming with curses much. Most programs go into a flurry
of activity, and then pause waiting for a keypress or some other action on the
part of the user. All you have to do is to be sure that the screen has been
-redrawn before pausing to wait for user input, by first calling
-``stdscr.refresh()`` or the :meth:`refresh` method of some other relevant
+redrawn before pausing to wait for user input, by simply calling
+``stdscr.refresh()`` or the :func:`refresh` method of some other relevant
window.
A pad is a special case of a window; it can be larger than the actual display
-screen, and only a portion of the pad displayed at a time. Creating a pad
+screen, and only a portion of it displayed at a time. Creating a pad simply
requires the pad's height and width, while refreshing a pad requires giving the
coordinates of the on-screen area where a subsection of the pad will be
-displayed. ::
+displayed. ::
pad = curses.newpad(100, 100)
- # These loops fill the pad with letters; addch() is
+ # These loops fill the pad with letters; this is
# explained in the next section
- for y in range(0, 99):
- for x in range(0, 99):
- pad.addch(y,x, ord('a') + (x*x+y*y) % 26)
-
- # Displays a section of the pad in the middle of the screen.
- # (0,0) : coordinate of upper-left corner of pad area to display.
- # (5,5) : coordinate of upper-left corner of window area to be filled
- # with pad content.
- # (20, 75) : coordinate of lower-right corner of window area to be
- # : filled with pad content.
- pad.refresh( 0,0, 5,5, 20,75)
-
-The :meth:`refresh` call displays a section of the pad in the rectangle
+ for y in range(0, 100):
+ for x in range(0, 100):
+ try:
+ pad.addch(y,x, ord('a') + (x*x+y*y) % 26)
+ except curses.error:
+ pass
+
+ # Displays a section of the pad in the middle of the screen
+ pad.refresh(0,0, 5,5, 20,75)
+
+The :func:`refresh` call displays a section of the pad in the rectangle
extending from coordinate (5,5) to coordinate (20,75) on the screen; the upper
left corner of the displayed section is coordinate (0,0) on the pad. Beyond
that difference, pads are exactly like ordinary windows and support the same
methods.
-If you have multiple windows and pads on screen there is a more
-efficient way to update the screen and prevent annoying screen flicker
-as each part of the screen gets updated. :meth:`refresh` actually
-does two things:
-
-1) Calls the :meth:`~curses.window.noutrefresh` method of each window
- to update an underlying data structure representing the desired
- state of the screen.
-2) Calls the function :func:`~curses.doupdate` function to change the
- physical screen to match the desired state recorded in the data structure.
-
-Instead you can call :meth:`noutrefresh` on a number of windows to
-update the data structure, and then call :func:`doupdate` to update
-the screen.
+If you have multiple windows and pads on screen there is a more efficient way to
+go, which will prevent annoying screen flicker at refresh time. Use the
+:meth:`noutrefresh` method of each window to update the data structure
+representing the desired state of the screen; then change the physical screen to
+match the desired state in one go with the function :func:`doupdate`. The
+normal :meth:`refresh` method calls :func:`doupdate` as its last act.
Displaying Text
===============
-From a C programmer's point of view, curses may sometimes look like a
-twisty maze of functions, all subtly different. For example,
-:c:func:`addstr` displays a string at the current cursor location in
-the ``stdscr`` window, while :c:func:`mvaddstr` moves to a given y,x
-coordinate first before displaying the string. :c:func:`waddstr` is just
-like :c:func:`addstr`, but allows specifying a window to use instead of
-using ``stdscr`` by default. :c:func:`mvwaddstr` allows specifying both
-a window and a coordinate.
+From a C programmer's point of view, curses may sometimes look like a twisty
+maze of functions, all subtly different. For example, :func:`addstr` displays a
+string at the current cursor location in the ``stdscr`` window, while
+:func:`mvaddstr` moves to a given y,x coordinate first before displaying the
+string. :func:`waddstr` is just like :func:`addstr`, but allows specifying a
+window to use, instead of using ``stdscr`` by default. :func:`mvwaddstr` follows
+similarly.
-Fortunately the Python interface hides all these details. ``stdscr``
-is a window object like any other, and methods such as
-:meth:`~curses.window.addstr` accept multiple argument forms. Usually there
-are four different forms.
+Fortunately the Python interface hides all these details; ``stdscr`` is a window
+object like any other, and methods like :func:`addstr` accept multiple argument
+forms. Usually there are four different forms.
+---------------------------------+-----------------------------------------------+
| Form | Description |
@@ -294,26 +237,17 @@ are four different forms.
| | display *str* or *ch*, using attribute *attr* |
+---------------------------------+-----------------------------------------------+
-Attributes allow displaying text in highlighted forms such as boldface,
+Attributes allow displaying text in highlighted forms, such as in boldface,
underline, reverse code, or in color. They'll be explained in more detail in
the next subsection.
-
-The :meth:`~curses.window.addstr` method takes a Python string or
-bytestring as the value to be displayed. The contents of bytestrings
-are sent to the terminal as-is. Strings are encoded to bytes using
-the value of the window's :attr:`encoding` attribute; this defaults to
-the default system encoding as returned by
-:func:`locale.getpreferredencoding`.
-
-The :meth:`~curses.window.addch` methods take a character, which can be
-either a string of length 1, a bytestring of length 1, or an integer.
-
-Constants are provided for extension characters; these constants are
-integers greater than 255. For example, :const:`ACS_PLMINUS` is a +/-
-symbol, and :const:`ACS_ULCORNER` is the upper left corner of a box
-(handy for drawing borders). You can also use the appropriate Unicode
-character.
+The :func:`addstr` function takes a Python string as the value to be displayed,
+while the :func:`addch` functions take a character, which can be either a Python
+string of length 1 or an integer. If it's a string, you're limited to
+displaying characters between 0 and 255. SVr4 curses provides constants for
+extension characters; these constants are integers greater than 255. For
+example, :const:`ACS_PLMINUS` is a +/- symbol, and :const:`ACS_ULCORNER` is the
+upper left corner of a box (handy for drawing borders).
Windows remember where the cursor was left after the last operation, so if you
leave out the *y,x* coordinates, the string or character will be displayed
@@ -323,11 +257,10 @@ you may want to ensure that the cursor is positioned in some location where it
won't be distracting; it can be confusing to have the cursor blinking at some
apparently random location.
-If your application doesn't need a blinking cursor at all, you can
-call ``curs_set(False)`` to make it invisible. For compatibility
-with older curses versions, there's a ``leaveok(bool)`` function
-that's a synonym for :func:`~curses.curs_set`. When *bool* is true, the
-curses library will attempt to suppress the flashing cursor, and you
+If your application doesn't need a blinking cursor at all, you can call
+``curs_set(0)`` to make it invisible. Equivalently, and for compatibility with
+older curses versions, there's a ``leaveok(bool)`` function. When *bool* is
+true, the curses library will attempt to suppress the flashing cursor, and you
won't need to worry about leaving it in odd locations.
@@ -335,16 +268,15 @@ Attributes and Color
--------------------
Characters can be displayed in different ways. Status lines in a text-based
-application are commonly shown in reverse video, or a text viewer may need to
+application are commonly shown in reverse video; a text viewer may need to
highlight certain words. curses supports this by allowing you to specify an
attribute for each cell on the screen.
-An attribute is an integer, each bit representing a different
-attribute. You can try to display text with multiple attribute bits
-set, but curses doesn't guarantee that all the possible combinations
-are available, or that they're all visually distinct. That depends on
-the ability of the terminal being used, so it's safest to stick to the
-most commonly available attributes, listed here.
+An attribute is an integer, each bit representing a different attribute. You can
+try to display text with multiple attribute bits set, but curses doesn't
+guarantee that all the possible combinations are available, or that they're all
+visually distinct. That depends on the ability of the terminal being used, so
+it's safest to stick to the most commonly available attributes, listed here.
+----------------------+--------------------------------------+
| Attribute | Description |
@@ -373,11 +305,10 @@ The curses library also supports color on those terminals that provide it. The
most common such terminal is probably the Linux console, followed by color
xterms.
-To use color, you must call the :func:`~curses.start_color` function soon
-after calling :func:`~curses.initscr`, to initialize the default color set
-(the :func:`curses.wrapper` function does this automatically). Once that's
-done, the :func:`~curses.has_colors` function returns TRUE if the terminal
-in use can
+To use color, you must call the :func:`start_color` function soon after calling
+:func:`initscr`, to initialize the default color set (the
+:func:`curses.wrapper.wrapper` function does this automatically). Once that's
+done, the :func:`has_colors` function returns TRUE if the terminal in use can
actually display color. (Note: curses uses the American spelling 'color',
instead of the Canadian/British spelling 'colour'. If you're used to the
British spelling, you'll have to resign yourself to misspelling it for the sake
@@ -385,10 +316,9 @@ of these functions.)
The curses library maintains a finite number of color pairs, containing a
foreground (or text) color and a background color. You can get the attribute
-value corresponding to a color pair with the :func:`~curses.color_pair`
-function; this can be bitwise-OR'ed with other attributes such as
-:const:`A_REVERSE`, but again, such combinations are not guaranteed to work
-on all terminals.
+value corresponding to a color pair with the :func:`color_pair` function; this
+can be bitwise-OR'ed with other attributes such as :const:`A_REVERSE`, but
+again, such combinations are not guaranteed to work on all terminals.
An example, which displays a line of text using color pair 1::
@@ -396,16 +326,15 @@ An example, which displays a line of text using color pair 1::
stdscr.refresh()
As I said before, a color pair consists of a foreground and background color.
+:func:`start_color` initializes 8 basic colors when it activates color mode.
+They are: 0:black, 1:red, 2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and
+7:white. The curses module defines named constants for each of these colors:
+:const:`curses.COLOR_BLACK`, :const:`curses.COLOR_RED`, and so forth.
+
The ``init_pair(n, f, b)`` function changes the definition of color pair *n*, to
foreground color f and background color b. Color pair 0 is hard-wired to white
on black, and cannot be changed.
-Colors are numbered, and :func:`start_color` initializes 8 basic
-colors when it activates color mode. They are: 0:black, 1:red,
-2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and 7:white. The :mod:`curses`
-module defines named constants for each of these colors:
-:const:`curses.COLOR_BLACK`, :const:`curses.COLOR_RED`, and so forth.
-
Let's put all this together. To change color 1 to red text on a white
background, you would call::
@@ -421,132 +350,90 @@ Very fancy terminals can change the definitions of the actual colors to a given
RGB value. This lets you change color 1, which is usually red, to purple or
blue or any other color you like. Unfortunately, the Linux console doesn't
support this, so I'm unable to try it out, and can't provide any examples. You
-can check if your terminal can do this by calling
-:func:`~curses.can_change_color`, which returns ``True`` if the capability is
-there. If you're lucky enough to have such a talented terminal, consult your
-system's man pages for more information.
+can check if your terminal can do this by calling :func:`can_change_color`,
+which returns TRUE if the capability is there. If you're lucky enough to have
+such a talented terminal, consult your system's man pages for more information.
User Input
==========
-The C curses library offers only very simple input mechanisms. Python's
-:mod:`curses` module adds a basic text-input widget. (Other libraries
-such as `Urwid <https://pypi.org/project/urwid/>`_ have more extensive
-collections of widgets.)
-
-There are two methods for getting input from a window:
-
-* :meth:`~curses.window.getch` refreshes the screen and then waits for
- the user to hit a key, displaying the key if :func:`~curses.echo` has been
- called earlier. You can optionally specify a coordinate to which
- the cursor should be moved before pausing.
-
-* :meth:`~curses.window.getkey` does the same thing but converts the
- integer to a string. Individual characters are returned as
- 1-character strings, and special keys such as function keys return
- longer strings containing a key name such as ``KEY_UP`` or ``^G``.
-
-It's possible to not wait for the user using the
-:meth:`~curses.window.nodelay` window method. After ``nodelay(True)``,
-:meth:`getch` and :meth:`getkey` for the window become
-non-blocking. To signal that no input is ready, :meth:`getch` returns
-``curses.ERR`` (a value of -1) and :meth:`getkey` raises an exception.
-There's also a :func:`~curses.halfdelay` function, which can be used to (in
-effect) set a timer on each :meth:`getch`; if no input becomes
-available within a specified delay (measured in tenths of a second),
-curses raises an exception.
+The curses library itself offers only very simple input mechanisms. Python's
+support adds a text-input widget that makes up some of the lack.
+
+The most common way to get input to a window is to use its :meth:`getch` method.
+:meth:`getch` pauses and waits for the user to hit a key, displaying it if
+:func:`echo` has been called earlier. You can optionally specify a coordinate
+to which the cursor should be moved before pausing.
+
+It's possible to change this behavior with the method :meth:`nodelay`. After
+``nodelay(1)``, :meth:`getch` for the window becomes non-blocking and returns
+``curses.ERR`` (a value of -1) when no input is ready. There's also a
+:func:`halfdelay` function, which can be used to (in effect) set a timer on each
+:meth:`getch`; if no input becomes available within a specified
+delay (measured in tenths of a second), curses raises an exception.
The :meth:`getch` method returns an integer; if it's between 0 and 255, it
represents the ASCII code of the key pressed. Values greater than 255 are
special keys such as Page Up, Home, or the cursor keys. You can compare the
value returned to constants such as :const:`curses.KEY_PPAGE`,
-:const:`curses.KEY_HOME`, or :const:`curses.KEY_LEFT`. The main loop of
-your program may look something like this::
+:const:`curses.KEY_HOME`, or :const:`curses.KEY_LEFT`. Usually the main loop of
+your program will look something like this::
- while True:
+ while 1:
c = stdscr.getch()
if c == ord('p'):
PrintDocument()
elif c == ord('q'):
- break # Exit the while loop
+ break # Exit the while()
elif c == curses.KEY_HOME:
x = y = 0
The :mod:`curses.ascii` module supplies ASCII class membership functions that
-take either integer or 1-character string arguments; these may be useful in
-writing more readable tests for such loops. It also supplies
+take either integer or 1-character-string arguments; these may be useful in
+writing more readable tests for your command interpreters. It also supplies
conversion functions that take either integer or 1-character-string arguments
and return the same type. For example, :func:`curses.ascii.ctrl` returns the
control character corresponding to its argument.
-There's also a method to retrieve an entire string,
-:meth:`~curses.window.getstr`. It isn't used very often, because its
-functionality is quite limited; the only editing keys available are
-the backspace key and the Enter key, which terminates the string. It
-can optionally be limited to a fixed number of characters. ::
+There's also a method to retrieve an entire string, :const:`getstr()`. It isn't
+used very often, because its functionality is quite limited; the only editing
+keys available are the backspace key and the Enter key, which terminates the
+string. It can optionally be limited to a fixed number of characters. ::
curses.echo() # Enable echoing of characters
# Get a 15-character string, with the cursor on the top line
s = stdscr.getstr(0,0, 15)
-The :mod:`curses.textpad` module supplies a text box that supports an
-Emacs-like set of keybindings. Various methods of the
-:class:`~curses.textpad.Textbox` class support editing with input
-validation and gathering the edit results either with or without
-trailing spaces. Here's an example::
-
- import curses
- from curses.textpad import Textbox, rectangle
+The Python :mod:`curses.textpad` module supplies something better. With it, you
+can turn a window into a text box that supports an Emacs-like set of
+keybindings. Various methods of :class:`Textbox` class support editing with
+input validation and gathering the edit results either with or without trailing
+spaces. See the library documentation on :mod:`curses.textpad` for the
+details.
- def main(stdscr):
- stdscr.addstr(0, 0, "Enter IM message: (hit Ctrl-G to send)")
- editwin = curses.newwin(5,30, 2,1)
- rectangle(stdscr, 1,0, 1+5+1, 1+30+1)
- stdscr.refresh()
-
- box = Textbox(editwin)
-
- # Let the user edit until Ctrl-G is struck.
- box.edit()
+For More Information
+====================
- # Get resulting contents
- message = box.gather()
+This HOWTO didn't cover some advanced topics, such as screen-scraping or
+capturing mouse events from an xterm instance. But the Python library page for
+the curses modules is now pretty complete. You should browse it next.
-See the library documentation on :mod:`curses.textpad` for more details.
+If you're in doubt about the detailed behavior of any of the ncurses entry
+points, consult the manual pages for your curses implementation, whether it's
+ncurses or a proprietary Unix vendor's. The manual pages will document any
+quirks, and provide complete lists of all the functions, attributes, and
+:const:`ACS_\*` characters available to you.
+Because the curses API is so large, some functions aren't supported in the
+Python interface, not because they're difficult to implement, but because no one
+has needed them yet. Feel free to add them and then submit a patch. Also, we
+don't yet have support for the menu library associated with
+ncurses; feel free to add that.
-For More Information
-====================
+If you write an interesting little program, feel free to contribute it as
+another demo. We can always use more of them!
-This HOWTO doesn't cover some advanced topics, such as reading the
-contents of the screen or capturing mouse events from an xterm
-instance, but the Python library page for the :mod:`curses` module is now
-reasonably complete. You should browse it next.
-
-If you're in doubt about the detailed behavior of the curses
-functions, consult the manual pages for your curses implementation,
-whether it's ncurses or a proprietary Unix vendor's. The manual pages
-will document any quirks, and provide complete lists of all the
-functions, attributes, and :const:`ACS_\*` characters available to
-you.
-
-Because the curses API is so large, some functions aren't supported in
-the Python interface. Often this isn't because they're difficult to
-implement, but because no one has needed them yet. Also, Python
-doesn't yet support the menu library associated with ncurses.
-Patches adding support for these would be welcome; see
-`the Python Developer's Guide <https://devguide.python.org/>`_ to
-learn more about submitting patches to Python.
-
-* `Writing Programs with NCURSES <http://invisible-island.net/ncurses/ncurses-intro.html>`_:
- a lengthy tutorial for C programmers.
-* `The ncurses man page <https://linux.die.net/man/3/ncurses>`_
-* `The ncurses FAQ <http://invisible-island.net/ncurses/ncurses.faq.html>`_
-* `"Use curses... don't swear" <https://www.youtube.com/watch?v=eN1eZtjLEnU>`_:
- video of a PyCon 2013 talk on controlling terminals using curses or Urwid.
-* `"Console Applications with Urwid" <http://www.pyvideo.org/video/1568/console-applications-with-urwid>`_:
- video of a PyCon CA 2012 talk demonstrating some applications written using
- Urwid.
+The ncurses FAQ: http://invisible-island.net/ncurses/ncurses.faq.html
diff --git a/Doc/howto/descriptor.rst b/Doc/howto/descriptor.rst
index 6928917..37ba6a8 100644
--- a/Doc/howto/descriptor.rst
+++ b/Doc/howto/descriptor.rst
@@ -11,7 +11,7 @@ Abstract
--------
Defines descriptors, summarizes the protocol, and shows how descriptors are
-called. Examines a custom descriptor and several built-in Python descriptors
+called. Examines a custom descriptor and several built-in python descriptors
including functions, properties, static methods, and class methods. Shows how
each works by giving a pure Python equivalent and a sample application.
@@ -36,7 +36,9 @@ continuing through the base classes of ``type(a)`` excluding metaclasses. If the
looked-up value is an object defining one of the descriptor methods, then Python
may override the default behavior and invoke the descriptor method instead.
Where this occurs in the precedence chain depends on which descriptor methods
-were defined.
+were defined. Note that descriptors are only invoked for new style objects or
+classes (a class is new style if it inherits from :class:`object` or
+:class:`type`).
Descriptors are a powerful, general purpose protocol. They are the mechanism
behind properties, methods, static methods, class methods, and :func:`super()`.
@@ -48,17 +50,17 @@ a flexible set of new tools for everyday Python programs.
Descriptor Protocol
-------------------
-``descr.__get__(self, obj, type=None) -> value``
+``descr.__get__(self, obj, type=None) --> value``
-``descr.__set__(self, obj, value) -> None``
+``descr.__set__(self, obj, value) --> None``
-``descr.__delete__(self, obj) -> None``
+``descr.__delete__(self, obj) --> None``
That is all there is to it. Define any of these methods and an object is
considered a descriptor and can override default behavior upon being looked up
as an attribute.
-If an object defines :meth:`__set__` or :meth:`__delete__`, it is considered
+If an object defines both :meth:`__get__` and :meth:`__set__`, it is considered
a data descriptor. Descriptors that only define :meth:`__get__` are called
non-data descriptors (they are typically used for methods but other uses are
possible).
@@ -87,6 +89,8 @@ of ``obj``. If ``d`` defines the method :meth:`__get__`, then ``d.__get__(obj)`
is invoked according to the precedence rules listed below.
The details of invocation depend on whether ``obj`` is an object or a class.
+Either way, descriptors only work for new style objects and classes. A class is
+new style if it is a subclass of :class:`object`.
For objects, the machinery is in :meth:`object.__getattribute__` which
transforms ``b.x`` into ``type(b).__dict__['x'].__get__(b, type(b))``. The
@@ -111,21 +115,23 @@ The important points to remember are:
* descriptors are invoked by the :meth:`__getattribute__` method
* overriding :meth:`__getattribute__` prevents automatic descriptor calls
+* :meth:`__getattribute__` is only available with new style classes and objects
* :meth:`object.__getattribute__` and :meth:`type.__getattribute__` make
different calls to :meth:`__get__`.
* data descriptors always override instance dictionaries.
* non-data descriptors may be overridden by instance dictionaries.
The object returned by ``super()`` also has a custom :meth:`__getattribute__`
-method for invoking descriptors. The attribute lookup ``super(B, obj).m`` searches
+method for invoking descriptors. The call ``super(B, obj).m()`` searches
``obj.__class__.__mro__`` for the base class ``A`` immediately following ``B``
and then returns ``A.__dict__['m'].__get__(obj, B)``. If not a descriptor,
``m`` is returned unchanged. If not in the dictionary, ``m`` reverts to a
search using :meth:`object.__getattribute__`.
-The implementation details are in :c:func:`super_getattro()` in
-:source:`Objects/typeobject.c`. and a pure Python equivalent can be found in
-`Guido's Tutorial`_.
+Note, in Python 2.2, ``super(B, obj).m()`` would only invoke :meth:`__get__` if
+``m`` was a data descriptor. In Python 2.3, non-data descriptors also get
+invoked unless an old-style class is involved. The implementation details are
+in :c:func:`super_getattro()` in :source:`Objects/typeobject.c`.
.. _`Guido's Tutorial`: https://www.python.org/download/releases/2.2.3/descrintro/#cooperation
@@ -145,7 +151,7 @@ print a message for each get or set. Overriding :meth:`__getattribute__` is
alternate approach that could do this for every attribute. However, this
descriptor is useful for monitoring just a few chosen attributes::
- class RevealAccess:
+ class RevealAccess(object):
"""A data descriptor that sets and returns values
normally and prints a message logging their access.
"""
@@ -155,14 +161,14 @@ descriptor is useful for monitoring just a few chosen attributes::
self.name = name
def __get__(self, obj, objtype):
- print('Retrieving', self.name)
+ print 'Retrieving', self.name
return self.val
def __set__(self, obj, val):
- print('Updating', self.name)
+ print 'Updating', self.name
self.val = val
- >>> class MyClass:
+ >>> class MyClass(object):
... x = RevealAccess(10, 'var "x"')
... y = 5
...
@@ -180,7 +186,7 @@ descriptor is useful for monitoring just a few chosen attributes::
The protocol is simple and offers exciting possibilities. Several use cases are
so common that they have been packaged into individual function calls.
-Properties, bound methods, static methods, and class methods are all
+Properties, bound and unbound methods, static methods, and class methods are all
based on the descriptor protocol.
@@ -194,7 +200,7 @@ triggers function calls upon access to an attribute. Its signature is::
The documentation shows a typical use to define a managed attribute ``x``::
- class C:
+ class C(object):
def getx(self): return self.__x
def setx(self, value): self.__x = value
def delx(self): del self.__x
@@ -203,7 +209,7 @@ The documentation shows a typical use to define a managed attribute ``x``::
To see how :func:`property` is implemented in terms of the descriptor protocol,
here is a pure Python equivalent::
- class Property:
+ class Property(object):
"Emulate PyProperty_Type() in Objects/descrobject.c"
def __init__(self, fget=None, fset=None, fdel=None, doc=None):
@@ -250,7 +256,7 @@ to be recalculated on every access; however, the programmer does not want to
affect existing client code accessing the attribute directly. The solution is
to wrap access to the value attribute in a property data descriptor::
- class Cell:
+ class Cell(object):
. . .
def getvalue(self):
"Recalculate the cell before returning value"
@@ -266,60 +272,50 @@ Python's object oriented features are built upon a function based environment.
Using non-data descriptors, the two are merged seamlessly.
Class dictionaries store methods as functions. In a class definition, methods
-are written using :keyword:`def` or :keyword:`lambda`, the usual tools for
-creating functions. Methods only differ from regular functions in that the
+are written using :keyword:`def` and :keyword:`lambda`, the usual tools for
+creating functions. The only difference from regular functions is that the
first argument is reserved for the object instance. By Python convention, the
instance reference is called *self* but may be called *this* or any other
variable name.
To support method calls, functions include the :meth:`__get__` method for
binding methods during attribute access. This means that all functions are
-non-data descriptors which return bound methods when they are invoked from an
-object. In pure Python, it works like this::
+non-data descriptors which return bound or unbound methods depending whether
+they are invoked from an object or a class. In pure python, it works like
+this::
- class Function:
+ class Function(object):
. . .
def __get__(self, obj, objtype=None):
"Simulate func_descr_get() in Objects/funcobject.c"
- if obj is None:
- return self
- return types.MethodType(self, obj)
+ return types.MethodType(self, obj, objtype)
Running the interpreter shows how the function descriptor works in practice::
- >>> class D:
+ >>> class D(object):
... def f(self, x):
... return x
...
>>> d = D()
-
- # Access through the class dictionary does not invoke __get__.
- # It just returns the underlying function object.
- >>> D.__dict__['f']
- <function D.f at 0x00C45070>
-
- # Dotted access from a class calls __get__() which just returns
- # the underlying function unchanged.
- >>> D.f
- <function D.f at 0x00C45070>
-
- # The function has a __qualname__ attribute to support introspection
- >>> D.f.__qualname__
- 'D.f'
-
- # Dotted access from an instance calls __get__() which returns the
- # function wrapped in a bound method object
- >>> d.f
+ >>> D.__dict__['f'] # Stored internally as a function
+ <function f at 0x00C45070>
+ >>> D.f # Get from a class becomes an unbound method
+ <unbound method D.f>
+ >>> d.f # Get from an instance becomes a bound method
<bound method D.f of <__main__.D object at 0x00B18C90>>
- # Internally, the bound method stores the underlying function,
- # the bound instance, and the class of the bound instance.
- >>> d.f.__func__
- <function D.f at 0x1012e5ae8>
- >>> d.f.__self__
- <__main__.D object at 0x1012e1f98>
- >>> d.f.__class__
- <class 'method'>
+The output suggests that bound and unbound methods are two different types.
+While they could have been implemented that way, the actual C implementation of
+:c:type:`PyMethod_Type` in :source:`Objects/classobject.c` is a single object
+with two different representations depending on whether the :attr:`im_self`
+field is set or is *NULL* (the C equivalent of ``None``).
+
+Likewise, the effects of calling a method object depend on the :attr:`im_self`
+field. If set (meaning bound), the original function (stored in the
+:attr:`im_func` field) is called as expected with the first argument set to the
+instance. If unbound, all of the arguments are passed unchanged to the original
+function. The actual C implementation of :func:`instancemethod_call()` is only
+slightly more complex in that it includes some type checking.
Static Methods and Class Methods
@@ -367,20 +363,20 @@ It can be called either from an object or the class: ``s.erf(1.5) --> .9332`` o
Since staticmethods return the underlying function with no changes, the example
calls are unexciting::
- >>> class E:
+ >>> class E(object):
... def f(x):
- ... print(x)
+ ... print x
... f = staticmethod(f)
...
- >>> E.f(3)
+ >>> print E.f(3)
3
- >>> E().f(3)
+ >>> print E().f(3)
3
Using the non-data descriptor protocol, a pure Python version of
:func:`staticmethod` would look like this::
- class StaticMethod:
+ class StaticMethod(object):
"Emulate PyStaticMethod_Type() in Objects/funcobject.c"
def __init__(self, f):
@@ -393,14 +389,14 @@ Unlike static methods, class methods prepend the class reference to the
argument list before calling the function. This format is the same
for whether the caller is an object or a class::
- >>> class E:
+ >>> class E(object):
... def f(klass, x):
- ... return klass.__name__, x
+ ... return klass.__name__, x
... f = classmethod(f)
...
- >>> print(E.f(3))
+ >>> print E.f(3)
('E', 3)
- >>> print(E().f(3))
+ >>> print E().f(3)
('E', 3)
@@ -410,7 +406,7 @@ is to create alternate class constructors. In Python 2.3, the classmethod
:func:`dict.fromkeys` creates a new dictionary from a list of keys. The pure
Python equivalent is::
- class Dict:
+ class Dict(object):
. . .
def fromkeys(klass, iterable, value=None):
"Emulate dict_fromkeys() in Objects/dictobject.c"
@@ -428,7 +424,7 @@ Now a new dictionary of unique keys can be constructed like this::
Using the non-data descriptor protocol, a pure Python version of
:func:`classmethod` would look like this::
- class ClassMethod:
+ class ClassMethod(object):
"Emulate PyClassMethod_Type() in Objects/funcobject.c"
def __init__(self, f):
diff --git a/Doc/howto/doanddont.rst b/Doc/howto/doanddont.rst
new file mode 100644
index 0000000..35e1583
--- /dev/null
+++ b/Doc/howto/doanddont.rst
@@ -0,0 +1,327 @@
+************************************
+ Idioms and Anti-Idioms in Python
+************************************
+
+:Author: Moshe Zadka
+
+This document is placed in the public domain.
+
+
+.. topic:: Abstract
+
+ This document can be considered a companion to the tutorial. It shows how to use
+ Python, and even more importantly, how *not* to use Python.
+
+
+Language Constructs You Should Not Use
+======================================
+
+While Python has relatively few gotchas compared to other languages, it still
+has some constructs which are only useful in corner cases, or are plain
+dangerous.
+
+
+from module import \*
+---------------------
+
+
+Inside Function Definitions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``from module import *`` is *invalid* inside function definitions. While many
+versions of Python do not check for the invalidity, it does not make it more
+valid, no more than having a smart lawyer makes a man innocent. Do not use it
+like that ever. Even in versions where it was accepted, it made the function
+execution slower, because the compiler could not be certain which names were
+local and which were global. In Python 2.1 this construct causes warnings, and
+sometimes even errors.
+
+
+At Module Level
+^^^^^^^^^^^^^^^
+
+While it is valid to use ``from module import *`` at module level it is usually
+a bad idea. For one, this loses an important property Python otherwise has ---
+you can know where each toplevel name is defined by a simple "search" function
+in your favourite editor. You also open yourself to trouble in the future, if
+some module grows additional functions or classes.
+
+One of the most awful questions asked on the newsgroup is why this code::
+
+ f = open("www")
+ f.read()
+
+does not work. Of course, it works just fine (assuming you have a file called
+"www".) But it does not work if somewhere in the module, the statement ``from
+os import *`` is present. The :mod:`os` module has a function called
+:func:`open` which returns an integer. While it is very useful, shadowing a
+builtin is one of its least useful properties.
+
+Remember, you can never know for sure what names a module exports, so either
+take what you need --- ``from module import name1, name2``, or keep them in the
+module and access on a per-need basis --- ``import module;print module.name``.
+
+
+When It Is Just Fine
+^^^^^^^^^^^^^^^^^^^^
+
+There are situations in which ``from module import *`` is just fine:
+
+* The interactive prompt. For example, ``from math import *`` makes Python an
+ amazing scientific calculator.
+
+* When extending a module in C with a module in Python.
+
+* When the module advertises itself as ``from import *`` safe.
+
+
+Unadorned :keyword:`exec`, :func:`execfile` and friends
+-------------------------------------------------------
+
+The word "unadorned" refers to the use without an explicit dictionary, in which
+case those constructs evaluate code in the *current* environment. This is
+dangerous for the same reasons ``from import *`` is dangerous --- it might step
+over variables you are counting on and mess up things for the rest of your code.
+Simply do not do that.
+
+Bad examples::
+
+ >>> for name in sys.argv[1:]:
+ >>> exec "%s=1" % name
+ >>> def func(s, **kw):
+ >>> for var, val in kw.items():
+ >>> exec "s.%s=val" % var # invalid!
+ >>> execfile("handler.py")
+ >>> handle()
+
+Good examples::
+
+ >>> d = {}
+ >>> for name in sys.argv[1:]:
+ >>> d[name] = 1
+ >>> def func(s, **kw):
+ >>> for var, val in kw.items():
+ >>> setattr(s, var, val)
+ >>> d={}
+ >>> execfile("handle.py", d, d)
+ >>> handle = d['handle']
+ >>> handle()
+
+
+from module import name1, name2
+-------------------------------
+
+This is a "don't" which is much weaker than the previous "don't"s but is still
+something you should not do if you don't have good reasons to do that. The
+reason it is usually a bad idea is because you suddenly have an object which lives
+in two separate namespaces. When the binding in one namespace changes, the
+binding in the other will not, so there will be a discrepancy between them. This
+happens when, for example, one module is reloaded, or changes the definition of
+a function at runtime.
+
+Bad example::
+
+ # foo.py
+ a = 1
+
+ # bar.py
+ from foo import a
+ if something():
+ a = 2 # danger: foo.a != a
+
+Good example::
+
+ # foo.py
+ a = 1
+
+ # bar.py
+ import foo
+ if something():
+ foo.a = 2
+
+
+except:
+-------
+
+Python has the ``except:`` clause, which catches all exceptions. Since *every*
+error in Python raises an exception, using ``except:`` can make many
+programming errors look like runtime problems, which hinders the debugging
+process.
+
+The following code shows a great example of why this is bad::
+
+ try:
+ foo = opne("file") # misspelled "open"
+ except:
+ sys.exit("could not open file!")
+
+The second line triggers a :exc:`NameError`, which is caught by the except
+clause. The program will exit, and the error message the program prints will
+make you think the problem is the readability of ``"file"`` when in fact
+the real error has nothing to do with ``"file"``.
+
+A better way to write the above is ::
+
+ try:
+ foo = opne("file")
+ except IOError:
+ sys.exit("could not open file")
+
+When this is run, Python will produce a traceback showing the :exc:`NameError`,
+and it will be immediately apparent what needs to be fixed.
+
+.. index:: bare except, except; bare
+
+Because ``except:`` catches *all* exceptions, including :exc:`SystemExit`,
+:exc:`KeyboardInterrupt`, and :exc:`GeneratorExit` (which is not an error and
+should not normally be caught by user code), using a bare ``except:`` is almost
+never a good idea. In situations where you need to catch all "normal" errors,
+such as in a framework that runs callbacks, you can catch the base class for
+all normal exceptions, :exc:`Exception`. Unfortunately in Python 2.x it is
+possible for third-party code to raise exceptions that do not inherit from
+:exc:`Exception`, so in Python 2.x there are some cases where you may have to
+use a bare ``except:`` and manually re-raise the exceptions you don't want
+to catch.
+
+
+Exceptions
+==========
+
+Exceptions are a useful feature of Python. You should learn to raise them
+whenever something unexpected occurs, and catch them only where you can do
+something about them.
+
+The following is a very popular anti-idiom ::
+
+ def get_status(file):
+ if not os.path.exists(file):
+ print "file not found"
+ sys.exit(1)
+ return open(file).readline()
+
+Consider the case where the file gets deleted between the time the call to
+:func:`os.path.exists` is made and the time :func:`open` is called. In that
+case the last line will raise an :exc:`IOError`. The same thing would happen
+if *file* exists but has no read permission. Since testing this on a normal
+machine on existent and non-existent files makes it seem bugless, the test
+results will seem fine, and the code will get shipped. Later an unhandled
+:exc:`IOError` (or perhaps some other :exc:`EnvironmentError`) escapes to the
+user, who gets to watch the ugly traceback.
+
+Here is a somewhat better way to do it. ::
+
+ def get_status(file):
+ try:
+ return open(file).readline()
+ except EnvironmentError as err:
+ print "Unable to open file: {}".format(err)
+ sys.exit(1)
+
+In this version, *either* the file gets opened and the line is read (so it
+works even on flaky NFS or SMB connections), or an error message is printed
+that provides all the available information on why the open failed, and the
+application is aborted.
+
+However, even this version of :func:`get_status` makes too many assumptions ---
+that it will only be used in a short running script, and not, say, in a long
+running server. Sure, the caller could do something like ::
+
+ try:
+ status = get_status(log)
+ except SystemExit:
+ status = None
+
+But there is a better way. You should try to use as few ``except`` clauses in
+your code as you can --- the ones you do use will usually be inside calls which
+should always succeed, or a catch-all in a main function.
+
+So, an even better version of :func:`get_status()` is probably ::
+
+ def get_status(file):
+ return open(file).readline()
+
+The caller can deal with the exception if it wants (for example, if it tries
+several files in a loop), or just let the exception filter upwards to *its*
+caller.
+
+But the last version still has a serious problem --- due to implementation
+details in CPython, the file would not be closed when an exception is raised
+until the exception handler finishes; and, worse, in other implementations
+(e.g., Jython) it might not be closed at all regardless of whether or not
+an exception is raised.
+
+The best version of this function uses the ``open()`` call as a context
+manager, which will ensure that the file gets closed as soon as the
+function returns::
+
+ def get_status(file):
+ with open(file) as fp:
+ return fp.readline()
+
+
+Using the Batteries
+===================
+
+Every so often, people seem to be writing stuff in the Python library again,
+usually poorly. While the occasional module has a poor interface, it is usually
+much better to use the rich standard library and data types that come with
+Python than inventing your own.
+
+A useful module very few people know about is :mod:`os.path`. It always has the
+correct path arithmetic for your operating system, and will usually be much
+better than whatever you come up with yourself.
+
+Compare::
+
+ # ugh!
+ return dir+"/"+file
+ # better
+ return os.path.join(dir, file)
+
+More useful functions in :mod:`os.path`: :func:`basename`, :func:`dirname` and
+:func:`splitext`.
+
+There are also many useful built-in functions people seem not to be aware of
+for some reason: :func:`min` and :func:`max` can find the minimum/maximum of
+any sequence with comparable semantics, for example, yet many people write
+their own :func:`max`/:func:`min`. Another highly useful function is
+:func:`reduce` which can be used to repeatedly apply a binary operation to a
+sequence, reducing it to a single value. For example, compute a factorial
+with a series of multiply operations::
+
+ >>> n = 4
+ >>> import operator
+ >>> reduce(operator.mul, range(1, n+1))
+ 24
+
+When it comes to parsing numbers, note that :func:`float`, :func:`int` and
+:func:`long` all accept string arguments and will reject ill-formed strings
+by raising an :exc:`ValueError`.
+
+
+Using Backslash to Continue Statements
+======================================
+
+Since Python treats a newline as a statement terminator, and since statements
+are often more than is comfortable to put in one line, many people do::
+
+ if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
+ calculate_number(10, 20) != forbulate(500, 360):
+ pass
+
+You should realize that this is dangerous: a stray space after the ``\`` would
+make this line wrong, and stray spaces are notoriously hard to see in editors.
+In this case, at least it would be a syntax error, but if the code was::
+
+ value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
+ + calculate_number(10, 20)*forbulate(500, 360)
+
+then it would just be subtly wrong.
+
+It is usually much better to use the implicit continuation inside parenthesis:
+
+This version is bulletproof::
+
+ value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9]
+ + calculate_number(10, 20)*forbulate(500, 360))
+
diff --git a/Doc/howto/functional.rst b/Doc/howto/functional.rst
index f8f2aac..a06e29c 100644
--- a/Doc/howto/functional.rst
+++ b/Doc/howto/functional.rst
@@ -3,7 +3,7 @@
********************************
:Author: A. M. Kuchling
-:Release: 0.32
+:Release: 0.31
In this document, we'll take a tour of Python's features suitable for
implementing programs in a functional style. After an introduction to the
@@ -15,9 +15,9 @@ concepts of functional programming, we'll look at language features such as
Introduction
============
-This section explains the basic concept of functional programming; if
-you're just interested in learning about Python language features,
-skip to the next section on :ref:`functional-howto-iterators`.
+This section explains the basic concept of functional programming; if you're
+just interested in learning about Python language features, skip to the next
+section.
Programming languages support decomposing problems in several different ways:
@@ -44,15 +44,14 @@ Programming languages support decomposing problems in several different ways:
functional languages include the ML family (Standard ML, OCaml, and other
variants) and Haskell.
-The designers of some computer languages choose to emphasize one
-particular approach to programming. This often makes it difficult to
-write programs that use a different approach. Other languages are
-multi-paradigm languages that support several different approaches.
-Lisp, C++, and Python are multi-paradigm; you can write programs or
-libraries that are largely procedural, object-oriented, or functional
-in all of these languages. In a large program, different sections
-might be written using different approaches; the GUI might be
-object-oriented while the processing logic is procedural or
+The designers of some computer languages choose to emphasize one particular
+approach to programming. This often makes it difficult to write programs that
+use a different approach. Other languages are multi-paradigm languages that
+support several different approaches. Lisp, C++, and Python are
+multi-paradigm; you can write programs or libraries that are largely
+procedural, object-oriented, or functional in all of these languages. In a
+large program, different sections might be written using different approaches;
+the GUI might be object-oriented while the processing logic is procedural or
functional, for example.
In a functional program, input flows through a set of functions. Each function
@@ -66,9 +65,9 @@ output must only depend on its input.
Some languages are very strict about purity and don't even have assignment
statements such as ``a=3`` or ``c = a + b``, but it's difficult to avoid all
side effects. Printing to the screen or writing to a disk file are side
-effects, for example. For example, in Python a call to the :func:`print` or
-:func:`time.sleep` function both return no useful value; they're only called for
-their side effects of sending some text to the screen or pausing execution for a
+effects, for example. For example, in Python a ``print`` statement or a
+``time.sleep(1)`` both return no useful value; they're only called for their
+side effects of sending some text to the screen or pausing execution for a
second.
Python programs written in functional style usually won't go to the extreme of
@@ -173,8 +172,6 @@ new programs by arranging existing functions in a new configuration and writing
a few functions specialized for the current task.
-.. _functional-howto-iterators:
-
Iterators
=========
@@ -183,53 +180,52 @@ foundation for writing functional-style programs: iterators.
An iterator is an object representing a stream of data; this object returns the
data one element at a time. A Python iterator must support a method called
-:meth:`~iterator.__next__` that takes no arguments and always returns the next
-element of the stream. If there are no more elements in the stream,
-:meth:`~iterator.__next__` must raise the :exc:`StopIteration` exception.
-Iterators don't have to be finite, though; it's perfectly reasonable to write
-an iterator that produces an infinite stream of data.
+``next()`` that takes no arguments and always returns the next element of the
+stream. If there are no more elements in the stream, ``next()`` must raise the
+``StopIteration`` exception. Iterators don't have to be finite, though; it's
+perfectly reasonable to write an iterator that produces an infinite stream of
+data.
The built-in :func:`iter` function takes an arbitrary object and tries to return
an iterator that will return the object's contents or elements, raising
:exc:`TypeError` if the object doesn't support iteration. Several of Python's
built-in data types support iteration, the most common being lists and
-dictionaries. An object is called :term:`iterable` if you can get an iterator
-for it.
+dictionaries. An object is called an **iterable** object if you can get an
+iterator for it.
You can experiment with the iteration interface manually:
- >>> L = [1, 2, 3]
+ >>> L = [1,2,3]
>>> it = iter(L)
- >>> it #doctest: +ELLIPSIS
+ >>> print it
<...iterator object at ...>
- >>> it.__next__() # same as next(it)
+ >>> it.next()
1
- >>> next(it)
+ >>> it.next()
2
- >>> next(it)
+ >>> it.next()
3
- >>> next(it)
+ >>> it.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
Python expects iterable objects in several different contexts, the most
-important being the :keyword:`for` statement. In the statement ``for X in Y``,
-Y must be an iterator or some object for which :func:`iter` can create an
-iterator. These two statements are equivalent::
-
+important being the ``for`` statement. In the statement ``for X in Y``, Y must
+be an iterator or some object for which ``iter()`` can create an iterator.
+These two statements are equivalent::
for i in iter(obj):
- print(i)
+ print i
for i in obj:
- print(i)
+ print i
Iterators can be materialized as lists or tuples by using the :func:`list` or
:func:`tuple` constructor functions:
- >>> L = [1, 2, 3]
+ >>> L = [1,2,3]
>>> iterator = iter(L)
>>> t = tuple(iterator)
>>> t
@@ -238,26 +234,26 @@ Iterators can be materialized as lists or tuples by using the :func:`list` or
Sequence unpacking also supports iterators: if you know an iterator will return
N elements, you can unpack them into an N-tuple:
- >>> L = [1, 2, 3]
+ >>> L = [1,2,3]
>>> iterator = iter(L)
- >>> a, b, c = iterator
- >>> a, b, c
+ >>> a,b,c = iterator
+ >>> a,b,c
(1, 2, 3)
Built-in functions such as :func:`max` and :func:`min` can take a single
iterator argument and will return the largest or smallest element. The ``"in"``
and ``"not in"`` operators also support iterators: ``X in iterator`` is true if
X is found in the stream returned by the iterator. You'll run into obvious
-problems if the iterator is infinite; :func:`max`, :func:`min`
+problems if the iterator is infinite; ``max()``, ``min()``
will never return, and if the element X never appears in the stream, the
``"in"`` and ``"not in"`` operators won't return either.
Note that you can only go forward in an iterator; there's no way to get the
previous element, reset the iterator, or make a copy of it. Iterator objects
can optionally provide these additional capabilities, but the iterator protocol
-only specifies the :meth:`~iterator.__next__` method. Functions may therefore
-consume all of the iterator's output, and if you need to do something different
-with the same stream, you'll have to create a new iterator.
+only specifies the ``next()`` method. Functions may therefore consume all of
+the iterator's output, and if you need to do something different with the same
+stream, you'll have to create a new iterator.
@@ -269,45 +265,47 @@ sequence type, such as strings, will automatically support creation of an
iterator.
Calling :func:`iter` on a dictionary returns an iterator that will loop over the
-dictionary's keys::
+dictionary's keys:
+
+.. not a doctest since dict ordering varies across Pythons
+
+::
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m:
- ... print(key, m[key])
- Jan 1
- Feb 2
+ ... print key, m[key]
Mar 3
+ Feb 2
+ Aug 8
+ Sep 9
Apr 4
- May 5
Jun 6
Jul 7
- Aug 8
- Sep 9
- Oct 10
+ Jan 1
+ May 5
Nov 11
Dec 12
+ Oct 10
-Note that starting with Python 3.7, dictionary iteration order is guaranteed
-to be the same as the insertion order. In earlier versions, the behaviour was
-unspecified and could vary between implementations.
+Note that the order is essentially random, because it's based on the hash
+ordering of the objects in the dictionary.
-Applying :func:`iter` to a dictionary always loops over the keys, but
-dictionaries have methods that return other iterators. If you want to iterate
-over values or key/value pairs, you can explicitly call the
-:meth:`~dict.values` or :meth:`~dict.items` methods to get an appropriate
-iterator.
+Applying ``iter()`` to a dictionary always loops over the keys, but dictionaries
+have methods that return other iterators. If you want to iterate over keys,
+values, or key/value pairs, you can explicitly call the ``iterkeys()``,
+``itervalues()``, or ``iteritems()`` methods to get an appropriate iterator.
The :func:`dict` constructor can accept an iterator that returns a finite stream
of ``(key, value)`` tuples:
>>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')]
>>> dict(iter(L))
- {'Italy': 'Rome', 'France': 'Paris', 'US': 'Washington DC'}
+ {'Italy': 'Rome', 'US': 'Washington DC', 'France': 'Paris'}
-Files also support iteration by calling the :meth:`~io.TextIOBase.readline`
-method until there are no more lines in the file. This means you can read each
-line of a file like this::
+Files also support iteration by calling the ``readline()`` method until there
+are no more lines in the file. This means you can read each line of a file like
+this::
for line in file:
# do something for each line
@@ -316,9 +314,9 @@ line of a file like this::
Sets can take their contents from an iterable and let you iterate over the set's
elements::
- S = {2, 3, 5, 7, 11, 13}
+ S = set((2, 3, 5, 7, 11, 13))
for i in S:
- print(i)
+ print i
@@ -410,9 +408,12 @@ clauses, the length of the resulting output will be equal to the product of the
lengths of all the sequences. If you have two lists of length 3, the output
list is 9 elements long:
+.. doctest::
+ :options: +NORMALIZE_WHITESPACE
+
>>> seq1 = 'abc'
- >>> seq2 = (1, 2, 3)
- >>> [(x, y) for x in seq1 for y in seq2] #doctest: +NORMALIZE_WHITESPACE
+ >>> seq2 = (1,2,3)
+ >>> [(x,y) for x in seq1 for y in seq2]
[('a', 1), ('a', 2), ('a', 3),
('b', 1), ('b', 2), ('b', 3),
('c', 1), ('c', 2), ('c', 3)]
@@ -422,9 +423,9 @@ creating a tuple, it must be surrounded with parentheses. The first list
comprehension below is a syntax error, while the second one is correct::
# Syntax error
- [x, y for x in seq1 for y in seq2]
+ [ x,y for x in seq1 for y in seq2]
# Correct
- [(x, y) for x in seq1 for y in seq2]
+ [ (x,y) for x in seq1 for y in seq2]
Generators
@@ -445,13 +446,15 @@ is what generators provide; they can be thought of as resumable functions.
Here's the simplest example of a generator function:
- >>> def generate_ints(N):
- ... for i in range(N):
- ... yield i
+.. testcode::
+
+ def generate_ints(N):
+ for i in range(N):
+ yield i
-Any function containing a :keyword:`yield` keyword is a generator function;
-this is detected by Python's :term:`bytecode` compiler which compiles the
-function specially as a result.
+Any function containing a ``yield`` keyword is a generator function; this is
+detected by Python's :term:`bytecode` compiler which compiles the function
+specially as a result.
When you call a generator function, it doesn't return a single value; instead it
returns a generator object that supports the iterator protocol. On executing
@@ -459,44 +462,44 @@ the ``yield`` expression, the generator outputs the value of ``i``, similar to a
``return`` statement. The big difference between ``yield`` and a ``return``
statement is that on reaching a ``yield`` the generator's state of execution is
suspended and local variables are preserved. On the next call to the
-generator's :meth:`~generator.__next__` method, the function will resume
-executing.
+generator's ``.next()`` method, the function will resume executing.
Here's a sample usage of the ``generate_ints()`` generator:
>>> gen = generate_ints(3)
- >>> gen #doctest: +ELLIPSIS
+ >>> gen
<generator object generate_ints at ...>
- >>> next(gen)
+ >>> gen.next()
0
- >>> next(gen)
+ >>> gen.next()
1
- >>> next(gen)
+ >>> gen.next()
2
- >>> next(gen)
+ >>> gen.next()
Traceback (most recent call last):
File "stdin", line 1, in <module>
File "stdin", line 2, in generate_ints
StopIteration
-You could equally write ``for i in generate_ints(5)``, or ``a, b, c =
+You could equally write ``for i in generate_ints(5)``, or ``a,b,c =
generate_ints(3)``.
-Inside a generator function, ``return value`` causes ``StopIteration(value)``
-to be raised from the :meth:`~generator.__next__` method. Once this happens, or
-the bottom of the function is reached, the procession of values ends and the
-generator cannot yield any further values.
+Inside a generator function, the ``return`` statement can only be used without a
+value, and signals the end of the procession of values; after executing a
+``return`` the generator cannot return any further values. ``return`` with a
+value, such as ``return 5``, is a syntax error inside a generator function. The
+end of the generator's results can also be indicated by raising
+``StopIteration`` manually, or by just letting the flow of execution fall off
+the bottom of the function.
You could achieve the effect of generators manually by writing your own class
and storing all the local variables of the generator as instance variables. For
example, returning a list of integers could be done by setting ``self.count`` to
-0, and having the :meth:`~iterator.__next__` method increment ``self.count`` and
-return it.
+0, and having the ``next()`` method increment ``self.count`` and return it.
However, for a moderately complicated generator, writing a corresponding class
can be much messier.
-The test suite included with Python's library,
-:source:`Lib/test/test_generators.py`, contains
+The test suite included with Python's library, ``test_generators.py``, contains
a number of more interesting examples. Here's one generator that implements an
in-order traversal of a tree using generators recursively. ::
@@ -539,23 +542,23 @@ when you're doing something with the returned value, as in the above example.
The parentheses aren't always necessary, but it's easier to always add them
instead of having to remember when they're needed.
-(:pep:`342` explains the exact rules, which are that a ``yield``-expression must
+(PEP 342 explains the exact rules, which are that a ``yield``-expression must
always be parenthesized except when it occurs at the top-level expression on the
right-hand side of an assignment. This means you can write ``val = yield i``
but have to use parentheses when there's an operation, as in ``val = (yield i)
+ 12``.)
-Values are sent into a generator by calling its :meth:`send(value)
-<generator.send>` method. This method resumes the generator's code and the
-``yield`` expression returns the specified value. If the regular
-:meth:`~generator.__next__` method is called, the ``yield`` returns ``None``.
+Values are sent into a generator by calling its ``send(value)`` method. This
+method resumes the generator's code and the ``yield`` expression returns the
+specified value. If the regular ``next()`` method is called, the ``yield``
+returns ``None``.
Here's a simple counter that increments by 1 and allows changing the value of
the internal counter.
.. testcode::
- def counter(maximum):
+ def counter (maximum):
i = 0
while i < maximum:
val = (yield i)
@@ -567,40 +570,38 @@ the internal counter.
And here's an example of changing the counter:
- >>> it = counter(10) #doctest: +SKIP
- >>> next(it) #doctest: +SKIP
+ >>> it = counter(10)
+ >>> print it.next()
0
- >>> next(it) #doctest: +SKIP
+ >>> print it.next()
1
- >>> it.send(8) #doctest: +SKIP
+ >>> print it.send(8)
8
- >>> next(it) #doctest: +SKIP
+ >>> print it.next()
9
- >>> next(it) #doctest: +SKIP
+ >>> print it.next()
Traceback (most recent call last):
File "t.py", line 15, in <module>
- it.next()
+ print it.next()
StopIteration
Because ``yield`` will often be returning ``None``, you should always check for
this case. Don't just use its value in expressions unless you're sure that the
-:meth:`~generator.send` method will be the only method used to resume your
-generator function.
+``send()`` method will be the only method used to resume your generator
+function.
-In addition to :meth:`~generator.send`, there are two other methods on
-generators:
+In addition to ``send()``, there are two other new methods on generators:
-* :meth:`throw(type, value=None, traceback=None) <generator.throw>` is used to
- raise an exception inside the generator; the exception is raised by the
- ``yield`` expression where the generator's execution is paused.
+* ``throw(type, value=None, traceback=None)`` is used to raise an exception
+ inside the generator; the exception is raised by the ``yield`` expression
+ where the generator's execution is paused.
-* :meth:`~generator.close` raises a :exc:`GeneratorExit` exception inside the
- generator to terminate the iteration. On receiving this exception, the
- generator's code must either raise :exc:`GeneratorExit` or
- :exc:`StopIteration`; catching the exception and doing anything else is
- illegal and will trigger a :exc:`RuntimeError`. :meth:`~generator.close`
- will also be called by Python's garbage collector when the generator is
- garbage-collected.
+* ``close()`` raises a :exc:`GeneratorExit` exception inside the generator to
+ terminate the iteration. On receiving this exception, the generator's code
+ must either raise :exc:`GeneratorExit` or :exc:`StopIteration`; catching the
+ exception and doing anything else is illegal and will trigger a
+ :exc:`RuntimeError`. ``close()`` will also be called by Python's garbage
+ collector when the generator is garbage-collected.
If you need to run cleanup code when a :exc:`GeneratorExit` occurs, I suggest
using a ``try: ... finally:`` suite instead of catching :exc:`GeneratorExit`.
@@ -619,47 +620,99 @@ Built-in functions
Let's look in more detail at built-in functions often used with iterators.
-Two of Python's built-in functions, :func:`map` and :func:`filter` duplicate the
-features of generator expressions:
+Two of Python's built-in functions, :func:`map` and :func:`filter`, are somewhat
+obsolete; they duplicate the features of list comprehensions but return actual
+lists instead of iterators.
-:func:`map(f, iterA, iterB, ...) <map>` returns an iterator over the sequence
- ``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
+``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0], iterB[0]),
+f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
>>> def upper(s):
... return s.upper()
- >>> list(map(upper, ['sentence', 'fragment']))
+ >>> map(upper, ['sentence', 'fragment'])
['SENTENCE', 'FRAGMENT']
+
>>> [upper(s) for s in ['sentence', 'fragment']]
['SENTENCE', 'FRAGMENT']
-You can of course achieve the same effect with a list comprehension.
+As shown above, you can achieve the same effect with a list comprehension. The
+:func:`itertools.imap` function does the same thing but can handle infinite
+iterators; it'll be discussed later, in the section on the :mod:`itertools` module.
-:func:`filter(predicate, iter) <filter>` returns an iterator over all the
-sequence elements that meet a certain condition, and is similarly duplicated by
-list comprehensions. A **predicate** is a function that returns the truth
-value of some condition; for use with :func:`filter`, the predicate must take a
-single value.
+``filter(predicate, iter)`` returns a list that contains all the sequence
+elements that meet a certain condition, and is similarly duplicated by list
+comprehensions. A **predicate** is a function that returns the truth value of
+some condition; for use with :func:`filter`, the predicate must take a single
+value.
>>> def is_even(x):
... return (x % 2) == 0
- >>> list(filter(is_even, range(10)))
+ >>> filter(is_even, range(10))
[0, 2, 4, 6, 8]
-
This can also be written as a list comprehension:
- >>> list(x for x in range(10) if is_even(x))
+ >>> [x for x in range(10) if is_even(x)]
[0, 2, 4, 6, 8]
+:func:`filter` also has a counterpart in the :mod:`itertools` module,
+:func:`itertools.ifilter`, that returns an iterator and can therefore handle
+infinite sequences just as :func:`itertools.imap` can.
-:func:`enumerate(iter, start=0) <enumerate>` counts off the elements in the
-iterable returning 2-tuples containing the count (from *start*) and
-each element. ::
+``reduce(func, iter, [initial_value])`` doesn't have a counterpart in the
+:mod:`itertools` module because it cumulatively performs an operation on all the
+iterable's elements and therefore can't be applied to infinite iterables.
+``func`` must be a function that takes two elements and returns a single value.
+:func:`reduce` takes the first two elements A and B returned by the iterator and
+calculates ``func(A, B)``. It then requests the third element, C, calculates
+``func(func(A, B), C)``, combines this result with the fourth element returned,
+and continues until the iterable is exhausted. If the iterable returns no
+values at all, a :exc:`TypeError` exception is raised. If the initial value is
+supplied, it's used as a starting point and ``func(initial_value, A)`` is the
+first calculation.
+
+ >>> import operator
+ >>> reduce(operator.concat, ['A', 'BB', 'C'])
+ 'ABBC'
+ >>> reduce(operator.concat, [])
+ Traceback (most recent call last):
+ ...
+ TypeError: reduce() of empty sequence with no initial value
+ >>> reduce(operator.mul, [1,2,3], 1)
+ 6
+ >>> reduce(operator.mul, [], 1)
+ 1
+
+If you use :func:`operator.add` with :func:`reduce`, you'll add up all the
+elements of the iterable. This case is so common that there's a special
+built-in called :func:`sum` to compute it:
+
+ >>> reduce(operator.add, [1,2,3,4], 0)
+ 10
+ >>> sum([1,2,3,4])
+ 10
+ >>> sum([])
+ 0
+
+For many uses of :func:`reduce`, though, it can be clearer to just write the
+obvious :keyword:`for` loop::
+
+ # Instead of:
+ product = reduce(operator.mul, [1,2,3], 1)
+
+ # You can write:
+ product = 1
+ for i in [1,2,3]:
+ product *= i
+
+
+``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples
+containing the count and each element.
>>> for item in enumerate(['subject', 'verb', 'object']):
- ... print(item)
+ ... print item
(0, 'subject')
(1, 'verb')
(2, 'object')
@@ -670,66 +723,127 @@ indexes at which certain conditions are met::
f = open('data.txt', 'r')
for i, line in enumerate(f):
if line.strip() == '':
- print('Blank line at line #%i' % i)
+ print 'Blank line at line #%i' % i
-:func:`sorted(iterable, key=None, reverse=False) <sorted>` collects all the
+``sorted(iterable, [cmp=None], [key=None], [reverse=False])`` collects all the
elements of the iterable into a list, sorts the list, and returns the sorted
-result. The *key* and *reverse* arguments are passed through to the
-constructed list's :meth:`~list.sort` method. ::
+result. The ``cmp``, ``key``, and ``reverse`` arguments are passed through to
+the constructed list's ``.sort()`` method. ::
>>> import random
>>> # Generate 8 random numbers between [0, 10000)
>>> rand_list = random.sample(range(10000), 8)
- >>> rand_list #doctest: +SKIP
+ >>> rand_list
[769, 7953, 9828, 6431, 8442, 9878, 6213, 2207]
- >>> sorted(rand_list) #doctest: +SKIP
+ >>> sorted(rand_list)
[769, 2207, 6213, 6431, 7953, 8442, 9828, 9878]
- >>> sorted(rand_list, reverse=True) #doctest: +SKIP
+ >>> sorted(rand_list, reverse=True)
[9878, 9828, 8442, 7953, 6431, 6213, 2207, 769]
-(For a more detailed discussion of sorting, see the :ref:`sortinghowto`.)
+(For a more detailed discussion of sorting, see the Sorting mini-HOWTO in the
+Python wiki at https://wiki.python.org/moin/HowTo/Sorting.)
+The ``any(iter)`` and ``all(iter)`` built-ins look at the truth values of an
+iterable's contents. :func:`any` returns ``True`` if any element in the iterable is
+a true value, and :func:`all` returns ``True`` if all of the elements are true
+values:
-The :func:`any(iter) <any>` and :func:`all(iter) <all>` built-ins look at the
-truth values of an iterable's contents. :func:`any` returns ``True`` if any element
-in the iterable is a true value, and :func:`all` returns ``True`` if all of the
-elements are true values:
-
- >>> any([0, 1, 0])
+ >>> any([0,1,0])
True
- >>> any([0, 0, 0])
+ >>> any([0,0,0])
False
- >>> any([1, 1, 1])
+ >>> any([1,1,1])
True
- >>> all([0, 1, 0])
+ >>> all([0,1,0])
False
- >>> all([0, 0, 0])
+ >>> all([0,0,0])
False
- >>> all([1, 1, 1])
+ >>> all([1,1,1])
True
-:func:`zip(iterA, iterB, ...) <zip>` takes one element from each iterable and
-returns them in a tuple::
+Small functions and the lambda expression
+=========================================
- zip(['a', 'b', 'c'], (1, 2, 3)) =>
- ('a', 1), ('b', 2), ('c', 3)
+When writing functional-style programs, you'll often need little functions that
+act as predicates or that combine elements in some way.
-It doesn't construct an in-memory list and exhaust all the input iterators
-before returning; instead tuples are constructed and returned only if they're
-requested. (The technical term for this behaviour is `lazy evaluation
-<https://en.wikipedia.org/wiki/Lazy_evaluation>`__.)
+If there's a Python built-in or a module function that's suitable, you don't
+need to define a new function at all::
-This iterator is intended to be used with iterables that are all of the same
-length. If the iterables are of different lengths, the resulting stream will be
-the same length as the shortest iterable. ::
+ stripped_lines = [line.strip() for line in lines]
+ existing_files = filter(os.path.exists, file_list)
- zip(['a', 'b'], (1, 2, 3)) =>
- ('a', 1), ('b', 2)
+If the function you need doesn't exist, you need to write it. One way to write
+small functions is to use the ``lambda`` statement. ``lambda`` takes a number
+of parameters and an expression combining these parameters, and creates a small
+function that returns the value of the expression::
-You should avoid doing this, though, because an element may be taken from the
-longer iterators and discarded. This means you can't go on to use the iterators
-further because you risk skipping a discarded element.
+ lowercase = lambda x: x.lower()
+
+ print_assign = lambda name, value: name + '=' + str(value)
+
+ adder = lambda x, y: x+y
+
+An alternative is to just use the ``def`` statement and define a function in the
+usual way::
+
+ def lowercase(x):
+ return x.lower()
+
+ def print_assign(name, value):
+ return name + '=' + str(value)
+
+ def adder(x,y):
+ return x + y
+
+Which alternative is preferable? That's a style question; my usual course is to
+avoid using ``lambda``.
+
+One reason for my preference is that ``lambda`` is quite limited in the
+functions it can define. The result has to be computable as a single
+expression, which means you can't have multiway ``if... elif... else``
+comparisons or ``try... except`` statements. If you try to do too much in a
+``lambda`` statement, you'll end up with an overly complicated expression that's
+hard to read. Quick, what's the following code doing?
+
+::
+
+ total = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
+
+You can figure it out, but it takes time to disentangle the expression to figure
+out what's going on. Using a short nested ``def`` statements makes things a
+little bit better::
+
+ def combine (a, b):
+ return 0, a[1] + b[1]
+
+ total = reduce(combine, items)[1]
+
+But it would be best of all if I had simply used a ``for`` loop::
+
+ total = 0
+ for a, b in items:
+ total += b
+
+Or the :func:`sum` built-in and a generator expression::
+
+ total = sum(b for a,b in items)
+
+Many uses of :func:`reduce` are clearer when written as ``for`` loops.
+
+Fredrik Lundh once suggested the following set of rules for refactoring uses of
+``lambda``:
+
+1) Write a lambda function.
+2) Write a comment explaining what the heck that lambda does.
+3) Study the comment for a while, and think of a name that captures the essence
+ of the comment.
+4) Convert the lambda to a def statement, using that name.
+5) Remove the comment.
+
+I really like these rules, but you're free to disagree
+about whether this lambda-free style is better.
The itertools module
@@ -749,46 +863,65 @@ The module's functions fall into a few broad classes:
Creating new iterators
----------------------
-:func:`itertools.count(start, step) <itertools.count>` returns an infinite
-stream of evenly spaced values. You can optionally supply the starting number,
-which defaults to 0, and the interval between numbers, which defaults to 1::
+``itertools.count(n)`` returns an infinite stream of integers, increasing by 1
+each time. You can optionally supply the starting number, which defaults to 0::
itertools.count() =>
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
itertools.count(10) =>
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
- itertools.count(10, 5) =>
- 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, ...
-:func:`itertools.cycle(iter) <itertools.cycle>` saves a copy of the contents of
-a provided iterable and returns a new iterator that returns its elements from
-first to last. The new iterator will repeat these elements infinitely. ::
+``itertools.cycle(iter)`` saves a copy of the contents of a provided iterable
+and returns a new iterator that returns its elements from first to last. The
+new iterator will repeat these elements infinitely. ::
- itertools.cycle([1, 2, 3, 4, 5]) =>
+ itertools.cycle([1,2,3,4,5]) =>
1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ...
-:func:`itertools.repeat(elem, [n]) <itertools.repeat>` returns the provided
-element *n* times, or returns the element endlessly if *n* is not provided. ::
+``itertools.repeat(elem, [n])`` returns the provided element ``n`` times, or
+returns the element endlessly if ``n`` is not provided. ::
itertools.repeat('abc') =>
abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ...
itertools.repeat('abc', 5) =>
abc, abc, abc, abc, abc
-:func:`itertools.chain(iterA, iterB, ...) <itertools.chain>` takes an arbitrary
-number of iterables as input, and returns all the elements of the first
-iterator, then all the elements of the second, and so on, until all of the
-iterables have been exhausted. ::
+``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of iterables as
+input, and returns all the elements of the first iterator, then all the elements
+of the second, and so on, until all of the iterables have been exhausted. ::
itertools.chain(['a', 'b', 'c'], (1, 2, 3)) =>
a, b, c, 1, 2, 3
-:func:`itertools.islice(iter, [start], stop, [step]) <itertools.islice>` returns
-a stream that's a slice of the iterator. With a single *stop* argument, it
-will return the first *stop* elements. If you supply a starting index, you'll
-get *stop-start* elements, and if you supply a value for *step*, elements
-will be skipped accordingly. Unlike Python's string and list slicing, you can't
-use negative values for *start*, *stop*, or *step*. ::
+``itertools.izip(iterA, iterB, ...)`` takes one element from each iterable and
+returns them in a tuple::
+
+ itertools.izip(['a', 'b', 'c'], (1, 2, 3)) =>
+ ('a', 1), ('b', 2), ('c', 3)
+
+It's similar to the built-in :func:`zip` function, but doesn't construct an
+in-memory list and exhaust all the input iterators before returning; instead
+tuples are constructed and returned only if they're requested. (The technical
+term for this behaviour is `lazy evaluation
+<http://en.wikipedia.org/wiki/Lazy_evaluation>`__.)
+
+This iterator is intended to be used with iterables that are all of the same
+length. If the iterables are of different lengths, the resulting stream will be
+the same length as the shortest iterable. ::
+
+ itertools.izip(['a', 'b'], (1, 2, 3)) =>
+ ('a', 1), ('b', 2)
+
+You should avoid doing this, though, because an element may be taken from the
+longer iterators and discarded. This means you can't go on to use the iterators
+further because you risk skipping a discarded element.
+
+``itertools.islice(iter, [start], stop, [step])`` returns a stream that's a
+slice of the iterator. With a single ``stop`` argument, it will return the
+first ``stop`` elements. If you supply a starting index, you'll get
+``stop-start`` elements, and if you supply a value for ``step``, elements will
+be skipped accordingly. Unlike Python's string and list slicing, you can't use
+negative values for ``start``, ``stop``, or ``step``. ::
itertools.islice(range(10), 8) =>
0, 1, 2, 3, 4, 5, 6, 7
@@ -797,10 +930,9 @@ use negative values for *start*, *stop*, or *step*. ::
itertools.islice(range(10), 2, 8, 2) =>
2, 4, 6
-:func:`itertools.tee(iter, [n]) <itertools.tee>` replicates an iterator; it
-returns *n* independent iterators that will all return the contents of the
-source iterator.
-If you don't supply a value for *n*, the default is 2. Replicating iterators
+``itertools.tee(iter, [n])`` replicates an iterator; it returns ``n``
+independent iterators that will all return the contents of the source iterator.
+If you don't supply a value for ``n``, the default is 2. Replicating iterators
requires saving some of the contents of the source iterator, so this can consume
significant memory if the iterator is large and one of the new iterators is
consumed more than the others. ::
@@ -818,21 +950,28 @@ consumed more than the others. ::
Calling functions on elements
-----------------------------
-The :mod:`operator` module contains a set of functions corresponding to Python's
-operators. Some examples are :func:`operator.add(a, b) <operator.add>` (adds
-two values), :func:`operator.ne(a, b) <operator.ne>` (same as ``a != b``), and
-:func:`operator.attrgetter('id') <operator.attrgetter>`
-(returns a callable that fetches the ``.id`` attribute).
+Two functions are used for calling other functions on the contents of an
+iterable.
+
+``itertools.imap(f, iterA, iterB, ...)`` returns a stream containing
+``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``::
+
+ itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
+ 6, 8, 8
-:func:`itertools.starmap(func, iter) <itertools.starmap>` assumes that the
-iterable will return a stream of tuples, and calls *func* using these tuples as
-the arguments::
+The ``operator`` module contains a set of functions corresponding to Python's
+operators. Some examples are ``operator.add(a, b)`` (adds two values),
+``operator.ne(a, b)`` (same as ``a!=b``), and ``operator.attrgetter('id')``
+(returns a callable that fetches the ``"id"`` attribute).
+
+``itertools.starmap(func, iter)`` assumes that the iterable will return a stream
+of tuples, and calls ``f()`` using these tuples as the arguments::
itertools.starmap(os.path.join,
- [('/bin', 'python'), ('/usr', 'bin', 'java'),
- ('/usr', 'bin', 'perl'), ('/usr', 'bin', 'ruby')])
+ [('/usr', 'bin', 'java'), ('/bin', 'python'),
+ ('/usr', 'bin', 'perl'),('/usr', 'bin', 'ruby')])
=>
- /bin/python, /usr/bin/java, /usr/bin/perl, /usr/bin/ruby
+ /usr/bin/java, /bin/python, /usr/bin/perl, /usr/bin/ruby
Selecting elements
@@ -841,19 +980,29 @@ Selecting elements
Another group of functions chooses a subset of an iterator's elements based on a
predicate.
-:func:`itertools.filterfalse(predicate, iter) <itertools.filterfalse>` is the
-opposite of :func:`filter`, returning all elements for which the predicate
-returns false::
+``itertools.ifilter(predicate, iter)`` returns all the elements for which the
+predicate returns true::
+
+ def is_even(x):
+ return (x % 2) == 0
+
+ itertools.ifilter(is_even, itertools.count()) =>
+ 0, 2, 4, 6, 8, 10, 12, 14, ...
+
+``itertools.ifilterfalse(predicate, iter)`` is the opposite, returning all
+elements for which the predicate returns false::
- itertools.filterfalse(is_even, itertools.count()) =>
+ itertools.ifilterfalse(is_even, itertools.count()) =>
1, 3, 5, 7, 9, 11, 13, 15, ...
-:func:`itertools.takewhile(predicate, iter) <itertools.takewhile>` returns
-elements for as long as the predicate returns true. Once the predicate returns
-false, the iterator will signal the end of its results. ::
+``itertools.takewhile(predicate, iter)`` returns elements for as long as the
+predicate returns true. Once the predicate returns false, the iterator will
+signal the end of its results.
+
+::
def less_than_10(x):
- return x < 10
+ return (x < 10)
itertools.takewhile(less_than_10, itertools.count()) =>
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
@@ -861,9 +1010,10 @@ false, the iterator will signal the end of its results. ::
itertools.takewhile(is_even, itertools.count()) =>
0
-:func:`itertools.dropwhile(predicate, iter) <itertools.dropwhile>` discards
-elements while the predicate returns true, and then returns the rest of the
-iterable's results. ::
+``itertools.dropwhile(predicate, iter)`` discards elements while the predicate
+returns true, and then returns the rest of the iterable's results.
+
+::
itertools.dropwhile(less_than_10, itertools.count()) =>
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
@@ -871,89 +1021,18 @@ iterable's results. ::
itertools.dropwhile(is_even, itertools.count()) =>
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
-:func:`itertools.compress(data, selectors) <itertools.compress>` takes two
-iterators and returns only those elements of *data* for which the corresponding
-element of *selectors* is true, stopping whenever either one is exhausted::
-
- itertools.compress([1, 2, 3, 4, 5], [True, True, False, False, True]) =>
- 1, 2, 5
-
-
-Combinatoric functions
-----------------------
-
-The :func:`itertools.combinations(iterable, r) <itertools.combinations>`
-returns an iterator giving all possible *r*-tuple combinations of the
-elements contained in *iterable*. ::
-
- itertools.combinations([1, 2, 3, 4, 5], 2) =>
- (1, 2), (1, 3), (1, 4), (1, 5),
- (2, 3), (2, 4), (2, 5),
- (3, 4), (3, 5),
- (4, 5)
-
- itertools.combinations([1, 2, 3, 4, 5], 3) =>
- (1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5),
- (2, 3, 4), (2, 3, 5), (2, 4, 5),
- (3, 4, 5)
-
-The elements within each tuple remain in the same order as
-*iterable* returned them. For example, the number 1 is always before
-2, 3, 4, or 5 in the examples above. A similar function,
-:func:`itertools.permutations(iterable, r=None) <itertools.permutations>`,
-removes this constraint on the order, returning all possible
-arrangements of length *r*::
-
- itertools.permutations([1, 2, 3, 4, 5], 2) =>
- (1, 2), (1, 3), (1, 4), (1, 5),
- (2, 1), (2, 3), (2, 4), (2, 5),
- (3, 1), (3, 2), (3, 4), (3, 5),
- (4, 1), (4, 2), (4, 3), (4, 5),
- (5, 1), (5, 2), (5, 3), (5, 4)
-
- itertools.permutations([1, 2, 3, 4, 5]) =>
- (1, 2, 3, 4, 5), (1, 2, 3, 5, 4), (1, 2, 4, 3, 5),
- ...
- (5, 4, 3, 2, 1)
-
-If you don't supply a value for *r* the length of the iterable is used,
-meaning that all the elements are permuted.
-
-Note that these functions produce all of the possible combinations by
-position and don't require that the contents of *iterable* are unique::
-
- itertools.permutations('aba', 3) =>
- ('a', 'b', 'a'), ('a', 'a', 'b'), ('b', 'a', 'a'),
- ('b', 'a', 'a'), ('a', 'a', 'b'), ('a', 'b', 'a')
-
-The identical tuple ``('a', 'a', 'b')`` occurs twice, but the two 'a'
-strings came from different positions.
-
-The :func:`itertools.combinations_with_replacement(iterable, r) <itertools.combinations_with_replacement>`
-function relaxes a different constraint: elements can be repeated
-within a single tuple. Conceptually an element is selected for the
-first position of each tuple and then is replaced before the second
-element is selected. ::
-
- itertools.combinations_with_replacement([1, 2, 3, 4, 5], 2) =>
- (1, 1), (1, 2), (1, 3), (1, 4), (1, 5),
- (2, 2), (2, 3), (2, 4), (2, 5),
- (3, 3), (3, 4), (3, 5),
- (4, 4), (4, 5),
- (5, 5)
-
Grouping elements
-----------------
-The last function I'll discuss, :func:`itertools.groupby(iter, key_func=None)
-<itertools.groupby>`, is the most complicated. ``key_func(elem)`` is a function
-that can compute a key value for each element returned by the iterable. If you
-don't supply a key function, the key is simply each element itself.
+The last function I'll discuss, ``itertools.groupby(iter, key_func=None)``, is
+the most complicated. ``key_func(elem)`` is a function that can compute a key
+value for each element returned by the iterable. If you don't supply a key
+function, the key is simply each element itself.
-:func:`~itertools.groupby` collects all the consecutive elements from the
-underlying iterable that have the same key value, and returns a stream of
-2-tuples containing a key value and an iterator for the elements with that key.
+``groupby()`` collects all the consecutive elements from the underlying iterable
+that have the same key value, and returns a stream of 2-tuples containing a key
+value and an iterator for the elements with that key.
::
@@ -963,8 +1042,8 @@ underlying iterable that have the same key value, and returns a stream of
...
]
- def get_state(city_state):
- return city_state[1]
+ def get_state ((city, state)):
+ return state
itertools.groupby(city_list, get_state) =>
('AL', iterator-1),
@@ -979,9 +1058,9 @@ underlying iterable that have the same key value, and returns a stream of
iterator-3 =>
('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')
-:func:`~itertools.groupby` assumes that the underlying iterable's contents will
-already be sorted based on the key. Note that the returned iterators also use
-the underlying iterable, so you have to consume the results of iterator-1 before
+``groupby()`` assumes that the underlying iterable's contents will already be
+sorted based on the key. Note that the returned iterators also use the
+underlying iterable, so you have to consume the results of iterator-1 before
requesting iterator-2 and its corresponding key.
@@ -999,82 +1078,22 @@ Consider a Python function ``f(a, b, c)``; you may wish to create a new function
``g(b, c)`` that's equivalent to ``f(1, b, c)``; you're filling in a value for
one of ``f()``'s parameters. This is called "partial function application".
-The constructor for :func:`~functools.partial` takes the arguments
-``(function, arg1, arg2, ..., kwarg1=value1, kwarg2=value2)``. The resulting
-object is callable, so you can just call it to invoke ``function`` with the
-filled-in arguments.
+The constructor for ``partial`` takes the arguments ``(function, arg1, arg2,
+... kwarg1=value1, kwarg2=value2)``. The resulting object is callable, so you
+can just call it to invoke ``function`` with the filled-in arguments.
Here's a small but realistic example::
import functools
- def log(message, subsystem):
- """Write the contents of 'message' to the specified subsystem."""
- print('%s: %s' % (subsystem, message))
+ def log (message, subsystem):
+ "Write the contents of 'message' to the specified subsystem."
+ print '%s: %s' % (subsystem, message)
...
server_log = functools.partial(log, subsystem='server')
server_log('Unable to open socket')
-:func:`functools.reduce(func, iter, [initial_value]) <functools.reduce>`
-cumulatively performs an operation on all the iterable's elements and,
-therefore, can't be applied to infinite iterables. *func* must be a function
-that takes two elements and returns a single value. :func:`functools.reduce`
-takes the first two elements A and B returned by the iterator and calculates
-``func(A, B)``. It then requests the third element, C, calculates
-``func(func(A, B), C)``, combines this result with the fourth element returned,
-and continues until the iterable is exhausted. If the iterable returns no
-values at all, a :exc:`TypeError` exception is raised. If the initial value is
-supplied, it's used as a starting point and ``func(initial_value, A)`` is the
-first calculation. ::
-
- >>> import operator, functools
- >>> functools.reduce(operator.concat, ['A', 'BB', 'C'])
- 'ABBC'
- >>> functools.reduce(operator.concat, [])
- Traceback (most recent call last):
- ...
- TypeError: reduce() of empty sequence with no initial value
- >>> functools.reduce(operator.mul, [1, 2, 3], 1)
- 6
- >>> functools.reduce(operator.mul, [], 1)
- 1
-
-If you use :func:`operator.add` with :func:`functools.reduce`, you'll add up all the
-elements of the iterable. This case is so common that there's a special
-built-in called :func:`sum` to compute it:
-
- >>> import functools, operator
- >>> functools.reduce(operator.add, [1, 2, 3, 4], 0)
- 10
- >>> sum([1, 2, 3, 4])
- 10
- >>> sum([])
- 0
-
-For many uses of :func:`functools.reduce`, though, it can be clearer to just
-write the obvious :keyword:`for` loop::
-
- import functools
- # Instead of:
- product = functools.reduce(operator.mul, [1, 2, 3], 1)
-
- # You can write:
- product = 1
- for i in [1, 2, 3]:
- product *= i
-
-A related function is :func:`itertools.accumulate(iterable, func=operator.add)
-<itertools.accumulate>`. It performs the same calculation, but instead of
-returning only the final result, :func:`accumulate` returns an iterator that
-also yields each partial result::
-
- itertools.accumulate([1, 2, 3, 4, 5]) =>
- 1, 3, 6, 10, 15
-
- itertools.accumulate([1, 2, 3, 4, 5], operator.mul) =>
- 1, 2, 6, 24, 120
-
The operator module
-------------------
@@ -1086,7 +1105,8 @@ that perform a single operation.
Some of the functions in this module are:
-* Math operations: ``add()``, ``sub()``, ``mul()``, ``floordiv()``, ``abs()``, ...
+* Math operations: ``add()``, ``sub()``, ``mul()``, ``div()``, ``floordiv()``,
+ ``abs()``, ...
* Logical operations: ``not_()``, ``truth()``.
* Bitwise operations: ``and_()``, ``or_()``, ``invert()``.
* Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``.
@@ -1095,85 +1115,6 @@ Some of the functions in this module are:
Consult the operator module's documentation for a complete list.
-Small functions and the lambda expression
-=========================================
-
-When writing functional-style programs, you'll often need little functions that
-act as predicates or that combine elements in some way.
-
-If there's a Python built-in or a module function that's suitable, you don't
-need to define a new function at all::
-
- stripped_lines = [line.strip() for line in lines]
- existing_files = filter(os.path.exists, file_list)
-
-If the function you need doesn't exist, you need to write it. One way to write
-small functions is to use the :keyword:`lambda` expression. ``lambda`` takes a
-number of parameters and an expression combining these parameters, and creates
-an anonymous function that returns the value of the expression::
-
- adder = lambda x, y: x+y
-
- print_assign = lambda name, value: name + '=' + str(value)
-
-An alternative is to just use the ``def`` statement and define a function in the
-usual way::
-
- def adder(x, y):
- return x + y
-
- def print_assign(name, value):
- return name + '=' + str(value)
-
-Which alternative is preferable? That's a style question; my usual course is to
-avoid using ``lambda``.
-
-One reason for my preference is that ``lambda`` is quite limited in the
-functions it can define. The result has to be computable as a single
-expression, which means you can't have multiway ``if... elif... else``
-comparisons or ``try... except`` statements. If you try to do too much in a
-``lambda`` statement, you'll end up with an overly complicated expression that's
-hard to read. Quick, what's the following code doing? ::
-
- import functools
- total = functools.reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
-
-You can figure it out, but it takes time to disentangle the expression to figure
-out what's going on. Using a short nested ``def`` statements makes things a
-little bit better::
-
- import functools
- def combine(a, b):
- return 0, a[1] + b[1]
-
- total = functools.reduce(combine, items)[1]
-
-But it would be best of all if I had simply used a ``for`` loop::
-
- total = 0
- for a, b in items:
- total += b
-
-Or the :func:`sum` built-in and a generator expression::
-
- total = sum(b for a, b in items)
-
-Many uses of :func:`functools.reduce` are clearer when written as ``for`` loops.
-
-Fredrik Lundh once suggested the following set of rules for refactoring uses of
-``lambda``:
-
-1. Write a lambda function.
-2. Write a comment explaining what the heck that lambda does.
-3. Study the comment for a while, and think of a name that captures the essence
- of the comment.
-4. Convert the lambda to a def statement, using that name.
-5. Remove the comment.
-
-I really like these rules, but you're free to disagree
-about whether this lambda-free style is better.
-
-
Revision History and Acknowledgements
=====================================
@@ -1229,6 +1170,7 @@ Text Processing".
Mertz also wrote a 3-part series of articles on functional programming
for IBM's DeveloperWorks site; see
+
`part 1 <https://www.ibm.com/developerworks/linux/library/l-prog/index.html>`__,
`part 2 <https://www.ibm.com/developerworks/linux/library/l-prog2/index.html>`__, and
`part 3 <https://www.ibm.com/developerworks/linux/library/l-prog3/index.html>`__,
@@ -1239,8 +1181,6 @@ Python documentation
Documentation for the :mod:`itertools` module.
-Documentation for the :mod:`functools` module.
-
Documentation for the :mod:`operator` module.
:pep:`289`: "Generator Expressions"
@@ -1250,6 +1190,52 @@ features in Python 2.5.
.. comment
+ Topics to place
+ -----------------------------
+
+ XXX os.walk()
+
+ XXX Need a large example.
+
+ But will an example add much? I'll post a first draft and see
+ what the comments say.
+
+.. comment
+
+ Original outline:
+ Introduction
+ Idea of FP
+ Programs built out of functions
+ Functions are strictly input-output, no internal state
+ Opposed to OO programming, where objects have state
+
+ Why FP?
+ Formal provability
+ Assignment is difficult to reason about
+ Not very relevant to Python
+ Modularity
+ Small functions that do one thing
+ Debuggability:
+ Easy to test due to lack of state
+ Easy to verify output from intermediate steps
+ Composability
+ You assemble a toolbox of functions that can be mixed
+
+ Tackling a problem
+ Need a significant example
+
+ Iterators
+ Generators
+ The itertools module
+ List comprehensions
+ Small functions and the lambda statement
+ Built-in functions
+ map
+ filter
+ reduce
+
+.. comment
+
Handy little function for printing part of an iterator -- used
while writing this document.
@@ -1259,4 +1245,6 @@ features in Python 2.5.
for elem in slice[:-1]:
sys.stdout.write(str(elem))
sys.stdout.write(', ')
- print(elem[-1])
+ print elem[-1]
+
+
diff --git a/Doc/howto/index.rst b/Doc/howto/index.rst
index 593341c..e4c95b1 100644
--- a/Doc/howto/index.rst
+++ b/Doc/howto/index.rst
@@ -17,6 +17,7 @@ Currently, the HOWTOs are:
cporting.rst
curses.rst
descriptor.rst
+ doanddont.rst
functional.rst
logging.rst
logging-cookbook.rst
@@ -25,8 +26,6 @@ Currently, the HOWTOs are:
sorting.rst
unicode.rst
urllib2.rst
+ webservers.rst
argparse.rst
- ipaddress.rst
- clinic.rst
- instrumentation.rst
diff --git a/Doc/howto/instrumentation.rst b/Doc/howto/instrumentation.rst
deleted file mode 100644
index 909deb5..0000000
--- a/Doc/howto/instrumentation.rst
+++ /dev/null
@@ -1,436 +0,0 @@
-.. highlight:: shell-session
-
-.. _instrumentation:
-
-===============================================
-Instrumenting CPython with DTrace and SystemTap
-===============================================
-
-:author: David Malcolm
-:author: Łukasz Langa
-
-DTrace and SystemTap are monitoring tools, each providing a way to inspect
-what the processes on a computer system are doing. They both use
-domain-specific languages allowing a user to write scripts which:
-
- - filter which processes are to be observed
- - gather data from the processes of interest
- - generate reports on the data
-
-As of Python 3.6, CPython can be built with embedded "markers", also
-known as "probes", that can be observed by a DTrace or SystemTap script,
-making it easier to monitor what the CPython processes on a system are
-doing.
-
-.. impl-detail::
-
- DTrace markers are implementation details of the CPython interpreter.
- No guarantees are made about probe compatibility between versions of
- CPython. DTrace scripts can stop working or work incorrectly without
- warning when changing CPython versions.
-
-
-Enabling the static markers
----------------------------
-
-macOS comes with built-in support for DTrace. On Linux, in order to
-build CPython with the embedded markers for SystemTap, the SystemTap
-development tools must be installed.
-
-On a Linux machine, this can be done via::
-
- $ yum install systemtap-sdt-devel
-
-or::
-
- $ sudo apt-get install systemtap-sdt-dev
-
-
-CPython must then be configured ``--with-dtrace``:
-
-.. code-block:: none
-
- checking for --with-dtrace... yes
-
-On macOS, you can list available DTrace probes by running a Python
-process in the background and listing all probes made available by the
-Python provider::
-
- $ python3.6 -q &
- $ sudo dtrace -l -P python$! # or: dtrace -l -m python3.6
-
- ID PROVIDER MODULE FUNCTION NAME
- 29564 python18035 python3.6 _PyEval_EvalFrameDefault function-entry
- 29565 python18035 python3.6 dtrace_function_entry function-entry
- 29566 python18035 python3.6 _PyEval_EvalFrameDefault function-return
- 29567 python18035 python3.6 dtrace_function_return function-return
- 29568 python18035 python3.6 collect gc-done
- 29569 python18035 python3.6 collect gc-start
- 29570 python18035 python3.6 _PyEval_EvalFrameDefault line
- 29571 python18035 python3.6 maybe_dtrace_line line
-
-On Linux, you can verify if the SystemTap static markers are present in
-the built binary by seeing if it contains a ".note.stapsdt" section.
-
-::
-
- $ readelf -S ./python | grep .note.stapsdt
- [30] .note.stapsdt NOTE 0000000000000000 00308d78
-
-If you've built Python as a shared library (with --enable-shared), you
-need to look instead within the shared library. For example::
-
- $ readelf -S libpython3.3dm.so.1.0 | grep .note.stapsdt
- [29] .note.stapsdt NOTE 0000000000000000 00365b68
-
-Sufficiently modern readelf can print the metadata::
-
- $ readelf -n ./python
-
- Displaying notes found at file offset 0x00000254 with length 0x00000020:
- Owner Data size Description
- GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
- OS: Linux, ABI: 2.6.32
-
- Displaying notes found at file offset 0x00000274 with length 0x00000024:
- Owner Data size Description
- GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
- Build ID: df924a2b08a7e89f6e11251d4602022977af2670
-
- Displaying notes found at file offset 0x002d6c30 with length 0x00000144:
- Owner Data size Description
- stapsdt 0x00000031 NT_STAPSDT (SystemTap probe descriptors)
- Provider: python
- Name: gc__start
- Location: 0x00000000004371c3, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6bf6
- Arguments: -4@%ebx
- stapsdt 0x00000030 NT_STAPSDT (SystemTap probe descriptors)
- Provider: python
- Name: gc__done
- Location: 0x00000000004374e1, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6bf8
- Arguments: -8@%rax
- stapsdt 0x00000045 NT_STAPSDT (SystemTap probe descriptors)
- Provider: python
- Name: function__entry
- Location: 0x000000000053db6c, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6be8
- Arguments: 8@%rbp 8@%r12 -4@%eax
- stapsdt 0x00000046 NT_STAPSDT (SystemTap probe descriptors)
- Provider: python
- Name: function__return
- Location: 0x000000000053dba8, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6bea
- Arguments: 8@%rbp 8@%r12 -4@%eax
-
-The above metadata contains information for SystemTap describing how it
-can patch strategically-placed machine code instructions to enable the
-tracing hooks used by a SystemTap script.
-
-
-Static DTrace probes
---------------------
-
-The following example DTrace script can be used to show the call/return
-hierarchy of a Python script, only tracing within the invocation of
-a function called "start". In other words, import-time function
-invocations are not going to be listed:
-
-.. code-block:: none
-
- self int indent;
-
- python$target:::function-entry
- /copyinstr(arg1) == "start"/
- {
- self->trace = 1;
- }
-
- python$target:::function-entry
- /self->trace/
- {
- printf("%d\t%*s:", timestamp, 15, probename);
- printf("%*s", self->indent, "");
- printf("%s:%s:%d\n", basename(copyinstr(arg0)), copyinstr(arg1), arg2);
- self->indent++;
- }
-
- python$target:::function-return
- /self->trace/
- {
- self->indent--;
- printf("%d\t%*s:", timestamp, 15, probename);
- printf("%*s", self->indent, "");
- printf("%s:%s:%d\n", basename(copyinstr(arg0)), copyinstr(arg1), arg2);
- }
-
- python$target:::function-return
- /copyinstr(arg1) == "start"/
- {
- self->trace = 0;
- }
-
-It can be invoked like this::
-
- $ sudo dtrace -q -s call_stack.d -c "python3.6 script.py"
-
-The output looks like this:
-
-.. code-block:: none
-
- 156641360502280 function-entry:call_stack.py:start:23
- 156641360518804 function-entry: call_stack.py:function_1:1
- 156641360532797 function-entry: call_stack.py:function_3:9
- 156641360546807 function-return: call_stack.py:function_3:10
- 156641360563367 function-return: call_stack.py:function_1:2
- 156641360578365 function-entry: call_stack.py:function_2:5
- 156641360591757 function-entry: call_stack.py:function_1:1
- 156641360605556 function-entry: call_stack.py:function_3:9
- 156641360617482 function-return: call_stack.py:function_3:10
- 156641360629814 function-return: call_stack.py:function_1:2
- 156641360642285 function-return: call_stack.py:function_2:6
- 156641360656770 function-entry: call_stack.py:function_3:9
- 156641360669707 function-return: call_stack.py:function_3:10
- 156641360687853 function-entry: call_stack.py:function_4:13
- 156641360700719 function-return: call_stack.py:function_4:14
- 156641360719640 function-entry: call_stack.py:function_5:18
- 156641360732567 function-return: call_stack.py:function_5:21
- 156641360747370 function-return:call_stack.py:start:28
-
-
-Static SystemTap markers
-------------------------
-
-The low-level way to use the SystemTap integration is to use the static
-markers directly. This requires you to explicitly state the binary file
-containing them.
-
-For example, this SystemTap script can be used to show the call/return
-hierarchy of a Python script:
-
-.. code-block:: none
-
- probe process("python").mark("function__entry") {
- filename = user_string($arg1);
- funcname = user_string($arg2);
- lineno = $arg3;
-
- printf("%s => %s in %s:%d\\n",
- thread_indent(1), funcname, filename, lineno);
- }
-
- probe process("python").mark("function__return") {
- filename = user_string($arg1);
- funcname = user_string($arg2);
- lineno = $arg3;
-
- printf("%s <= %s in %s:%d\\n",
- thread_indent(-1), funcname, filename, lineno);
- }
-
-It can be invoked like this::
-
- $ stap \
- show-call-hierarchy.stp \
- -c "./python test.py"
-
-The output looks like this:
-
-.. code-block:: none
-
- 11408 python(8274): => __contains__ in Lib/_abcoll.py:362
- 11414 python(8274): => __getitem__ in Lib/os.py:425
- 11418 python(8274): => encode in Lib/os.py:490
- 11424 python(8274): <= encode in Lib/os.py:493
- 11428 python(8274): <= __getitem__ in Lib/os.py:426
- 11433 python(8274): <= __contains__ in Lib/_abcoll.py:366
-
-where the columns are:
-
- - time in microseconds since start of script
-
- - name of executable
-
- - PID of process
-
-and the remainder indicates the call/return hierarchy as the script executes.
-
-For a `--enable-shared` build of CPython, the markers are contained within the
-libpython shared library, and the probe's dotted path needs to reflect this. For
-example, this line from the above example:
-
-.. code-block:: none
-
- probe process("python").mark("function__entry") {
-
-should instead read:
-
-.. code-block:: none
-
- probe process("python").library("libpython3.6dm.so.1.0").mark("function__entry") {
-
-(assuming a debug build of CPython 3.6)
-
-
-Available static markers
-------------------------
-
-.. I'm reusing the "c:function" type for markers
-
-.. c:function:: function__entry(str filename, str funcname, int lineno)
-
- This marker indicates that execution of a Python function has begun.
- It is only triggered for pure-Python (bytecode) functions.
-
- The filename, function name, and line number are provided back to the
- tracing script as positional arguments, which must be accessed using
- ``$arg1``, ``$arg2``, ``$arg3``:
-
- * ``$arg1`` : ``(const char *)`` filename, accessible using ``user_string($arg1)``
-
- * ``$arg2`` : ``(const char *)`` function name, accessible using
- ``user_string($arg2)``
-
- * ``$arg3`` : ``int`` line number
-
-.. c:function:: function__return(str filename, str funcname, int lineno)
-
- This marker is the converse of :c:func:`function__entry`, and indicates that
- execution of a Python function has ended (either via ``return``, or via an
- exception). It is only triggered for pure-Python (bytecode) functions.
-
- The arguments are the same as for :c:func:`function__entry`
-
-.. c:function:: line(str filename, str funcname, int lineno)
-
- This marker indicates a Python line is about to be executed. It is
- the equivalent of line-by-line tracing with a Python profiler. It is
- not triggered within C functions.
-
- The arguments are the same as for :c:func:`function__entry`.
-
-.. c:function:: gc__start(int generation)
-
- Fires when the Python interpreter starts a garbage collection cycle.
- ``arg0`` is the generation to scan, like :func:`gc.collect()`.
-
-.. c:function:: gc__done(long collected)
-
- Fires when the Python interpreter finishes a garbage collection
- cycle. ``arg0`` is the number of collected objects.
-
-.. c:function:: import__find__load__start(str modulename)
-
- Fires before :mod:`importlib` attempts to find and load the module.
- ``arg0`` is the module name.
-
- .. versionadded:: 3.7
-
-.. c:function:: import__find__load__done(str modulename, int found)
-
- Fires after :mod:`importlib`'s find_and_load function is called.
- ``arg0`` is the module name, ``arg1`` indicates if module was
- successfully loaded.
-
- .. versionadded:: 3.7
-
-
-.. c:function:: audit(str event, void *tuple)
-
- Fires when :func:`sys.audit` or :c:func:`PySys_Audit` is called.
- ``arg0`` is the event name as C string, ``arg1`` is a :c:type:`PyObject`
- pointer to a tuple object.
-
- .. versionadded:: 3.8
-
-
-SystemTap Tapsets
------------------
-
-The higher-level way to use the SystemTap integration is to use a "tapset":
-SystemTap's equivalent of a library, which hides some of the lower-level
-details of the static markers.
-
-Here is a tapset file, based on a non-shared build of CPython:
-
-.. code-block:: none
-
- /*
- Provide a higher-level wrapping around the function__entry and
- function__return markers:
- \*/
- probe python.function.entry = process("python").mark("function__entry")
- {
- filename = user_string($arg1);
- funcname = user_string($arg2);
- lineno = $arg3;
- frameptr = $arg4
- }
- probe python.function.return = process("python").mark("function__return")
- {
- filename = user_string($arg1);
- funcname = user_string($arg2);
- lineno = $arg3;
- frameptr = $arg4
- }
-
-If this file is installed in SystemTap's tapset directory (e.g.
-``/usr/share/systemtap/tapset``), then these additional probepoints become
-available:
-
-.. c:function:: python.function.entry(str filename, str funcname, int lineno, frameptr)
-
- This probe point indicates that execution of a Python function has begun.
- It is only triggered for pure-Python (bytecode) functions.
-
-.. c:function:: python.function.return(str filename, str funcname, int lineno, frameptr)
-
- This probe point is the converse of :c:func:`python.function.return`, and
- indicates that execution of a Python function has ended (either via
- ``return``, or via an exception). It is only triggered for pure-Python
- (bytecode) functions.
-
-
-Examples
---------
-This SystemTap script uses the tapset above to more cleanly implement the
-example given above of tracing the Python function-call hierarchy, without
-needing to directly name the static markers:
-
-.. code-block:: none
-
- probe python.function.entry
- {
- printf("%s => %s in %s:%d\n",
- thread_indent(1), funcname, filename, lineno);
- }
-
- probe python.function.return
- {
- printf("%s <= %s in %s:%d\n",
- thread_indent(-1), funcname, filename, lineno);
- }
-
-
-The following script uses the tapset above to provide a top-like view of all
-running CPython code, showing the top 20 most frequently-entered bytecode
-frames, each second, across the whole system:
-
-.. code-block:: none
-
- global fn_calls;
-
- probe python.function.entry
- {
- fn_calls[pid(), filename, funcname, lineno] += 1;
- }
-
- probe timer.ms(1000) {
- printf("\033[2J\033[1;1H") /* clear screen \*/
- printf("%6s %80s %6s %30s %6s\n",
- "PID", "FILENAME", "LINE", "FUNCTION", "CALLS")
- foreach ([pid, filename, funcname, lineno] in fn_calls- limit 20) {
- printf("%6d %80s %6d %30s %6d\n",
- pid, filename, lineno, funcname,
- fn_calls[pid, filename, funcname, lineno]);
- }
- delete fn_calls;
- }
-
diff --git a/Doc/howto/ipaddress.rst b/Doc/howto/ipaddress.rst
deleted file mode 100644
index 452e367..0000000
--- a/Doc/howto/ipaddress.rst
+++ /dev/null
@@ -1,340 +0,0 @@
-.. testsetup::
-
- import ipaddress
-
-.. _ipaddress-howto:
-
-***************************************
-An introduction to the ipaddress module
-***************************************
-
-:author: Peter Moody
-:author: Nick Coghlan
-
-.. topic:: Overview
-
- This document aims to provide a gentle introduction to the
- :mod:`ipaddress` module. It is aimed primarily at users that aren't
- already familiar with IP networking terminology, but may also be useful
- to network engineers wanting an overview of how :mod:`ipaddress`
- represents IP network addressing concepts.
-
-
-Creating Address/Network/Interface objects
-==========================================
-
-Since :mod:`ipaddress` is a module for inspecting and manipulating IP addresses,
-the first thing you'll want to do is create some objects. You can use
-:mod:`ipaddress` to create objects from strings and integers.
-
-
-A Note on IP Versions
----------------------
-
-For readers that aren't particularly familiar with IP addressing, it's
-important to know that the Internet Protocol is currently in the process
-of moving from version 4 of the protocol to version 6. This transition is
-occurring largely because version 4 of the protocol doesn't provide enough
-addresses to handle the needs of the whole world, especially given the
-increasing number of devices with direct connections to the internet.
-
-Explaining the details of the differences between the two versions of the
-protocol is beyond the scope of this introduction, but readers need to at
-least be aware that these two versions exist, and it will sometimes be
-necessary to force the use of one version or the other.
-
-
-IP Host Addresses
------------------
-
-Addresses, often referred to as "host addresses" are the most basic unit
-when working with IP addressing. The simplest way to create addresses is
-to use the :func:`ipaddress.ip_address` factory function, which automatically
-determines whether to create an IPv4 or IPv6 address based on the passed in
-value:
-
- >>> ipaddress.ip_address('192.0.2.1')
- IPv4Address('192.0.2.1')
- >>> ipaddress.ip_address('2001:DB8::1')
- IPv6Address('2001:db8::1')
-
-Addresses can also be created directly from integers. Values that will
-fit within 32 bits are assumed to be IPv4 addresses::
-
- >>> ipaddress.ip_address(3221225985)
- IPv4Address('192.0.2.1')
- >>> ipaddress.ip_address(42540766411282592856903984951653826561)
- IPv6Address('2001:db8::1')
-
-To force the use of IPv4 or IPv6 addresses, the relevant classes can be
-invoked directly. This is particularly useful to force creation of IPv6
-addresses for small integers::
-
- >>> ipaddress.ip_address(1)
- IPv4Address('0.0.0.1')
- >>> ipaddress.IPv4Address(1)
- IPv4Address('0.0.0.1')
- >>> ipaddress.IPv6Address(1)
- IPv6Address('::1')
-
-
-Defining Networks
------------------
-
-Host addresses are usually grouped together into IP networks, so
-:mod:`ipaddress` provides a way to create, inspect and manipulate network
-definitions. IP network objects are constructed from strings that define the
-range of host addresses that are part of that network. The simplest form
-for that information is a "network address/network prefix" pair, where the
-prefix defines the number of leading bits that are compared to determine
-whether or not an address is part of the network and the network address
-defines the expected value of those bits.
-
-As for addresses, a factory function is provided that determines the correct
-IP version automatically::
-
- >>> ipaddress.ip_network('192.0.2.0/24')
- IPv4Network('192.0.2.0/24')
- >>> ipaddress.ip_network('2001:db8::0/96')
- IPv6Network('2001:db8::/96')
-
-Network objects cannot have any host bits set. The practical effect of this
-is that ``192.0.2.1/24`` does not describe a network. Such definitions are
-referred to as interface objects since the ip-on-a-network notation is
-commonly used to describe network interfaces of a computer on a given network
-and are described further in the next section.
-
-By default, attempting to create a network object with host bits set will
-result in :exc:`ValueError` being raised. To request that the
-additional bits instead be coerced to zero, the flag ``strict=False`` can
-be passed to the constructor::
-
- >>> ipaddress.ip_network('192.0.2.1/24')
- Traceback (most recent call last):
- ...
- ValueError: 192.0.2.1/24 has host bits set
- >>> ipaddress.ip_network('192.0.2.1/24', strict=False)
- IPv4Network('192.0.2.0/24')
-
-While the string form offers significantly more flexibility, networks can
-also be defined with integers, just like host addresses. In this case, the
-network is considered to contain only the single address identified by the
-integer, so the network prefix includes the entire network address::
-
- >>> ipaddress.ip_network(3221225984)
- IPv4Network('192.0.2.0/32')
- >>> ipaddress.ip_network(42540766411282592856903984951653826560)
- IPv6Network('2001:db8::/128')
-
-As with addresses, creation of a particular kind of network can be forced
-by calling the class constructor directly instead of using the factory
-function.
-
-
-Host Interfaces
----------------
-
-As mentioned just above, if you need to describe an address on a particular
-network, neither the address nor the network classes are sufficient.
-Notation like ``192.0.2.1/24`` is commonly used by network engineers and the
-people who write tools for firewalls and routers as shorthand for "the host
-``192.0.2.1`` on the network ``192.0.2.0/24``", Accordingly, :mod:`ipaddress`
-provides a set of hybrid classes that associate an address with a particular
-network. The interface for creation is identical to that for defining network
-objects, except that the address portion isn't constrained to being a network
-address.
-
- >>> ipaddress.ip_interface('192.0.2.1/24')
- IPv4Interface('192.0.2.1/24')
- >>> ipaddress.ip_interface('2001:db8::1/96')
- IPv6Interface('2001:db8::1/96')
-
-Integer inputs are accepted (as with networks), and use of a particular IP
-version can be forced by calling the relevant constructor directly.
-
-
-Inspecting Address/Network/Interface Objects
-============================================
-
-You've gone to the trouble of creating an IPv(4|6)(Address|Network|Interface)
-object, so you probably want to get information about it. :mod:`ipaddress`
-tries to make doing this easy and intuitive.
-
-Extracting the IP version::
-
- >>> addr4 = ipaddress.ip_address('192.0.2.1')
- >>> addr6 = ipaddress.ip_address('2001:db8::1')
- >>> addr6.version
- 6
- >>> addr4.version
- 4
-
-Obtaining the network from an interface::
-
- >>> host4 = ipaddress.ip_interface('192.0.2.1/24')
- >>> host4.network
- IPv4Network('192.0.2.0/24')
- >>> host6 = ipaddress.ip_interface('2001:db8::1/96')
- >>> host6.network
- IPv6Network('2001:db8::/96')
-
-Finding out how many individual addresses are in a network::
-
- >>> net4 = ipaddress.ip_network('192.0.2.0/24')
- >>> net4.num_addresses
- 256
- >>> net6 = ipaddress.ip_network('2001:db8::0/96')
- >>> net6.num_addresses
- 4294967296
-
-Iterating through the "usable" addresses on a network::
-
- >>> net4 = ipaddress.ip_network('192.0.2.0/24')
- >>> for x in net4.hosts():
- ... print(x) # doctest: +ELLIPSIS
- 192.0.2.1
- 192.0.2.2
- 192.0.2.3
- 192.0.2.4
- ...
- 192.0.2.252
- 192.0.2.253
- 192.0.2.254
-
-
-Obtaining the netmask (i.e. set bits corresponding to the network prefix) or
-the hostmask (any bits that are not part of the netmask):
-
- >>> net4 = ipaddress.ip_network('192.0.2.0/24')
- >>> net4.netmask
- IPv4Address('255.255.255.0')
- >>> net4.hostmask
- IPv4Address('0.0.0.255')
- >>> net6 = ipaddress.ip_network('2001:db8::0/96')
- >>> net6.netmask
- IPv6Address('ffff:ffff:ffff:ffff:ffff:ffff::')
- >>> net6.hostmask
- IPv6Address('::ffff:ffff')
-
-
-Exploding or compressing the address::
-
- >>> addr6.exploded
- '2001:0db8:0000:0000:0000:0000:0000:0001'
- >>> addr6.compressed
- '2001:db8::1'
- >>> net6.exploded
- '2001:0db8:0000:0000:0000:0000:0000:0000/96'
- >>> net6.compressed
- '2001:db8::/96'
-
-While IPv4 doesn't support explosion or compression, the associated objects
-still provide the relevant properties so that version neutral code can
-easily ensure the most concise or most verbose form is used for IPv6
-addresses while still correctly handling IPv4 addresses.
-
-
-Networks as lists of Addresses
-==============================
-
-It's sometimes useful to treat networks as lists. This means it is possible
-to index them like this::
-
- >>> net4[1]
- IPv4Address('192.0.2.1')
- >>> net4[-1]
- IPv4Address('192.0.2.255')
- >>> net6[1]
- IPv6Address('2001:db8::1')
- >>> net6[-1]
- IPv6Address('2001:db8::ffff:ffff')
-
-
-It also means that network objects lend themselves to using the list
-membership test syntax like this::
-
- if address in network:
- # do something
-
-Containment testing is done efficiently based on the network prefix::
-
- >>> addr4 = ipaddress.ip_address('192.0.2.1')
- >>> addr4 in ipaddress.ip_network('192.0.2.0/24')
- True
- >>> addr4 in ipaddress.ip_network('192.0.3.0/24')
- False
-
-
-Comparisons
-===========
-
-:mod:`ipaddress` provides some simple, hopefully intuitive ways to compare
-objects, where it makes sense::
-
- >>> ipaddress.ip_address('192.0.2.1') < ipaddress.ip_address('192.0.2.2')
- True
-
-A :exc:`TypeError` exception is raised if you try to compare objects of
-different versions or different types.
-
-
-Using IP Addresses with other modules
-=====================================
-
-Other modules that use IP addresses (such as :mod:`socket`) usually won't
-accept objects from this module directly. Instead, they must be coerced to
-an integer or string that the other module will accept::
-
- >>> addr4 = ipaddress.ip_address('192.0.2.1')
- >>> str(addr4)
- '192.0.2.1'
- >>> int(addr4)
- 3221225985
-
-
-Getting more detail when instance creation fails
-================================================
-
-When creating address/network/interface objects using the version-agnostic
-factory functions, any errors will be reported as :exc:`ValueError` with
-a generic error message that simply says the passed in value was not
-recognized as an object of that type. The lack of a specific error is
-because it's necessary to know whether the value is *supposed* to be IPv4
-or IPv6 in order to provide more detail on why it has been rejected.
-
-To support use cases where it is useful to have access to this additional
-detail, the individual class constructors actually raise the
-:exc:`ValueError` subclasses :exc:`ipaddress.AddressValueError` and
-:exc:`ipaddress.NetmaskValueError` to indicate exactly which part of
-the definition failed to parse correctly.
-
-The error messages are significantly more detailed when using the
-class constructors directly. For example::
-
- >>> ipaddress.ip_address("192.168.0.256")
- Traceback (most recent call last):
- ...
- ValueError: '192.168.0.256' does not appear to be an IPv4 or IPv6 address
- >>> ipaddress.IPv4Address("192.168.0.256")
- Traceback (most recent call last):
- ...
- ipaddress.AddressValueError: Octet 256 (> 255) not permitted in '192.168.0.256'
-
- >>> ipaddress.ip_network("192.168.0.1/64")
- Traceback (most recent call last):
- ...
- ValueError: '192.168.0.1/64' does not appear to be an IPv4 or IPv6 network
- >>> ipaddress.IPv4Network("192.168.0.1/64")
- Traceback (most recent call last):
- ...
- ipaddress.NetmaskValueError: '64' is not a valid netmask
-
-However, both of the module specific exceptions have :exc:`ValueError` as their
-parent class, so if you're not concerned with the particular type of error,
-you can still write code like the following::
-
- try:
- network = ipaddress.IPv4Network(address)
- except ValueError:
- print('address/netmask is invalid for IPv4:', address)
-
diff --git a/Doc/howto/logging-cookbook.rst b/Doc/howto/logging-cookbook.rst
index 17f4ff6..50ff76e 100644
--- a/Doc/howto/logging-cookbook.rst
+++ b/Doc/howto/logging-cookbook.rst
@@ -72,9 +72,7 @@ Here is the auxiliary module::
def some_function():
module_logger.info('received a call to "some_function"')
-The output looks like this:
-
-.. code-block:: none
+The output looks like this::
2005-03-23 23:47:11,663 - spam_application - INFO -
creating an instance of auxiliary_module.Auxiliary
@@ -101,7 +99,7 @@ Logging from multiple threads
-----------------------------
Logging from multiple threads requires no special effort. The following example
-shows logging from the main (initial) thread and another thread::
+shows logging from the main (initIal) thread and another thread::
import logging
import threading
@@ -129,9 +127,7 @@ shows logging from the main (initial) thread and another thread::
if __name__ == '__main__':
main()
-When run, the script should print something like the following:
-
-.. code-block:: none
+When run, the script should print something like the following::
0 Thread-1 Hi from myfunc
3 MainThread Hello from main
@@ -186,7 +182,7 @@ previous simple module-based configuration example::
# 'application' code
logger.debug('debug message')
logger.info('info message')
- logger.warning('warn message')
+ logger.warn('warn message')
logger.error('error message')
logger.critical('critical message')
@@ -244,18 +240,14 @@ messages should not. Here's how you can achieve this::
logger2.warning('Jail zesty vixen who grabbed pay from quack.')
logger2.error('The five boxing wizards jump quickly.')
-When you run this, on the console you will see
-
-.. code-block:: none
+When you run this, on the console you will see ::
root : INFO Jackdaws love my big sphinx of quartz.
myapp.area1 : INFO How quickly daft jumping zebras vex.
myapp.area2 : WARNING Jail zesty vixen who grabbed pay from quack.
myapp.area2 : ERROR The five boxing wizards jump quickly.
-and in the file you will see something like
-
-.. code-block:: none
+and in the file you will see something like ::
10-22 22:19 root INFO Jackdaws love my big sphinx of quartz.
10-22 22:19 myapp.area1 DEBUG Quick zephyrs blow, vexing daft Jim.
@@ -295,7 +287,7 @@ Here is an example of a module using the logging configuration server::
while True:
logger.debug('debug message')
logger.info('info message')
- logger.warning('warn message')
+ logger.warn('warn message')
logger.error('error message')
logger.critical('critical message')
time.sleep(5)
@@ -326,81 +318,6 @@ configuration::
print('complete')
-Dealing with handlers that block
---------------------------------
-
-.. currentmodule:: logging.handlers
-
-Sometimes you have to get your logging handlers to do their work without
-blocking the thread you're logging from. This is common in Web applications,
-though of course it also occurs in other scenarios.
-
-A common culprit which demonstrates sluggish behaviour is the
-:class:`SMTPHandler`: sending emails can take a long time, for a
-number of reasons outside the developer's control (for example, a poorly
-performing mail or network infrastructure). But almost any network-based
-handler can block: Even a :class:`SocketHandler` operation may do a
-DNS query under the hood which is too slow (and this query can be deep in the
-socket library code, below the Python layer, and outside your control).
-
-One solution is to use a two-part approach. For the first part, attach only a
-:class:`QueueHandler` to those loggers which are accessed from
-performance-critical threads. They simply write to their queue, which can be
-sized to a large enough capacity or initialized with no upper bound to their
-size. The write to the queue will typically be accepted quickly, though you
-will probably need to catch the :exc:`queue.Full` exception as a precaution
-in your code. If you are a library developer who has performance-critical
-threads in their code, be sure to document this (together with a suggestion to
-attach only ``QueueHandlers`` to your loggers) for the benefit of other
-developers who will use your code.
-
-The second part of the solution is :class:`QueueListener`, which has been
-designed as the counterpart to :class:`QueueHandler`. A
-:class:`QueueListener` is very simple: it's passed a queue and some handlers,
-and it fires up an internal thread which listens to its queue for LogRecords
-sent from ``QueueHandlers`` (or any other source of ``LogRecords``, for that
-matter). The ``LogRecords`` are removed from the queue and passed to the
-handlers for processing.
-
-The advantage of having a separate :class:`QueueListener` class is that you
-can use the same instance to service multiple ``QueueHandlers``. This is more
-resource-friendly than, say, having threaded versions of the existing handler
-classes, which would eat up one thread per handler for no particular benefit.
-
-An example of using these two classes follows (imports omitted)::
-
- que = queue.Queue(-1) # no limit on size
- queue_handler = QueueHandler(que)
- handler = logging.StreamHandler()
- listener = QueueListener(que, handler)
- root = logging.getLogger()
- root.addHandler(queue_handler)
- formatter = logging.Formatter('%(threadName)s: %(message)s')
- handler.setFormatter(formatter)
- listener.start()
- # The log output will display the thread which generated
- # the event (the main thread) rather than the internal
- # thread which monitors the internal queue. This is what
- # you want to happen.
- root.warning('Look out!')
- listener.stop()
-
-which, when run, will produce:
-
-.. code-block:: none
-
- MainThread: Look out!
-
-.. versionchanged:: 3.5
- Prior to Python 3.5, the :class:`QueueListener` always passed every message
- received from the queue to every handler it was initialized with. (This was
- because it was assumed that level filtering was all done on the other side,
- where the queue is filled.) From 3.5 onwards, this behaviour can be changed
- by passing a keyword argument ``respect_handler_level=True`` to the
- listener's constructor. When this is done, the listener compares the level
- of each message with the handler's level, and only passes a message to a
- handler if it's appropriate to do so.
-
.. _network-logging:
Sending and receiving logging events across a network
@@ -434,17 +351,17 @@ the receiving end. A simple way of doing this is attaching a
logger2.warning('Jail zesty vixen who grabbed pay from quack.')
logger2.error('The five boxing wizards jump quickly.')
-At the receiving end, you can set up a receiver using the :mod:`socketserver`
+At the receiving end, you can set up a receiver using the :mod:`SocketServer`
module. Here is a basic working example::
import pickle
import logging
import logging.handlers
- import socketserver
+ import SocketServer
import struct
- class LogRecordStreamHandler(socketserver.StreamRequestHandler):
+ class LogRecordStreamHandler(SocketServer.StreamRequestHandler):
"""Handler for a streaming logging request.
This basically logs the record using whatever logging policy is
@@ -486,17 +403,17 @@ module. Here is a basic working example::
# cycles and network bandwidth!
logger.handle(record)
- class LogRecordSocketReceiver(socketserver.ThreadingTCPServer):
+ class LogRecordSocketReceiver(SocketServer.ThreadingTCPServer):
"""
Simple TCP socket-based logging receiver suitable for testing.
"""
- allow_reuse_address = True
+ allow_reuse_address = 1
def __init__(self, host='localhost',
port=logging.handlers.DEFAULT_TCP_LOGGING_PORT,
handler=LogRecordStreamHandler):
- socketserver.ThreadingTCPServer.__init__(self, (host, port), handler)
+ SocketServer.ThreadingTCPServer.__init__(self, (host, port), handler)
self.abort = 0
self.timeout = 1
self.logname = None
@@ -523,9 +440,7 @@ module. Here is a basic working example::
main()
First run the server, and then the client. On the client side, nothing is
-printed on the console; on the server side, you should see something like:
-
-.. code-block:: none
+printed on the console; on the server side, you should see something like::
About to start TCP server...
59 root INFO Jackdaws love my big sphinx of quartz.
@@ -546,6 +461,8 @@ serialization.
Adding contextual information to your logging output
----------------------------------------------------
+.. currentmodule:: logging
+
Sometimes you want logging output to contain contextual information in
addition to the parameters passed to the logging call. For example, in a
networked application, it may be desirable to log client-specific information
@@ -579,7 +496,7 @@ information. When you call one of the logging methods on an instance of
information in the delegated call. Here's a snippet from the code of
:class:`LoggerAdapter`::
- def debug(self, msg, /, *args, **kwargs):
+ def debug(self, msg, *args, **kwargs):
"""
Delegate a debug call to the underlying logger, after adding
contextual information from this adapter instance.
@@ -685,9 +602,7 @@ script::
lvlname = logging.getLevelName(lvl)
a2.log(lvl, 'A message at %s level with %d %s', lvlname, 2, 'parameters')
-which, when run, produces something like:
-
-.. code-block:: none
+which, when run, produces something like::
2010-09-06 22:38:15,292 a.b.c DEBUG IP: 123.231.231.123 User: fred A debug message
2010-09-06 22:38:15,300 a.b.c INFO IP: 192.168.0.1 User: sheila An info message with some parameters
@@ -721,267 +636,15 @@ existing processes to perform this function.)
includes a working socket receiver which can be used as a starting point for you
to adapt in your own applications.
-You could also write your own handler which uses the :class:`~multiprocessing.Lock`
-class from the :mod:`multiprocessing` module to serialize access to the
+If you are using a recent version of Python which includes the
+:mod:`multiprocessing` module, you could write your own handler which uses the
+:class:`~multiprocessing.Lock` class from this module to serialize access to the
file from your processes. The existing :class:`FileHandler` and subclasses do
not make use of :mod:`multiprocessing` at present, though they may do so in the
future. Note that at present, the :mod:`multiprocessing` module does not provide
working lock functionality on all platforms (see
https://bugs.python.org/issue3770).
-.. currentmodule:: logging.handlers
-
-Alternatively, you can use a ``Queue`` and a :class:`QueueHandler` to send
-all logging events to one of the processes in your multi-process application.
-The following example script demonstrates how you can do this; in the example
-a separate listener process listens for events sent by other processes and logs
-them according to its own logging configuration. Although the example only
-demonstrates one way of doing it (for example, you may want to use a listener
-thread rather than a separate listener process -- the implementation would be
-analogous) it does allow for completely different logging configurations for
-the listener and the other processes in your application, and can be used as
-the basis for code meeting your own specific requirements::
-
- # You'll need these imports in your own code
- import logging
- import logging.handlers
- import multiprocessing
-
- # Next two import lines for this demo only
- from random import choice, random
- import time
-
- #
- # Because you'll want to define the logging configurations for listener and workers, the
- # listener and worker process functions take a configurer parameter which is a callable
- # for configuring logging for that process. These functions are also passed the queue,
- # which they use for communication.
- #
- # In practice, you can configure the listener however you want, but note that in this
- # simple example, the listener does not apply level or filter logic to received records.
- # In practice, you would probably want to do this logic in the worker processes, to avoid
- # sending events which would be filtered out between processes.
- #
- # The size of the rotated files is made small so you can see the results easily.
- def listener_configurer():
- root = logging.getLogger()
- h = logging.handlers.RotatingFileHandler('mptest.log', 'a', 300, 10)
- f = logging.Formatter('%(asctime)s %(processName)-10s %(name)s %(levelname)-8s %(message)s')
- h.setFormatter(f)
- root.addHandler(h)
-
- # This is the listener process top-level loop: wait for logging events
- # (LogRecords)on the queue and handle them, quit when you get a None for a
- # LogRecord.
- def listener_process(queue, configurer):
- configurer()
- while True:
- try:
- record = queue.get()
- if record is None: # We send this as a sentinel to tell the listener to quit.
- break
- logger = logging.getLogger(record.name)
- logger.handle(record) # No level or filter logic applied - just do it!
- except Exception:
- import sys, traceback
- print('Whoops! Problem:', file=sys.stderr)
- traceback.print_exc(file=sys.stderr)
-
- # Arrays used for random selections in this demo
-
- LEVELS = [logging.DEBUG, logging.INFO, logging.WARNING,
- logging.ERROR, logging.CRITICAL]
-
- LOGGERS = ['a.b.c', 'd.e.f']
-
- MESSAGES = [
- 'Random message #1',
- 'Random message #2',
- 'Random message #3',
- ]
-
- # The worker configuration is done at the start of the worker process run.
- # Note that on Windows you can't rely on fork semantics, so each process
- # will run the logging configuration code when it starts.
- def worker_configurer(queue):
- h = logging.handlers.QueueHandler(queue) # Just the one handler needed
- root = logging.getLogger()
- root.addHandler(h)
- # send all messages, for demo; no other level or filter logic applied.
- root.setLevel(logging.DEBUG)
-
- # This is the worker process top-level loop, which just logs ten events with
- # random intervening delays before terminating.
- # The print messages are just so you know it's doing something!
- def worker_process(queue, configurer):
- configurer(queue)
- name = multiprocessing.current_process().name
- print('Worker started: %s' % name)
- for i in range(10):
- time.sleep(random())
- logger = logging.getLogger(choice(LOGGERS))
- level = choice(LEVELS)
- message = choice(MESSAGES)
- logger.log(level, message)
- print('Worker finished: %s' % name)
-
- # Here's where the demo gets orchestrated. Create the queue, create and start
- # the listener, create ten workers and start them, wait for them to finish,
- # then send a None to the queue to tell the listener to finish.
- def main():
- queue = multiprocessing.Queue(-1)
- listener = multiprocessing.Process(target=listener_process,
- args=(queue, listener_configurer))
- listener.start()
- workers = []
- for i in range(10):
- worker = multiprocessing.Process(target=worker_process,
- args=(queue, worker_configurer))
- workers.append(worker)
- worker.start()
- for w in workers:
- w.join()
- queue.put_nowait(None)
- listener.join()
-
- if __name__ == '__main__':
- main()
-
-A variant of the above script keeps the logging in the main process, in a
-separate thread::
-
- import logging
- import logging.config
- import logging.handlers
- from multiprocessing import Process, Queue
- import random
- import threading
- import time
-
- def logger_thread(q):
- while True:
- record = q.get()
- if record is None:
- break
- logger = logging.getLogger(record.name)
- logger.handle(record)
-
-
- def worker_process(q):
- qh = logging.handlers.QueueHandler(q)
- root = logging.getLogger()
- root.setLevel(logging.DEBUG)
- root.addHandler(qh)
- levels = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR,
- logging.CRITICAL]
- loggers = ['foo', 'foo.bar', 'foo.bar.baz',
- 'spam', 'spam.ham', 'spam.ham.eggs']
- for i in range(100):
- lvl = random.choice(levels)
- logger = logging.getLogger(random.choice(loggers))
- logger.log(lvl, 'Message no. %d', i)
-
- if __name__ == '__main__':
- q = Queue()
- d = {
- 'version': 1,
- 'formatters': {
- 'detailed': {
- 'class': 'logging.Formatter',
- 'format': '%(asctime)s %(name)-15s %(levelname)-8s %(processName)-10s %(message)s'
- }
- },
- 'handlers': {
- 'console': {
- 'class': 'logging.StreamHandler',
- 'level': 'INFO',
- },
- 'file': {
- 'class': 'logging.FileHandler',
- 'filename': 'mplog.log',
- 'mode': 'w',
- 'formatter': 'detailed',
- },
- 'foofile': {
- 'class': 'logging.FileHandler',
- 'filename': 'mplog-foo.log',
- 'mode': 'w',
- 'formatter': 'detailed',
- },
- 'errors': {
- 'class': 'logging.FileHandler',
- 'filename': 'mplog-errors.log',
- 'mode': 'w',
- 'level': 'ERROR',
- 'formatter': 'detailed',
- },
- },
- 'loggers': {
- 'foo': {
- 'handlers': ['foofile']
- }
- },
- 'root': {
- 'level': 'DEBUG',
- 'handlers': ['console', 'file', 'errors']
- },
- }
- workers = []
- for i in range(5):
- wp = Process(target=worker_process, name='worker %d' % (i + 1), args=(q,))
- workers.append(wp)
- wp.start()
- logging.config.dictConfig(d)
- lp = threading.Thread(target=logger_thread, args=(q,))
- lp.start()
- # At this point, the main process could do some useful work of its own
- # Once it's done that, it can wait for the workers to terminate...
- for wp in workers:
- wp.join()
- # And now tell the logging thread to finish up, too
- q.put(None)
- lp.join()
-
-This variant shows how you can e.g. apply configuration for particular loggers
-- e.g. the ``foo`` logger has a special handler which stores all events in the
-``foo`` subsystem in a file ``mplog-foo.log``. This will be used by the logging
-machinery in the main process (even though the logging events are generated in
-the worker processes) to direct the messages to the appropriate destinations.
-
-Using concurrent.futures.ProcessPoolExecutor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-If you want to use :class:`concurrent.futures.ProcessPoolExecutor` to start
-your worker processes, you need to create the queue slightly differently.
-Instead of
-
-.. code-block:: python
-
- queue = multiprocessing.Queue(-1)
-
-you should use
-
-.. code-block:: python
-
- queue = multiprocessing.Manager().Queue(-1) # also works with the examples above
-
-and you can then replace the worker creation from this::
-
- workers = []
- for i in range(10):
- worker = multiprocessing.Process(target=worker_process,
- args=(queue, worker_configurer))
- workers.append(worker)
- worker.start()
- for w in workers:
- w.join()
-
-to this (remembering to first import :mod:`concurrent.futures`)::
-
- with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor:
- for i in range(10):
- executor.submit(worker_process, queue, worker_configurer)
-
Using file rotation
-------------------
@@ -1022,9 +685,7 @@ logging package provides a :class:`~handlers.RotatingFileHandler`::
print(filename)
The result should be 6 separate files, each with part of the log history for the
-application:
-
-.. code-block:: none
+application::
logging_rotatingfile_example.out
logging_rotatingfile_example.out.1
@@ -1041,329 +702,6 @@ and each time it reaches the size limit it is renamed with the suffix
Obviously this example sets the log length much too small as an extreme
example. You would want to set *maxBytes* to an appropriate value.
-.. _format-styles:
-
-Use of alternative formatting styles
-------------------------------------
-
-When logging was added to the Python standard library, the only way of
-formatting messages with variable content was to use the %-formatting
-method. Since then, Python has gained two new formatting approaches:
-:class:`string.Template` (added in Python 2.4) and :meth:`str.format`
-(added in Python 2.6).
-
-Logging (as of 3.2) provides improved support for these two additional
-formatting styles. The :class:`Formatter` class been enhanced to take an
-additional, optional keyword parameter named ``style``. This defaults to
-``'%'``, but other possible values are ``'{'`` and ``'$'``, which correspond
-to the other two formatting styles. Backwards compatibility is maintained by
-default (as you would expect), but by explicitly specifying a style parameter,
-you get the ability to specify format strings which work with
-:meth:`str.format` or :class:`string.Template`. Here's an example console
-session to show the possibilities:
-
-.. code-block:: pycon
-
- >>> import logging
- >>> root = logging.getLogger()
- >>> root.setLevel(logging.DEBUG)
- >>> handler = logging.StreamHandler()
- >>> bf = logging.Formatter('{asctime} {name} {levelname:8s} {message}',
- ... style='{')
- >>> handler.setFormatter(bf)
- >>> root.addHandler(handler)
- >>> logger = logging.getLogger('foo.bar')
- >>> logger.debug('This is a DEBUG message')
- 2010-10-28 15:11:55,341 foo.bar DEBUG This is a DEBUG message
- >>> logger.critical('This is a CRITICAL message')
- 2010-10-28 15:12:11,526 foo.bar CRITICAL This is a CRITICAL message
- >>> df = logging.Formatter('$asctime $name ${levelname} $message',
- ... style='$')
- >>> handler.setFormatter(df)
- >>> logger.debug('This is a DEBUG message')
- 2010-10-28 15:13:06,924 foo.bar DEBUG This is a DEBUG message
- >>> logger.critical('This is a CRITICAL message')
- 2010-10-28 15:13:11,494 foo.bar CRITICAL This is a CRITICAL message
- >>>
-
-Note that the formatting of logging messages for final output to logs is
-completely independent of how an individual logging message is constructed.
-That can still use %-formatting, as shown here::
-
- >>> logger.error('This is an%s %s %s', 'other,', 'ERROR,', 'message')
- 2010-10-28 15:19:29,833 foo.bar ERROR This is another, ERROR, message
- >>>
-
-Logging calls (``logger.debug()``, ``logger.info()`` etc.) only take
-positional parameters for the actual logging message itself, with keyword
-parameters used only for determining options for how to handle the actual
-logging call (e.g. the ``exc_info`` keyword parameter to indicate that
-traceback information should be logged, or the ``extra`` keyword parameter
-to indicate additional contextual information to be added to the log). So
-you cannot directly make logging calls using :meth:`str.format` or
-:class:`string.Template` syntax, because internally the logging package
-uses %-formatting to merge the format string and the variable arguments.
-There would be no changing this while preserving backward compatibility, since
-all logging calls which are out there in existing code will be using %-format
-strings.
-
-There is, however, a way that you can use {}- and $- formatting to construct
-your individual log messages. Recall that for a message you can use an
-arbitrary object as a message format string, and that the logging package will
-call ``str()`` on that object to get the actual format string. Consider the
-following two classes::
-
- class BraceMessage:
- def __init__(self, fmt, /, *args, **kwargs):
- self.fmt = fmt
- self.args = args
- self.kwargs = kwargs
-
- def __str__(self):
- return self.fmt.format(*self.args, **self.kwargs)
-
- class DollarMessage:
- def __init__(self, fmt, /, **kwargs):
- self.fmt = fmt
- self.kwargs = kwargs
-
- def __str__(self):
- from string import Template
- return Template(self.fmt).substitute(**self.kwargs)
-
-Either of these can be used in place of a format string, to allow {}- or
-$-formatting to be used to build the actual "message" part which appears in the
-formatted log output in place of "%(message)s" or "{message}" or "$message".
-It's a little unwieldy to use the class names whenever you want to log
-something, but it's quite palatable if you use an alias such as __ (double
-underscore --- not to be confused with _, the single underscore used as a
-synonym/alias for :func:`gettext.gettext` or its brethren).
-
-The above classes are not included in Python, though they're easy enough to
-copy and paste into your own code. They can be used as follows (assuming that
-they're declared in a module called ``wherever``):
-
-.. code-block:: pycon
-
- >>> from wherever import BraceMessage as __
- >>> print(__('Message with {0} {name}', 2, name='placeholders'))
- Message with 2 placeholders
- >>> class Point: pass
- ...
- >>> p = Point()
- >>> p.x = 0.5
- >>> p.y = 0.5
- >>> print(__('Message with coordinates: ({point.x:.2f}, {point.y:.2f})',
- ... point=p))
- Message with coordinates: (0.50, 0.50)
- >>> from wherever import DollarMessage as __
- >>> print(__('Message with $num $what', num=2, what='placeholders'))
- Message with 2 placeholders
- >>>
-
-While the above examples use ``print()`` to show how the formatting works, you
-would of course use ``logger.debug()`` or similar to actually log using this
-approach.
-
-One thing to note is that you pay no significant performance penalty with this
-approach: the actual formatting happens not when you make the logging call, but
-when (and if) the logged message is actually about to be output to a log by a
-handler. So the only slightly unusual thing which might trip you up is that the
-parentheses go around the format string and the arguments, not just the format
-string. That's because the __ notation is just syntax sugar for a constructor
-call to one of the XXXMessage classes.
-
-If you prefer, you can use a :class:`LoggerAdapter` to achieve a similar effect
-to the above, as in the following example::
-
- import logging
-
- class Message:
- def __init__(self, fmt, args):
- self.fmt = fmt
- self.args = args
-
- def __str__(self):
- return self.fmt.format(*self.args)
-
- class StyleAdapter(logging.LoggerAdapter):
- def __init__(self, logger, extra=None):
- super(StyleAdapter, self).__init__(logger, extra or {})
-
- def log(self, level, msg, /, *args, **kwargs):
- if self.isEnabledFor(level):
- msg, kwargs = self.process(msg, kwargs)
- self.logger._log(level, Message(msg, args), (), **kwargs)
-
- logger = StyleAdapter(logging.getLogger(__name__))
-
- def main():
- logger.debug('Hello, {}', 'world!')
-
- if __name__ == '__main__':
- logging.basicConfig(level=logging.DEBUG)
- main()
-
-The above script should log the message ``Hello, world!`` when run with
-Python 3.2 or later.
-
-
-.. currentmodule:: logging
-
-.. _custom-logrecord:
-
-Customizing ``LogRecord``
--------------------------
-
-Every logging event is represented by a :class:`LogRecord` instance.
-When an event is logged and not filtered out by a logger's level, a
-:class:`LogRecord` is created, populated with information about the event and
-then passed to the handlers for that logger (and its ancestors, up to and
-including the logger where further propagation up the hierarchy is disabled).
-Before Python 3.2, there were only two places where this creation was done:
-
-* :meth:`Logger.makeRecord`, which is called in the normal process of
- logging an event. This invoked :class:`LogRecord` directly to create an
- instance.
-* :func:`makeLogRecord`, which is called with a dictionary containing
- attributes to be added to the LogRecord. This is typically invoked when a
- suitable dictionary has been received over the network (e.g. in pickle form
- via a :class:`~handlers.SocketHandler`, or in JSON form via an
- :class:`~handlers.HTTPHandler`).
-
-This has usually meant that if you need to do anything special with a
-:class:`LogRecord`, you've had to do one of the following.
-
-* Create your own :class:`Logger` subclass, which overrides
- :meth:`Logger.makeRecord`, and set it using :func:`~logging.setLoggerClass`
- before any loggers that you care about are instantiated.
-* Add a :class:`Filter` to a logger or handler, which does the
- necessary special manipulation you need when its
- :meth:`~Filter.filter` method is called.
-
-The first approach would be a little unwieldy in the scenario where (say)
-several different libraries wanted to do different things. Each would attempt
-to set its own :class:`Logger` subclass, and the one which did this last would
-win.
-
-The second approach works reasonably well for many cases, but does not allow
-you to e.g. use a specialized subclass of :class:`LogRecord`. Library
-developers can set a suitable filter on their loggers, but they would have to
-remember to do this every time they introduced a new logger (which they would
-do simply by adding new packages or modules and doing ::
-
- logger = logging.getLogger(__name__)
-
-at module level). It's probably one too many things to think about. Developers
-could also add the filter to a :class:`~logging.NullHandler` attached to their
-top-level logger, but this would not be invoked if an application developer
-attached a handler to a lower-level library logger --- so output from that
-handler would not reflect the intentions of the library developer.
-
-In Python 3.2 and later, :class:`~logging.LogRecord` creation is done through a
-factory, which you can specify. The factory is just a callable you can set with
-:func:`~logging.setLogRecordFactory`, and interrogate with
-:func:`~logging.getLogRecordFactory`. The factory is invoked with the same
-signature as the :class:`~logging.LogRecord` constructor, as :class:`LogRecord`
-is the default setting for the factory.
-
-This approach allows a custom factory to control all aspects of LogRecord
-creation. For example, you could return a subclass, or just add some additional
-attributes to the record once created, using a pattern similar to this::
-
- old_factory = logging.getLogRecordFactory()
-
- def record_factory(*args, **kwargs):
- record = old_factory(*args, **kwargs)
- record.custom_attribute = 0xdecafbad
- return record
-
- logging.setLogRecordFactory(record_factory)
-
-This pattern allows different libraries to chain factories together, and as
-long as they don't overwrite each other's attributes or unintentionally
-overwrite the attributes provided as standard, there should be no surprises.
-However, it should be borne in mind that each link in the chain adds run-time
-overhead to all logging operations, and the technique should only be used when
-the use of a :class:`Filter` does not provide the desired result.
-
-
-.. _zeromq-handlers:
-
-Subclassing QueueHandler - a ZeroMQ example
--------------------------------------------
-
-You can use a :class:`QueueHandler` subclass to send messages to other kinds
-of queues, for example a ZeroMQ 'publish' socket. In the example below,the
-socket is created separately and passed to the handler (as its 'queue')::
-
- import zmq # using pyzmq, the Python binding for ZeroMQ
- import json # for serializing records portably
-
- ctx = zmq.Context()
- sock = zmq.Socket(ctx, zmq.PUB) # or zmq.PUSH, or other suitable value
- sock.bind('tcp://*:5556') # or wherever
-
- class ZeroMQSocketHandler(QueueHandler):
- def enqueue(self, record):
- self.queue.send_json(record.__dict__)
-
-
- handler = ZeroMQSocketHandler(sock)
-
-
-Of course there are other ways of organizing this, for example passing in the
-data needed by the handler to create the socket::
-
- class ZeroMQSocketHandler(QueueHandler):
- def __init__(self, uri, socktype=zmq.PUB, ctx=None):
- self.ctx = ctx or zmq.Context()
- socket = zmq.Socket(self.ctx, socktype)
- socket.bind(uri)
- super().__init__(socket)
-
- def enqueue(self, record):
- self.queue.send_json(record.__dict__)
-
- def close(self):
- self.queue.close()
-
-
-Subclassing QueueListener - a ZeroMQ example
---------------------------------------------
-
-You can also subclass :class:`QueueListener` to get messages from other kinds
-of queues, for example a ZeroMQ 'subscribe' socket. Here's an example::
-
- class ZeroMQSocketListener(QueueListener):
- def __init__(self, uri, /, *handlers, **kwargs):
- self.ctx = kwargs.get('ctx') or zmq.Context()
- socket = zmq.Socket(self.ctx, zmq.SUB)
- socket.setsockopt_string(zmq.SUBSCRIBE, '') # subscribe to everything
- socket.connect(uri)
- super().__init__(socket, *handlers, **kwargs)
-
- def dequeue(self):
- msg = self.queue.recv_json()
- return logging.makeLogRecord(msg)
-
-
-.. seealso::
-
- Module :mod:`logging`
- API reference for the logging module.
-
- Module :mod:`logging.config`
- Configuration API for the logging module.
-
- Module :mod:`logging.handlers`
- Useful handlers included with the logging module.
-
- :ref:`A basic logging tutorial <logging-basic-tutorial>`
-
- :ref:`A more advanced logging tutorial <logging-advanced-tutorial>`
-
-
An example dictionary-based configuration
-----------------------------------------
@@ -1427,280 +765,24 @@ For more information about this configuration, you can see the `relevant
section <https://docs.djangoproject.com/en/1.9/topics/logging/#configuring-logging>`_
of the Django documentation.
-.. _cookbook-rotator-namer:
-
-Using a rotator and namer to customize log rotation processing
---------------------------------------------------------------
-
-An example of how you can define a namer and rotator is given in the following
-snippet, which shows zlib-based compression of the log file::
-
- def namer(name):
- return name + ".gz"
-
- def rotator(source, dest):
- with open(source, "rb") as sf:
- data = sf.read()
- compressed = zlib.compress(data, 9)
- with open(dest, "wb") as df:
- df.write(compressed)
- os.remove(source)
-
- rh = logging.handlers.RotatingFileHandler(...)
- rh.rotator = rotator
- rh.namer = namer
-
-These are not "true" .gz files, as they are bare compressed data, with no
-"container" such as you’d find in an actual gzip file. This snippet is just
-for illustration purposes.
-
-A more elaborate multiprocessing example
-----------------------------------------
-
-The following working example shows how logging can be used with multiprocessing
-using configuration files. The configurations are fairly simple, but serve to
-illustrate how more complex ones could be implemented in a real multiprocessing
-scenario.
-
-In the example, the main process spawns a listener process and some worker
-processes. Each of the main process, the listener and the workers have three
-separate configurations (the workers all share the same configuration). We can
-see logging in the main process, how the workers log to a QueueHandler and how
-the listener implements a QueueListener and a more complex logging
-configuration, and arranges to dispatch events received via the queue to the
-handlers specified in the configuration. Note that these configurations are
-purely illustrative, but you should be able to adapt this example to your own
-scenario.
-
-Here's the script - the docstrings and the comments hopefully explain how it
-works::
-
- import logging
- import logging.config
- import logging.handlers
- from multiprocessing import Process, Queue, Event, current_process
- import os
- import random
- import time
-
- class MyHandler:
- """
- A simple handler for logging events. It runs in the listener process and
- dispatches events to loggers based on the name in the received record,
- which then get dispatched, by the logging system, to the handlers
- configured for those loggers.
- """
-
- def handle(self, record):
- if record.name == "root":
- logger = logging.getLogger()
- else:
- logger = logging.getLogger(record.name)
-
- if logger.isEnabledFor(record.levelno):
- # The process name is transformed just to show that it's the listener
- # doing the logging to files and console
- record.processName = '%s (for %s)' % (current_process().name, record.processName)
- logger.handle(record)
-
- def listener_process(q, stop_event, config):
- """
- This could be done in the main process, but is just done in a separate
- process for illustrative purposes.
-
- This initialises logging according to the specified configuration,
- starts the listener and waits for the main process to signal completion
- via the event. The listener is then stopped, and the process exits.
- """
- logging.config.dictConfig(config)
- listener = logging.handlers.QueueListener(q, MyHandler())
- listener.start()
- if os.name == 'posix':
- # On POSIX, the setup logger will have been configured in the
- # parent process, but should have been disabled following the
- # dictConfig call.
- # On Windows, since fork isn't used, the setup logger won't
- # exist in the child, so it would be created and the message
- # would appear - hence the "if posix" clause.
- logger = logging.getLogger('setup')
- logger.critical('Should not appear, because of disabled logger ...')
- stop_event.wait()
- listener.stop()
-
- def worker_process(config):
- """
- A number of these are spawned for the purpose of illustration. In
- practice, they could be a heterogeneous bunch of processes rather than
- ones which are identical to each other.
-
- This initialises logging according to the specified configuration,
- and logs a hundred messages with random levels to randomly selected
- loggers.
-
- A small sleep is added to allow other processes a chance to run. This
- is not strictly needed, but it mixes the output from the different
- processes a bit more than if it's left out.
- """
- logging.config.dictConfig(config)
- levels = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR,
- logging.CRITICAL]
- loggers = ['foo', 'foo.bar', 'foo.bar.baz',
- 'spam', 'spam.ham', 'spam.ham.eggs']
- if os.name == 'posix':
- # On POSIX, the setup logger will have been configured in the
- # parent process, but should have been disabled following the
- # dictConfig call.
- # On Windows, since fork isn't used, the setup logger won't
- # exist in the child, so it would be created and the message
- # would appear - hence the "if posix" clause.
- logger = logging.getLogger('setup')
- logger.critical('Should not appear, because of disabled logger ...')
- for i in range(100):
- lvl = random.choice(levels)
- logger = logging.getLogger(random.choice(loggers))
- logger.log(lvl, 'Message no. %d', i)
- time.sleep(0.01)
-
- def main():
- q = Queue()
- # The main process gets a simple configuration which prints to the console.
- config_initial = {
- 'version': 1,
- 'handlers': {
- 'console': {
- 'class': 'logging.StreamHandler',
- 'level': 'INFO'
- }
- },
- 'root': {
- 'handlers': ['console'],
- 'level': 'DEBUG'
- }
- }
- # The worker process configuration is just a QueueHandler attached to the
- # root logger, which allows all messages to be sent to the queue.
- # We disable existing loggers to disable the "setup" logger used in the
- # parent process. This is needed on POSIX because the logger will
- # be there in the child following a fork().
- config_worker = {
- 'version': 1,
- 'disable_existing_loggers': True,
- 'handlers': {
- 'queue': {
- 'class': 'logging.handlers.QueueHandler',
- 'queue': q
- }
- },
- 'root': {
- 'handlers': ['queue'],
- 'level': 'DEBUG'
- }
- }
- # The listener process configuration shows that the full flexibility of
- # logging configuration is available to dispatch events to handlers however
- # you want.
- # We disable existing loggers to disable the "setup" logger used in the
- # parent process. This is needed on POSIX because the logger will
- # be there in the child following a fork().
- config_listener = {
- 'version': 1,
- 'disable_existing_loggers': True,
- 'formatters': {
- 'detailed': {
- 'class': 'logging.Formatter',
- 'format': '%(asctime)s %(name)-15s %(levelname)-8s %(processName)-10s %(message)s'
- },
- 'simple': {
- 'class': 'logging.Formatter',
- 'format': '%(name)-15s %(levelname)-8s %(processName)-10s %(message)s'
- }
- },
- 'handlers': {
- 'console': {
- 'class': 'logging.StreamHandler',
- 'formatter': 'simple',
- 'level': 'INFO'
- },
- 'file': {
- 'class': 'logging.FileHandler',
- 'filename': 'mplog.log',
- 'mode': 'w',
- 'formatter': 'detailed'
- },
- 'foofile': {
- 'class': 'logging.FileHandler',
- 'filename': 'mplog-foo.log',
- 'mode': 'w',
- 'formatter': 'detailed'
- },
- 'errors': {
- 'class': 'logging.FileHandler',
- 'filename': 'mplog-errors.log',
- 'mode': 'w',
- 'formatter': 'detailed',
- 'level': 'ERROR'
- }
- },
- 'loggers': {
- 'foo': {
- 'handlers': ['foofile']
- }
- },
- 'root': {
- 'handlers': ['console', 'file', 'errors'],
- 'level': 'DEBUG'
- }
- }
- # Log some initial events, just to show that logging in the parent works
- # normally.
- logging.config.dictConfig(config_initial)
- logger = logging.getLogger('setup')
- logger.info('About to create workers ...')
- workers = []
- for i in range(5):
- wp = Process(target=worker_process, name='worker %d' % (i + 1),
- args=(config_worker,))
- workers.append(wp)
- wp.start()
- logger.info('Started worker: %s', wp.name)
- logger.info('About to create listener ...')
- stop_event = Event()
- lp = Process(target=listener_process, name='listener',
- args=(q, stop_event, config_listener))
- lp.start()
- logger.info('Started listener')
- # We now hang around for the workers to finish their work.
- for wp in workers:
- wp.join()
- # Workers all done, listening can now stop.
- # Logging in the parent still works normally.
- logger.info('Telling listener to stop ...')
- stop_event.set()
- lp.join()
- logger.info('All done.')
-
- if __name__ == '__main__':
- main()
-
-
Inserting a BOM into messages sent to a SysLogHandler
-----------------------------------------------------
-:rfc:`5424` requires that a
+`RFC 5424 <https://tools.ietf.org/html/rfc5424>`_ requires that a
Unicode message be sent to a syslog daemon as a set of bytes which have the
following structure: an optional pure-ASCII component, followed by a UTF-8 Byte
-Order Mark (BOM), followed by Unicode encoded using UTF-8. (See the
-:rfc:`relevant section of the specification <5424#section-6>`.)
+Order Mark (BOM), followed by Unicode encoded using UTF-8. (See the `relevant
+section of the specification <https://tools.ietf.org/html/rfc5424#section-6>`_.)
-In Python 3.1, code was added to
+In Python 2.6 and 2.7, code was added to
:class:`~logging.handlers.SysLogHandler` to insert a BOM into the message, but
unfortunately, it was implemented incorrectly, with the BOM appearing at the
beginning of the message and hence not allowing any pure-ASCII component to
appear before it.
As this behaviour is broken, the incorrect BOM insertion code is being removed
-from Python 3.2.4 and later. However, it is not being replaced, and if you
-want to produce :rfc:`5424`-compliant messages which include a BOM, an optional
+from Python 2.7.4 and later. However, it is not being replaced, and if you
+want to produce RFC 5424-compliant messages which include a BOM, an optional
pure-ASCII sequence before it and arbitrary Unicode after it, encoded using
UTF-8, then you need to do the following:
@@ -1708,10 +790,10 @@ UTF-8, then you need to do the following:
:class:`~logging.handlers.SysLogHandler` instance, with a format string
such as::
- 'ASCII section\ufeffUnicode section'
+ u'ASCII section\ufeffUnicode section'
- The Unicode code point U+FEFF, when encoded using UTF-8, will be
- encoded as a UTF-8 BOM -- the byte-string ``b'\xef\xbb\xbf'``.
+ The Unicode code point ``u'\ufeff'``, when encoded using UTF-8, will be
+ encoded as a UTF-8 BOM -- the byte-string ``'\xef\xbb\xbf'``.
#. Replace the ASCII section with whatever placeholders you like, but make sure
that the data that appears in there after substitution is always ASCII (that
@@ -1721,10 +803,11 @@ UTF-8, then you need to do the following:
which appears there after substitution contains characters outside the ASCII
range, that's fine -- it will be encoded using UTF-8.
-The formatted message *will* be encoded using UTF-8 encoding by
-``SysLogHandler``. If you follow the above rules, you should be able to produce
-:rfc:`5424`-compliant messages. If you don't, logging may not complain, but your
-messages will not be RFC 5424-compliant, and your syslog daemon may complain.
+If the formatted message is Unicode, it *will* be encoded using UTF-8 encoding
+by ``SysLogHandler``. If you follow the above rules, you should be able to
+produce RFC 5424-compliant messages. If you don't, logging may not complain,
+but your messages will not be RFC 5424-compliant, and your syslog daemon may
+complain.
Implementing structured logging
@@ -1741,8 +824,8 @@ which uses JSON to serialise the event in a machine-parseable manner::
import json
import logging
- class StructuredMessage:
- def __init__(self, message, /, **kwargs):
+ class StructuredMessage(object):
+ def __init__(self, message, **kwargs):
self.message = message
self.kwargs = kwargs
@@ -1754,9 +837,7 @@ which uses JSON to serialise the event in a machine-parseable manner::
logging.basicConfig(level=logging.INFO, format='%(message)s')
logging.info(_('message 1', foo='bar', bar='baz', num=123, fnum=123.456))
-If the above script is run, it prints:
-
-.. code-block:: none
+If the above script is run, it prints::
message 1 >>> {"fnum": 123.456, "num": 123, "bar": "baz", "foo": "bar"}
@@ -1785,8 +866,8 @@ as in the following complete example::
return o.encode('unicode_escape').decode('ascii')
return super(Encoder, self).default(o)
- class StructuredMessage:
- def __init__(self, message, /, **kwargs):
+ class StructuredMessage(object):
+ def __init__(self, message, **kwargs):
self.message = message
self.kwargs = kwargs
@@ -1798,14 +879,12 @@ as in the following complete example::
def main():
logging.basicConfig(level=logging.INFO, format='%(message)s')
- logging.info(_('message 1', set_value={1, 2, 3}, snowman='\u2603'))
+ logging.info(_('message 1', set_value=set([1, 2, 3]), snowman='\u2603'))
if __name__ == '__main__':
main()
-When the above script is run, it prints:
-
-.. code-block:: none
+When the above script is run, it prints::
message 1 >>> {"snowman": "\u2603", "set_value": [1, 2, 3]}
@@ -1823,15 +902,20 @@ Customizing handlers with :func:`dictConfig`
There are times when you want to customize logging handlers in particular ways,
and if you use :func:`dictConfig` you may be able to do this without
subclassing. As an example, consider that you may want to set the ownership of a
-log file. On POSIX, this is easily done using :func:`shutil.chown`, but the file
+log file. On POSIX, this is easily done using :func:`os.chown`, but the file
handlers in the stdlib don't offer built-in support. You can customize handler
creation using a plain function such as::
def owned_file_handler(filename, mode='a', encoding=None, owner=None):
if owner:
+ import os, pwd, grp
+ # convert user and group names to uid and gid
+ uid = pwd.getpwnam(owner[0]).pw_uid
+ gid = grp.getgrnam(owner[1]).gr_gid
+ owner = (uid, gid)
if not os.path.exists(filename):
open(filename, 'a').close()
- shutil.chown(filename, *owner)
+ os.chown(filename, *owner)
return logging.FileHandler(filename, mode, encoding)
You can then specify, in a logging configuration passed to :func:`dictConfig`,
@@ -1953,130 +1037,8 @@ Of course, the approach could also be extended to types of handler other than a
or a different type of handler altogether.
-.. currentmodule:: logging
-
-.. _formatting-styles:
-
-Using particular formatting styles throughout your application
---------------------------------------------------------------
-
-In Python 3.2, the :class:`~logging.Formatter` gained a ``style`` keyword
-parameter which, while defaulting to ``%`` for backward compatibility, allowed
-the specification of ``{`` or ``$`` to support the formatting approaches
-supported by :meth:`str.format` and :class:`string.Template`. Note that this
-governs the formatting of logging messages for final output to logs, and is
-completely orthogonal to how an individual logging message is constructed.
-
-Logging calls (:meth:`~Logger.debug`, :meth:`~Logger.info` etc.) only take
-positional parameters for the actual logging message itself, with keyword
-parameters used only for determining options for how to handle the logging call
-(e.g. the ``exc_info`` keyword parameter to indicate that traceback information
-should be logged, or the ``extra`` keyword parameter to indicate additional
-contextual information to be added to the log). So you cannot directly make
-logging calls using :meth:`str.format` or :class:`string.Template` syntax,
-because internally the logging package uses %-formatting to merge the format
-string and the variable arguments. There would no changing this while preserving
-backward compatibility, since all logging calls which are out there in existing
-code will be using %-format strings.
-
-There have been suggestions to associate format styles with specific loggers,
-but that approach also runs into backward compatibility problems because any
-existing code could be using a given logger name and using %-formatting.
-
-For logging to work interoperably between any third-party libraries and your
-code, decisions about formatting need to be made at the level of the
-individual logging call. This opens up a couple of ways in which alternative
-formatting styles can be accommodated.
-
-
-Using LogRecord factories
-^^^^^^^^^^^^^^^^^^^^^^^^^
-
-In Python 3.2, along with the :class:`~logging.Formatter` changes mentioned
-above, the logging package gained the ability to allow users to set their own
-:class:`LogRecord` subclasses, using the :func:`setLogRecordFactory` function.
-You can use this to set your own subclass of :class:`LogRecord`, which does the
-Right Thing by overriding the :meth:`~LogRecord.getMessage` method. The base
-class implementation of this method is where the ``msg % args`` formatting
-happens, and where you can substitute your alternate formatting; however, you
-should be careful to support all formatting styles and allow %-formatting as
-the default, to ensure interoperability with other code. Care should also be
-taken to call ``str(self.msg)``, just as the base implementation does.
-
-Refer to the reference documentation on :func:`setLogRecordFactory` and
-:class:`LogRecord` for more information.
-
-
-Using custom message objects
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-There is another, perhaps simpler way that you can use {}- and $- formatting to
-construct your individual log messages. You may recall (from
-:ref:`arbitrary-object-messages`) that when logging you can use an arbitrary
-object as a message format string, and that the logging package will call
-:func:`str` on that object to get the actual format string. Consider the
-following two classes::
-
- class BraceMessage:
- def __init__(self, fmt, /, *args, **kwargs):
- self.fmt = fmt
- self.args = args
- self.kwargs = kwargs
-
- def __str__(self):
- return self.fmt.format(*self.args, **self.kwargs)
-
- class DollarMessage:
- def __init__(self, fmt, /, **kwargs):
- self.fmt = fmt
- self.kwargs = kwargs
-
- def __str__(self):
- from string import Template
- return Template(self.fmt).substitute(**self.kwargs)
-
-Either of these can be used in place of a format string, to allow {}- or
-$-formatting to be used to build the actual "message" part which appears in the
-formatted log output in place of “%(message)s” or “{message}” or “$message”.
-If you find it a little unwieldy to use the class names whenever you want to log
-something, you can make it more palatable if you use an alias such as ``M`` or
-``_`` for the message (or perhaps ``__``, if you are using ``_`` for
-localization).
-
-Examples of this approach are given below. Firstly, formatting with
-:meth:`str.format`::
-
- >>> __ = BraceMessage
- >>> print(__('Message with {0} {1}', 2, 'placeholders'))
- Message with 2 placeholders
- >>> class Point: pass
- ...
- >>> p = Point()
- >>> p.x = 0.5
- >>> p.y = 0.5
- >>> print(__('Message with coordinates: ({point.x:.2f}, {point.y:.2f})', point=p))
- Message with coordinates: (0.50, 0.50)
-
-Secondly, formatting with :class:`string.Template`::
-
- >>> __ = DollarMessage
- >>> print(__('Message with $num $what', num=2, what='placeholders'))
- Message with 2 placeholders
- >>>
-
-One thing to note is that you pay no significant performance penalty with this
-approach: the actual formatting happens not when you make the logging call, but
-when (and if) the logged message is actually about to be output to a log by a
-handler. So the only slightly unusual thing which might trip you up is that the
-parentheses go around the format string and the arguments, not just the format
-string. That’s because the __ notation is just syntax sugar for a constructor
-call to one of the ``XXXMessage`` classes shown above.
-
-
.. _filters-dictconfig:
-.. currentmodule:: logging.config
-
Configuring filters with :func:`dictConfig`
-------------------------------------------
@@ -2135,9 +1097,7 @@ most obvious, but you can provide any callable which returns a
This example shows how you can pass configuration data to the callable which
constructs the instance, in the form of keyword parameters. When run, the above
-script will print:
-
-.. code-block:: none
+script will print::
changed: hello
@@ -2176,7 +1136,7 @@ class, as shown in the following example::
Format an exception so that it prints on a single line.
"""
result = super(OneLineExceptionFormatter, self).formatException(exc_info)
- return repr(result) # or format into one line however you want to
+ return repr(result) # or format into one line however you want to
def format(self, record):
s = super(OneLineExceptionFormatter, self).format(record)
@@ -2204,9 +1164,7 @@ class, as shown in the following example::
if __name__ == '__main__':
main()
-When run, this produces a file with exactly two lines:
-
-.. code-block:: none
+When run, this produces a file with exactly two lines::
28/01/2015 07:21:23|INFO|Sample message|
28/01/2015 07:21:23|ERROR|ZeroDivisionError: integer division or modulo by zero|'Traceback (most recent call last):\n File "logtest7.py", line 30, in main\n x = 1 / 0\nZeroDivisionError: integer division or modulo by zero'|
@@ -2268,7 +1226,6 @@ The above approach can, of course, be adapted to other TTS systems and even
other systems altogether which can process messages via external programs run
from a command line.
-
.. _buffered-logging:
Buffering logging messages and outputting them conditionally
@@ -2301,9 +1258,9 @@ The script just arranges to decorate ``foo`` with a decorator which will do the
conditional logging that's required. The decorator takes a logger as a parameter
and attaches a memory handler for the duration of the call to the decorated
function. The decorator can be additionally parameterised using a target handler,
-a level at which flushing should occur, and a capacity for the buffer (number of
-records buffered). These default to a :class:`~logging.StreamHandler` which
-writes to ``sys.stderr``, ``logging.ERROR`` and ``100`` respectively.
+a level at which flushing should occur, and a capacity for the buffer. These
+default to a :class:`~logging.StreamHandler` which writes to ``sys.stderr``,
+``logging.ERROR`` and ``100`` respectively.
Here's the script::
@@ -2368,9 +1325,7 @@ Here's the script::
write_line('Calling decorated foo with True')
assert decorated_foo(True)
-When this script is run, the following output should be observed:
-
-.. code-block:: none
+When this script is run, the following output should be observed::
Calling undecorated foo with False
about to log at DEBUG ...
@@ -2466,9 +1421,7 @@ the following complete example::
logging.config.dictConfig(LOGGING)
logging.warning('The local time is %s', time.asctime())
-When this script is run, it should print something like:
-
-.. code-block:: none
+When this script is run, it should print something like::
2015-10-17 12:53:29,501 The local time is Sat Oct 17 13:53:29 2015
2015-10-17 13:53:29,501 The local time is Sat Oct 17 13:53:29 2015
@@ -2492,7 +1445,7 @@ scope of the context manager::
import logging
import sys
- class LoggingContext:
+ class LoggingContext(object):
def __init__(self, logger, level=None, handler=None, close=True):
self.logger = logger
self.level = level
@@ -2583,399 +1536,3 @@ In this case, the message #5 printed to ``stdout`` doesn't appear, as expected.
Of course, the approach described here can be generalised, for example to attach
logging filters temporarily. Note that the above code works in Python 2 as well
as Python 3.
-
-
-.. _starter-template:
-
-A CLI application starter template
-----------------------------------
-
-Here's an example which shows how you can:
-
-* Use a logging level based on command-line arguments
-* Dispatch to multiple subcommands in separate files, all logging at the same
- level in a consistent way
-* Make use of simple, minimal configuration
-
-Suppose we have a command-line application whose job is to stop, start or
-restart some services. This could be organised for the purposes of illustration
-as a file ``app.py`` that is the main script for the application, with individual
-commands implemented in ``start.py``, ``stop.py`` and ``restart.py``. Suppose
-further that we want to control the verbosity of the application via a
-command-line argument, defaulting to ``logging.INFO``. Here's one way that
-``app.py`` could be written::
-
- import argparse
- import importlib
- import logging
- import os
- import sys
-
- def main(args=None):
- scriptname = os.path.basename(__file__)
- parser = argparse.ArgumentParser(scriptname)
- levels = ('DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL')
- parser.add_argument('--log-level', default='INFO', choices=levels)
- subparsers = parser.add_subparsers(dest='command',
- help='Available commands:')
- start_cmd = subparsers.add_parser('start', help='Start a service')
- start_cmd.add_argument('name', metavar='NAME',
- help='Name of service to start')
- stop_cmd = subparsers.add_parser('stop',
- help='Stop one or more services')
- stop_cmd.add_argument('names', metavar='NAME', nargs='+',
- help='Name of service to stop')
- restart_cmd = subparsers.add_parser('restart',
- help='Restart one or more services')
- restart_cmd.add_argument('names', metavar='NAME', nargs='+',
- help='Name of service to restart')
- options = parser.parse_args()
- # the code to dispatch commands could all be in this file. For the purposes
- # of illustration only, we implement each command in a separate module.
- try:
- mod = importlib.import_module(options.command)
- cmd = getattr(mod, 'command')
- except (ImportError, AttributeError):
- print('Unable to find the code for command \'%s\'' % options.command)
- return 1
- # Could get fancy here and load configuration from file or dictionary
- logging.basicConfig(level=options.log_level,
- format='%(levelname)s %(name)s %(message)s')
- cmd(options)
-
- if __name__ == '__main__':
- sys.exit(main())
-
-And the ``start``, ``stop`` and ``restart`` commands can be implemented in
-separate modules, like so for starting::
-
- # start.py
- import logging
-
- logger = logging.getLogger(__name__)
-
- def command(options):
- logger.debug('About to start %s', options.name)
- # actually do the command processing here ...
- logger.info('Started the \'%s\' service.', options.name)
-
-and thus for stopping::
-
- # stop.py
- import logging
-
- logger = logging.getLogger(__name__)
-
- def command(options):
- n = len(options.names)
- if n == 1:
- plural = ''
- services = '\'%s\'' % options.names[0]
- else:
- plural = 's'
- services = ', '.join('\'%s\'' % name for name in options.names)
- i = services.rfind(', ')
- services = services[:i] + ' and ' + services[i + 2:]
- logger.debug('About to stop %s', services)
- # actually do the command processing here ...
- logger.info('Stopped the %s service%s.', services, plural)
-
-and similarly for restarting::
-
- # restart.py
- import logging
-
- logger = logging.getLogger(__name__)
-
- def command(options):
- n = len(options.names)
- if n == 1:
- plural = ''
- services = '\'%s\'' % options.names[0]
- else:
- plural = 's'
- services = ', '.join('\'%s\'' % name for name in options.names)
- i = services.rfind(', ')
- services = services[:i] + ' and ' + services[i + 2:]
- logger.debug('About to restart %s', services)
- # actually do the command processing here ...
- logger.info('Restarted the %s service%s.', services, plural)
-
-If we run this application with the default log level, we get output like this:
-
-.. code-block:: shell-session
-
- $ python app.py start foo
- INFO start Started the 'foo' service.
-
- $ python app.py stop foo bar
- INFO stop Stopped the 'foo' and 'bar' services.
-
- $ python app.py restart foo bar baz
- INFO restart Restarted the 'foo', 'bar' and 'baz' services.
-
-The first word is the logging level, and the second word is the module or
-package name of the place where the event was logged.
-
-If we change the logging level, then we can change the information sent to the
-log. For example, if we want more information:
-
-.. code-block:: shell-session
-
- $ python app.py --log-level DEBUG start foo
- DEBUG start About to start foo
- INFO start Started the 'foo' service.
-
- $ python app.py --log-level DEBUG stop foo bar
- DEBUG stop About to stop 'foo' and 'bar'
- INFO stop Stopped the 'foo' and 'bar' services.
-
- $ python app.py --log-level DEBUG restart foo bar baz
- DEBUG restart About to restart 'foo', 'bar' and 'baz'
- INFO restart Restarted the 'foo', 'bar' and 'baz' services.
-
-And if we want less:
-
-.. code-block:: shell-session
-
- $ python app.py --log-level WARNING start foo
- $ python app.py --log-level WARNING stop foo bar
- $ python app.py --log-level WARNING restart foo bar baz
-
-In this case, the commands don't print anything to the console, since nothing
-at ``WARNING`` level or above is logged by them.
-
-.. _qt-gui:
-
-A Qt GUI for logging
---------------------
-
-A question that comes up from time to time is about how to log to a GUI
-application. The `Qt <https://www.qt.io/>`_ framework is a popular
-cross-platform UI framework with Python bindings using `PySide2
-<https://pypi.org/project/PySide2/>`_ or `PyQt5
-<https://pypi.org/project/PyQt5/>`_ libraries.
-
-The following example shows how to log to a Qt GUI. This introduces a simple
-``QtHandler`` class which takes a callable, which should be a slot in the main
-thread that does GUI updates. A worker thread is also created to show how you
-can log to the GUI from both the UI itself (via a button for manual logging)
-as well as a worker thread doing work in the background (here, just logging
-messages at random levels with random short delays in between).
-
-The worker thread is implemented using Qt's ``QThread`` class rather than the
-:mod:`threading` module, as there are circumstances where one has to use
-``QThread``, which offers better integration with other ``Qt`` components.
-
-The code should work with recent releases of either ``PySide2`` or ``PyQt5``.
-You should be able to adapt the approach to earlier versions of Qt. Please
-refer to the comments in the code snippet for more detailed information.
-
-.. code-block:: python3
-
- import datetime
- import logging
- import random
- import sys
- import time
-
- # Deal with minor differences between PySide2 and PyQt5
- try:
- from PySide2 import QtCore, QtGui, QtWidgets
- Signal = QtCore.Signal
- Slot = QtCore.Slot
- except ImportError:
- from PyQt5 import QtCore, QtGui, QtWidgets
- Signal = QtCore.pyqtSignal
- Slot = QtCore.pyqtSlot
-
-
- logger = logging.getLogger(__name__)
-
-
- #
- # Signals need to be contained in a QObject or subclass in order to be correctly
- # initialized.
- #
- class Signaller(QtCore.QObject):
- signal = Signal(str, logging.LogRecord)
-
- #
- # Output to a Qt GUI is only supposed to happen on the main thread. So, this
- # handler is designed to take a slot function which is set up to run in the main
- # thread. In this example, the function takes a string argument which is a
- # formatted log message, and the log record which generated it. The formatted
- # string is just a convenience - you could format a string for output any way
- # you like in the slot function itself.
- #
- # You specify the slot function to do whatever GUI updates you want. The handler
- # doesn't know or care about specific UI elements.
- #
- class QtHandler(logging.Handler):
- def __init__(self, slotfunc, *args, **kwargs):
- super(QtHandler, self).__init__(*args, **kwargs)
- self.signaller = Signaller()
- self.signaller.signal.connect(slotfunc)
-
- def emit(self, record):
- s = self.format(record)
- self.signaller.signal.emit(s, record)
-
- #
- # This example uses QThreads, which means that the threads at the Python level
- # are named something like "Dummy-1". The function below gets the Qt name of the
- # current thread.
- #
- def ctname():
- return QtCore.QThread.currentThread().objectName()
-
-
- #
- # Used to generate random levels for logging.
- #
- LEVELS = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR,
- logging.CRITICAL)
-
- #
- # This worker class represents work that is done in a thread separate to the
- # main thread. The way the thread is kicked off to do work is via a button press
- # that connects to a slot in the worker.
- #
- # Because the default threadName value in the LogRecord isn't much use, we add
- # a qThreadName which contains the QThread name as computed above, and pass that
- # value in an "extra" dictionary which is used to update the LogRecord with the
- # QThread name.
- #
- # This example worker just outputs messages sequentially, interspersed with
- # random delays of the order of a few seconds.
- #
- class Worker(QtCore.QObject):
- @Slot()
- def start(self):
- extra = {'qThreadName': ctname() }
- logger.debug('Started work', extra=extra)
- i = 1
- # Let the thread run until interrupted. This allows reasonably clean
- # thread termination.
- while not QtCore.QThread.currentThread().isInterruptionRequested():
- delay = 0.5 + random.random() * 2
- time.sleep(delay)
- level = random.choice(LEVELS)
- logger.log(level, 'Message after delay of %3.1f: %d', delay, i, extra=extra)
- i += 1
-
- #
- # Implement a simple UI for this cookbook example. This contains:
- #
- # * A read-only text edit window which holds formatted log messages
- # * A button to start work and log stuff in a separate thread
- # * A button to log something from the main thread
- # * A button to clear the log window
- #
- class Window(QtWidgets.QWidget):
-
- COLORS = {
- logging.DEBUG: 'black',
- logging.INFO: 'blue',
- logging.WARNING: 'orange',
- logging.ERROR: 'red',
- logging.CRITICAL: 'purple',
- }
-
- def __init__(self, app):
- super(Window, self).__init__()
- self.app = app
- self.textedit = te = QtWidgets.QPlainTextEdit(self)
- # Set whatever the default monospace font is for the platform
- f = QtGui.QFont('nosuchfont')
- f.setStyleHint(f.Monospace)
- te.setFont(f)
- te.setReadOnly(True)
- PB = QtWidgets.QPushButton
- self.work_button = PB('Start background work', self)
- self.log_button = PB('Log a message at a random level', self)
- self.clear_button = PB('Clear log window', self)
- self.handler = h = QtHandler(self.update_status)
- # Remember to use qThreadName rather than threadName in the format string.
- fs = '%(asctime)s %(qThreadName)-12s %(levelname)-8s %(message)s'
- formatter = logging.Formatter(fs)
- h.setFormatter(formatter)
- logger.addHandler(h)
- # Set up to terminate the QThread when we exit
- app.aboutToQuit.connect(self.force_quit)
-
- # Lay out all the widgets
- layout = QtWidgets.QVBoxLayout(self)
- layout.addWidget(te)
- layout.addWidget(self.work_button)
- layout.addWidget(self.log_button)
- layout.addWidget(self.clear_button)
- self.setFixedSize(900, 400)
-
- # Connect the non-worker slots and signals
- self.log_button.clicked.connect(self.manual_update)
- self.clear_button.clicked.connect(self.clear_display)
-
- # Start a new worker thread and connect the slots for the worker
- self.start_thread()
- self.work_button.clicked.connect(self.worker.start)
- # Once started, the button should be disabled
- self.work_button.clicked.connect(lambda : self.work_button.setEnabled(False))
-
- def start_thread(self):
- self.worker = Worker()
- self.worker_thread = QtCore.QThread()
- self.worker.setObjectName('Worker')
- self.worker_thread.setObjectName('WorkerThread') # for qThreadName
- self.worker.moveToThread(self.worker_thread)
- # This will start an event loop in the worker thread
- self.worker_thread.start()
-
- def kill_thread(self):
- # Just tell the worker to stop, then tell it to quit and wait for that
- # to happen
- self.worker_thread.requestInterruption()
- if self.worker_thread.isRunning():
- self.worker_thread.quit()
- self.worker_thread.wait()
- else:
- print('worker has already exited.')
-
- def force_quit(self):
- # For use when the window is closed
- if self.worker_thread.isRunning():
- self.kill_thread()
-
- # The functions below update the UI and run in the main thread because
- # that's where the slots are set up
-
- @Slot(str, logging.LogRecord)
- def update_status(self, status, record):
- color = self.COLORS.get(record.levelno, 'black')
- s = '<pre><font color="%s">%s</font></pre>' % (color, status)
- self.textedit.appendHtml(s)
-
- @Slot()
- def manual_update(self):
- # This function uses the formatted message passed in, but also uses
- # information from the record to format the message in an appropriate
- # color according to its severity (level).
- level = random.choice(LEVELS)
- extra = {'qThreadName': ctname() }
- logger.log(level, 'Manually logged!', extra=extra)
-
- @Slot()
- def clear_display(self):
- self.textedit.clear()
-
-
- def main():
- QtCore.QThread.currentThread().setObjectName('MainThread')
- logging.getLogger().setLevel(logging.DEBUG)
- app = QtWidgets.QApplication(sys.argv)
- example = Window(app)
- example.show()
- sys.exit(app.exec_())
-
- if __name__=='__main__':
- main()
diff --git a/Doc/howto/logging.rst b/Doc/howto/logging.rst
index fbe5a11..698007a 100644
--- a/Doc/howto/logging.rst
+++ b/Doc/howto/logging.rst
@@ -128,36 +128,23 @@ look at that next. Be sure to try the following in a newly-started Python
interpreter, and don't just continue from the session described above::
import logging
- logging.basicConfig(filename='example.log', encoding='utf-8', level=logging.DEBUG)
+ logging.basicConfig(filename='example.log',level=logging.DEBUG)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
- logging.error('And non-ASCII stuff, too, like Øresund and Malmö')
-
-.. versionchanged:: 3.9
- The *encoding* argument was added. In earlier Python versions, or if not
- specified, the encoding used is the default value used by :func:`open`. While
- not shown in the above example, an *errors* argument can also now be passed,
- which determines how encoding errors are handled. For available values and
- the default, see the documentation for :func:`open`.
And now if we open the file and look at what we have, we should find the log
-messages:
-
-.. code-block:: none
+messages::
DEBUG:root:This message should go to the log file
INFO:root:So should this
WARNING:root:And this, too
- ERROR:root:And non-ASCII stuff, too, like Øresund and Malmö
This example also shows how you can set the logging level which acts as the
threshold for tracking. In this case, because we set the threshold to
``DEBUG``, all of the messages were printed.
-If you want to set the logging level from a command-line option such as:
-
-.. code-block:: none
+If you want to set the logging level from a command-line option such as::
--log=INFO
@@ -221,9 +208,7 @@ could organize logging in it::
def do_something():
logging.info('Doing something')
-If you run *myapp.py*, you should see this in *myapp.log*:
-
-.. code-block:: none
+If you run *myapp.py*, you should see this in *myapp.log*::
INFO:root:Started
INFO:root:Doing something
@@ -258,7 +243,7 @@ uses the old, %-style of string formatting. This is for backwards
compatibility: the logging package pre-dates newer formatting options such as
:meth:`str.format` and :class:`string.Template`. These newer formatting
options *are* supported, but exploring them is outside the scope of this
-tutorial: see :ref:`formatting-styles` for more information.
+tutorial.
Changing the format of displayed messages
@@ -273,9 +258,7 @@ specify the format you want to use::
logging.info('So should this')
logging.warning('And this, too')
-which would print:
-
-.. code-block:: none
+which would print::
DEBUG:This message should appear on the console
INFO:So should this
@@ -299,23 +282,19 @@ your format string::
logging.basicConfig(format='%(asctime)s %(message)s')
logging.warning('is when this event was logged.')
-which should print something like this:
-
-.. code-block:: none
+which should print something like this::
2010-12-12 11:41:42,612 is when this event was logged.
-The default format for date/time display (shown above) is like ISO8601 or
-:rfc:`3339`. If you need more control over the formatting of the date/time, provide
-a *datefmt* argument to ``basicConfig``, as in this example::
+The default format for date/time display (shown above) is ISO8601. If you need
+more control over the formatting of the date/time, provide a *datefmt*
+argument to ``basicConfig``, as in this example::
import logging
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
logging.warning('is when this event was logged.')
-which would display something like this:
-
-.. code-block:: none
+which would display something like this::
12/12/2010 11:46:36 AM is when this event was logged.
@@ -335,7 +314,7 @@ favourite beverage and carry on.
If your logging needs are simple, then use the above examples to incorporate
logging into your own scripts, and if you run into problems or don't
understand something, please post a question on the comp.lang.python Usenet
-group (available at https://groups.google.com/forum/#!forum/comp.lang.python) and you
+group (available at https://groups.google.com/group/comp.lang.python) and you
should receive help before too long.
Still here? You can carry on reading the next few sections, which provide a
@@ -384,10 +363,10 @@ root logger's name is printed as 'root' in the logged output.
It is, of course, possible to log messages to different destinations. Support
is included in the package for writing log messages to files, HTTP GET/POST
-locations, email via SMTP, generic sockets, queues, or OS-specific logging
-mechanisms such as syslog or the Windows NT event log. Destinations are served
-by :dfn:`handler` classes. You can create your own log destination class if
-you have special requirements not met by any of the built-in handler classes.
+locations, email via SMTP, generic sockets, or OS-specific logging mechanisms
+such as syslog or the Windows NT event log. Destinations are served by
+:dfn:`handler` classes. You can create your own log destination class if you
+have special requirements not met by any of the built-in handler classes.
By default, no destination is set for any logging messages. You can specify
a destination (such as console or file) by using :func:`basicConfig` as in the
@@ -397,9 +376,7 @@ if no destination is set; and if one is not set, they will set a destination
of the console (``sys.stderr``) and a default format for the displayed
message before delegating to the root logger to do the actual message output.
-The default format set by :func:`basicConfig` for messages is:
-
-.. code-block:: none
+The default format set by :func:`basicConfig` for messages is::
severity:logger name:message
@@ -452,10 +429,10 @@ With the logger object configured, the following methods create log messages:
:meth:`Logger.error`, and :meth:`Logger.critical` all create log records with
a message and a level that corresponds to their respective method names. The
message is actually a format string, which may contain the standard string
- substitution syntax of ``%s``, ``%d``, ``%f``, and so on. The
+ substitution syntax of :const:`%s`, :const:`%d`, :const:`%f`, and so on. The
rest of their arguments is a list of objects that correspond with the
- substitution fields in the message. With regard to ``**kwargs``, the
- logging methods care only about a keyword of ``exc_info`` and use it to
+ substitution fields in the message. With regard to :const:`**kwargs`, the
+ logging methods care only about a keyword of :const:`exc_info` and use it to
determine whether to log exception information.
* :meth:`Logger.exception` creates a log message similar to
@@ -538,31 +515,20 @@ Formatters
Formatter objects configure the final order, structure, and contents of the log
message. Unlike the base :class:`logging.Handler` class, application code may
instantiate formatter classes, although you could likely subclass the formatter
-if your application needs special behavior. The constructor takes three
-optional arguments -- a message format string, a date format string and a style
-indicator.
+if your application needs special behavior. The constructor takes two
+optional arguments -- a message format string and a date format string.
-.. method:: logging.Formatter.__init__(fmt=None, datefmt=None, style='%')
+.. method:: logging.Formatter.__init__(fmt=None, datefmt=None)
If there is no message format string, the default is to use the
-raw message. If there is no date format string, the default date format is:
-
-.. code-block:: none
+raw message. If there is no date format string, the default date format is::
%Y-%m-%d %H:%M:%S
-with the milliseconds tacked on at the end. The ``style`` is one of `%`, '{'
-or '$'. If one of these is not specified, then '%' will be used.
-
-If the ``style`` is '%', the message format string uses
-``%(<dictionary key>)s`` styled string substitution; the possible keys are
-documented in :ref:`logrecord-attributes`. If the style is '{', the message
-format string is assumed to be compatible with :meth:`str.format` (using
-keyword arguments), while if the style is '$' then the message format string
-should conform to what is expected by :meth:`string.Template.substitute`.
+with the milliseconds tacked on at the end.
-.. versionchanged:: 3.2
- Added the ``style`` parameter.
+The message format string uses ``%(<dictionary key>)s`` styled string
+substitution; the possible keys are documented in :ref:`logrecord-attributes`.
The following message format string will log the time in a human-readable
format, the severity of the message, and the contents of the message, in that
@@ -619,7 +585,7 @@ logger, a console handler, and a simple formatter using Python code::
# 'application' code
logger.debug('debug message')
logger.info('info message')
- logger.warning('warn message')
+ logger.warn('warn message')
logger.error('error message')
logger.critical('critical message')
@@ -649,13 +615,11 @@ the names of the objects::
# 'application' code
logger.debug('debug message')
logger.info('info message')
- logger.warning('warn message')
+ logger.warn('warn message')
logger.error('error message')
logger.critical('critical message')
-Here is the logging.conf file:
-
-.. code-block:: ini
+Here is the logging.conf file::
[loggers]
keys=root,simpleExample
@@ -704,19 +668,18 @@ noncoders to easily modify the logging properties.
.. warning:: The :func:`fileConfig` function takes a default parameter,
``disable_existing_loggers``, which defaults to ``True`` for reasons of
backward compatibility. This may or may not be what you want, since it
- will cause any non-root loggers existing before the :func:`fileConfig`
- call to be disabled unless they (or an ancestor) are explicitly named in
- the configuration. Please refer to the reference documentation for more
+ will cause any loggers existing before the :func:`fileConfig` call to
+ be disabled unless they (or an ancestor) are explicitly named in the
+ configuration. Please refer to the reference documentation for more
information, and specify ``False`` for this parameter if you wish.
The dictionary passed to :func:`dictConfig` can also specify a Boolean
value with key ``disable_existing_loggers``, which if not specified
explicitly in the dictionary also defaults to being interpreted as
- ``True``. This leads to the logger-disabling behaviour described above,
+ ``True``. This leads to the logger-disabling behaviour described above,
which may not be what you want - in which case, provide the key
explicitly with a value of ``False``.
-
.. currentmodule:: logging
Note that the class names referenced in config files need to be either relative
@@ -727,7 +690,7 @@ import mechanisms. Thus, you could use either
and module ``mymodule``, where ``mypackage`` is available on the Python import
path).
-In Python 3.2, a new means of configuring logging has been introduced, using
+In Python 2.7, a new means of configuring logging has been introduced, using
dictionaries to hold configuration information. This provides a superset of the
functionality of the config-file-based approach outlined above, and is the
recommended configuration method for new applications and deployments. Because
@@ -740,9 +703,7 @@ construct the dictionary in Python code, receive it in pickled form over a
socket, or use whatever approach makes sense for your application.
Here's an example of the same configuration as above, in YAML format for
-the new dictionary-based approach:
-
-.. code-block:: yaml
+the new dictionary-based approach::
version: 1
formatters:
@@ -774,7 +735,7 @@ where a logging event needs to be output, but no handlers can be found to
output the event. The behaviour of the logging package in these
circumstances is dependent on the Python version.
-For versions of Python prior to 3.2, the behaviour is as follows:
+For Python 2.x, the behaviour is as follows:
* If *logging.raiseExceptions* is ``False`` (production mode), the event is
silently dropped.
@@ -782,19 +743,6 @@ For versions of Python prior to 3.2, the behaviour is as follows:
* If *logging.raiseExceptions* is ``True`` (development mode), a message
'No handlers could be found for logger X.Y.Z' is printed once.
-In Python 3.2 and later, the behaviour is as follows:
-
-* The event is output using a 'handler of last resort', stored in
- ``logging.lastResort``. This internal handler is not associated with any
- logger, and acts like a :class:`~logging.StreamHandler` which writes the
- event description message to the current value of ``sys.stderr`` (therefore
- respecting any redirections which may be in effect). No formatting is
- done on the message - just the bare event description message is printed.
- The handler's level is set to ``WARNING``, so all events at this and
- greater severities will be output.
-
-To obtain the pre-3.2 behaviour, ``logging.lastResort`` can be set to ``None``.
-
.. _library-config:
Configuring Logging for a Library
@@ -803,24 +751,23 @@ Configuring Logging for a Library
When developing a library which uses logging, you should take care to
document how the library uses logging - for example, the names of loggers
used. Some consideration also needs to be given to its logging configuration.
-If the using application does not use logging, and library code makes logging
-calls, then (as described in the previous section) events of severity
-``WARNING`` and greater will be printed to ``sys.stderr``. This is regarded as
-the best default behaviour.
+If the using application does not configure logging, and library code makes
+logging calls, then (as described in the previous section) an error message
+will be printed to ``sys.stderr``.
-If for some reason you *don't* want these messages printed in the absence of
+If for some reason you *don't* want this message printed in the absence of
any logging configuration, you can attach a do-nothing handler to the top-level
logger for your library. This avoids the message being printed, since a handler
-will always be found for the library's events: it just doesn't produce any
+will be always be found for the library's events: it just doesn't produce any
output. If the library user configures logging for application use, presumably
that configuration will add some handlers, and if levels are suitably
configured then logging calls made in library code will send output to those
handlers, as normal.
A do-nothing handler is included in the logging package:
-:class:`~logging.NullHandler` (since Python 3.1). An instance of this handler
+:class:`~logging.NullHandler` (since Python 2.7). An instance of this handler
could be added to the top-level logger of the logging namespace used by the
-library (*if* you want to prevent your library's logged events being output to
+library (*if* you want to prevent an error message being output to
``sys.stderr`` in the absence of logging configuration). If all logging by a
library *foo* is done using loggers with names matching 'foo.x', 'foo.x.y',
etc. then the code::
@@ -938,10 +885,10 @@ provided:
disk files, rotating the log file at certain timed intervals.
#. :class:`~handlers.SocketHandler` instances send messages to TCP/IP
- sockets. Since 3.4, Unix domain sockets are also supported.
+ sockets.
#. :class:`~handlers.DatagramHandler` instances send messages to UDP
- sockets. Since 3.4, Unix domain sockets are also supported.
+ sockets.
#. :class:`~handlers.SMTPHandler` instances send messages to a designated
email address.
@@ -963,24 +910,18 @@ provided:
name. This handler is only useful on Unix-like systems; Windows does not
support the underlying mechanism used.
-#. :class:`~handlers.QueueHandler` instances send messages to a queue, such as
- those implemented in the :mod:`queue` or :mod:`multiprocessing` modules.
-
#. :class:`NullHandler` instances do nothing with error messages. They are used
by library developers who want to use logging, but want to avoid the 'No
handlers could be found for logger XXX' message which can be displayed if
the library user has not configured logging. See :ref:`library-config` for
more information.
-.. versionadded:: 3.1
+.. versionadded:: 2.7
The :class:`NullHandler` class.
-.. versionadded:: 3.2
- The :class:`~handlers.QueueHandler` class.
-
The :class:`NullHandler`, :class:`StreamHandler` and :class:`FileHandler`
classes are defined in the core logging package. The other handlers are
-defined in a sub-module, :mod:`logging.handlers`. (There is also another
+defined in a sub- module, :mod:`logging.handlers`. (There is also another
sub-module, :mod:`logging.config`, for configuration functionality.)
Logged messages are formatted for presentation through instances of the
@@ -1029,7 +970,6 @@ swallowed.
exceptions that occur. It's advised that you set :data:`raiseExceptions` to
``False`` for production usage.
-.. currentmodule:: logging
.. _arbitrary-object-messages:
@@ -1086,8 +1026,7 @@ need:
| | :func:`sys._getframe`, which may help |
| | to speed up your code in environments |
| | like PyPy (which can't speed up code |
-| | that uses :func:`sys._getframe`), if |
-| | and when PyPy supports Python 3.x. |
+| | that uses :func:`sys._getframe`). |
+-----------------------------------------------+----------------------------------------+
| Threading information. | Set ``logging.logThreads`` to ``0``. |
+-----------------------------------------------+----------------------------------------+
diff --git a/Doc/howto/logging_flow.png b/Doc/howto/logging_flow.png
index fac4acd..a883823 100644..100755
--- a/Doc/howto/logging_flow.png
+++ b/Doc/howto/logging_flow.png
Binary files differ
diff --git a/Doc/howto/pyporting.rst b/Doc/howto/pyporting.rst
index f7d12a1..88b0177 100644
--- a/Doc/howto/pyporting.rst
+++ b/Doc/howto/pyporting.rst
@@ -31,26 +31,20 @@ are:
#. Only worry about supporting Python 2.7
#. Make sure you have good test coverage (coverage.py_ can help;
- ``python -m pip install coverage``)
+ ``pip install coverage``)
#. Learn the differences between Python 2 & 3
-#. Use Futurize_ (or Modernize_) to update your code (e.g. ``python -m pip install future``)
+#. Use Futurize_ (or Modernize_) to update your code (e.g. ``pip install future``)
#. Use Pylint_ to help make sure you don't regress on your Python 3 support
- (``python -m pip install pylint``)
+ (``pip install pylint``)
#. Use caniusepython3_ to find out which of your dependencies are blocking your
- use of Python 3 (``python -m pip install caniusepython3``)
+ use of Python 3 (``pip install caniusepython3``)
#. Once your dependencies are no longer blocking you, use continuous integration
to make sure you stay compatible with Python 2 & 3 (tox_ can help test
- against multiple versions of Python; ``python -m pip install tox``)
+ against multiple versions of Python; ``pip install tox``)
#. Consider using optional static type checking to make sure your type usage
works in both Python 2 & 3 (e.g. use mypy_ to check your typing under both
- Python 2 & Python 3; ``python -m pip install mypy``).
+ Python 2 & Python 3).
-.. note::
-
- Note: Using ``python -m pip install`` guarantees that the ``pip`` you invoke
- is the one installed for the Python currently in use, whether it be
- a system-wide ``pip`` or one installed within a
- :ref:`virtual environment <tut-venv>`.
Details
=======
@@ -77,7 +71,7 @@ Drop support for Python 2.6 and older
While you can make Python 2.5 work with Python 3, it is **much** easier if you
only have to work with Python 2.7. If dropping Python 2.5 is not an
option then the six_ project can help you support Python 2.5 & 3 simultaneously
-(``python -m pip install six``). Do realize, though, that nearly all the projects listed
+(``pip install six``). Do realize, though, that nearly all the projects listed
in this HOWTO will not be available to you.
If you are able to skip Python 2.5 and older, then the required changes
@@ -439,12 +433,12 @@ to make sure everything functions as expected in both versions of Python.
.. _Futurize: http://python-future.org/automatic_conversion.html
.. _importlib: https://docs.python.org/3/library/importlib.html#module-importlib
.. _importlib2: https://pypi.org/project/importlib2
-.. _Modernize: https://python-modernize.readthedocs.io/
+.. _Modernize: https://python-modernize.readthedocs.org/en/latest/
.. _mypy: http://mypy-lang.org/
.. _Porting to Python 3: http://python3porting.com/
.. _Pylint: https://pypi.org/project/pylint
-.. _Python 3 Q & A: https://ncoghlan-devs-python-notes.readthedocs.io/en/latest/python3/questions_and_answers.html
+.. _Python 3 Q & A: https://ncoghlan-devs-python-notes.readthedocs.org/en/latest/python3/questions_and_answers.html
.. _pytype: https://github.com/google/pytype
.. _python-future: http://python-future.org/
@@ -455,4 +449,4 @@ to make sure everything functions as expected in both versions of Python.
.. _"What's New": https://docs.python.org/3/whatsnew/index.html
-.. _Why Python 3 exists: https://snarky.ca/why-python-3-exists
+.. _Why Python 3 exists: http://www.snarky.ca/why-python-3-exists
diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst
index d574c37..81c0495 100644
--- a/Doc/howto/regex.rst
+++ b/Doc/howto/regex.rst
@@ -23,6 +23,11 @@
Introduction
============
+The :mod:`re` module was added in Python 1.5, and provides Perl-style regular
+expression patterns. Earlier versions of Python came with the :mod:`regex`
+module, which provided Emacs-style patterns. The :mod:`regex` module was
+removed completely in Python 2.5.
+
Regular expressions (called REs, or regexes, or regex patterns) are essentially
a tiny, highly specialized programming language embedded inside Python and made
available through the :mod:`re` module. Using this little language, you specify
@@ -107,25 +112,13 @@ you can still match them in patterns; for example, if you need to match a ``[``
or ``\``, you can precede them with a backslash to remove their special
meaning: ``\[`` or ``\\``.
-Some of the special sequences beginning with ``'\'`` represent
-predefined sets of characters that are often useful, such as the set
-of digits, the set of letters, or the set of anything that isn't
-whitespace.
-
-Let's take an example: ``\w`` matches any alphanumeric character. If
-the regex pattern is expressed in bytes, this is equivalent to the
-class ``[a-zA-Z0-9_]``. If the regex pattern is a string, ``\w`` will
-match all the characters marked as letters in the Unicode database
-provided by the :mod:`unicodedata` module. You can use the more
-restricted definition of ``\w`` in a string pattern by supplying the
-:const:`re.ASCII` flag when compiling the regular expression.
-
-The following list of special sequences isn't complete. For a complete
-list of sequences and expanded class definitions for Unicode string
-patterns, see the last part of :ref:`Regular Expression Syntax
-<re-syntax>` in the Standard Library reference. In general, the
-Unicode versions match any character that's in the appropriate
-category in the Unicode database.
+Some of the special sequences beginning with ``'\'`` represent predefined sets
+of characters that are often useful, such as the set of digits, the set of
+letters, or the set of anything that isn't whitespace. The following predefined
+special sequences are a subset of those available. The equivalent classes are
+for byte string patterns. For a complete list of sequences and expanded class
+definitions for Unicode string patterns, see the last part of
+:ref:`Regular Expression Syntax <re-syntax>`.
``\d``
Matches any decimal digit; this is equivalent to the class ``[0-9]``.
@@ -154,8 +147,8 @@ These sequences can be included inside a character class. For example,
``','`` or ``'.'``.
The final metacharacter in this section is ``.``. It matches anything except a
-newline character, and there's an alternate mode (:const:`re.DOTALL`) where it will
-match even a newline. ``.`` is often used where you want to match "any
+newline character, and there's an alternate mode (``re.DOTALL``) where it will
+match even a newline. ``'.'`` is often used where you want to match "any
character".
@@ -169,11 +162,15 @@ wouldn't be much of an advance. Another capability is that you can specify that
portions of the RE must be repeated a certain number of times.
The first metacharacter for repeating things that we'll look at is ``*``. ``*``
-doesn't match the literal character ``'*'``; instead, it specifies that the
+doesn't match the literal character ``*``; instead, it specifies that the
previous character can be matched zero or more times, instead of exactly once.
-For example, ``ca*t`` will match ``'ct'`` (0 ``'a'`` characters), ``'cat'`` (1 ``'a'``),
-``'caaat'`` (3 ``'a'`` characters), and so forth.
+For example, ``ca*t`` will match ``ct`` (0 ``a`` characters), ``cat`` (1 ``a``),
+``caaat`` (3 ``a`` characters), and so forth. The RE engine has various
+internal limitations stemming from the size of C's ``int`` type that will
+prevent it from matching over 2 billion ``a`` characters; you probably don't
+have enough memory to construct a string that large, so you shouldn't run into
+that limit.
Repetitions such as ``*`` are :dfn:`greedy`; when repeating a RE, the matching
engine will try to repeat it as many times as possible. If later portions of the
@@ -183,7 +180,7 @@ fewer repetitions.
A step-by-step example will make this more obvious. Let's consider the
expression ``a[bcd]*b``. This matches the letter ``'a'``, zero or more letters
from the class ``[bcd]``, and finally ends with a ``'b'``. Now imagine matching
-this RE against the string ``'abcbd'``.
+this RE against the string ``abcbd``.
+------+-----------+---------------------------------+
| Step | Matched | Explanation |
@@ -216,7 +213,7 @@ this RE against the string ``'abcbd'``.
| | | it succeeds. |
+------+-----------+---------------------------------+
-The end of the RE has now been reached, and it has matched ``'abcb'``. This
+The end of the RE has now been reached, and it has matched ``abcb``. This
demonstrates how the matching engine goes as far as it can at first, and if no
match is found it will then progressively back up and retry the rest of the RE
again and again. It will back up until it has tried zero matches for
@@ -227,23 +224,24 @@ Another repeating metacharacter is ``+``, which matches one or more times. Pay
careful attention to the difference between ``*`` and ``+``; ``*`` matches
*zero* or more times, so whatever's being repeated may not be present at all,
while ``+`` requires at least *one* occurrence. To use a similar example,
-``ca+t`` will match ``'cat'`` (1 ``'a'``), ``'caaat'`` (3 ``'a'``\ s), but won't
-match ``'ct'``.
+``ca+t`` will match ``cat`` (1 ``a``), ``caaat`` (3 ``a``'s), but won't match
+``ct``.
There are two more repeating qualifiers. The question mark character, ``?``,
matches either once or zero times; you can think of it as marking something as
-being optional. For example, ``home-?brew`` matches either ``'homebrew'`` or
-``'home-brew'``.
+being optional. For example, ``home-?brew`` matches either ``homebrew`` or
+``home-brew``.
The most complicated repeated qualifier is ``{m,n}``, where *m* and *n* are
decimal integers. This qualifier means there must be at least *m* repetitions,
-and at most *n*. For example, ``a/{1,3}b`` will match ``'a/b'``, ``'a//b'``, and
-``'a///b'``. It won't match ``'ab'``, which has no slashes, or ``'a////b'``, which
+and at most *n*. For example, ``a/{1,3}b`` will match ``a/b``, ``a//b``, and
+``a///b``. It won't match ``ab``, which has no slashes, or ``a////b``, which
has four.
You can omit either *m* or *n*; in that case, a reasonable value is assumed for
the missing value. Omitting *m* is interpreted as a lower limit of 0, while
-omitting *n* results in an upper bound of infinity.
+omitting *n* results in an upper bound of infinity --- actually, the upper bound
+is the 2-billion limit mentioned earlier, but that might as well be infinity.
Readers of a reductionist bent may notice that the three other qualifiers can
all be expressed using this notation. ``{0,}`` is the same as ``*``, ``{1,}``
@@ -270,8 +268,8 @@ performing string substitutions. ::
>>> import re
>>> p = re.compile('ab*')
- >>> p
- re.compile('ab*')
+ >>> p #doctest: +ELLIPSIS
+ <_sre.SRE_Pattern object at 0x...>
:func:`re.compile` also accepts an optional *flags* argument, used to enable
various special features and syntax variations. We'll go over the available
@@ -290,8 +288,6 @@ Putting REs in strings keeps the Python language simpler, but has one
disadvantage which is the topic of the next section.
-.. _the-backslash-plague:
-
The Backslash Plague
--------------------
@@ -330,13 +326,6 @@ backslashes are not handled in any special way in a string literal prefixed with
while ``"\n"`` is a one-character string containing a newline. Regular
expressions will often be written in Python code using this raw string notation.
-In addition, special escape sequences that are valid in regular expressions,
-but not valid as Python string literals, now result in a
-:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`,
-which means the sequences will be invalid if raw string notation or escaping
-the backslashes isn't used.
-
-
+-------------------+------------------+
| Regular String | Raw string |
+===================+==================+
@@ -372,46 +361,49 @@ for a complete listing.
| | returns them as an :term:`iterator`. |
+------------------+-----------------------------------------------+
-:meth:`~re.Pattern.match` and :meth:`~re.Pattern.search` return ``None`` if no match can be found. If
+:meth:`match` and :meth:`search` return ``None`` if no match can be found. If
they're successful, a :ref:`match object <match-objects>` instance is returned,
containing information about the match: where it starts and ends, the substring
it matched, and more.
You can learn about this by interactively experimenting with the :mod:`re`
-module. If you have :mod:`tkinter` available, you may also want to look at
-:source:`Tools/demo/redemo.py`, a demonstration program included with the
+module. If you have Tkinter available, you may also want to look at
+:source:`Tools/scripts/redemo.py`, a demonstration program included with the
Python distribution. It allows you to enter REs and strings, and displays
whether the RE matches or fails. :file:`redemo.py` can be quite useful when
-trying to debug a complicated RE.
+trying to debug a complicated RE. Phil Schwartz's `Kodos
+<http://kodos.sourceforge.net/>`_ is also an interactive tool for developing and
+testing RE patterns.
This HOWTO uses the standard Python interpreter for its examples. First, run the
Python interpreter, import the :mod:`re` module, and compile a RE::
+ Python 2.2.2 (#1, Feb 10 2003, 12:57:01)
>>> import re
>>> p = re.compile('[a-z]+')
- >>> p
- re.compile('[a-z]+')
+ >>> p #doctest: +ELLIPSIS
+ <_sre.SRE_Pattern object at 0x...>
Now, you can try matching various strings against the RE ``[a-z]+``. An empty
string shouldn't match at all, since ``+`` means 'one or more repetitions'.
-:meth:`~re.Pattern.match` should return ``None`` in this case, which will cause the
+:meth:`match` should return ``None`` in this case, which will cause the
interpreter to print no output. You can explicitly print the result of
-:meth:`!match` to make this clear. ::
+:meth:`match` to make this clear. ::
>>> p.match("")
- >>> print(p.match(""))
+ >>> print p.match("")
None
Now, let's try it on a string that it should match, such as ``tempo``. In this
-case, :meth:`~re.Pattern.match` will return a :ref:`match object <match-objects>`, so you
+case, :meth:`match` will return a :ref:`match object <match-objects>`, so you
should store the result in a variable for later use. ::
>>> m = p.match('tempo')
- >>> m
- <re.Match object; span=(0, 5), match='tempo'>
+ >>> m #doctest: +ELLIPSIS
+ <_sre.SRE_Match object at 0x...>
Now you can query the :ref:`match object <match-objects>` for information
-about the matching string. Match object instances
+about the matching string. :ref:`match object <match-objects>` instances
also have several methods and attributes; the most important ones are:
+------------------+--------------------------------------------+
@@ -436,18 +428,18 @@ Trying these methods will soon clarify their meaning::
>>> m.span()
(0, 5)
-:meth:`~re.Match.group` returns the substring that was matched by the RE. :meth:`~re.Match.start`
-and :meth:`~re.Match.end` return the starting and ending index of the match. :meth:`~re.Match.span`
-returns both start and end indexes in a single tuple. Since the :meth:`~re.Pattern.match`
-method only checks if the RE matches at the start of a string, :meth:`!start`
-will always be zero. However, the :meth:`~re.Pattern.search` method of patterns
+:meth:`group` returns the substring that was matched by the RE. :meth:`start`
+and :meth:`end` return the starting and ending index of the match. :meth:`span`
+returns both start and end indexes in a single tuple. Since the :meth:`match`
+method only checks if the RE matches at the start of a string, :meth:`start`
+will always be zero. However, the :meth:`search` method of patterns
scans through the string, so the match may not start at zero in that
case. ::
- >>> print(p.match('::: message'))
+ >>> print p.match('::: message')
None
- >>> m = p.search('::: message'); print(m)
- <re.Match object; span=(4, 11), match='message'>
+ >>> m = p.search('::: message'); print m #doctest: +ELLIPSIS
+ <_sre.SRE_Match object at 0x...>
>>> m.group()
'message'
>>> m.span()
@@ -460,32 +452,26 @@ In actual programs, the most common style is to store the
p = re.compile( ... )
m = p.match( 'string goes here' )
if m:
- print('Match found: ', m.group())
+ print 'Match found: ', m.group()
else:
- print('No match')
+ print 'No match'
Two pattern methods return all of the matches for a pattern.
-:meth:`~re.Pattern.findall` returns a list of matching strings::
+:meth:`findall` returns a list of matching strings::
- >>> p = re.compile(r'\d+')
+ >>> p = re.compile('\d+')
>>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
['12', '11', '10']
-The ``r`` prefix, making the literal a raw string literal, is needed in this
-example because escape sequences in a normal "cooked" string literal that are
-not recognized by Python, as opposed to regular expressions, now result in a
-:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`. See
-:ref:`the-backslash-plague`.
-
-:meth:`~re.Pattern.findall` has to create the entire list before it can be returned as the
-result. The :meth:`~re.Pattern.finditer` method returns a sequence of
-:ref:`match object <match-objects>` instances as an :term:`iterator`::
+:meth:`findall` has to create the entire list before it can be returned as the
+result. The :meth:`finditer` method returns a sequence of
+:ref:`match object <match-objects>` instances as an :term:`iterator`. [#]_ ::
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
>>> iterator #doctest: +ELLIPSIS
- <callable_iterator object at 0x...>
+ <callable-iterator object at 0x...>
>>> for match in iterator:
- ... print(match.span())
+ ... print match.span()
...
(0, 2)
(22, 24)
@@ -496,27 +482,38 @@ Module-Level Functions
----------------------
You don't have to create a pattern object and call its methods; the
-:mod:`re` module also provides top-level functions called :func:`~re.match`,
-:func:`~re.search`, :func:`~re.findall`, :func:`~re.sub`, and so forth. These functions
-take the same arguments as the corresponding pattern method with
+:mod:`re` module also provides top-level functions called :func:`match`,
+:func:`search`, :func:`findall`, :func:`sub`, and so forth. These functions
+take the same arguments as the corresponding pattern method, with
the RE string added as the first argument, and still return either ``None`` or a
:ref:`match object <match-objects>` instance. ::
- >>> print(re.match(r'From\s+', 'Fromage amk'))
+ >>> print re.match(r'From\s+', 'Fromage amk')
None
>>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') #doctest: +ELLIPSIS
- <re.Match object; span=(0, 5), match='From '>
+ <_sre.SRE_Match object at 0x...>
Under the hood, these functions simply create a pattern object for you
-and call the appropriate method on it. They also store the compiled
-object in a cache, so future calls using the same RE won't need to
-parse the pattern again and again.
+and call the appropriate method on it. They also store the compiled object in a
+cache, so future calls using the same RE are faster.
Should you use these module-level functions, or should you get the
-pattern and call its methods yourself? If you're accessing a regex
-within a loop, pre-compiling it will save a few function calls.
-Outside of loops, there's not much difference thanks to the internal
-cache.
+pattern and call its methods yourself? That choice depends on how
+frequently the RE will be used, and on your personal coding style. If the RE is
+being used at only one point in the code, then the module functions are probably
+more convenient. If a program contains a lot of regular expressions, or re-uses
+the same ones in several locations, then it might be worthwhile to collect all
+the definitions in one place, in a section of code that compiles all the REs
+ahead of time. To take an example from the standard library, here's an extract
+from the deprecated :mod:`xmllib` module::
+
+ ref = re.compile( ... )
+ entityref = re.compile( ... )
+ charref = re.compile( ... )
+ starttagopen = re.compile( ... )
+
+I generally prefer to work with the compiled object, even for one-time uses, but
+few people will be as much of a purist about this as I am.
Compilation Flags
@@ -536,22 +533,22 @@ of each one.
+---------------------------------+--------------------------------------------+
| Flag | Meaning |
+=================================+============================================+
-| :const:`ASCII`, :const:`A` | Makes several escapes like ``\w``, ``\b``, |
-| | ``\s`` and ``\d`` match only on ASCII |
-| | characters with the respective property. |
-+---------------------------------+--------------------------------------------+
| :const:`DOTALL`, :const:`S` | Make ``.`` match any character, including |
-| | newlines. |
+| | newlines |
+---------------------------------+--------------------------------------------+
-| :const:`IGNORECASE`, :const:`I` | Do case-insensitive matches. |
+| :const:`IGNORECASE`, :const:`I` | Do case-insensitive matches |
+---------------------------------+--------------------------------------------+
-| :const:`LOCALE`, :const:`L` | Do a locale-aware match. |
+| :const:`LOCALE`, :const:`L` | Do a locale-aware match |
+---------------------------------+--------------------------------------------+
| :const:`MULTILINE`, :const:`M` | Multi-line matching, affecting ``^`` and |
-| | ``$``. |
+| | ``$`` |
+---------------------------------+--------------------------------------------+
| :const:`VERBOSE`, :const:`X` | Enable verbose REs, which can be organized |
-| (for 'extended') | more cleanly and understandably. |
+| | more cleanly and understandably. |
++---------------------------------+--------------------------------------------+
+| :const:`UNICODE`, :const:`U` | Makes several escapes like ``\w``, ``\b``, |
+| | ``\s`` and ``\d`` dependent on the Unicode |
+| | character database. |
+---------------------------------+--------------------------------------------+
@@ -561,41 +558,26 @@ of each one.
Perform case-insensitive matching; character class and literal strings will
match letters by ignoring case. For example, ``[A-Z]`` will match lowercase
- letters, too. Full Unicode matching also works unless the :const:`ASCII`
- flag is used to disable non-ASCII matches. When the Unicode patterns
- ``[a-z]`` or ``[A-Z]`` are used in combination with the :const:`IGNORECASE`
- flag, they will match the 52 ASCII letters and 4 additional non-ASCII
- letters: 'İ' (U+0130, Latin capital letter I with dot above), 'ı' (U+0131,
- Latin small letter dotless i), 'ſ' (U+017F, Latin small letter long s) and
- 'K' (U+212A, Kelvin sign). ``Spam`` will match ``'Spam'``, ``'spam'``,
- ``'spAM'``, or ``'ſpam'`` (the latter is matched only in Unicode mode).
- This lowercasing doesn't take the current locale into account;
- it will if you also set the :const:`LOCALE` flag.
+ letters, too, and ``Spam`` will match ``Spam``, ``spam``, or ``spAM``. This
+ lowercasing doesn't take the current locale into account; it will if you also
+ set the :const:`LOCALE` flag.
.. data:: L
LOCALE
:noindex:
- Make ``\w``, ``\W``, ``\b``, ``\B`` and case-insensitive matching dependent
- on the current locale instead of the Unicode database.
-
- Locales are a feature of the C library intended to help in writing programs
- that take account of language differences. For example, if you're
- processing encoded French text, you'd want to be able to write ``\w+`` to
- match words, but ``\w`` only matches the character class ``[A-Za-z]`` in
- bytes patterns; it won't match bytes corresponding to ``é`` or ``ç``.
- If your system is configured properly and a French locale is selected,
- certain C functions will tell the program that the byte corresponding to
- ``é`` should also be considered a letter.
+ Make ``\w``, ``\W``, ``\b``, and ``\B``, dependent on the current locale.
+
+ Locales are a feature of the C library intended to help in writing programs that
+ take account of language differences. For example, if you're processing French
+ text, you'd want to be able to write ``\w+`` to match words, but ``\w`` only
+ matches the character class ``[A-Za-z]``; it won't match ``'é'`` or ``'ç'``. If
+ your system is configured properly and a French locale is selected, certain C
+ functions will tell the program that ``'é'`` should also be considered a letter.
Setting the :const:`LOCALE` flag when compiling a regular expression will cause
the resulting compiled object to use these C functions for ``\w``; this is
slower, but also enables ``\w+`` to match French words as you'd expect.
- The use of this flag is discouraged in Python 3 as the locale mechanism
- is very unreliable, it only handles one "culture" at a time, and it only
- works with 8-bit locales. Unicode matching is already enabled by default
- in Python 3 for Unicode (str) patterns, and it is able to handle different
- locales/languages.
.. data:: M
@@ -622,13 +604,12 @@ of each one.
newline; without this flag, ``'.'`` will match anything *except* a newline.
-.. data:: A
- ASCII
+.. data:: U
+ UNICODE
:noindex:
- Make ``\w``, ``\W``, ``\b``, ``\B``, ``\s`` and ``\S`` perform ASCII-only
- matching instead of full Unicode matching. This is only meaningful for
- Unicode patterns, and is ignored for byte patterns.
+ Make ``\w``, ``\W``, ``\b``, ``\B``, ``\d``, ``\D``, ``\s`` and ``\S``
+ dependent on the Unicode character properties database.
.. data:: X
@@ -693,11 +674,11 @@ zero-width assertions should never be repeated, because if they match once at a
given location, they can obviously be matched an infinite number of times.
``|``
- Alternation, or the "or" operator. If *A* and *B* are regular expressions,
- ``A|B`` will match any string that matches either *A* or *B*. ``|`` has very
+ Alternation, or the "or" operator. If A and B are regular expressions,
+ ``A|B`` will match any string that matches either ``A`` or ``B``. ``|`` has very
low precedence in order to make it work reasonably when you're alternating
- multi-character strings. ``Crow|Servo`` will match either ``'Crow'`` or ``'Servo'``,
- not ``'Cro'``, a ``'w'`` or an ``'S'``, and ``'ervo'``.
+ multi-character strings. ``Crow|Servo`` will match either ``Crow`` or ``Servo``,
+ not ``Cro``, a ``'w'`` or an ``'S'``, and ``ervo``.
To match a literal ``'|'``, use ``\|``, or enclose it inside a character class,
as in ``[|]``.
@@ -710,23 +691,24 @@ given location, they can obviously be matched an infinite number of times.
For example, if you wish to match the word ``From`` only at the beginning of a
line, the RE to use is ``^From``. ::
- >>> print(re.search('^From', 'From Here to Eternity')) #doctest: +ELLIPSIS
- <re.Match object; span=(0, 4), match='From'>
- >>> print(re.search('^From', 'Reciting From Memory'))
+ >>> print re.search('^From', 'From Here to Eternity') #doctest: +ELLIPSIS
+ <_sre.SRE_Match object at 0x...>
+ >>> print re.search('^From', 'Reciting From Memory')
None
- To match a literal ``'^'``, use ``\^``.
+ .. To match a literal \character{\^}, use \regexp{\e\^} or enclose it
+ .. inside a character class, as in \regexp{[{\e}\^]}.
``$``
Matches at the end of a line, which is defined as either the end of the string,
or any location followed by a newline character. ::
- >>> print(re.search('}$', '{block}')) #doctest: +ELLIPSIS
- <re.Match object; span=(6, 7), match='}'>
- >>> print(re.search('}$', '{block} '))
+ >>> print re.search('}$', '{block}') #doctest: +ELLIPSIS
+ <_sre.SRE_Match object at 0x...>
+ >>> print re.search('}$', '{block} ')
None
- >>> print(re.search('}$', '{block}\n')) #doctest: +ELLIPSIS
- <re.Match object; span=(6, 7), match='}'>
+ >>> print re.search('}$', '{block}\n') #doctest: +ELLIPSIS
+ <_sre.SRE_Match object at 0x...>
To match a literal ``'$'``, use ``\$`` or enclose it inside a character class,
as in ``[$]``.
@@ -750,11 +732,11 @@ given location, they can obviously be matched an infinite number of times.
match when it's contained inside another word. ::
>>> p = re.compile(r'\bclass\b')
- >>> print(p.search('no class at all'))
- <re.Match object; span=(3, 8), match='class'>
- >>> print(p.search('the declassified algorithm'))
+ >>> print p.search('no class at all') #doctest: +ELLIPSIS
+ <_sre.SRE_Match object at 0x...>
+ >>> print p.search('the declassified algorithm')
None
- >>> print(p.search('one subclass is'))
+ >>> print p.search('one subclass is')
None
There are two subtleties you should remember when using this special sequence.
@@ -766,10 +748,10 @@ given location, they can obviously be matched an infinite number of times.
in front of the RE string. ::
>>> p = re.compile('\bclass\b')
- >>> print(p.search('no class at all'))
+ >>> print p.search('no class at all')
None
- >>> print(p.search('\b' + 'class' + '\b'))
- <re.Match object; span=(0, 7), match='\x08class\x08'>
+ >>> print p.search('\b' + 'class' + '\b') #doctest: +ELLIPSIS
+ <_sre.SRE_Match object at 0x...>
Second, inside a character class, where there's no use for this assertion,
``\b`` represents the backspace character, for compatibility with Python's
@@ -787,9 +769,7 @@ Frequently you need to obtain more information than just whether the RE matched
or not. Regular expressions are often used to dissect strings by writing a RE
divided into several subgroups which match different components of interest.
For example, an RFC-822 header line is divided into a header name and a value,
-separated by a ``':'``, like this:
-
-.. code-block:: none
+separated by a ``':'``, like this::
From: author@example.com
User-Agent: Thunderbird 1.5.0.9 (X11/20061227)
@@ -808,13 +788,12 @@ of a group with a repeating qualifier, such as ``*``, ``+``, ``?``, or
``ab``. ::
>>> p = re.compile('(ab)*')
- >>> print(p.match('ababababab').span())
+ >>> print p.match('ababababab').span()
(0, 10)
Groups indicated with ``'('``, ``')'`` also capture the starting and ending
index of the text that they match; this can be retrieved by passing an argument
-to :meth:`~re.Match.group`, :meth:`~re.Match.start`, :meth:`~re.Match.end`, and
-:meth:`~re.Match.span`. Groups are
+to :meth:`group`, :meth:`start`, :meth:`end`, and :meth:`span`. Groups are
numbered starting with 0. Group 0 is always present; it's the whole RE, so
:ref:`match object <match-objects>` methods all have group 0 as their default
argument. Later we'll see how to express groups that don't capture the span
@@ -840,13 +819,13 @@ from left to right. ::
>>> m.group(2)
'b'
-:meth:`~re.Match.group` can be passed multiple group numbers at a time, in which case it
+:meth:`group` can be passed multiple group numbers at a time, in which case it
will return a tuple containing the corresponding values for those groups. ::
>>> m.group(2,1,2)
('b', 'abc', 'b')
-The :meth:`~re.Match.groups` method returns a tuple containing the strings for all the
+The :meth:`groups` method returns a tuple containing the strings for all the
subgroups, from 1 up to however many there are. ::
>>> m.groups()
@@ -880,10 +859,11 @@ keep track of the group numbers. There are two features which help with this
problem. Both of them use a common syntax for regular expression extensions, so
we'll look at that first.
-Perl 5 is well known for its powerful additions to standard regular expressions.
-For these new features the Perl developers couldn't choose new single-keystroke metacharacters
-or new special sequences beginning with ``\`` without making Perl's regular
-expressions confusingly different from standard REs. If they chose ``&`` as a
+Perl 5 added several additional features to standard regular expressions, and
+the Python :mod:`re` module supports most of them. It would have been
+difficult to choose new single-keystroke metacharacters or new special sequences
+beginning with ``\`` to represent the new features without making Perl's regular
+expressions confusingly different from standard REs. If you chose ``&`` as a
new metacharacter, for example, old expressions would be assuming that ``&`` was
a regular character and wouldn't have escaped it by writing ``\&`` or ``[&]``.
@@ -895,15 +875,22 @@ what extension is being used, so ``(?=foo)`` is one thing (a positive lookahead
assertion) and ``(?:foo)`` is something else (a non-capturing group containing
the subexpression ``foo``).
-Python supports several of Perl's extensions and adds an extension
-syntax to Perl's extension syntax. If the first character after the
-question mark is a ``P``, you know that it's an extension that's
-specific to Python.
-
-Now that we've looked at the general extension syntax, we can return
-to the features that simplify working with groups in complex REs.
-
-Sometimes you'll want to use a group to denote a part of a regular expression,
+Python adds an extension syntax to Perl's extension syntax. If the first
+character after the question mark is a ``P``, you know that it's an extension
+that's specific to Python. Currently there are two such extensions:
+``(?P<name>...)`` defines a named group, and ``(?P=name)`` is a backreference to
+a named group. If future versions of Perl 5 add similar features using a
+different syntax, the :mod:`re` module will be changed to support the new
+syntax, while preserving the Python-specific syntax for compatibility's sake.
+
+Now that we've looked at the general extension syntax, we can return to the
+features that simplify working with groups in complex REs. Since groups are
+numbered from left to right and a complex expression may use many groups, it can
+become difficult to keep track of the correct numbering. Modifying such a
+complex RE is annoying, too: insert a new group near the beginning and you
+change the numbers of everything that follows it.
+
+Sometimes you'll want to use a group to collect a part of a regular expression,
but aren't interested in retrieving the group's contents. You can make this fact
explicit by using a non-capturing group: ``(?:...)``, where you can replace the
``...`` with any other regular expression. ::
@@ -929,7 +916,7 @@ numbers, groups can be referenced by a name.
The syntax for a named group is one of the Python-specific extensions:
``(?P<name>...)``. *name* is, obviously, the name of the group. Named groups
-behave exactly like capturing groups, and additionally associate a name
+also behave exactly like capturing groups, and additionally associate a name
with a group. The :ref:`match object <match-objects>` methods that deal with
capturing groups all accept either integers that refer to the group by number
or strings that contain the desired group's name. Named groups are still
@@ -942,13 +929,6 @@ given numbers, so you can retrieve information about a group in two ways::
>>> m.group(1)
'Lots'
-Additionally, you can retrieve named groups as a dictionary with
-:meth:`~re.Match.groupdict`::
-
- >>> m = re.match(r'(?P<first>\w+) (?P<last>\w+)', 'Jane Doe')
- >>> m.groupdict()
- {'first': 'Jane', 'last': 'Doe'}
-
Named groups are handy because they let you use easily-remembered names, instead
of having to remember numbers. Here's an example RE from the :mod:`imaplib`
module::
@@ -1003,10 +983,9 @@ The pattern to match this is quite simple:
``.*[.].*$``
Notice that the ``.`` needs to be treated specially because it's a
-metacharacter, so it's inside a character class to only match that
-specific character. Also notice the trailing ``$``; this is added to
-ensure that all the rest of the string must be included in the
-extension. This regular expression matches ``foo.bar`` and
+metacharacter; I've put it inside a character class. Also notice the trailing
+``$``; this is added to ensure that all the rest of the string must be included
+in the extension. This regular expression matches ``foo.bar`` and
``autoexec.bat`` and ``sendmail.cf`` and ``printers.conf``.
Now, consider complicating the problem a bit; what if you want to match
@@ -1069,7 +1048,7 @@ using the following pattern methods:
| ``sub()`` | Find all substrings where the RE matches, and |
| | replace them with a different string |
+------------------+-----------------------------------------------+
-| ``subn()`` | Does the same thing as :meth:`!sub`, but |
+| ``subn()`` | Does the same thing as :meth:`sub`, but |
| | returns the new string and the number of |
| | replacements |
+------------------+-----------------------------------------------+
@@ -1078,10 +1057,10 @@ using the following pattern methods:
Splitting Strings
-----------------
-The :meth:`~re.Pattern.split` method of a pattern splits a string apart
+The :meth:`split` method of a pattern splits a string apart
wherever the RE matches, returning a list of the pieces. It's similar to the
-:meth:`~str.split` method of strings but provides much more generality in the
-delimiters that you can split by; string :meth:`!split` only supports splitting by
+:meth:`split` method of strings but provides much more generality in the
+delimiters that you can split by; :meth:`split` only supports splitting by
whitespace or by a fixed string. As you'd expect, there's a module-level
:func:`re.split` function, too.
@@ -1121,11 +1100,11 @@ following calls::
The module-level function :func:`re.split` adds the RE to be used as the first
argument, but is otherwise the same. ::
- >>> re.split(r'[\W]+', 'Words, words, words.')
+ >>> re.split('[\W]+', 'Words, words, words.')
['Words', 'words', 'words', '']
- >>> re.split(r'([\W]+)', 'Words, words, words.')
+ >>> re.split('([\W]+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
- >>> re.split(r'[\W]+', 'Words, words, words.', 1)
+ >>> re.split('[\W]+', 'Words, words, words.', 1)
['Words', 'words, words.']
@@ -1133,9 +1112,10 @@ Search and Replace
------------------
Another common task is to find all the matches for a pattern, and replace them
-with a different string. The :meth:`~re.Pattern.sub` method takes a replacement value,
+with a different string. The :meth:`sub` method takes a replacement value,
which can be either a string or a function, and the string to be processed.
+
.. method:: .sub(replacement, string[, count=0])
:noindex:
@@ -1147,7 +1127,7 @@ which can be either a string or a function, and the string to be processed.
replaced; *count* must be a non-negative integer. The default value of 0 means
to replace all occurrences.
-Here's a simple example of using the :meth:`~re.Pattern.sub` method. It replaces colour
+Here's a simple example of using the :meth:`sub` method. It replaces colour
names with the word ``colour``::
>>> p = re.compile('(blue|white|red)')
@@ -1156,7 +1136,7 @@ names with the word ``colour``::
>>> p.sub('colour', 'blue socks and red shoes', count=1)
'colour socks and red shoes'
-The :meth:`~re.Pattern.subn` method does the same work, but returns a 2-tuple containing the
+The :meth:`subn` method does the same work, but returns a 2-tuple containing the
new string value and the number of replacements that were performed::
>>> p = re.compile('(blue|white|red)')
@@ -1165,16 +1145,16 @@ new string value and the number of replacements that were performed::
>>> p.subn('colour', 'no colours at all')
('no colours at all', 0)
-Empty matches are replaced only when they're not adjacent to a previous empty match.
+Empty matches are replaced only when they're not adjacent to a previous match.
::
>>> p = re.compile('x*')
>>> p.sub('-', 'abxd')
- '-a-b--d-'
+ '-a-b-d-'
If *replacement* is a string, any backslash escapes in it are processed. That
is, ``\n`` is converted to a single newline character, ``\r`` is converted to a
-carriage return, and so forth. Unknown escapes such as ``\&`` are left alone.
+carriage return, and so forth. Unknown escapes such as ``\j`` are left alone.
Backreferences, such as ``\6``, are replaced with the substring matched by the
corresponding group in the RE. This lets you incorporate portions of the
original text in the resulting replacement string.
@@ -1241,24 +1221,24 @@ Use String Methods
Sometimes using the :mod:`re` module is a mistake. If you're matching a fixed
string, or a single character class, and you're not using any :mod:`re` features
-such as the :const:`~re.IGNORECASE` flag, then the full power of regular expressions
+such as the :const:`IGNORECASE` flag, then the full power of regular expressions
may not be required. Strings have several methods for performing operations with
fixed strings and they're usually much faster, because the implementation is a
single small C loop that's been optimized for the purpose, instead of the large,
more generalized regular expression engine.
One example might be replacing a single fixed string with another one; for
-example, you might replace ``word`` with ``deed``. :func:`re.sub` seems like the
-function to use for this, but consider the :meth:`~str.replace` method. Note that
-:meth:`!replace` will also replace ``word`` inside words, turning ``swordfish``
+example, you might replace ``word`` with ``deed``. ``re.sub()`` seems like the
+function to use for this, but consider the :meth:`replace` method. Note that
+:func:`replace` will also replace ``word`` inside words, turning ``swordfish``
into ``sdeedfish``, but the naive RE ``word`` would have done that, too. (To
avoid performing the substitution on parts of words, the pattern would have to
be ``\bword\b``, in order to require that ``word`` have a word boundary on
-either side. This takes the job beyond :meth:`!replace`'s abilities.)
+either side. This takes the job beyond :meth:`replace`'s abilities.)
Another common task is deleting every occurrence of a single character from a
string or replacing it with another single character. You might do this with
-something like ``re.sub('\n', ' ', S)``, but :meth:`~str.translate` is capable of
+something like ``re.sub('\n', ' ', S)``, but :meth:`translate` is capable of
doing both tasks and will be faster than any regular expression operation can
be.
@@ -1269,23 +1249,23 @@ can be solved with a faster and simpler string method.
match() versus search()
-----------------------
-The :func:`~re.match` function only checks if the RE matches at the beginning of the
-string while :func:`~re.search` will scan forward through the string for a match.
-It's important to keep this distinction in mind. Remember, :func:`!match` will
+The :func:`match` function only checks if the RE matches at the beginning of the
+string while :func:`search` will scan forward through the string for a match.
+It's important to keep this distinction in mind. Remember, :func:`match` will
only report a successful match which will start at 0; if the match wouldn't
-start at zero, :func:`!match` will *not* report it. ::
+start at zero, :func:`match` will *not* report it. ::
- >>> print(re.match('super', 'superstition').span())
+ >>> print re.match('super', 'superstition').span()
(0, 5)
- >>> print(re.match('super', 'insuperable'))
+ >>> print re.match('super', 'insuperable')
None
-On the other hand, :func:`~re.search` will scan forward through the string,
+On the other hand, :func:`search` will scan forward through the string,
reporting the first match it finds. ::
- >>> print(re.search('super', 'superstition').span())
+ >>> print re.search('super', 'superstition').span()
(0, 5)
- >>> print(re.search('super', 'insuperable').span())
+ >>> print re.search('super', 'insuperable').span()
(2, 7)
Sometimes you'll be tempted to keep using :func:`re.match`, and just add ``.*``
@@ -1314,17 +1294,17 @@ doesn't work because of the greedy nature of ``.*``. ::
>>> s = '<html><head><title>Title</title>'
>>> len(s)
32
- >>> print(re.match('<.*>', s).span())
+ >>> print re.match('<.*>', s).span()
(0, 32)
- >>> print(re.match('<.*>', s).group())
+ >>> print re.match('<.*>', s).group()
<html><head><title>Title</title>
-The RE matches the ``'<'`` in ``'<html>'``, and the ``.*`` consumes the rest of
+The RE matches the ``'<'`` in ``<html>``, and the ``.*`` consumes the rest of
the string. There's still more left in the RE, though, and the ``>`` can't
match at the end of the string, so the regular expression engine has to
backtrack character by character until it finds a match for the ``>``. The
-final match extends from the ``'<'`` in ``'<html>'`` to the ``'>'`` in
-``'</title>'``, which isn't what you want.
+final match extends from the ``'<'`` in ``<html>`` to the ``'>'`` in
+``</title>``, which isn't what you want.
In this case, the solution is to use the non-greedy qualifiers ``*?``, ``+?``,
``??``, or ``{m,n}?``, which match as *little* text as possible. In the above
@@ -1332,7 +1312,7 @@ example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and
when it fails, the engine advances a character at a time, retrying the ``'>'``
at every step. This produces just the right result::
- >>> print(re.match('<.*?>', s).group())
+ >>> print re.match('<.*?>', s).group()
<html>
(Note that parsing HTML or XML with regular expressions is painful.
@@ -1343,14 +1323,14 @@ be *very* complicated. Use an HTML or XML parser module for such tasks.)
Using re.VERBOSE
-----------------
+--------------------
By now you've probably noticed that regular expressions are a very compact
notation, but they're not terribly readable. REs of moderate complexity can
become lengthy collections of backslashes, parentheses, and metacharacters,
making them difficult to read and understand.
-For such REs, specifying the :const:`re.VERBOSE` flag when compiling the regular
+For such REs, specifying the ``re.VERBOSE`` flag when compiling the regular
expression can be helpful, because it allows you to format the regular
expression more clearly.
@@ -1389,5 +1369,11 @@ Friedl's Mastering Regular Expressions, published by O'Reilly. Unfortunately,
it exclusively concentrates on Perl and Java's flavours of regular expressions,
and doesn't contain any Python material at all, so it won't be useful as a
reference for programming in Python. (The first edition covered Python's
-now-removed :mod:`!regex` module, which won't help you much.) Consider checking
+now-removed :mod:`regex` module, which won't help you much.) Consider checking
it out from your library.
+
+
+.. rubric:: Footnotes
+
+.. [#] Introduced in Python 2.2.2.
+
diff --git a/Doc/howto/sockets.rst b/Doc/howto/sockets.rst
index 4655f28..69abf78 100644
--- a/Doc/howto/sockets.rst
+++ b/Doc/howto/sockets.rst
@@ -19,8 +19,8 @@
Sockets
=======
-I'm only going to talk about INET (i.e. IPv4) sockets, but they account for at least 99% of
-the sockets in use. And I'll only talk about STREAM (i.e. TCP) sockets - unless you really
+I'm only going to talk about INET sockets, but they account for at least 99% of
+the sockets in use. And I'll only talk about STREAM sockets - unless you really
know what you're doing (in which case this HOWTO isn't for you!), you'll get
better behavior and performance from a STREAM socket than anything else. I will
try to clear up the mystery of what a socket is, as well as some hints on how to
@@ -56,10 +56,12 @@ Creating a Socket
Roughly speaking, when you clicked on the link that brought you to this page,
your browser did something like the following::
- # create an INET, STREAMing socket
- s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
- # now connect to the web server on port 80 - the normal http port
- s.connect(("www.python.org", 80))
+ #create an INET, STREAMing socket
+ s = socket.socket(
+ socket.AF_INET, socket.SOCK_STREAM)
+ #now connect to the web server on port 80
+ # - the normal http port
+ s.connect(("www.mcmillan-inc.com", 80))
When the ``connect`` completes, the socket ``s`` can be used to send
in a request for the text of the page. The same socket will read the
@@ -70,11 +72,13 @@ exchanges).
What happens in the web server is a bit more complex. First, the web server
creates a "server socket"::
- # create an INET, STREAMing socket
- serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
- # bind the socket to a public host, and a well-known port
+ #create an INET, STREAMing socket
+ serversocket = socket.socket(
+ socket.AF_INET, socket.SOCK_STREAM)
+ #bind the socket to a public host,
+ # and a well-known port
serversocket.bind((socket.gethostname(), 80))
- # become a server socket
+ #become a server socket
serversocket.listen(5)
A couple things to notice: we used ``socket.gethostname()`` so that the socket
@@ -95,11 +99,11 @@ connections. If the rest of the code is written properly, that should be plenty.
Now that we have a "server" socket, listening on port 80, we can enter the
mainloop of the web server::
- while True:
- # accept connections from outside
+ while 1:
+ #accept connections from outside
(clientsocket, address) = serversocket.accept()
- # now do something with the clientsocket
- # in this case, we'll pretend this is a threaded server
+ #now do something with the clientsocket
+ #in this case, we'll pretend this is a threaded server
ct = client_thread(clientsocket)
ct.run()
@@ -121,13 +125,12 @@ IPC
---
If you need fast IPC between two processes on one machine, you should look into
-pipes or shared memory. If you do decide to use AF_INET sockets, bind the
-"server" socket to ``'localhost'``. On most platforms, this will take a
-shortcut around a couple of layers of network code and be quite a bit faster.
+whatever form of shared memory the platform offers. A simple protocol based
+around shared memory and locks or semaphores is by far the fastest technique.
-.. seealso::
- The :mod:`multiprocessing` integrates cross-platform IPC into a higher-level
- API.
+If you do decide to use sockets, bind the "server" socket to ``'localhost'``. On
+most platforms, this will take a shortcut around a couple of layers of network
+code and be quite a bit faster.
Using a Socket
@@ -180,15 +183,15 @@ righter than others).
Assuming you don't want to end the connection, the simplest solution is a fixed
length message::
- class MySocket:
- """demonstration class only
+ class mysocket:
+ '''demonstration class only
- coded for clarity, not efficiency
- """
+ '''
def __init__(self, sock=None):
if sock is None:
self.sock = socket.socket(
- socket.AF_INET, socket.SOCK_STREAM)
+ socket.AF_INET, socket.SOCK_STREAM)
else:
self.sock = sock
@@ -208,11 +211,11 @@ length message::
bytes_recd = 0
while bytes_recd < MSGLEN:
chunk = self.sock.recv(min(MSGLEN - bytes_recd, 2048))
- if chunk == b'':
+ if chunk == '':
raise RuntimeError("socket connection broken")
chunks.append(chunk)
bytes_recd = bytes_recd + len(chunk)
- return b''.join(chunks)
+ return ''.join(chunks)
The sending code here is usable for almost any messaging scheme - in Python you
send strings, and you can use ``len()`` to determine its length (even if it has
@@ -298,7 +301,7 @@ When Sockets Die
Probably the worst thing about using blocking sockets is what happens when the
other side comes down hard (without doing a ``close``). Your socket is likely to
-hang. TCP is a reliable protocol, and it will wait a long, long time
+hang. SOCKSTREAM is a reliable protocol, and it will wait a long, long time
before giving up on a connection. If you're using threads, the entire thread is
essentially dead. There's not much you can do about it. As long as you aren't
doing something dumb, like holding a lock while doing a blocking read, the
@@ -317,7 +320,7 @@ know about the mechanics of using sockets. You'll still use the same calls, in
much the same ways. It's just that, if you do it right, your app will be almost
inside-out.
-In Python, you use ``socket.setblocking(False)`` to make it non-blocking. In C, it's
+In Python, you use ``socket.setblocking(0)`` to make it non-blocking. In C, it's
more complex, (for one thing, you'll need to choose between the BSD flavor
``O_NONBLOCK`` and the almost indistinguishable Posix flavor ``O_NDELAY``, which
is completely different from ``TCP_NODELAY``), but it's the exact same idea. You
@@ -369,6 +372,12 @@ have created a new socket to ``connect`` to someone else, put it in the
potential_writers list. If it shows up in the writable list, you have a decent
chance that it has connected.
+One very nasty problem with ``select``: if somewhere in those input lists of
+sockets is one which has died a nasty death, the ``select`` will fail. You then
+need to loop through every single damn socket in all those lists and do a
+``select([sock],[],[],0)`` until you find the bad one. That timeout of 0 means
+it won't take long, but it's ugly.
+
Actually, ``select`` can be handy even with blocking sockets. It's one way of
determining whether you will block - the socket returns as readable when there's
something in the buffers. However, this still doesn't help with the problem of
@@ -378,6 +387,32 @@ determining whether the other end is done, or just busy with something else.
files. Don't try this on Windows. On Windows, ``select`` works with sockets
only. Also note that in C, many of the more advanced socket options are done
differently on Windows. In fact, on Windows I usually use threads (which work
-very, very well) with my sockets.
+very, very well) with my sockets. Face it, if you want any kind of performance,
+your code will look very different on Windows than on Unix.
+
+
+Performance
+-----------
+There's no question that the fastest sockets code uses non-blocking sockets and
+select to multiplex them. You can put together something that will saturate a
+LAN connection without putting any strain on the CPU. The trouble is that an app
+written this way can't do much of anything else - it needs to be ready to
+shuffle bytes around at all times.
+
+Assuming that your app is actually supposed to do something more than that,
+threading is the optimal solution, (and using non-blocking sockets will be
+faster than using blocking sockets). Unfortunately, threading support in Unixes
+varies both in API and quality. So the normal Unix solution is to fork a
+subprocess to deal with each connection. The overhead for this is significant
+(and don't do this on Windows - the overhead of process creation is enormous
+there). It also means that unless each subprocess is completely independent,
+you'll need to use another form of IPC, say a pipe, or shared memory and
+semaphores, to communicate between the parent and child processes.
+
+Finally, remember that even though blocking sockets are somewhat slower than
+non-blocking, in many cases they are the "right" solution. After all, if your
+app is driven by the data it receives over a socket, there's not much sense in
+complicating the logic just so your app can wait on ``select`` instead of
+``recv``.
diff --git a/Doc/howto/sorting.rst b/Doc/howto/sorting.rst
index 1d6d5c4..6e31637 100644
--- a/Doc/howto/sorting.rst
+++ b/Doc/howto/sorting.rst
@@ -23,7 +23,7 @@ returns a new sorted list::
>>> sorted([5, 2, 3, 1, 4])
[1, 2, 3, 4, 5]
-You can also use the :meth:`list.sort` method. It modifies the list
+You can also use the :meth:`list.sort` method of a list. It modifies the list
in-place (and returns ``None`` to avoid confusion). Usually it's less convenient
than :func:`sorted` - but if you don't need the original list, it's slightly
more efficient.
@@ -42,8 +42,9 @@ lists. In contrast, the :func:`sorted` function accepts any iterable.
Key Functions
=============
-Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a
-function to be called on each list element prior to making comparisons.
+Starting with Python 2.4, both :meth:`list.sort` and :func:`sorted` added a
+*key* parameter to specify a function to be called on each list element prior to
+making comparisons.
For example, here's a case-insensitive string comparison:
@@ -87,9 +88,9 @@ Operator Module Functions
=========================
The key-function patterns shown above are very common, so Python provides
-convenience functions to make accessor functions easier and faster. The
-:mod:`operator` module has :func:`~operator.itemgetter`,
-:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function.
+convenience functions to make accessor functions easier and faster. The operator
+module has :func:`operator.itemgetter`, :func:`operator.attrgetter`, and
+starting in Python 2.5 an :func:`operator.methodcaller` function.
Using those functions, the above examples become simpler and faster:
@@ -110,6 +111,16 @@ sort by *grade* then by *age*:
>>> sorted(student_objects, key=attrgetter('grade', 'age'))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
+The :func:`operator.methodcaller` function makes method calls with fixed
+parameters for each object being sorted. For example, the :meth:`str.count`
+method could be used to compute message priority by counting the
+number of exclamation marks in a message:
+
+ >>> from operator import methodcaller
+ >>> messages = ['critical!!!', 'hurry!', 'standby', 'immediate!!']
+ >>> sorted(messages, key=methodcaller('count', '!'))
+ ['standby', 'hurry!', 'immediate!!', 'critical!!!']
+
Ascending and Descending
========================
@@ -126,7 +137,7 @@ student data in reverse *age* order:
Sort Stability and Complex Sorts
================================
-Sorts are guaranteed to be `stable
+Starting with Python 2.2, sorts are guaranteed to be `stable
<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
when multiple records have the same key, their original order is preserved.
@@ -145,17 +156,6 @@ ascending *age*, do the *age* sort first and then sort again using *grade*:
>>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
-This can be abstracted out into a wrapper function that can take a list and
-tuples of field and order to sort them on multiple passes.
-
- >>> def multisort(xs, specs):
- ... for key, reverse in reversed(specs):
- ... xs.sort(key=attrgetter(key), reverse=reverse)
- ... return xs
-
- >>> multisort(list(student_objects), (('grade', True), ('age', False)))
- [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
-
The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
does multiple sorts efficiently because it can take advantage of any ordering
already present in a dataset.
@@ -198,8 +198,10 @@ Another name for this idiom is
`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
after Randal L. Schwartz, who popularized it among Perl programmers.
-Now that Python sorting provides key-functions, this technique is not often needed.
-
+For large lists and lists where the comparison information is expensive to
+calculate, and Python versions before 2.4, DSU is likely to be the fastest way
+to sort the list. For 2.4 and later, key functions provide the same
+functionality.
The Old Way Using the *cmp* Parameter
=====================================
@@ -209,11 +211,11 @@ there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword
arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to
handle user specified comparison functions.
-In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to
+In Python 3, the *cmp* parameter was removed entirely (as part of a larger effort to
simplify and unify the language, eliminating the conflict between rich
comparisons and the :meth:`__cmp__` magic method).
-In Py2.x, sort allowed an optional function which can be called for doing the
+In Python 2, :meth:`~list.sort` allowed an optional function which can be called for doing the
comparisons. That function should take two arguments to be compared and then
return a negative value for less-than, return zero if they are equal, or return
a positive value for greater-than. For example, we can do:
@@ -236,7 +238,7 @@ function. The following wrapper makes that easy to do::
def cmp_to_key(mycmp):
'Convert a cmp= function into a key= function'
- class K:
+ class K(object):
def __init__(self, obj, *args):
self.obj = obj
def __lt__(self, other):
@@ -264,8 +266,8 @@ To convert to a key function, just wrap the old comparison function:
>>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
[5, 4, 3, 2, 1]
-In Python 3.2, the :func:`functools.cmp_to_key` function was added to the
-:mod:`functools` module in the standard library.
+In Python 2.7, the :func:`functools.cmp_to_key` function was added to the
+functools module.
Odd and Ends
============
@@ -274,7 +276,7 @@ Odd and Ends
:func:`locale.strcoll` for a comparison function.
* The *reverse* parameter still maintains sort stability (so that records with
- equal keys retain the original order). Interestingly, that effect can be
+ equal keys retain their original order). Interestingly, that effect can be
simulated without the parameter by using the builtin :func:`reversed` function
twice:
@@ -285,20 +287,28 @@ Odd and Ends
>>> standard_way
[('red', 1), ('red', 2), ('blue', 1), ('blue', 2)]
-* The sort routines are guaranteed to use :meth:`__lt__` when making comparisons
- between two objects. So, it is easy to add a standard sort order to a class by
- defining an :meth:`__lt__` method::
+* To create a standard sort order for a class, just add the appropriate rich
+ comparison methods:
+ >>> Student.__eq__ = lambda self, other: self.age == other.age
+ >>> Student.__ne__ = lambda self, other: self.age != other.age
>>> Student.__lt__ = lambda self, other: self.age < other.age
+ >>> Student.__le__ = lambda self, other: self.age <= other.age
+ >>> Student.__gt__ = lambda self, other: self.age > other.age
+ >>> Student.__ge__ = lambda self, other: self.age >= other.age
>>> sorted(student_objects)
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
+ For general purpose comparisons, the recommended approach is to define all six
+ rich comparison operators. The :func:`functools.total_ordering` class
+ decorator makes this easy to implement.
+
* Key functions need not depend directly on the objects being sorted. A key
function can also access external resources. For instance, if the student grades
are stored in a dictionary, they can be used to sort a separate list of student
names:
>>> students = ['dave', 'john', 'jane']
- >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
- >>> sorted(students, key=newgrades.__getitem__)
+ >>> grades = {'john': 'F', 'jane':'A', 'dave': 'C'}
+ >>> sorted(students, key=grades.__getitem__)
['jane', 'dave', 'john']
diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst
index 51bd64b..6724039 100644
--- a/Doc/howto/unicode.rst
+++ b/Doc/howto/unicode.rst
@@ -1,76 +1,105 @@
-.. _unicode-howto:
-
*****************
Unicode HOWTO
*****************
-:Release: 1.12
-
-This HOWTO discusses Python's support for the Unicode specification
-for representing textual data, and explains various problems that
-people commonly encounter when trying to work with Unicode.
+:Release: 1.03
+This HOWTO discusses Python 2.x's support for Unicode, and explains
+various problems that people commonly encounter when trying to work
+with Unicode. For the Python 3 version, see
+<https://docs.python.org/3/howto/unicode.html>.
Introduction to Unicode
=======================
+History of Character Codes
+--------------------------
+
+In 1968, the American Standard Code for Information Interchange, better known by
+its acronym ASCII, was standardized. ASCII defined numeric codes for various
+characters, with the numeric values running from 0 to
+127. For example, the lowercase letter 'a' is assigned 97 as its code
+value.
+
+ASCII was an American-developed standard, so it only defined unaccented
+characters. There was an 'e', but no 'é' or 'Í'. This meant that languages
+which required accented characters couldn't be faithfully represented in ASCII.
+(Actually the missing accents matter for English, too, which contains words such
+as 'naïve' and 'café', and some publications have house styles which require
+spellings such as 'coöperate'.)
+
+For a while people just wrote programs that didn't display accents. I remember
+looking at Apple ][ BASIC programs, published in French-language publications in
+the mid-1980s, that had lines like these::
+
+ PRINT "MISE A JOUR TERMINEE"
+ PRINT "PARAMETRES ENREGISTRES"
+
+Those messages should contain accents, and they just look wrong to someone who
+can read French.
+
+In the 1980s, almost all personal computers were 8-bit, meaning that bytes could
+hold values ranging from 0 to 255. ASCII codes only went up to 127, so some
+machines assigned values between 128 and 255 to accented characters. Different
+machines had different codes, however, which led to problems exchanging files.
+Eventually various commonly used sets of values for the 128--255 range emerged.
+Some were true standards, defined by the International Organization for
+Standardization, and some were *de facto* conventions that were invented by one
+company or another and managed to catch on.
+
+255 characters aren't very many. For example, you can't fit both the accented
+characters used in Western Europe and the Cyrillic alphabet used for Russian
+into the 128--255 range because there are more than 128 such characters.
+
+You could write files using different codes (all your Russian files in a coding
+system called KOI8, all your French files in a different coding system called
+Latin1), but what if you wanted to write a French document that quotes some
+Russian text? In the 1980s people began to want to solve this problem, and the
+Unicode standardization effort began.
+
+Unicode started out using 16-bit characters instead of 8-bit characters. 16
+bits means you have 2^16 = 65,536 distinct values available, making it possible
+to represent many different characters from many different alphabets; an initial
+goal was to have Unicode contain the alphabets for every single human language.
+It turns out that even 16 bits isn't enough to meet that goal, and the modern
+Unicode specification uses a wider range of codes, 0--1,114,111 (0x10ffff in
+base-16).
+
+There's a related ISO standard, ISO 10646. Unicode and ISO 10646 were
+originally separate efforts, but the specifications were merged with the 1.1
+revision of Unicode.
+
+(This discussion of Unicode's history is highly simplified. I don't think the
+average Python programmer needs to worry about the historical details; consult
+the Unicode consortium site listed in the References for more information.)
+
+
Definitions
-----------
-Today's programs need to be able to handle a wide variety of
-characters. Applications are often internationalized to display
-messages and output in a variety of user-selectable languages; the
-same program might need to output an error message in English, French,
-Japanese, Hebrew, or Russian. Web content can be written in any of
-these languages and can also include a variety of emoji symbols.
-Python's string type uses the Unicode Standard for representing
-characters, which lets Python programs work with all these different
-possible characters.
-
-Unicode (https://www.unicode.org/) is a specification that aims to
-list every character used by human languages and give each character
-its own unique code. The Unicode specifications are continually
-revised and updated to add new languages and symbols.
-
A **character** is the smallest possible component of a text. 'A', 'B', 'C',
-etc., are all different characters. So are 'È' and 'Í'. Characters vary
-depending on the language or context you're talking
-about. For example, there's a character for "Roman Numeral One", 'Ⅰ', that's
-separate from the uppercase letter 'I'. They'll usually look the same,
-but these are two different characters that have different meanings.
-
-The Unicode standard describes how characters are represented by
-**code points**. A code point value is an integer in the range 0 to
-0x10FFFF (about 1.1 million values, with some 110 thousand assigned so
-far). In the standard and in this document, a code point is written
-using the notation ``U+265E`` to mean the character with value
-``0x265e`` (9,822 in decimal).
-
-The Unicode standard contains a lot of tables listing characters and
-their corresponding code points:
-
-.. code-block:: none
+etc., are all different characters. So are 'È' and 'Í'. Characters are
+abstractions, and vary depending on the language or context you're talking
+about. For example, the symbol for ohms (Ω) is usually drawn much like the
+capital letter omega (Ω) in the Greek alphabet (they may even be the same in
+some fonts), but these are two different characters that have different
+meanings.
+
+The Unicode standard describes how characters are represented by **code
+points**. A code point is an integer value, usually denoted in base 16. In the
+standard, a code point is written using the notation U+12ca to mean the
+character with value 0x12ca (4810 decimal). The Unicode standard contains a lot
+of tables listing characters and their corresponding code points::
0061 'a'; LATIN SMALL LETTER A
0062 'b'; LATIN SMALL LETTER B
0063 'c'; LATIN SMALL LETTER C
...
007B '{'; LEFT CURLY BRACKET
- ...
- 2167 'Ⅷ'; ROMAN NUMERAL EIGHT
- 2168 'Ⅸ'; ROMAN NUMERAL NINE
- ...
- 265E '♞'; BLACK CHESS KNIGHT
- 265F '♟'; BLACK CHESS PAWN
- ...
- 1F600 '😀'; GRINNING FACE
- 1F609 '😉'; WINKING FACE
- ...
Strictly, these definitions imply that it's meaningless to say 'this is
-character ``U+265E``'. ``U+265E`` is a code point, which represents some particular
-character; in this case, it represents the character 'BLACK CHESS KNIGHT',
-'♞'. In
+character U+12ca'. U+12ca is a code point, which represents some particular
+character; in this case, it represents the character 'ETHIOPIC SYLLABLE WI'. In
informal contexts, this distinction between code points and characters will
sometimes be forgotten.
@@ -85,19 +114,14 @@ toolkit or a terminal's font renderer.
Encodings
---------
-To summarize the previous section: a Unicode string is a sequence of
-code points, which are numbers from 0 through ``0x10FFFF`` (1,114,111
-decimal). This sequence of code points needs to be represented in
-memory as a set of **code units**, and **code units** are then mapped
-to 8-bit bytes. The rules for translating a Unicode string into a
-sequence of bytes are called a **character encoding**, or just
-an **encoding**.
+To summarize the previous section: a Unicode string is a sequence of code
+points, which are numbers from 0 to 0x10ffff. This sequence needs to be
+represented as a set of bytes (meaning, values from 0--255) in memory. The rules
+for translating a Unicode string into a sequence of bytes are called an
+**encoding**.
-The first encoding you might think of is using 32-bit integers as the
-code unit, and then using the CPU's representation of 32-bit integers.
-In this representation, the string "Python" might look like this:
-
-.. code-block:: none
+The first encoding you might think of is an array of 32-bit integers. In this
+representation, the string "Python" would look like this::
P y t h o n
0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00
@@ -109,221 +133,270 @@ problems.
1. It's not portable; different processors order the bytes differently.
2. It's very wasteful of space. In most texts, the majority of the code points
- are less than 127, or less than 255, so a lot of space is occupied by ``0x00``
+ are less than 127, or less than 255, so a lot of space is occupied by zero
bytes. The above string takes 24 bytes compared to the 6 bytes needed for an
ASCII representation. Increased RAM usage doesn't matter too much (desktop
- computers have gigabytes of RAM, and strings aren't usually that large), but
+ computers have megabytes of RAM, and strings aren't usually that large), but
expanding our usage of disk and network bandwidth by a factor of 4 is
intolerable.
3. It's not compatible with existing C functions such as ``strlen()``, so a new
family of wide string functions would need to be used.
-Therefore this encoding isn't used very much, and people instead choose other
-encodings that are more efficient and convenient, such as UTF-8.
+4. Many Internet standards are defined in terms of textual data, and can't
+ handle content with embedded zero bytes.
+
+Generally people don't use this encoding, instead choosing other
+encodings that are more efficient and convenient. UTF-8 is probably
+the most commonly supported encoding; it will be discussed below.
+
+Encodings don't have to handle every possible Unicode character, and most
+encodings don't. For example, Python's default encoding is the 'ascii'
+encoding. The rules for converting a Unicode string into the ASCII encoding are
+simple; for each code point:
-UTF-8 is one of the most commonly used encodings, and Python often
-defaults to using it. UTF stands for "Unicode Transformation Format",
-and the '8' means that 8-bit values are used in the encoding. (There
-are also UTF-16 and UTF-32 encodings, but they are less frequently
-used than UTF-8.) UTF-8 uses the following rules:
+1. If the code point is < 128, each byte is the same as the value of the code
+ point.
-1. If the code point is < 128, it's represented by the corresponding byte value.
-2. If the code point is >= 128, it's turned into a sequence of two, three, or
- four bytes, where each byte of the sequence is between 128 and 255.
+2. If the code point is 128 or greater, the Unicode string can't be represented
+ in this encoding. (Python raises a :exc:`UnicodeEncodeError` exception in this
+ case.)
+
+Latin-1, also known as ISO-8859-1, is a similar encoding. Unicode code points
+0--255 are identical to the Latin-1 values, so converting to this encoding simply
+requires converting code points to byte values; if a code point larger than 255
+is encountered, the string can't be encoded into Latin-1.
+
+Encodings don't have to be simple one-to-one mappings like Latin-1. Consider
+IBM's EBCDIC, which was used on IBM mainframes. Letter values weren't in one
+block: 'a' through 'i' had values from 129 to 137, but 'j' through 'r' were 145
+through 153. If you wanted to use EBCDIC as an encoding, you'd probably use
+some sort of lookup table to perform the conversion, but this is largely an
+internal detail.
+
+UTF-8 is one of the most commonly used encodings. UTF stands for "Unicode
+Transformation Format", and the '8' means that 8-bit numbers are used in the
+encoding. (There's also a UTF-16 encoding, but it's less frequently used than
+UTF-8.) UTF-8 uses the following rules:
+
+1. If the code point is <128, it's represented by the corresponding byte value.
+2. If the code point is between 128 and 0x7ff, it's turned into two byte values
+ between 128 and 255.
+3. Code points >0x7ff are turned into three- or four-byte sequences, where each
+ byte of the sequence is between 128 and 255.
UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
-2. A Unicode string is turned into a sequence of bytes that contains embedded
- zero bytes only where they represent the null character (U+0000). This means
- that UTF-8 strings can be processed by C functions such as ``strcpy()`` and sent
- through protocols that can't handle zero bytes for anything other than
- end-of-string markers.
+2. A Unicode string is turned into a string of bytes containing no embedded zero
+ bytes. This avoids byte-ordering issues, and means UTF-8 strings can be
+ processed by C functions such as ``strcpy()`` and sent through protocols that
+ can't handle zero bytes.
3. A string of ASCII text is also valid UTF-8 text.
-4. UTF-8 is fairly compact; the majority of commonly used characters can be
- represented with one or two bytes.
+4. UTF-8 is fairly compact; the majority of code points are turned into two
+ bytes, and values less than 128 occupy only a single byte.
5. If bytes are corrupted or lost, it's possible to determine the start of the
next UTF-8-encoded code point and resynchronize. It's also unlikely that
random 8-bit data will look like valid UTF-8.
-6. UTF-8 is a byte oriented encoding. The encoding specifies that each
- character is represented by a specific sequence of one or more bytes. This
- avoids the byte-ordering issues that can occur with integer and word oriented
- encodings, like UTF-16 and UTF-32, where the sequence of bytes varies depending
- on the hardware on which the string was encoded.
+
References
----------
-The `Unicode Consortium site <http://www.unicode.org>`_ has character charts, a
+The Unicode Consortium site at <http://www.unicode.org> has character charts, a
glossary, and PDF versions of the Unicode specification. Be prepared for some
-difficult reading. `A chronology <http://www.unicode.org/history/>`_ of the
-origin and development of Unicode is also available on the site.
+difficult reading. <http://www.unicode.org/history/> is a chronology of the
+origin and development of Unicode.
-On the Computerphile Youtube channel, Tom Scott briefly
-`discusses the history of Unicode and UTF-8 <https://www.youtube.com/watch?v=MijmeoH9LT4>`_
-(9 minutes 36 seconds).
+To help understand the standard, Jukka Korpela has written an introductory guide
+to reading the Unicode character tables, available at
+<https://www.cs.tut.fi/~jkorpela/unicode/guide.html>.
-To help understand the standard, Jukka Korpela has written `an introductory
-guide <http://jkorpela.fi/unicode/guide.html>`_ to reading the
-Unicode character tables.
+Another good introductory article was written by Joel Spolsky
+<http://www.joelonsoftware.com/articles/Unicode.html>.
+If this introduction didn't make things clear to you, you should try reading this
+alternate article before continuing.
-Another `good introductory article <https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/>`_
-was written by Joel Spolsky.
-If this introduction didn't make things clear to you, you should try
-reading this alternate article before continuing.
+.. Jason Orendorff XXX http://www.jorendorff.com/articles/unicode/ is broken
-Wikipedia entries are often helpful; see the entries for "`character encoding
-<https://en.wikipedia.org/wiki/Character_encoding>`_" and `UTF-8
-<https://en.wikipedia.org/wiki/UTF-8>`_, for example.
+Wikipedia entries are often helpful; see the entries for "character encoding"
+<http://en.wikipedia.org/wiki/Character_encoding> and UTF-8
+<http://en.wikipedia.org/wiki/UTF-8>, for example.
-Python's Unicode Support
-========================
+Python 2.x's Unicode Support
+============================
Now that you've learned the rudiments of Unicode, we can look at Python's
Unicode features.
-The String Type
----------------
-
-Since Python 3.0, the language's :class:`str` type contains Unicode
-characters, meaning any string created using ``"unicode rocks!"``, ``'unicode
-rocks!'``, or the triple-quoted string syntax is stored as Unicode.
-
-The default encoding for Python source code is UTF-8, so you can simply
-include a Unicode character in a string literal::
- try:
- with open('/tmp/input.txt', 'r') as f:
- ...
- except OSError:
- # 'File not found' error message.
- print("Fichier non trouvé")
-
-Side note: Python 3 also supports using Unicode characters in identifiers::
-
- répertoire = "/tmp/records.log"
- with open(répertoire, "w") as f:
- f.write("test\n")
-
-If you can't enter a particular character in your editor or want to
-keep the source code ASCII-only for some reason, you can also use
-escape sequences in string literals. (Depending on your system,
-you may see the actual capital-delta glyph instead of a \u escape.) ::
-
- >>> "\N{GREEK CAPITAL LETTER DELTA}" # Using the character name
- '\u0394'
- >>> "\u0394" # Using a 16-bit hex value
- '\u0394'
- >>> "\U00000394" # Using a 32-bit hex value
- '\u0394'
-
-In addition, one can create a string using the :func:`~bytes.decode` method of
-:class:`bytes`. This method takes an *encoding* argument, such as ``UTF-8``,
-and optionally an *errors* argument.
+The Unicode Type
+----------------
+
+Unicode strings are expressed as instances of the :class:`unicode` type, one of
+Python's repertoire of built-in types. It derives from an abstract type called
+:class:`basestring`, which is also an ancestor of the :class:`str` type; you can
+therefore check if a value is a string type with ``isinstance(value,
+basestring)``. Under the hood, Python represents Unicode strings as either 16-
+or 32-bit integers, depending on how the Python interpreter was compiled.
+
+The :func:`unicode` constructor has the signature ``unicode(string[, encoding,
+errors])``. All of its arguments should be 8-bit strings. The first argument
+is converted to Unicode using the specified encoding; if you leave off the
+``encoding`` argument, the ASCII encoding is used for the conversion, so
+characters greater than 127 will be treated as errors::
+
+ >>> unicode('abcdef')
+ u'abcdef'
+ >>> s = unicode('abcdef')
+ >>> type(s)
+ <type 'unicode'>
+ >>> unicode('abcdef' + chr(255)) #doctest: +NORMALIZE_WHITESPACE
+ Traceback (most recent call last):
+ ...
+ UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
+ ordinal not in range(128)
-The *errors* argument specifies the response when the input string can't be
+The ``errors`` argument specifies the response when the input string can't be
converted according to the encoding's rules. Legal values for this argument are
-``'strict'`` (raise a :exc:`UnicodeDecodeError` exception), ``'replace'`` (use
-``U+FFFD``, ``REPLACEMENT CHARACTER``), ``'ignore'`` (just leave the
-character out of the Unicode result), or ``'backslashreplace'`` (inserts a
-``\xNN`` escape sequence).
-The following examples show the differences::
+'strict' (raise a ``UnicodeDecodeError`` exception), 'replace' (add U+FFFD,
+'REPLACEMENT CHARACTER'), or 'ignore' (just leave the character out of the
+Unicode result). The following examples show the differences::
- >>> b'\x80abc'.decode("utf-8", "strict") #doctest: +NORMALIZE_WHITESPACE
+ >>> unicode('\x80abc', errors='strict') #doctest: +NORMALIZE_WHITESPACE
Traceback (most recent call last):
...
- UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
- invalid start byte
- >>> b'\x80abc'.decode("utf-8", "replace")
- '\ufffdabc'
- >>> b'\x80abc'.decode("utf-8", "backslashreplace")
- '\\x80abc'
- >>> b'\x80abc'.decode("utf-8", "ignore")
- 'abc'
-
-Encodings are specified as strings containing the encoding's name. Python
+ UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
+ ordinal not in range(128)
+ >>> unicode('\x80abc', errors='replace')
+ u'\ufffdabc'
+ >>> unicode('\x80abc', errors='ignore')
+ u'abc'
+
+Encodings are specified as strings containing the encoding's name. Python 2.7
comes with roughly 100 different encodings; see the Python Library Reference at
-:ref:`standard-encodings` for a list. Some encodings have multiple names; for
-example, ``'latin-1'``, ``'iso_8859_1'`` and ``'8859``' are all synonyms for
-the same encoding.
+:ref:`standard-encodings` for a list. Some encodings
+have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are all
+synonyms for the same encoding.
-One-character Unicode strings can also be created with the :func:`chr`
+One-character Unicode strings can also be created with the :func:`unichr`
built-in function, which takes integers and returns a Unicode string of length 1
that contains the corresponding code point. The reverse operation is the
built-in :func:`ord` function that takes a one-character Unicode string and
returns the code point value::
- >>> chr(57344)
- '\ue000'
- >>> ord('\ue000')
- 57344
-
-Converting to Bytes
--------------------
-
-The opposite method of :meth:`bytes.decode` is :meth:`str.encode`,
-which returns a :class:`bytes` representation of the Unicode string, encoded in the
-requested *encoding*.
-
-The *errors* parameter is the same as the parameter of the
-:meth:`~bytes.decode` method but supports a few more possible handlers. As well as
-``'strict'``, ``'ignore'``, and ``'replace'`` (which in this case
-inserts a question mark instead of the unencodable character), there is
-also ``'xmlcharrefreplace'`` (inserts an XML character reference),
-``backslashreplace`` (inserts a ``\uNNNN`` escape sequence) and
-``namereplace`` (inserts a ``\N{...}`` escape sequence).
-
-The following example shows the different results::
-
- >>> u = chr(40960) + 'abcd' + chr(1972)
+ >>> unichr(40960)
+ u'\ua000'
+ >>> ord(u'\ua000')
+ 40960
+
+Instances of the :class:`unicode` type have many of the same methods as the
+8-bit string type for operations such as searching and formatting::
+
+ >>> s = u'Was ever feather so lightly blown to and fro as this multitude?'
+ >>> s.count('e')
+ 5
+ >>> s.find('feather')
+ 9
+ >>> s.find('bird')
+ -1
+ >>> s.replace('feather', 'sand')
+ u'Was ever sand so lightly blown to and fro as this multitude?'
+ >>> s.upper()
+ u'WAS EVER FEATHER SO LIGHTLY BLOWN TO AND FRO AS THIS MULTITUDE?'
+
+Note that the arguments to these methods can be Unicode strings or 8-bit
+strings. 8-bit strings will be converted to Unicode before carrying out the
+operation; Python's default ASCII encoding will be used, so characters greater
+than 127 will cause an exception::
+
+ >>> s.find('Was\x9f') #doctest: +NORMALIZE_WHITESPACE
+ Traceback (most recent call last):
+ ...
+ UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3:
+ ordinal not in range(128)
+ >>> s.find(u'Was\x9f')
+ -1
+
+Much Python code that operates on strings will therefore work with Unicode
+strings without requiring any changes to the code. (Input and output code needs
+more updating for Unicode; more on this later.)
+
+Another important method is ``.encode([encoding], [errors='strict'])``, which
+returns an 8-bit string version of the Unicode string, encoded in the requested
+encoding. The ``errors`` parameter is the same as the parameter of the
+``unicode()`` constructor, with one additional possibility; as well as 'strict',
+'ignore', and 'replace', you can also pass 'xmlcharrefreplace' which uses XML's
+character references. The following example shows the different results::
+
+ >>> u = unichr(40960) + u'abcd' + unichr(1972)
>>> u.encode('utf-8')
- b'\xea\x80\x80abcd\xde\xb4'
- >>> u.encode('ascii') #doctest: +NORMALIZE_WHITESPACE
+ '\xea\x80\x80abcd\xde\xb4'
+ >>> u.encode('ascii') #doctest: +NORMALIZE_WHITESPACE
Traceback (most recent call last):
...
- UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
- position 0: ordinal not in range(128)
+ UnicodeEncodeError: 'ascii' codec can't encode character u'\ua000' in
+ position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
- b'abcd'
+ 'abcd'
>>> u.encode('ascii', 'replace')
- b'?abcd?'
+ '?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
- b'&#40960;abcd&#1972;'
- >>> u.encode('ascii', 'backslashreplace')
- b'\\ua000abcd\\u07b4'
- >>> u.encode('ascii', 'namereplace')
- b'\\N{YI SYLLABLE IT}abcd\\u07b4'
+ '&#40960;abcd&#1972;'
+
+Python's 8-bit strings have a ``.decode([encoding], [errors])`` method that
+interprets the string using the given encoding::
-The low-level routines for registering and accessing the available
-encodings are found in the :mod:`codecs` module. Implementing new
-encodings also requires understanding the :mod:`codecs` module.
-However, the encoding and decoding functions returned by this module
-are usually more low-level than is comfortable, and writing new encodings
-is a specialized task, so the module won't be covered in this HOWTO.
+ >>> u = unichr(40960) + u'abcd' + unichr(1972) # Assemble a string
+ >>> utf8_version = u.encode('utf-8') # Encode as UTF-8
+ >>> type(utf8_version), utf8_version
+ (<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
+ >>> u2 = utf8_version.decode('utf-8') # Decode using UTF-8
+ >>> u == u2 # The two strings match
+ True
+
+The low-level routines for registering and accessing the available encodings are
+found in the :mod:`codecs` module. However, the encoding and decoding functions
+returned by this module are usually more low-level than is comfortable, so I'm
+not going to describe the :mod:`codecs` module here. If you need to implement a
+completely new encoding, you'll need to learn about the :mod:`codecs` module
+interfaces, but implementing encodings is a specialized task that also won't be
+covered here. Consult the Python documentation to learn more about this module.
+
+The most commonly used part of the :mod:`codecs` module is the
+:func:`codecs.open` function which will be discussed in the section on input and
+output.
Unicode Literals in Python Source Code
--------------------------------------
-In Python source code, specific Unicode code points can be written using the
-``\u`` escape sequence, which is followed by four hex digits giving the code
-point. The ``\U`` escape sequence is similar, but expects eight hex digits,
-not four::
+In Python source code, Unicode literals are written as strings prefixed with the
+'u' or 'U' character: ``u'abcdefghijk'``. Specific code points can be written
+using the ``\u`` escape sequence, which is followed by four hex digits giving
+the code point. The ``\U`` escape sequence is similar, but expects 8 hex
+digits, not 4.
- >>> s = "a\xac\u1234\u20ac\U00008000"
- ... # ^^^^ two-digit hex escape
- ... # ^^^^^^ four-digit Unicode escape
- ... # ^^^^^^^^^^ eight-digit Unicode escape
- >>> [ord(c) for c in s]
- [97, 172, 4660, 8364, 32768]
+Unicode literals can also use the same escape sequences as 8-bit strings,
+including ``\x``, but ``\x`` only takes two hex digits so it can't express an
+arbitrary code point. Octal escapes can go up to U+01ff, which is octal 777.
+
+::
+
+ >>> s = u"a\xac\u1234\u20ac\U00008000"
+ ... # ^^^^ two-digit hex escape
+ ... # ^^^^^^ four-digit Unicode escape
+ ... # ^^^^^^^^^^ eight-digit Unicode escape
+ >>> for c in s: print ord(c),
+ ...
+ 97 172 4660 8364 32768
Using escape sequences for code points greater than 127 is fine in small doses,
but becomes an annoyance if you're using many accented characters, as you would
in a program with messages in French or some other accent-using language. You
-can also assemble strings using the :func:`chr` built-in function, but this is
+can also assemble strings using the :func:`unichr` built-in function, but this is
even more tedious.
Ideally, you'd want to be able to write literals in your language's natural
@@ -331,15 +404,15 @@ encoding. You could then edit Python source code with your favorite editor
which would display the accented characters naturally, and have the right
characters used at runtime.
-Python supports writing source code in UTF-8 by default, but you can use almost
-any encoding if you declare the encoding being used. This is done by including
-a special comment as either the first or second line of the source file::
+Python supports writing Unicode literals in any encoding, but you have to
+declare the encoding being used. This is done by including a special comment as
+either the first or second line of the source file::
#!/usr/bin/env python
# -*- coding: latin-1 -*-
- u = 'abcdé'
- print(ord(u[-1]))
+ u = u'abcdé'
+ print ord(u[-1])
The syntax is inspired by Emacs's notation for specifying variables local to a
file. Emacs supports many different variables, but Python only supports
@@ -347,38 +420,57 @@ file. Emacs supports many different variables, but Python only supports
they have no significance to Python but are a convention. Python looks for
``coding: name`` or ``coding=name`` in the comment.
-If you don't include such a comment, the default encoding used will be UTF-8 as
-already mentioned. See also :pep:`263` for more information.
+If you don't include such a comment, the default encoding used will be ASCII.
+Versions of Python before 2.4 were Euro-centric and assumed Latin-1 as a default
+encoding for string literals; in Python 2.4, characters greater than 127 still
+work but result in a warning. For example, the following program has no
+encoding declaration::
+
+ #!/usr/bin/env python
+ u = u'abcdé'
+ print ord(u[-1])
+
+When you run it with Python 2.4, it will output the following warning::
+
+ amk:~$ python2.4 p263.py
+ sys:1: DeprecationWarning: Non-ASCII character '\xe9'
+ in file p263.py on line 2, but no encoding declared;
+ see https://www.python.org/peps/pep-0263.html for details
+
+Python 2.5 and higher are stricter and will produce a syntax error::
+
+ amk:~$ python2.5 p263.py
+ File "/tmp/p263.py", line 2
+ SyntaxError: Non-ASCII character '\xc3' in file /tmp/p263.py
+ on line 2, but no encoding declared; see
+ https://www.python.org/peps/pep-0263.html for details
Unicode Properties
------------------
-The Unicode specification includes a database of information about
-code points. For each defined code point, the information includes
-the character's name, its category, the numeric value if applicable
-(for characters representing numeric concepts such as the Roman
-numerals, fractions such as one-third and four-fifths, etc.). There
-are also display-related properties, such as how to use the code point
-in bidirectional text.
+The Unicode specification includes a database of information about code points.
+For each code point that's defined, the information includes the character's
+name, its category, the numeric value if applicable (Unicode has characters
+representing the Roman numerals and fractions such as one-third and
+four-fifths). There are also properties related to the code point's use in
+bidirectional text and other display-related properties.
The following program displays some information about several characters, and
prints the numeric value of one particular character::
import unicodedata
- u = chr(233) + chr(0x0bf2) + chr(3972) + chr(6000) + chr(13231)
+ u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) + unichr(13231)
for i, c in enumerate(u):
- print(i, '%04x' % ord(c), unicodedata.category(c), end=" ")
- print(unicodedata.name(c))
+ print i, '%04x' % ord(c), unicodedata.category(c),
+ print unicodedata.name(c)
# Get numeric value of second character
- print(unicodedata.numeric(u[1]))
-
-When run, this prints:
+ print unicodedata.numeric(u[1])
-.. code-block:: none
+When run, this prints::
0 00e9 Ll LATIN SMALL LETTER E WITH ACUTE
1 0bf2 No TAMIL NUMBER ONE THOUSAND
@@ -393,144 +485,23 @@ These are grouped into categories such as "Letter", "Number", "Punctuation", or
from the above output, ``'Ll'`` means 'Letter, lowercase', ``'No'`` means
"Number, other", ``'Mn'`` is "Mark, nonspacing", and ``'So'`` is "Symbol,
other". See
-`the General Category Values section of the Unicode Character Database documentation <http://www.unicode.org/reports/tr44/#General_Category_Values>`_ for a
+<http://www.unicode.org/reports/tr44/#General_Category_Values> for a
list of category codes.
-
-Comparing Strings
------------------
-
-Unicode adds some complication to comparing strings, because the same
-set of characters can be represented by different sequences of code
-points. For example, a letter like 'ê' can be represented as a single
-code point U+00EA, or as U+0065 U+0302, which is the code point for
-'e' followed by a code point for 'COMBINING CIRCUMFLEX ACCENT'. These
-will produce the same output when printed, but one is a string of
-length 1 and the other is of length 2.
-
-One tool for a case-insensitive comparison is the
-:meth:`~str.casefold` string method that converts a string to a
-case-insensitive form following an algorithm described by the Unicode
-Standard. This algorithm has special handling for characters such as
-the German letter 'ß' (code point U+00DF), which becomes the pair of
-lowercase letters 'ss'.
-
-::
-
- >>> street = 'Gürzenichstraße'
- >>> street.casefold()
- 'gürzenichstrasse'
-
-A second tool is the :mod:`unicodedata` module's
-:func:`~unicodedata.normalize` function that converts strings to one
-of several normal forms, where letters followed by a combining
-character are replaced with single characters. :func:`normalize` can
-be used to perform string comparisons that won't falsely report
-inequality if two strings use combining characters differently:
-
-::
-
- import unicodedata
-
- def compare_strs(s1, s2):
- def NFD(s):
- return unicodedata.normalize('NFD', s)
-
- return NFD(s1) == NFD(s2)
-
- single_char = 'ê'
- multiple_chars = '\N{LATIN SMALL LETTER E}\N{COMBINING CIRCUMFLEX ACCENT}'
- print('length of first string=', len(single_char))
- print('length of second string=', len(multiple_chars))
- print(compare_strs(single_char, multiple_chars))
-
-When run, this outputs:
-
-.. code-block:: shell-session
-
- $ python3 compare-strs.py
- length of first string= 1
- length of second string= 2
- True
-
-The first argument to the :func:`~unicodedata.normalize` function is a
-string giving the desired normalization form, which can be one of
-'NFC', 'NFKC', 'NFD', and 'NFKD'.
-
-The Unicode Standard also specifies how to do caseless comparisons::
-
- import unicodedata
-
- def compare_caseless(s1, s2):
- def NFD(s):
- return unicodedata.normalize('NFD', s)
-
- return NFD(NFD(s1).casefold()) == NFD(NFD(s2).casefold())
-
- # Example usage
- single_char = 'ê'
- multiple_chars = '\N{LATIN CAPITAL LETTER E}\N{COMBINING CIRCUMFLEX ACCENT}'
-
- print(compare_caseless(single_char, multiple_chars))
-
-This will print ``True``. (Why is :func:`NFD` invoked twice? Because
-there are a few characters that make :meth:`casefold` return a
-non-normalized string, so the result needs to be normalized again. See
-section 3.13 of the Unicode Standard for a discussion and an example.)
-
-
-Unicode Regular Expressions
----------------------------
-
-The regular expressions supported by the :mod:`re` module can be provided
-either as bytes or strings. Some of the special character sequences such as
-``\d`` and ``\w`` have different meanings depending on whether
-the pattern is supplied as bytes or a string. For example,
-``\d`` will match the characters ``[0-9]`` in bytes but
-in strings will match any character that's in the ``'Nd'`` category.
-
-The string in this example has the number 57 written in both Thai and
-Arabic numerals::
-
- import re
- p = re.compile(r'\d+')
-
- s = "Over \u0e55\u0e57 57 flavours"
- m = p.search(s)
- print(repr(m.group()))
-
-When executed, ``\d+`` will match the Thai numerals and print them
-out. If you supply the :const:`re.ASCII` flag to
-:func:`~re.compile`, ``\d+`` will match the substring "57" instead.
-
-Similarly, ``\w`` matches a wide variety of Unicode characters but
-only ``[a-zA-Z0-9_]`` in bytes or if :const:`re.ASCII` is supplied,
-and ``\s`` will match either Unicode whitespace characters or
-``[ \t\n\r\f\v]``.
-
-
References
----------
-.. comment should these be mentioned earlier, e.g. at the start of the "introduction to Unicode" first section?
-
-Some good alternative discussions of Python's Unicode support are:
-
-* `Processing Text Files in Python 3 <http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html>`_, by Nick Coghlan.
-* `Pragmatic Unicode <https://nedbatchelder.com/text/unipain.html>`_, a PyCon 2012 presentation by Ned Batchelder.
-
-The :class:`str` type is described in the Python library reference at
-:ref:`textseq`.
+The Unicode and 8-bit string types are described in the Python library reference
+at :ref:`typesseq`.
The documentation for the :mod:`unicodedata` module.
The documentation for the :mod:`codecs` module.
-Marc-André Lemburg gave `a presentation titled "Python and Unicode" (PDF slides)
-<https://downloads.egenix.com/python/Unicode-EPC2002-Talk.pdf>`_ at
-EuroPython 2002. The slides are an excellent overview of the design of Python
-2's Unicode features (where the Unicode string type is called ``unicode`` and
-literals start with ``u``).
+Marc-André Lemburg gave a presentation at EuroPython 2002 titled "Python and
+Unicode". A PDF version of his slides is available at
+<https://downloads.egenix.com/python/Unicode-EPC2002-Talk.pdf>, and is an
+excellent overview of the design of Python's Unicode features.
Reading and Writing Unicode Data
@@ -548,43 +519,54 @@ columns and can return Unicode values from an SQL query.
Unicode data is usually converted to a particular encoding before it gets
written to disk or sent over a socket. It's possible to do all the work
-yourself: open a file, read an 8-bit bytes object from it, and convert the bytes
-with ``bytes.decode(encoding)``. However, the manual approach is not recommended.
+yourself: open a file, read an 8-bit string from it, and convert the string with
+``unicode(str, encoding)``. However, the manual approach is not recommended.
One problem is the multi-byte nature of encodings; one Unicode character can be
represented by several bytes. If you want to read the file in arbitrary-sized
-chunks (say, 1024 or 4096 bytes), you need to write error-handling code to catch the case
+chunks (say, 1K or 4K), you need to write error-handling code to catch the case
where only part of the bytes encoding a single Unicode character are read at the
end of a chunk. One solution would be to read the entire file into memory and
then perform the decoding, but that prevents you from working with files that
-are extremely large; if you need to read a 2 GiB file, you need 2 GiB of RAM.
+are extremely large; if you need to read a 2Gb file, you need 2Gb of RAM.
(More, really, since for at least a moment you'd need to have both the encoded
string and its Unicode version in memory.)
The solution would be to use the low-level decoding interface to catch the case
of partial coding sequences. The work of implementing this has already been
-done for you: the built-in :func:`open` function can return a file-like object
-that assumes the file's contents are in a specified encoding and accepts Unicode
-parameters for methods such as :meth:`~io.TextIOBase.read` and
-:meth:`~io.TextIOBase.write`. This works through :func:`open`\'s *encoding* and
-*errors* parameters which are interpreted just like those in :meth:`str.encode`
-and :meth:`bytes.decode`.
+done for you: the :mod:`codecs` module includes a version of the :func:`open`
+function that returns a file-like object that assumes the file's contents are in
+a specified encoding and accepts Unicode parameters for methods such as
+``.read()`` and ``.write()``.
+
+The function's parameters are ``open(filename, mode='rb', encoding=None,
+errors='strict', buffering=1)``. ``mode`` can be ``'r'``, ``'w'``, or ``'a'``,
+just like the corresponding parameter to the regular built-in ``open()``
+function; add a ``'+'`` to update the file. ``buffering`` is similarly parallel
+to the standard function's parameter. ``encoding`` is a string giving the
+encoding to use; if it's left as ``None``, a regular Python file object that
+accepts 8-bit strings is returned. Otherwise, a wrapper object is returned, and
+data written to or read from the wrapper object will be converted as needed.
+``errors`` specifies the action for encoding errors and can be one of the usual
+values of 'strict', 'ignore', and 'replace'.
Reading Unicode from a file is therefore simple::
- with open('unicode.txt', encoding='utf-8') as f:
- for line in f:
- print(repr(line))
+ import codecs
+ f = codecs.open('unicode.rst', encoding='utf-8')
+ for line in f:
+ print repr(line)
It's also possible to open files in update mode, allowing both reading and
writing::
- with open('test', encoding='utf-8', mode='w+') as f:
- f.write('\u4500 blah blah blah\n')
- f.seek(0)
- print(repr(f.readline()[:1]))
+ f = codecs.open('test', encoding='utf-8', mode='w+')
+ f.write(u'\u4500 blah blah blah\n')
+ f.seek(0)
+ print repr(f.readline()[:1])
+ f.close()
-The Unicode character ``U+FEFF`` is used as a byte-order mark (BOM), and is often
+Unicode character U+FEFF is used as a byte-order mark (BOM), and is often
written as the first character of a file in order to assist with autodetection
of the file's byte ordering. Some encodings, such as UTF-16, expect a BOM to be
present at the start of a file; when such an encoding is used, the BOM will be
@@ -593,24 +575,18 @@ the file is read. There are variants of these encodings, such as 'utf-16-le'
and 'utf-16-be' for little-endian and big-endian encodings, that specify one
particular byte ordering and don't skip the BOM.
-In some areas, it is also convention to use a "BOM" at the start of UTF-8
-encoded files; the name is misleading since UTF-8 is not byte-order dependent.
-The mark simply announces that the file is encoded in UTF-8. For reading such
-files, use the 'utf-8-sig' codec to automatically skip the mark if present.
-
Unicode filenames
-----------------
-Most of the operating systems in common use today support filenames
-that contain arbitrary Unicode characters. Usually this is
-implemented by converting the Unicode string into some encoding that
-varies depending on the system. Today Python is converging on using
-UTF-8: Python on MacOS has used UTF-8 for several versions, and Python
-3.6 switched to using UTF-8 on Windows as well. On Unix systems,
-there will only be a filesystem encoding if you've set the ``LANG`` or
-``LC_CTYPE`` environment variables; if you haven't, the default
-encoding is again UTF-8.
+Most of the operating systems in common use today support filenames that contain
+arbitrary Unicode characters. Usually this is implemented by converting the
+Unicode string into some encoding that varies depending on the system. For
+example, Mac OS X uses UTF-8 while Windows uses a configurable encoding; on
+Windows, Python uses the name "mbcs" to refer to whatever the currently
+configured encoding is. On Unix systems, there will only be a filesystem
+encoding if you've set the ``LANG`` or ``LC_CTYPE`` environment variables; if
+you haven't, the default encoding is ASCII.
The :func:`sys.getfilesystemencoding` function returns the encoding to use on
your current system, in case you want to do the encoding manually, but there's
@@ -618,46 +594,42 @@ not much reason to bother. When opening a file for reading or writing, you can
usually just provide the Unicode string as the filename, and it will be
automatically converted to the right encoding for you::
- filename = 'filename\u4500abc'
- with open(filename, 'w') as f:
- f.write('blah\n')
+ filename = u'filename\u4500abc'
+ f = open(filename, 'w')
+ f.write('blah\n')
+ f.close()
Functions in the :mod:`os` module such as :func:`os.stat` will also accept Unicode
filenames.
-The :func:`os.listdir` function returns filenames, which raises an issue: should it return
-the Unicode version of filenames, or should it return bytes containing
-the encoded versions? :func:`os.listdir` can do both, depending on whether you
-provided the directory path as bytes or a Unicode string. If you pass a
-Unicode string as the path, filenames will be decoded using the filesystem's
-encoding and a list of Unicode strings will be returned, while passing a byte
-path will return the filenames as bytes. For example,
-assuming the default filesystem encoding is UTF-8, running the following
-program::
-
- fn = 'filename\u4500abc'
+:func:`os.listdir`, which returns filenames, raises an issue: should it return
+the Unicode version of filenames, or should it return 8-bit strings containing
+the encoded versions? :func:`os.listdir` will do both, depending on whether you
+provided the directory path as an 8-bit string or a Unicode string. If you pass
+a Unicode string as the path, filenames will be decoded using the filesystem's
+encoding and a list of Unicode strings will be returned, while passing an 8-bit
+path will return the 8-bit versions of the filenames. For example, assuming the
+default filesystem encoding is UTF-8, running the following program::
+
+ fn = u'filename\u4500abc'
f = open(fn, 'w')
f.close()
import os
- print(os.listdir(b'.'))
- print(os.listdir('.'))
+ print os.listdir('.')
+ print os.listdir(u'.')
will produce the following output:
.. code-block:: shell-session
- $ python listdir-test.py
- [b'filename\xe4\x94\x80abc', ...]
- ['filename\u4500abc', ...]
+ amk:~$ python t.py
+ ['.svn', 'filename\xe4\x94\x80abc', ...]
+ [u'.svn', u'filename\u4500abc', ...]
The first list contains UTF-8-encoded filenames, and the second list contains
the Unicode versions.
-Note that on most occasions, you should can just stick with using
-Unicode with these APIs. The bytes APIs should only be used on
-systems where undecodable file names can be present; that's
-pretty much only Unix systems now.
Tips for Writing Unicode-aware Programs
@@ -668,96 +640,109 @@ Unicode.
The most important tip is:
- Software should only work with Unicode strings internally, decoding the input
- data as soon as possible and encoding the output only at the end.
+ Software should only work with Unicode strings internally, converting to a
+ particular encoding on output.
-If you attempt to write processing functions that accept both Unicode and byte
+If you attempt to write processing functions that accept both Unicode and 8-bit
strings, you will find your program vulnerable to bugs wherever you combine the
-two different kinds of strings. There is no automatic encoding or decoding: if
-you do e.g. ``str + bytes``, a :exc:`TypeError` will be raised.
+two different kinds of strings. Python's default encoding is ASCII, so whenever
+a character with an ASCII value > 127 is in the input data, you'll get a
+:exc:`UnicodeDecodeError` because that character can't be handled by the ASCII
+encoding.
+
+It's easy to miss such problems if you only test your software with data that
+doesn't contain any accents; everything will seem to work, but there's actually
+a bug in your program waiting for the first user who attempts to use characters
+> 127. A second tip, therefore, is:
+
+ Include characters > 127 and, even better, characters > 255 in your test
+ data.
When using data coming from a web browser or some other untrusted source, a
common technique is to check for illegal characters in a string before using the
string in a generated command line or storing it in a database. If you're doing
-this, be careful to check the decoded string, not the encoded bytes data;
-some encodings may have interesting properties, such as not being bijective
-or not being fully ASCII-compatible. This is especially true if the input
-data also specifies the encoding, since the attacker can then choose a
-clever way to hide malicious text in the encoded bytestream.
-
-
-Converting Between File Encodings
-'''''''''''''''''''''''''''''''''
-
-The :class:`~codecs.StreamRecoder` class can transparently convert between
-encodings, taking a stream that returns data in encoding #1
-and behaving like a stream returning data in encoding #2.
-
-For example, if you have an input file *f* that's in Latin-1, you
-can wrap it with a :class:`~codecs.StreamRecoder` to return bytes encoded in
-UTF-8::
-
- new_f = codecs.StreamRecoder(f,
- # en/decoder: used by read() to encode its results and
- # by write() to decode its input.
- codecs.getencoder('utf-8'), codecs.getdecoder('utf-8'),
-
- # reader/writer: used to read and write to the stream.
- codecs.getreader('latin-1'), codecs.getwriter('latin-1') )
-
-
-Files in an Unknown Encoding
-''''''''''''''''''''''''''''
-
-What can you do if you need to make a change to a file, but don't know
-the file's encoding? If you know the encoding is ASCII-compatible and
-only want to examine or modify the ASCII parts, you can open the file
-with the ``surrogateescape`` error handler::
-
- with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f:
- data = f.read()
-
- # make changes to the string 'data'
-
- with open(fname + '.new', 'w',
- encoding="ascii", errors="surrogateescape") as f:
- f.write(data)
-
-The ``surrogateescape`` error handler will decode any non-ASCII bytes
-as code points in a special range running from U+DC80 to
-U+DCFF. These code points will then turn back into the
-same bytes when the ``surrogateescape`` error handler is used to
-encode the data and write it back out.
-
+this, be careful to check the string once it's in the form that will be used or
+stored; it's possible for encodings to be used to disguise characters. This is
+especially true if the input data also specifies the encoding; many encodings
+leave the commonly checked-for characters alone, but Python includes some
+encodings such as ``'base64'`` that modify every single character.
+
+For example, let's say you have a content management system that takes a Unicode
+filename, and you want to disallow paths with a '/' character. You might write
+this code::
+
+ def read_file (filename, encoding):
+ if '/' in filename:
+ raise ValueError("'/' not allowed in filenames")
+ unicode_name = filename.decode(encoding)
+ f = open(unicode_name, 'r')
+ # ... return contents of file ...
+
+However, if an attacker could specify the ``'base64'`` encoding, they could pass
+``'L2V0Yy9wYXNzd2Q='``, which is the base-64 encoded form of the string
+``'/etc/passwd'``, to read a system file. The above code looks for ``'/'``
+characters in the encoded form and misses the dangerous character in the
+resulting decoded form.
References
----------
-One section of `Mastering Python 3 Input/Output
-<http://pyvideo.org/video/289/pycon-2010--mastering-python-3-i-o>`_,
-a PyCon 2010 talk by David Beazley, discusses text processing and binary data handling.
-
-The `PDF slides for Marc-André Lemburg's presentation "Writing Unicode-aware
-Applications in Python"
-<https://downloads.egenix.com/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf>`_
-discuss questions of character encodings as well as how to internationalize
-and localize an application. These slides cover Python 2.x only.
-
-`The Guts of Unicode in Python
-<http://pyvideo.org/video/1768/the-guts-of-unicode-in-python>`_
-is a PyCon 2013 talk by Benjamin Peterson that discusses the internal Unicode
-representation in Python 3.3.
-
-
-Acknowledgements
-================
-
-The initial draft of this document was written by Andrew Kuchling.
-It has since been revised further by Alexander Belopolsky, Georg Brandl,
-Andrew Kuchling, and Ezio Melotti.
-
-Thanks to the following people who have noted errors or offered
-suggestions on this article: Éric Araujo, Nicholas Bastin, Nick
-Coghlan, Marius Gedminas, Kent Johnson, Ken Krugler, Marc-André
-Lemburg, Martin von Löwis, Terry J. Reedy, Serhiy Storchaka,
-Eryk Sun, Chad Whitacre, Graham Wideman.
+The PDF slides for Marc-André Lemburg's presentation "Writing Unicode-aware
+Applications in Python" are available at
+<https://downloads.egenix.com/python/LSM2005-Developing-Unicode-aware-applications-in-Python.pdf>
+and discuss questions of character encodings as well as how to internationalize
+and localize an application.
+
+
+Revision History and Acknowledgements
+=====================================
+
+Thanks to the following people who have noted errors or offered suggestions on
+this article: Nicholas Bastin, Marius Gedminas, Kent Johnson, Ken Krugler,
+Marc-André Lemburg, Martin von Löwis, Chad Whitacre.
+
+Version 1.0: posted August 5 2005.
+
+Version 1.01: posted August 7 2005. Corrects factual and markup errors; adds
+several links.
+
+Version 1.02: posted August 16 2005. Corrects factual errors.
+
+Version 1.03: posted June 20 2010. Notes that Python 3.x is not covered,
+and that the HOWTO only covers 2.x.
+
+
+.. comment Describe Python 3.x support (new section? new document?)
+.. comment Additional topic: building Python w/ UCS2 or UCS4 support
+.. comment Describe obscure -U switch somewhere?
+.. comment Describe use of codecs.StreamRecoder and StreamReaderWriter
+
+.. comment
+ Original outline:
+
+ - [ ] Unicode introduction
+ - [ ] ASCII
+ - [ ] Terms
+ - [ ] Character
+ - [ ] Code point
+ - [ ] Encodings
+ - [ ] Common encodings: ASCII, Latin-1, UTF-8
+ - [ ] Unicode Python type
+ - [ ] Writing unicode literals
+ - [ ] Obscurity: -U switch
+ - [ ] Built-ins
+ - [ ] unichr()
+ - [ ] ord()
+ - [ ] unicode() constructor
+ - [ ] Unicode type
+ - [ ] encode(), decode() methods
+ - [ ] Unicodedata module for character properties
+ - [ ] I/O
+ - [ ] Reading/writing Unicode data into files
+ - [ ] Byte-order marks
+ - [ ] Unicode filenames
+ - [ ] Writing Unicode programs
+ - [ ] Do everything in Unicode
+ - [ ] Declaring source code encodings (PEP 263)
+ - [ ] Other issues
+ - [ ] Building Python (UCS2, UCS4)
diff --git a/Doc/howto/urllib2.rst b/Doc/howto/urllib2.rst
index 046a88a..ce63948 100644
--- a/Doc/howto/urllib2.rst
+++ b/Doc/howto/urllib2.rst
@@ -1,8 +1,8 @@
.. _urllib-howto:
-***********************************************************
- HOWTO Fetch Internet Resources Using The urllib Package
-***********************************************************
+************************************************
+ HOWTO Fetch Internet Resources Using urllib2
+************************************************
:Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
@@ -26,14 +26,14 @@ Introduction
A tutorial on *Basic Authentication*, with examples in Python.
-**urllib.request** is a Python module for fetching URLs
+**urllib2** is a Python module for fetching URLs
(Uniform Resource Locators). It offers a very simple interface, in the form of
the *urlopen* function. This is capable of fetching URLs using a variety of
different protocols. It also offers a slightly more complex interface for
handling common situations - like basic authentication, cookies, proxies and so
on. These are provided by objects called handlers and openers.
-urllib.request supports fetching URLs for many "URL schemes" (identified by the string
+urllib2 supports fetching URLs for many "URL schemes" (identified by the string
before the ``":"`` in URL - for example ``"ftp"`` is the URL scheme of
``"ftp://python.org/"``) using their associated network protocols (e.g. FTP, HTTP).
This tutorial focuses on the most common case, HTTP.
@@ -42,58 +42,43 @@ For straightforward situations *urlopen* is very easy to use. But as soon as you
encounter errors or non-trivial cases when opening HTTP URLs, you will need some
understanding of the HyperText Transfer Protocol. The most comprehensive and
authoritative reference to HTTP is :rfc:`2616`. This is a technical document and
-not intended to be easy to read. This HOWTO aims to illustrate using *urllib*,
+not intended to be easy to read. This HOWTO aims to illustrate using *urllib2*,
with enough detail about HTTP to help you through. It is not intended to replace
-the :mod:`urllib.request` docs, but is supplementary to them.
+the :mod:`urllib2` docs, but is supplementary to them.
Fetching URLs
=============
-The simplest way to use urllib.request is as follows::
+The simplest way to use urllib2 is as follows::
- import urllib.request
- with urllib.request.urlopen('http://python.org/') as response:
- html = response.read()
+ import urllib2
+ response = urllib2.urlopen('http://python.org/')
+ html = response.read()
-If you wish to retrieve a resource via URL and store it in a temporary
-location, you can do so via the :func:`shutil.copyfileobj` and
-:func:`tempfile.NamedTemporaryFile` functions::
-
- import shutil
- import tempfile
- import urllib.request
-
- with urllib.request.urlopen('http://python.org/') as response:
- with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
- shutil.copyfileobj(response, tmp_file)
-
- with open(tmp_file.name) as html:
- pass
-
-Many uses of urllib will be that simple (note that instead of an 'http:' URL we
+Many uses of urllib2 will be that simple (note that instead of an 'http:' URL we
could have used a URL starting with 'ftp:', 'file:', etc.). However, it's the
purpose of this tutorial to explain the more complicated cases, concentrating on
HTTP.
HTTP is based on requests and responses - the client makes requests and servers
-send responses. urllib.request mirrors this with a ``Request`` object which represents
+send responses. urllib2 mirrors this with a ``Request`` object which represents
the HTTP request you are making. In its simplest form you create a Request
object that specifies the URL you want to fetch. Calling ``urlopen`` with this
Request object returns a response object for the URL requested. This response is
a file-like object, which means you can for example call ``.read()`` on the
response::
- import urllib.request
+ import urllib2
- req = urllib.request.Request('http://www.voidspace.org.uk')
- with urllib.request.urlopen(req) as response:
- the_page = response.read()
+ req = urllib2.Request('http://www.voidspace.org.uk')
+ response = urllib2.urlopen(req)
+ the_page = response.read()
-Note that urllib.request makes use of the same Request interface to handle all URL
+Note that urllib2 makes use of the same Request interface to handle all URL
schemes. For example, you can make an FTP request like so::
- req = urllib.request.Request('ftp://example.com/')
+ req = urllib2.Request('ftp://example.com/')
In the case of HTTP, there are two extra things that Request objects allow you
to do: First, you can pass data to be sent to the server. Second, you can pass
@@ -105,35 +90,34 @@ Data
----
Sometimes you want to send data to a URL (often the URL will refer to a CGI
-(Common Gateway Interface) script or other web application). With HTTP,
+(Common Gateway Interface) script [#]_ or other web application). With HTTP,
this is often done using what's known as a **POST** request. This is often what
your browser does when you submit a HTML form that you filled in on the web. Not
all POSTs have to come from forms: you can use a POST to transmit arbitrary data
to your own application. In the common case of HTML forms, the data needs to be
encoded in a standard way, and then passed to the Request object as the ``data``
-argument. The encoding is done using a function from the :mod:`urllib.parse`
-library. ::
+argument. The encoding is done using a function from the ``urllib`` library
+*not* from ``urllib2``. ::
- import urllib.parse
- import urllib.request
+ import urllib
+ import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
- data = urllib.parse.urlencode(values)
- data = data.encode('ascii') # data should be bytes
- req = urllib.request.Request(url, data)
- with urllib.request.urlopen(req) as response:
- the_page = response.read()
+ data = urllib.urlencode(values)
+ req = urllib2.Request(url, data)
+ response = urllib2.urlopen(req)
+ the_page = response.read()
Note that other encodings are sometimes required (e.g. for file upload from HTML
forms - see `HTML Specification, Form Submission
<https://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
details).
-If you do not pass the ``data`` argument, urllib uses a **GET** request. One
+If you do not pass the ``data`` argument, urllib2 uses a **GET** request. One
way in which GET and POST requests differ is that POST requests often have
"side-effects": they change the state of the system in some way (for example by
placing an order with the website for a hundredweight of tinned spam to be
@@ -145,18 +129,18 @@ GET request by encoding it in the URL itself.
This is done as follows::
- >>> import urllib.request
- >>> import urllib.parse
+ >>> import urllib2
+ >>> import urllib
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
- >>> url_values = urllib.parse.urlencode(data)
- >>> print(url_values) # The order may differ from below. #doctest: +SKIP
+ >>> url_values = urllib.urlencode(data)
+ >>> print url_values # The order may differ. #doctest: +SKIP
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
- >>> data = urllib.request.urlopen(full_url)
+ >>> data = urllib2.urlopen(full_url)
Notice that the full URL is created by adding a ``?`` to the URL, followed by
the encoded values.
@@ -168,7 +152,7 @@ We'll discuss here one particular HTTP header, to illustrate how to add headers
to your HTTP request.
Some websites [#]_ dislike being browsed by programs, or send different versions
-to different browsers [#]_. By default urllib identifies itself as
+to different browsers [#]_. By default urllib2 identifies itself as
``Python-urllib/x.y`` (where ``x`` and ``y`` are the major and minor version
numbers of the Python release,
e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain
@@ -178,8 +162,8 @@ pass a dictionary of headers in. The following example makes the same
request as above, but identifies itself as a version of Internet
Explorer [#]_. ::
- import urllib.parse
- import urllib.request
+ import urllib
+ import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
@@ -188,11 +172,10 @@ Explorer [#]_. ::
'language': 'Python' }
headers = {'User-Agent': user_agent}
- data = urllib.parse.urlencode(values)
- data = data.encode('ascii')
- req = urllib.request.Request(url, data, headers)
- with urllib.request.urlopen(req) as response:
- the_page = response.read()
+ data = urllib.urlencode(values)
+ req = urllib2.Request(url, data, headers)
+ response = urllib2.urlopen(req)
+ the_page = response.read()
The response also has two useful methods. See the section on `info and geturl`_
which comes after we have a look at what happens when things go wrong.
@@ -208,8 +191,6 @@ usual with Python APIs, built-in exceptions such as :exc:`ValueError`,
:exc:`HTTPError` is the subclass of :exc:`URLError` raised in the specific case of
HTTP URLs.
-The exception classes are exported from the :mod:`urllib.error` module.
-
URLError
--------
@@ -220,10 +201,10 @@ error code and a text error message.
e.g. ::
- >>> req = urllib.request.Request('http://www.pretend_server.org')
- >>> try: urllib.request.urlopen(req)
- ... except urllib.error.URLError as e:
- ... print(e.reason) #doctest: +SKIP
+ >>> req = urllib2.Request('http://www.pretend_server.org')
+ >>> try: urllib2.urlopen(req)
+ ... except urllib2.URLError as e:
+ ... print e.reason #doctest: +SKIP
...
(4, 'getaddrinfo failed')
@@ -235,11 +216,11 @@ Every HTTP response from the server contains a numeric "status code". Sometimes
the status code indicates that the server is unable to fulfil the request. The
default handlers will handle some of these responses for you (for example, if
the response is a "redirection" that requests the client fetch the document from
-a different URL, urllib will handle that for you). For those it can't handle,
+a different URL, urllib2 will handle that for you). For those it can't handle,
urlopen will raise an :exc:`HTTPError`. Typical errors include '404' (page not
found), '403' (request forbidden), and '401' (authentication required).
-See section 10 of :rfc:`2616` for a reference on all the HTTP error codes.
+See section 10 of RFC 2616 for a reference on all the HTTP error codes.
The :exc:`HTTPError` instance raised will have an integer 'code' attribute, which
corresponds to the error sent by the server.
@@ -251,8 +232,8 @@ Because the default handlers handle redirects (codes in the 300 range), and
codes in the 100--299 range indicate success, you will usually only see error
codes in the 400--599 range.
-:attr:`http.server.BaseHTTPRequestHandler.responses` is a useful dictionary of
-response codes in that shows all the response codes used by :rfc:`2616`. The
+``BaseHTTPServer.BaseHTTPRequestHandler.responses`` is a useful dictionary of
+response codes in that shows all the response codes used by RFC 2616. The
dictionary is reproduced here for convenience ::
# Table mapping response codes to messages; entries have the
@@ -326,21 +307,22 @@ dictionary is reproduced here for convenience ::
When an error is raised the server responds by returning an HTTP error code
*and* an error page. You can use the :exc:`HTTPError` instance as a response on the
page returned. This means that as well as the code attribute, it also has read,
-geturl, and info, methods as returned by the ``urllib.response`` module::
+geturl, and info, methods. ::
- >>> req = urllib.request.Request('http://www.python.org/fish.html')
+ >>> req = urllib2.Request('http://www.python.org/fish.html')
>>> try:
- ... urllib.request.urlopen(req)
- ... except urllib.error.HTTPError as e:
- ... print(e.code)
- ... print(e.read()) #doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
+ ... urllib2.urlopen(req)
+ ... except urllib2.HTTPError as e:
+ ... print e.code
+ ... print e.read() #doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
...
404
- b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
- "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
- ...
- <title>Page Not Found</title>\n
- ...
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+ ...
+ <title>Page Not Found</title>
+ ...
+
Wrapping it Up
--------------
@@ -354,17 +336,16 @@ Number 1
::
- from urllib.request import Request, urlopen
- from urllib.error import URLError, HTTPError
+ from urllib2 import Request, urlopen, URLError, HTTPError
req = Request(someurl)
try:
response = urlopen(req)
except HTTPError as e:
- print('The server couldn\'t fulfill the request.')
- print('Error code: ', e.code)
+ print 'The server couldn\'t fulfill the request.'
+ print 'Error code: ', e.code
except URLError as e:
- print('We failed to reach a server.')
- print('Reason: ', e.reason)
+ print 'We failed to reach a server.'
+ print 'Reason: ', e.reason
else:
# everything is fine
@@ -379,18 +360,17 @@ Number 2
::
- from urllib.request import Request, urlopen
- from urllib.error import URLError
+ from urllib2 import Request, urlopen, URLError
req = Request(someurl)
try:
response = urlopen(req)
except URLError as e:
if hasattr(e, 'reason'):
- print('We failed to reach a server.')
- print('Reason: ', e.reason)
+ print 'We failed to reach a server.'
+ print 'Reason: ', e.reason
elif hasattr(e, 'code'):
- print('The server couldn\'t fulfill the request.')
- print('Error code: ', e.code)
+ print 'The server couldn\'t fulfill the request.'
+ print 'Error code: ', e.code
else:
# everything is fine
@@ -398,9 +378,8 @@ Number 2
info and geturl
===============
-The response returned by urlopen (or the :exc:`HTTPError` instance) has two
-useful methods :meth:`info` and :meth:`geturl` and is defined in the module
-:mod:`urllib.response`..
+The response returned by urlopen (or the :exc:`HTTPError` instance) has two useful
+methods :meth:`info` and :meth:`geturl`.
**geturl** - this returns the real URL of the page fetched. This is useful
because ``urlopen`` (or the opener object used) may have followed a
@@ -408,10 +387,10 @@ redirect. The URL of the page fetched may not be the same as the URL requested.
**info** - this returns a dictionary-like object that describes the page
fetched, particularly the headers sent by the server. It is currently an
-:class:`http.client.HTTPMessage` instance.
+``httplib.HTTPMessage`` instance.
Typical headers include 'Content-length', 'Content-type', and so on. See the
-`Quick Reference to HTTP Headers <http://jkorpela.fi/http.html>`_
+`Quick Reference to HTTP Headers <https://www.cs.tut.fi/~jkorpela/http.html>`_
for a useful listing of HTTP headers with brief explanations of their meaning
and use.
@@ -420,7 +399,7 @@ Openers and Handlers
====================
When you fetch a URL you use an opener (an instance of the perhaps
-confusingly-named :class:`urllib.request.OpenerDirector`). Normally we have been using
+confusingly-named :class:`urllib2.OpenerDirector`). Normally we have been using
the default opener - via ``urlopen`` - but you can create custom
openers. Openers use handlers. All the "heavy lifting" is done by the
handlers. Each handler knows how to open URLs for a particular URL scheme (http,
@@ -465,9 +444,7 @@ error code) requesting authentication. This specifies the authentication scheme
and a 'realm'. The header looks like: ``WWW-Authenticate: SCHEME
realm="REALM"``.
-e.g.
-
-.. code-block:: none
+e.g. ::
WWW-Authenticate: Basic realm="cPanel Users"
@@ -491,24 +468,24 @@ The top-level URL is the first URL that requires authentication. URLs "deeper"
than the URL you pass to .add_password() will also match. ::
# create a password manager
- password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
+ password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
# Add the username and password.
# If we knew the realm, we could use it instead of None.
top_level_url = "http://example.com/foo/"
password_mgr.add_password(None, top_level_url, username, password)
- handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
+ handler = urllib2.HTTPBasicAuthHandler(password_mgr)
# create "opener" (OpenerDirector instance)
- opener = urllib.request.build_opener(handler)
+ opener = urllib2.build_opener(handler)
# use the opener to fetch a URL
opener.open(a_url)
# Install the opener.
- # Now all calls to urllib.request.urlopen use our opener.
- urllib.request.install_opener(opener)
+ # Now all calls to urllib2.urlopen use our opener.
+ urllib2.install_opener(opener)
.. note::
@@ -517,7 +494,7 @@ than the URL you pass to .add_password() will also match. ::
-- ``ProxyHandler`` (if a proxy setting such as an :envvar:`http_proxy`
environment variable is set), ``UnknownHandler``, ``HTTPHandler``,
``HTTPDefaultErrorHandler``, ``HTTPRedirectHandler``, ``FTPHandler``,
- ``FileHandler``, ``DataHandler``, ``HTTPErrorProcessor``.
+ ``FileHandler``, ``HTTPErrorProcessor``.
``top_level_url`` is in fact *either* a full URL (including the 'http:' scheme
component and the hostname and optionally the port number)
@@ -531,52 +508,52 @@ not correct.
Proxies
=======
-**urllib** will auto-detect your proxy settings and use those. This is through
+**urllib2** will auto-detect your proxy settings and use those. This is through
the ``ProxyHandler``, which is part of the normal handler chain when a proxy
setting is detected. Normally that's a good thing, but there are occasions
when it may not be helpful [#]_. One way to do this is to setup our own
``ProxyHandler``, with no proxies defined. This is done using similar steps to
setting up a `Basic Authentication`_ handler: ::
- >>> proxy_support = urllib.request.ProxyHandler({})
- >>> opener = urllib.request.build_opener(proxy_support)
- >>> urllib.request.install_opener(opener)
+ >>> proxy_support = urllib2.ProxyHandler({})
+ >>> opener = urllib2.build_opener(proxy_support)
+ >>> urllib2.install_opener(opener)
.. note::
- Currently ``urllib.request`` *does not* support fetching of ``https`` locations
- through a proxy. However, this can be enabled by extending urllib.request as
+ Currently ``urllib2`` *does not* support fetching of ``https`` locations
+ through a proxy. However, this can be enabled by extending urllib2 as
shown in the recipe [#]_.
.. note::
``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set; see
- the documentation on :func:`~urllib.request.getproxies`.
+ the documentation on :func:`~urllib.getproxies`.
Sockets and Layers
==================
-The Python support for fetching resources from the web is layered. urllib uses
-the :mod:`http.client` library, which in turn uses the socket library.
+The Python support for fetching resources from the web is layered. urllib2 uses
+the httplib library, which in turn uses the socket library.
As of Python 2.3 you can specify how long a socket should wait for a response
before timing out. This can be useful in applications which have to fetch web
pages. By default the socket module has *no timeout* and can hang. Currently,
-the socket timeout is not exposed at the http.client or urllib.request levels.
-However, you can set the default timeout globally for all sockets using ::
+the socket timeout is not exposed at the httplib or urllib2 levels. However,
+you can set the default timeout globally for all sockets using ::
import socket
- import urllib.request
+ import urllib2
# timeout in seconds
timeout = 10
socket.setdefaulttimeout(timeout)
- # this call to urllib.request.urlopen now uses the default timeout
+ # this call to urllib2.urlopen now uses the default timeout
# we have set in the socket module
- req = urllib.request.Request('http://www.voidspace.org.uk')
- response = urllib.request.urlopen(req)
+ req = urllib2.Request('http://www.voidspace.org.uk')
+ response = urllib2.urlopen(req)
-------
@@ -587,6 +564,8 @@ Footnotes
This document was reviewed and revised by John Lee.
+.. [#] For an introduction to the CGI protocol see
+ `Writing Web Applications in Python <http://www.pyzine.com/Issue008/Section_Articles/article_CGIOne.html>`_.
.. [#] Google for example.
.. [#] Browser sniffing is a very bad practice for website design - building
sites using web standards is much more sensible. Unfortunately a lot of
@@ -597,9 +576,9 @@ This document was reviewed and revised by John Lee.
`Quick Reference to HTTP Headers`_.
.. [#] In my case I have to use a proxy to access the internet at work. If you
attempt to fetch *localhost* URLs through this proxy it blocks them. IE
- is set to use the proxy, which urllib picks up on. In order to test
- scripts with a localhost server, I have to prevent urllib from using
+ is set to use the proxy, which urllib2 picks up on. In order to test
+ scripts with a localhost server, I have to prevent urllib2 from using
the proxy.
-.. [#] urllib opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe
+.. [#] urllib2 opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe
<https://code.activestate.com/recipes/456195/>`_.
diff --git a/Doc/howto/webservers.rst b/Doc/howto/webservers.rst
new file mode 100644
index 0000000..5071a8a
--- /dev/null
+++ b/Doc/howto/webservers.rst
@@ -0,0 +1,735 @@
+*******************************
+ HOWTO Use Python in the web
+*******************************
+
+:Author: Marek Kubica
+
+.. topic:: Abstract
+
+ This document shows how Python fits into the web. It presents some ways
+ to integrate Python with a web server, and general practices useful for
+ developing web sites.
+
+
+Programming for the Web has become a hot topic since the rise of "Web 2.0",
+which focuses on user-generated content on web sites. It has always been
+possible to use Python for creating web sites, but it was a rather tedious task.
+Therefore, many frameworks and helper tools have been created to assist
+developers in creating faster and more robust sites. This HOWTO describes
+some of the methods used to combine Python with a web server to create
+dynamic content. It is not meant as a complete introduction, as this topic is
+far too broad to be covered in one single document. However, a short overview
+of the most popular libraries is provided.
+
+.. seealso::
+
+ While this HOWTO tries to give an overview of Python in the web, it cannot
+ always be as up to date as desired. Web development in Python is rapidly
+ moving forward, so the wiki page on `Web Programming
+ <https://wiki.python.org/moin/WebProgramming>`_ may be more in sync with
+ recent development.
+
+
+The Low-Level View
+==================
+
+When a user enters a web site, their browser makes a connection to the site's
+web server (this is called the *request*). The server looks up the file in the
+file system and sends it back to the user's browser, which displays it (this is
+the *response*). This is roughly how the underlying protocol, HTTP, works.
+
+Dynamic web sites are not based on files in the file system, but rather on
+programs which are run by the web server when a request comes in, and which
+*generate* the content that is returned to the user. They can do all sorts of
+useful things, like display the postings of a bulletin board, show your email,
+configure software, or just display the current time. These programs can be
+written in any programming language the server supports. Since most servers
+support Python, it is easy to use Python to create dynamic web sites.
+
+Most HTTP servers are written in C or C++, so they cannot execute Python code
+directly -- a bridge is needed between the server and the program. These
+bridges, or rather interfaces, define how programs interact with the server.
+There have been numerous attempts to create the best possible interface, but
+there are only a few worth mentioning.
+
+Not every web server supports every interface. Many web servers only support
+old, now-obsolete interfaces; however, they can often be extended using
+third-party modules to support newer ones.
+
+
+Common Gateway Interface
+------------------------
+
+This interface, most commonly referred to as "CGI", is the oldest, and is
+supported by nearly every web server out of the box. Programs using CGI to
+communicate with their web server need to be started by the server for every
+request. So, every request starts a new Python interpreter -- which takes some
+time to start up -- thus making the whole interface only usable for low load
+situations.
+
+The upside of CGI is that it is simple -- writing a Python program which uses
+CGI is a matter of about three lines of code. This simplicity comes at a
+price: it does very few things to help the developer.
+
+Writing CGI programs, while still possible, is no longer recommended. With
+:ref:`WSGI <WSGI>`, a topic covered later in this document, it is possible to write
+programs that emulate CGI, so they can be run as CGI if no better option is
+available.
+
+.. seealso::
+
+ The Python standard library includes some modules that are helpful for
+ creating plain CGI programs:
+
+ * :mod:`cgi` -- Handling of user input in CGI scripts
+ * :mod:`cgitb` -- Displays nice tracebacks when errors happen in CGI
+ applications, instead of presenting a "500 Internal Server Error" message
+
+ The Python wiki features a page on `CGI scripts
+ <https://wiki.python.org/moin/CgiScripts>`_ with some additional information
+ about CGI in Python.
+
+
+Simple script for testing CGI
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To test whether your web server works with CGI, you can use this short and
+simple CGI program::
+
+ #!/usr/bin/env python
+ # -*- coding: UTF-8 -*-
+
+ # enable debugging
+ import cgitb
+ cgitb.enable()
+
+ print "Content-Type: text/plain;charset=utf-8"
+ print
+
+ print "Hello World!"
+
+Depending on your web server configuration, you may need to save this code with
+a ``.py`` or ``.cgi`` extension. Additionally, this file may also need to be
+in a ``cgi-bin`` folder, for security reasons.
+
+You might wonder what the ``cgitb`` line is about. This line makes it possible
+to display a nice traceback instead of just crashing and displaying an "Internal
+Server Error" in the user's browser. This is useful for debugging, but it might
+risk exposing some confidential data to the user. You should not use ``cgitb``
+in production code for this reason. You should *always* catch exceptions, and
+display proper error pages -- end-users don't like to see nondescript "Internal
+Server Errors" in their browsers.
+
+
+Setting up CGI on your own server
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you don't have your own web server, this does not apply to you. You can
+check whether it works as-is, and if not you will need to talk to the
+administrator of your web server. If it is a big host, you can try filing a
+ticket asking for Python support.
+
+If you are your own administrator or want to set up CGI for testing purposes on
+your own computers, you have to configure it by yourself. There is no single
+way to configure CGI, as there are many web servers with different
+configuration options. Currently the most widely used free web server is
+`Apache HTTPd <http://httpd.apache.org/>`_, or Apache for short. Apache can be
+easily installed on nearly every system using the system's package management
+tool. `lighttpd <http://www.lighttpd.net>`_ is another alternative and is
+said to have better performance. On many systems this server can also be
+installed using the package management tool, so manually compiling the web
+server may not be needed.
+
+* On Apache you can take a look at the `Dynamic Content with CGI
+ <http://httpd.apache.org/docs/2.2/howto/cgi.html>`_ tutorial, where everything
+ is described. Most of the time it is enough just to set ``+ExecCGI``. The
+ tutorial also describes the most common gotchas that might arise.
+
+* On lighttpd you need to use the `CGI module
+ <http://redmine.lighttpd.net/projects/lighttpd/wiki/Docs_ModCGI>`_\ , which can be configured
+ in a straightforward way. It boils down to setting ``cgi.assign`` properly.
+
+
+Common problems with CGI scripts
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Using CGI sometimes leads to small annoyances while trying to get these
+scripts to run. Sometimes a seemingly correct script does not work as
+expected, the cause being some small hidden problem that's difficult to spot.
+
+Some of these potential problems are:
+
+* The Python script is not marked as executable. When CGI scripts are not
+ executable most web servers will let the user download it, instead of
+ running it and sending the output to the user. For CGI scripts to run
+ properly on Unix-like operating systems, the ``+x`` bit needs to be set.
+ Using ``chmod a+x your_script.py`` may solve this problem.
+
+* On a Unix-like system, The line endings in the program file must be Unix
+ style line endings. This is important because the web server checks the
+ first line of the script (called shebang) and tries to run the program
+ specified there. It gets easily confused by Windows line endings (Carriage
+ Return & Line Feed, also called CRLF), so you have to convert the file to
+ Unix line endings (only Line Feed, LF). This can be done automatically by
+ uploading the file via FTP in text mode instead of binary mode, but the
+ preferred way is just telling your editor to save the files with Unix line
+ endings. Most editors support this.
+
+* Your web server must be able to read the file, and you need to make sure the
+ permissions are correct. On unix-like systems, the server often runs as user
+ and group ``www-data``, so it might be worth a try to change the file
+ ownership, or making the file world readable by using ``chmod a+r
+ your_script.py``.
+
+* The web server must know that the file you're trying to access is a CGI script.
+ Check the configuration of your web server, as it may be configured
+ to expect a specific file extension for CGI scripts.
+
+* On Unix-like systems, the path to the interpreter in the shebang
+ (``#!/usr/bin/env python``) must be correct. This line calls
+ ``/usr/bin/env`` to find Python, but it will fail if there is no
+ ``/usr/bin/env``, or if Python is not in the web server's path. If you know
+ where your Python is installed, you can also use that full path. The
+ commands ``whereis python`` and ``type -p python`` could help you find
+ where it is installed. Once you know the path, you can change the shebang
+ accordingly: ``#!/usr/bin/python``.
+
+* The file must not contain a BOM (Byte Order Mark). The BOM is meant for
+ determining the byte order of UTF-16 and UTF-32 encodings, but some editors
+ write this also into UTF-8 files. The BOM interferes with the shebang line,
+ so be sure to tell your editor not to write the BOM.
+
+* If the web server is using :ref:`mod-python`, ``mod_python`` may be having
+ problems. ``mod_python`` is able to handle CGI scripts by itself, but it can
+ also be a source of issues.
+
+
+.. _mod-python:
+
+mod_python
+----------
+
+People coming from PHP often find it hard to grasp how to use Python in the web.
+Their first thought is mostly `mod_python <http://modpython.org/>`_\ ,
+because they think that this is the equivalent to ``mod_php``. Actually, there
+are many differences. What ``mod_python`` does is embed the interpreter into
+the Apache process, thus speeding up requests by not having to start a Python
+interpreter for each request. On the other hand, it is not "Python intermixed
+with HTML" in the way that PHP is often intermixed with HTML. The Python
+equivalent of that is a template engine. ``mod_python`` itself is much more
+powerful and provides more access to Apache internals. It can emulate CGI,
+work in a "Python Server Pages" mode (similar to JSP) which is "HTML
+intermingled with Python", and it has a "Publisher" which designates one file
+to accept all requests and decide what to do with them.
+
+``mod_python`` does have some problems. Unlike the PHP interpreter, the Python
+interpreter uses caching when executing files, so changes to a file will
+require the web server to be restarted. Another problem is the basic concept
+-- Apache starts child processes to handle the requests, and unfortunately
+every child process needs to load the whole Python interpreter even if it does
+not use it. This makes the whole web server slower. Another problem is that,
+because ``mod_python`` is linked against a specific version of ``libpython``,
+it is not possible to switch from an older version to a newer (e.g. 2.4 to 2.5)
+without recompiling ``mod_python``. ``mod_python`` is also bound to the Apache
+web server, so programs written for ``mod_python`` cannot easily run on other
+web servers.
+
+These are the reasons why ``mod_python`` should be avoided when writing new
+programs. In some circumstances it still might be a good idea to use
+``mod_python`` for deployment, but WSGI makes it possible to run WSGI programs
+under ``mod_python`` as well.
+
+
+FastCGI and SCGI
+----------------
+
+FastCGI and SCGI try to solve the performance problem of CGI in another way.
+Instead of embedding the interpreter into the web server, they create
+long-running background processes. There is still a module in the web server
+which makes it possible for the web server to "speak" with the background
+process. As the background process is independent of the server, it can be
+written in any language, including Python. The language just needs to have a
+library which handles the communication with the webserver.
+
+The difference between FastCGI and SCGI is very small, as SCGI is essentially
+just a "simpler FastCGI". As the web server support for SCGI is limited,
+most people use FastCGI instead, which works the same way. Almost everything
+that applies to SCGI also applies to FastCGI as well, so we'll only cover
+the latter.
+
+These days, FastCGI is never used directly. Just like ``mod_python``, it is only
+used for the deployment of WSGI applications.
+
+
+Setting up FastCGI
+^^^^^^^^^^^^^^^^^^
+
+Each web server requires a specific module.
+
+* Apache has both `mod_fastcgi <http://www.fastcgi.com/drupal/>`_ and `mod_fcgid
+ <https://httpd.apache.org/mod_fcgid/>`_. ``mod_fastcgi`` is the original one, but it
+ has some licensing issues, which is why it is sometimes considered non-free.
+ ``mod_fcgid`` is a smaller, compatible alternative. One of these modules needs
+ to be loaded by Apache.
+
+* lighttpd ships its own `FastCGI module
+ <http://redmine.lighttpd.net/projects/lighttpd/wiki/Docs_ModFastCGI>`_ as well as an
+ `SCGI module <http://redmine.lighttpd.net/projects/lighttpd/wiki/Docs_ModSCGI>`_.
+
+* `nginx <http://nginx.org/>`_ also supports `FastCGI
+ <https://www.nginx.com/resources/wiki/start/topics/examples/simplepythonfcgi/>`_.
+
+Once you have installed and configured the module, you can test it with the
+following WSGI-application::
+
+ #!/usr/bin/env python
+ # -*- coding: UTF-8 -*-
+
+ from cgi import escape
+ import sys, os
+ from flup.server.fcgi import WSGIServer
+
+ def app(environ, start_response):
+ start_response('200 OK', [('Content-Type', 'text/html')])
+
+ yield '<h1>FastCGI Environment</h1>'
+ yield '<table>'
+ for k, v in sorted(environ.items()):
+ yield '<tr><th>%s</th><td>%s</td></tr>' % (escape(k), escape(v))
+ yield '</table>'
+
+ WSGIServer(app).run()
+
+This is a simple WSGI application, but you need to install `flup
+<https://pypi.org/project/flup/1.0>`_ first, as flup handles the low level
+FastCGI access.
+
+.. seealso::
+
+ There is some documentation on `setting up Django with WSGI
+ <https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/>`_, most of
+ which can be reused for other WSGI-compliant frameworks and libraries.
+ Only the ``manage.py`` part has to be changed, the example used here can be
+ used instead. Django does more or less the exact same thing.
+
+
+mod_wsgi
+--------
+
+`mod_wsgi <http://code.google.com/p/modwsgi/>`_ is an attempt to get rid of the
+low level gateways. Given that FastCGI, SCGI, and mod_python are mostly used to
+deploy WSGI applications, mod_wsgi was started to directly embed WSGI applications
+into the Apache web server. mod_wsgi is specifically designed to host WSGI
+applications. It makes the deployment of WSGI applications much easier than
+deployment using other low level methods, which need glue code. The downside
+is that mod_wsgi is limited to the Apache web server; other servers would need
+their own implementations of mod_wsgi.
+
+mod_wsgi supports two modes: embedded mode, in which it integrates with the
+Apache process, and daemon mode, which is more FastCGI-like. Unlike FastCGI,
+mod_wsgi handles the worker-processes by itself, which makes administration
+easier.
+
+
+.. _WSGI:
+
+Step back: WSGI
+===============
+
+WSGI has already been mentioned several times, so it has to be something
+important. In fact it really is, and now it is time to explain it.
+
+The *Web Server Gateway Interface*, or WSGI for short, is defined in
+:pep:`333` and is currently the best way to do Python web programming. While
+it is great for programmers writing frameworks, a normal web developer does not
+need to get in direct contact with it. When choosing a framework for web
+development it is a good idea to choose one which supports WSGI.
+
+The big benefit of WSGI is the unification of the application programming
+interface. When your program is compatible with WSGI -- which at the outer
+level means that the framework you are using has support for WSGI -- your
+program can be deployed via any web server interface for which there are WSGI
+wrappers. You do not need to care about whether the application user uses
+mod_python or FastCGI or mod_wsgi -- with WSGI your application will work on
+any gateway interface. The Python standard library contains its own WSGI
+server, :mod:`wsgiref`, which is a small web server that can be used for
+testing.
+
+A really great WSGI feature is middleware. Middleware is a layer around your
+program which can add various functionality to it. There is quite a bit of
+`middleware <https://wsgi.readthedocs.org/en/latest/libraries.html>`_ already
+available. For example, instead of writing your own session management (HTTP
+is a stateless protocol, so to associate multiple HTTP requests with a single
+user your application must create and manage such state via a session), you can
+just download middleware which does that, plug it in, and get on with coding
+the unique parts of your application. The same thing with compression -- there
+is existing middleware which handles compressing your HTML using gzip to save
+on your server's bandwidth. Authentication is another problem that is easily
+solved using existing middleware.
+
+Although WSGI may seem complex, the initial phase of learning can be very
+rewarding because WSGI and the associated middleware already have solutions to
+many problems that might arise while developing web sites.
+
+
+WSGI Servers
+------------
+
+The code that is used to connect to various low level gateways like CGI or
+mod_python is called a *WSGI server*. One of these servers is ``flup``, which
+supports FastCGI and SCGI, as well as `AJP
+<https://en.wikipedia.org/wiki/Apache_JServ_Protocol>`_. Some of these servers
+are written in Python, as ``flup`` is, but there also exist others which are
+written in C and can be used as drop-in replacements.
+
+There are many servers already available, so a Python web application
+can be deployed nearly anywhere. This is one big advantage that Python has
+compared with other web technologies.
+
+.. seealso::
+
+ A good overview of WSGI-related code can be found in the `WSGI homepage
+ <https://wsgi.readthedocs.org/>`_, which contains an extensive list of `WSGI servers
+ <https://wsgi.readthedocs.org/en/latest/servers.html>`_ which can be used by *any* application
+ supporting WSGI.
+
+ You might be interested in some WSGI-supporting modules already contained in
+ the standard library, namely:
+
+ * :mod:`wsgiref` -- some tiny utilities and servers for WSGI
+
+
+Case study: MoinMoin
+--------------------
+
+What does WSGI give the web application developer? Let's take a look at
+an application that's been around for a while, which was written in
+Python without using WSGI.
+
+One of the most widely used wiki software packages is `MoinMoin
+<https://moinmo.in/>`_. It was created in 2000, so it predates WSGI by about
+three years. Older versions needed separate code to run on CGI, mod_python,
+FastCGI and standalone.
+
+It now includes support for WSGI. Using WSGI, it is possible to deploy
+MoinMoin on any WSGI compliant server, with no additional glue code.
+Unlike the pre-WSGI versions, this could include WSGI servers that the
+authors of MoinMoin know nothing about.
+
+
+Model-View-Controller
+=====================
+
+The term *MVC* is often encountered in statements such as "framework *foo*
+supports MVC". MVC is more about the overall organization of code, rather than
+any particular API. Many web frameworks use this model to help the developer
+bring structure to their program. Bigger web applications can have lots of
+code, so it is a good idea to have an effective structure right from the beginning.
+That way, even users of other frameworks (or even other languages, since MVC is
+not Python-specific) can easily understand the code, given that they are
+already familiar with the MVC structure.
+
+MVC stands for three components:
+
+* The *model*. This is the data that will be displayed and modified. In
+ Python frameworks, this component is often represented by the classes used by
+ an object-relational mapper.
+
+* The *view*. This component's job is to display the data of the model to the
+ user. Typically this component is implemented via templates.
+
+* The *controller*. This is the layer between the user and the model. The
+ controller reacts to user actions (like opening some specific URL), tells
+ the model to modify the data if necessary, and tells the view code what to
+ display,
+
+While one might think that MVC is a complex design pattern, in fact it is not.
+It is used in Python because it has turned out to be useful for creating clean,
+maintainable web sites.
+
+.. note::
+
+ While not all Python frameworks explicitly support MVC, it is often trivial
+ to create a web site which uses the MVC pattern by separating the data logic
+ (the model) from the user interaction logic (the controller) and the
+ templates (the view). That's why it is important not to write unnecessary
+ Python code in the templates -- it works against the MVC model and creates
+ chaos in the code base, making it harder to understand and modify.
+
+.. seealso::
+
+ The English Wikipedia has an article about the `Model-View-Controller pattern
+ <https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller>`_. It includes a long
+ list of web frameworks for various programming languages.
+
+
+Ingredients for Websites
+========================
+
+Websites are complex constructs, so tools have been created to help web
+developers make their code easier to write and more maintainable. Tools like
+these exist for all web frameworks in all languages. Developers are not forced
+to use these tools, and often there is no "best" tool. It is worth learning
+about the available tools because they can greatly simplify the process of
+developing a web site.
+
+
+.. seealso::
+
+ There are far more components than can be presented here. The Python wiki
+ has a page about these components, called
+ `Web Components <https://wiki.python.org/moin/WebComponents>`_.
+
+
+Templates
+---------
+
+Mixing of HTML and Python code is made possible by a few libraries. While
+convenient at first, it leads to horribly unmaintainable code. That's why
+templates exist. Templates are, in the simplest case, just HTML files with
+placeholders. The HTML is sent to the user's browser after filling in the
+placeholders.
+
+Python already includes two ways to build simple templates::
+
+ >>> template = "<html><body><h1>Hello %s!</h1></body></html>"
+ >>> print template % "Reader"
+ <html><body><h1>Hello Reader!</h1></body></html>
+
+ >>> from string import Template
+ >>> template = Template("<html><body><h1>Hello ${name}</h1></body></html>")
+ >>> print template.substitute(dict(name='Dinsdale'))
+ <html><body><h1>Hello Dinsdale!</h1></body></html>
+
+To generate complex HTML based on non-trivial model data, conditional
+and looping constructs like Python's *for* and *if* are generally needed.
+*Template engines* support templates of this complexity.
+
+There are a lot of template engines available for Python which can be used with
+or without a `framework`_. Some of these define a plain-text programming
+language which is easy to learn, partly because it is limited in scope.
+Others use XML, and the template output is guaranteed to be always be valid
+XML. There are many other variations.
+
+Some `frameworks`_ ship their own template engine or recommend one in
+particular. In the absence of a reason to use a different template engine,
+using the one provided by or recommended by the framework is a good idea.
+
+Popular template engines include:
+
+ * `Mako <http://www.makotemplates.org/>`_
+ * `Genshi <http://genshi.edgewall.org/>`_
+ * `Jinja <http://jinja.pocoo.org/>`_
+
+.. seealso::
+
+ There are many template engines competing for attention, because it is
+ pretty easy to create them in Python. The page `Templating
+ <https://wiki.python.org/moin/Templating>`_ in the wiki lists a big,
+ ever-growing number of these. The three listed above are considered "second
+ generation" template engines and are a good place to start.
+
+
+Data persistence
+----------------
+
+*Data persistence*, while sounding very complicated, is just about storing data.
+This data might be the text of blog entries, the postings on a bulletin board or
+the text of a wiki page. There are, of course, a number of different ways to store
+information on a web server.
+
+Often, relational database engines like `MySQL <http://www.mysql.com/>`_ or
+`PostgreSQL <http://www.postgresql.org/>`_ are used because of their good
+performance when handling very large databases consisting of millions of
+entries. There is also a small database engine called `SQLite
+<http://www.sqlite.org/>`_, which is bundled with Python in the :mod:`sqlite3`
+module, and which uses only one file. It has no other dependencies. For
+smaller sites SQLite is just enough.
+
+Relational databases are *queried* using a language called `SQL
+<https://en.wikipedia.org/wiki/SQL>`_. Python programmers in general do not
+like SQL too much, as they prefer to work with objects. It is possible to save
+Python objects into a database using a technology called `ORM
+<https://en.wikipedia.org/wiki/Object-relational_mapping>`_ (Object Relational
+Mapping). ORM translates all object-oriented access into SQL code under the
+hood, so the developer does not need to think about it. Most `frameworks`_ use
+ORMs, and it works quite well.
+
+A second possibility is storing data in normal, plain text files (some
+times called "flat files"). This is very easy for simple sites,
+but can be difficult to get right if the web site is performing many
+updates to the stored data.
+
+A third possibility are object oriented databases (also called "object
+databases"). These databases store the object data in a form that closely
+parallels the way the objects are structured in memory during program
+execution. (By contrast, ORMs store the object data as rows of data in tables
+and relations between those rows.) Storing the objects directly has the
+advantage that nearly all objects can be saved in a straightforward way, unlike
+in relational databases where some objects are very hard to represent.
+
+`Frameworks`_ often give hints on which data storage method to choose. It is
+usually a good idea to stick to the data store recommended by the framework
+unless the application has special requirements better satisfied by an
+alternate storage mechanism.
+
+.. seealso::
+
+ * `Persistence Tools <https://wiki.python.org/moin/PersistenceTools>`_ lists
+ possibilities on how to save data in the file system. Some of these
+ modules are part of the standard library
+
+ * `Database Programming <https://wiki.python.org/moin/DatabaseProgramming>`_
+ helps with choosing a method for saving data
+
+ * `SQLAlchemy <http://www.sqlalchemy.org/>`_, the most powerful OR-Mapper
+ for Python, and `Elixir <https://pypi.org/project/Elixir>`_, which makes
+ SQLAlchemy easier to use
+
+ * `SQLObject <http://www.sqlobject.org/>`_, another popular OR-Mapper
+
+ * `ZODB <https://launchpad.net/zodb>`_ and `Durus
+ <https://www.mems-exchange.org/software/>`_, two object oriented
+ databases
+
+
+.. _framework:
+
+Frameworks
+==========
+
+The process of creating code to run web sites involves writing code to provide
+various services. The code to provide a particular service often works the
+same way regardless of the complexity or purpose of the web site in question.
+Abstracting these common solutions into reusable code produces what are called
+"frameworks" for web development. Perhaps the most well-known framework for
+web development is Ruby on Rails, but Python has its own frameworks. Some of
+these were partly inspired by Rails, or borrowed ideas from Rails, but many
+existed a long time before Rails.
+
+Originally Python web frameworks tended to incorporate all of the services
+needed to develop web sites as a giant, integrated set of tools. No two web
+frameworks were interoperable: a program developed for one could not be
+deployed on a different one without considerable re-engineering work. This led
+to the development of "minimalist" web frameworks that provided just the tools
+to communicate between the Python code and the http protocol, with all other
+services to be added on top via separate components. Some ad hoc standards
+were developed that allowed for limited interoperability between frameworks,
+such as a standard that allowed different template engines to be used
+interchangeably.
+
+Since the advent of WSGI, the Python web framework world has been evolving
+toward interoperability based on the WSGI standard. Now many web frameworks,
+whether "full stack" (providing all the tools one needs to deploy the most
+complex web sites) or minimalist, or anything in between, are built from
+collections of reusable components that can be used with more than one
+framework.
+
+The majority of users will probably want to select a "full stack" framework
+that has an active community. These frameworks tend to be well documented,
+and provide the easiest path to producing a fully functional web site in
+minimal time.
+
+
+Some notable frameworks
+-----------------------
+
+There are an incredible number of frameworks, so they cannot all be covered
+here. Instead we will briefly touch on some of the most popular.
+
+
+Django
+^^^^^^
+
+`Django <https://www.djangoproject.com/>`_ is a framework consisting of several
+tightly coupled elements which were written from scratch and work together very
+well. It includes an ORM which is quite powerful while being simple to use,
+and has a great online administration interface which makes it possible to edit
+the data in the database with a browser. The template engine is text-based and
+is designed to be usable for page designers who cannot write Python. It
+supports template inheritance and filters (which work like Unix pipes). Django
+has many handy features bundled, such as creation of RSS feeds or generic views,
+which make it possible to create web sites almost without writing any Python code.
+
+It has a big, international community, the members of which have created many
+web sites. There are also a lot of add-on projects which extend Django's normal
+functionality. This is partly due to Django's well written `online
+documentation <https://docs.djangoproject.com/>`_ and the `Django book
+<http://www.djangobook.com/>`_.
+
+
+.. note::
+
+ Although Django is an MVC-style framework, it names the elements
+ differently, which is described in the `Django FAQ
+ <https://docs.djangoproject.com/en/dev/faq/general/#django-appears-to-be-a-mvc-framework-but-you-call-the-controller-the-view-and-the-view-the-template-how-come-you-don-t-use-the-standard-names>`_.
+
+
+TurboGears
+^^^^^^^^^^
+
+Another popular web framework for Python is `TurboGears
+<http://www.turbogears.org/>`_. TurboGears takes the approach of using already
+existing components and combining them with glue code to create a seamless
+experience. TurboGears gives the user flexibility in choosing components. For
+example the ORM and template engine can be changed to use packages different
+from those used by default.
+
+The documentation can be found in the `TurboGears documentation
+<https://turbogears.readthedocs.org/>`_, where links to screencasts can be found.
+TurboGears has also an active user community which can respond to most related
+questions. There is also a `TurboGears book <http://turbogears.org/1.0/docs/TGBooks.html>`_
+published, which is a good starting point.
+
+The newest version of TurboGears, version 2.0, moves even further in direction
+of WSGI support and a component-based architecture. TurboGears 2 is based on
+the WSGI stack of another popular component-based web framework, `Pylons
+<http://www.pylonsproject.org/>`_.
+
+
+Zope
+^^^^
+
+The Zope framework is one of the "old original" frameworks. Its current
+incarnation in Zope2 is a tightly integrated full-stack framework. One of its
+most interesting feature is its tight integration with a powerful object
+database called the `ZODB <https://launchpad.net/zodb>`_ (Zope Object Database).
+Because of its highly integrated nature, Zope wound up in a somewhat isolated
+ecosystem: code written for Zope wasn't very usable outside of Zope, and
+vice-versa. To solve this problem the Zope 3 effort was started. Zope 3
+re-engineers Zope as a set of more cleanly isolated components. This effort
+was started before the advent of the WSGI standard, but there is WSGI support
+for Zope 3 from the `Repoze <http://repoze.org/>`_ project. Zope components
+have many years of production use behind them, and the Zope 3 project gives
+access to these components to the wider Python community. There is even a
+separate framework based on the Zope components: `Grok
+<http://grok.zope.org/>`_.
+
+Zope is also the infrastructure used by the `Plone <https://plone.org/>`_ content
+management system, one of the most powerful and popular content management
+systems available.
+
+
+Other notable frameworks
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Of course these are not the only frameworks that are available. There are
+many other frameworks worth mentioning.
+
+Another framework that's already been mentioned is `Pylons`_. Pylons is much
+like TurboGears, but with an even stronger emphasis on flexibility, which comes
+at the cost of being more difficult to use. Nearly every component can be
+exchanged, which makes it necessary to use the documentation of every single
+component, of which there are many. Pylons builds upon `Paste
+<http://pythonpaste.org/>`_, an extensive set of tools which are handy for WSGI.
+
+And that's still not everything. The most up-to-date information can always be
+found in the Python wiki.
+
+.. seealso::
+
+ The Python wiki contains an extensive list of `web frameworks
+ <https://wiki.python.org/moin/WebFrameworks>`_.
+
+ Most frameworks also have their own mailing lists and IRC channels, look out
+ for these on the projects' web sites.