summaryrefslogtreecommitdiffstats
path: root/Doc/howto
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/howto')
-rw-r--r--Doc/howto/doanddont.rst15
-rw-r--r--Doc/howto/functional.rst28
-rw-r--r--Doc/howto/regex.rst56
-rw-r--r--Doc/howto/unicode.rst68
-rw-r--r--Doc/howto/urllib2.rst24
5 files changed, 99 insertions, 92 deletions
diff --git a/Doc/howto/doanddont.rst b/Doc/howto/doanddont.rst
index a322c53..07652bc 100644
--- a/Doc/howto/doanddont.rst
+++ b/Doc/howto/doanddont.rst
@@ -59,7 +59,7 @@ its least useful properties.
Remember, you can never know for sure what names a module exports, so either
take what you need --- ``from module import name1, name2``, or keep them in the
-module and access on a per-need basis --- ``import module;print module.name``.
+module and access on a per-need basis --- ``import module; print(module.name)``.
When It Is Just Fine
@@ -181,7 +181,7 @@ The following is a very popular anti-idiom ::
def get_status(file):
if not os.path.exists(file):
- print "file not found"
+ print("file not found")
sys.exit(1)
return open(file).readline()
@@ -199,7 +199,7 @@ Here is a better way to do it. ::
try:
return open(file).readline()
except (IOError, OSError):
- print "file not found"
+ print("file not found")
sys.exit(1)
In this version, \*either\* the file gets opened and the line is read (so it
@@ -264,12 +264,13 @@ More useful functions in :mod:`os.path`: :func:`basename`, :func:`dirname` and
There are also many useful builtin functions people seem not to be aware of for
some reason: :func:`min` and :func:`max` can find the minimum/maximum of any
sequence with comparable semantics, for example, yet many people write their own
-:func:`max`/:func:`min`. Another highly useful function is :func:`reduce`. A
-classical use of :func:`reduce` is something like ::
+:func:`max`/:func:`min`. Another highly useful function is
+:func:`functools.reduce`. A classical use of :func:`reduce` is something like
+::
- import sys, operator
+ import sys, operator, functools
nums = map(float, sys.argv[1:])
- print reduce(operator.add, nums)/len(nums)
+ print(functools.reduce(operator.add, nums) / len(nums))
This cute little script prints the average of all numbers given on the command
line. The :func:`reduce` adds up all the numbers, and the rest is just some
diff --git a/Doc/howto/functional.rst b/Doc/howto/functional.rst
index bc12793..280749c 100644
--- a/Doc/howto/functional.rst
+++ b/Doc/howto/functional.rst
@@ -201,7 +201,7 @@ You can experiment with the iteration interface manually::
>>> L = [1,2,3]
>>> it = iter(L)
- >>> print it
+ >>> it
<iterator object at 0x8116870>
>>> it.next()
1
@@ -221,10 +221,10 @@ be an iterator or some object for which ``iter()`` can create an iterator.
These two statements are equivalent::
for i in iter(obj):
- print i
+ print(i)
for i in obj:
- print i
+ print(i)
Iterators can be materialized as lists or tuples by using the :func:`list` or
:func:`tuple` constructor functions::
@@ -274,7 +274,7 @@ dictionary's keys::
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m:
- ... print key, m[key]
+ ... print(key, m[key])
Mar 3
Feb 2
Aug 8
@@ -316,7 +316,7 @@ elements::
S = set((2, 3, 5, 7, 11, 13))
for i in S:
- print i
+ print(i)
@@ -568,18 +568,18 @@ the internal counter.
And here's an example of changing the counter:
>>> it = counter(10)
- >>> print it.next()
+ >>> it.next()
0
- >>> print it.next()
+ >>> it.next()
1
- >>> print it.send(8)
+ >>> it.send(8)
8
- >>> print it.next()
+ >>> it.next()
9
- >>> print it.next()
+ >>> it.next()
Traceback (most recent call last):
File ``t.py'', line 15, in ?
- print it.next()
+ it.next()
StopIteration
Because ``yield`` will often be returning ``None``, you should always check for
@@ -721,7 +721,7 @@ indexes at which certain conditions are met::
f = open('data.txt', 'r')
for i, line in enumerate(f):
if line.strip() == '':
- print 'Blank line at line #%i' % i
+ print('Blank line at line #%i' % i)
``sorted(iterable, [cmp=None], [key=None], [reverse=False)`` collects all the
elements of the iterable into a list, sorts the list, and returns the sorted
@@ -1100,7 +1100,7 @@ Here's a small but realistic example::
def log (message, subsystem):
"Write the contents of 'message' to the specified subsystem."
- print '%s: %s' % (subsystem, message)
+ print('%s: %s' % (subsystem, message))
...
server_log = functools.partial(log, subsystem='server')
@@ -1395,6 +1395,6 @@ features in Python 2.5.
for elem in slice[:-1]:
sys.stdout.write(str(elem))
sys.stdout.write(', ')
- print elem[-1]
+ print(elem[-1])
diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst
index b200764..1f26687 100644
--- a/Doc/howto/regex.rst
+++ b/Doc/howto/regex.rst
@@ -1,5 +1,5 @@
****************************
- Regular Expression HOWTO
+ Regular Expression HOWTO
****************************
:Author: A.M. Kuchling
@@ -263,7 +263,7 @@ performing string substitutions. ::
>>> import re
>>> p = re.compile('ab*')
- >>> print p
+ >>> p
<re.RegexObject instance at 80b4150>
:func:`re.compile` also accepts an optional *flags* argument, used to enable
@@ -387,7 +387,7 @@ interpreter to print no output. You can explicitly print the result of
:meth:`match` to make this clear. ::
>>> p.match("")
- >>> print p.match("")
+ >>> print(p.match(""))
None
Now, let's try it on a string that it should match, such as ``tempo``. In this
@@ -395,7 +395,7 @@ case, :meth:`match` will return a :class:`MatchObject`, so you should store the
result in a variable for later use. ::
>>> m = p.match('tempo')
- >>> print m
+ >>> m
<_sre.SRE_Match object at 80c4f68>
Now you can query the :class:`MatchObject` for information about the matching
@@ -432,9 +432,9 @@ will always be zero. However, the :meth:`search` method of :class:`RegexObject`
instances scans through the string, so the match may not start at zero in that
case. ::
- >>> print p.match('::: message')
+ >>> print(p.match('::: message'))
None
- >>> m = p.search('::: message') ; print m
+ >>> m = p.search('::: message') ; print(m)
<re.MatchObject instance at 80c9650>
>>> m.group()
'message'
@@ -447,9 +447,9 @@ in a variable, and then check if it was ``None``. This usually looks like::
p = re.compile( ... )
m = p.match( 'string goes here' )
if m:
- print 'Match found: ', m.group()
+ print('Match found: ', m.group())
else:
- print 'No match'
+ print('No match')
Two :class:`RegexObject` methods return all of the matches for a pattern.
:meth:`findall` returns a list of matching strings::
@@ -466,7 +466,7 @@ instances as an iterator. [#]_ ::
>>> iterator
<callable-iterator object at 0x401833ac>
>>> for match in iterator:
- ... print match.span()
+ ... print(match.span())
...
(0, 2)
(22, 24)
@@ -483,7 +483,7 @@ take the same arguments as the corresponding :class:`RegexObject` method, with
the RE string added as the first argument, and still return either ``None`` or a
:class:`MatchObject` instance. ::
- >>> print re.match(r'From\s+', 'Fromage amk')
+ >>> print(re.match(r'From\s+', 'Fromage amk'))
None
>>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998')
<re.MatchObject instance at 80c5978>
@@ -674,9 +674,9 @@ given location, they can obviously be matched an infinite number of times.
For example, if you wish to match the word ``From`` only at the beginning of a
line, the RE to use is ``^From``. ::
- >>> print re.search('^From', 'From Here to Eternity')
+ >>> print(re.search('^From', 'From Here to Eternity'))
<re.MatchObject instance at 80c1520>
- >>> print re.search('^From', 'Reciting From Memory')
+ >>> print(re.search('^From', 'Reciting From Memory'))
None
.. % To match a literal \character{\^}, use \regexp{\e\^} or enclose it
@@ -686,11 +686,11 @@ given location, they can obviously be matched an infinite number of times.
Matches at the end of a line, which is defined as either the end of the string,
or any location followed by a newline character. ::
- >>> print re.search('}$', '{block}')
+ >>> print(re.search('}$', '{block}'))
<re.MatchObject instance at 80adfa8>
- >>> print re.search('}$', '{block} ')
+ >>> print(re.search('}$', '{block} '))
None
- >>> print re.search('}$', '{block}\n')
+ >>> print(re.search('}$', '{block}\n'))
<re.MatchObject instance at 80adfa8>
To match a literal ``'$'``, use ``\$`` or enclose it inside a character class,
@@ -717,11 +717,11 @@ given location, they can obviously be matched an infinite number of times.
match when it's contained inside another word. ::
>>> p = re.compile(r'\bclass\b')
- >>> print p.search('no class at all')
+ >>> print(p.search('no class at all'))
<re.MatchObject instance at 80c8f28>
- >>> print p.search('the declassified algorithm')
+ >>> print(p.search('the declassified algorithm'))
None
- >>> print p.search('one subclass is')
+ >>> print(p.search('one subclass is'))
None
There are two subtleties you should remember when using this special sequence.
@@ -733,9 +733,9 @@ given location, they can obviously be matched an infinite number of times.
in front of the RE string. ::
>>> p = re.compile('\bclass\b')
- >>> print p.search('no class at all')
+ >>> print(p.search('no class at all'))
None
- >>> print p.search('\b' + 'class' + '\b')
+ >>> print(p.search('\b' + 'class' + '\b') )
<re.MatchObject instance at 80c3ee0>
Second, inside a character class, where there's no use for this assertion,
@@ -773,7 +773,7 @@ of a group with a repeating qualifier, such as ``*``, ``+``, ``?``, or
``ab``. ::
>>> p = re.compile('(ab)*')
- >>> print p.match('ababababab').span()
+ >>> print(p.match('ababababab').span())
(0, 10)
Groups indicated with ``'('``, ``')'`` also capture the starting and ending
@@ -1247,17 +1247,17 @@ It's important to keep this distinction in mind. Remember, :func:`match` will
only report a successful match which will start at 0; if the match wouldn't
start at zero, :func:`match` will *not* report it. ::
- >>> print re.match('super', 'superstition').span()
+ >>> print(re.match('super', 'superstition').span())
(0, 5)
- >>> print re.match('super', 'insuperable')
+ >>> print(re.match('super', 'insuperable'))
None
On the other hand, :func:`search` will scan forward through the string,
reporting the first match it finds. ::
- >>> print re.search('super', 'superstition').span()
+ >>> print(re.search('super', 'superstition').span())
(0, 5)
- >>> print re.search('super', 'insuperable').span()
+ >>> print(re.search('super', 'insuperable').span())
(2, 7)
Sometimes you'll be tempted to keep using :func:`re.match`, and just add ``.*``
@@ -1286,9 +1286,9 @@ doesn't work because of the greedy nature of ``.*``. ::
>>> s = '<html><head><title>Title</title>'
>>> len(s)
32
- >>> print re.match('<.*>', s).span()
+ >>> print(re.match('<.*>', s).span())
(0, 32)
- >>> print re.match('<.*>', s).group()
+ >>> print(re.match('<.*>', s).group())
<html><head><title>Title</title>
The RE matches the ``'<'`` in ``<html>``, and the ``.*`` consumes the rest of
@@ -1304,7 +1304,7 @@ example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and
when it fails, the engine advances a character at a time, retrying the ``'>'``
at every step. This produces just the right result::
- >>> print re.match('<.*?>', s).group()
+ >>> print(re.match('<.*?>', s).group())
<html>
(Note that parsing HTML or XML with regular expressions is painful.
diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst
index 16bd5a8..8b52039 100644
--- a/Doc/howto/unicode.rst
+++ b/Doc/howto/unicode.rst
@@ -7,6 +7,12 @@
This HOWTO discusses Python's support for Unicode, and explains various problems
that people commonly encounter when trying to work with Unicode.
+.. XXX fix it
+.. warning::
+
+ This HOWTO has not yet been updated for Python 3000's string object changes.
+
+
Introduction to Unicode
=======================
@@ -122,8 +128,8 @@ The first encoding you might think of is an array of 32-bit integers. In this
representation, the string "Python" would look like this::
P y t h o n
- 0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00
- 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
+ 0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
This representation is straightforward but using it presents a number of
problems.
@@ -181,7 +187,7 @@ UTF-8.) UTF-8 uses the following rules:
between 128 and 255.
3. Code points >0x7ff are turned into three- or four-byte sequences, where each
byte of the sequence is between 128 and 255.
-
+
UTF-8 has several convenient properties:
1. It can handle any Unicode code point.
@@ -256,7 +262,7 @@ characters greater than 127 will be treated as errors::
>>> unicode('abcdef' + chr(255))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
- UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
+ UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
ordinal not in range(128)
The ``errors`` argument specifies the response when the input string can't be
@@ -268,7 +274,7 @@ Unicode result). The following examples show the differences::
>>> unicode('\x80abc', errors='strict')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
- UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
+ UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0:
ordinal not in range(128)
>>> unicode('\x80abc', errors='replace')
u'\ufffdabc'
@@ -354,7 +360,7 @@ interprets the string using the given encoding::
>>> u2 = utf8_version.decode('utf-8') # Decode using UTF-8
>>> u == u2 # The two strings match
True
-
+
The low-level routines for registering and accessing the available encodings are
found in the :mod:`codecs` module. However, the encoding and decoding functions
returned by this module are usually more low-level than is comfortable, so I'm
@@ -366,8 +372,8 @@ covered here. Consult the Python documentation to learn more about this module.
The most commonly used part of the :mod:`codecs` module is the
:func:`codecs.open` function which will be discussed in the section on input and
output.
-
-
+
+
Unicode Literals in Python Source Code
--------------------------------------
@@ -385,10 +391,10 @@ arbitrary code point. Octal escapes can go up to U+01ff, which is octal 777.
>>> s = u"a\xac\u1234\u20ac\U00008000"
^^^^ two-digit hex escape
- ^^^^^^ four-digit Unicode escape
+ ^^^^^^ four-digit Unicode escape
^^^^^^^^^^ eight-digit Unicode escape
- >>> for c in s: print ord(c),
- ...
+ >>> for c in s: print(ord(c), end=" ")
+ ...
97 172 4660 8364 32768
Using escape sequences for code points greater than 127 is fine in small doses,
@@ -408,10 +414,10 @@ either the first or second line of the source file::
#!/usr/bin/env python
# -*- coding: latin-1 -*-
-
+
u = u'abcdé'
- print ord(u[-1])
-
+ print(ord(u[-1]))
+
The syntax is inspired by Emacs's notation for specifying variables local to a
file. Emacs supports many different variables, but Python only supports
'coding'. The ``-*-`` symbols indicate that the comment is special; within
@@ -426,15 +432,15 @@ encoding declaration::
#!/usr/bin/env python
u = u'abcdé'
- print ord(u[-1])
+ print(ord(u[-1]))
When you run it with Python 2.4, it will output the following warning::
amk:~$ python p263.py
- sys:1: DeprecationWarning: Non-ASCII character '\xe9'
- in file p263.py on line 2, but no encoding declared;
+ sys:1: DeprecationWarning: Non-ASCII character '\xe9'
+ in file p263.py on line 2, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
-
+
Unicode Properties
------------------
@@ -450,15 +456,15 @@ The following program displays some information about several characters, and
prints the numeric value of one particular character::
import unicodedata
-
+
u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) + unichr(13231)
-
+
for i, c in enumerate(u):
- print i, '%04x' % ord(c), unicodedata.category(c),
- print unicodedata.name(c)
-
+ print(i, '%04x' % ord(c), unicodedata.category(c), end=" ")
+ print(unicodedata.name(c))
+
# Get numeric value of second character
- print unicodedata.numeric(u[1])
+ print(unicodedata.numeric(u[1]))
When run, this prints::
@@ -545,7 +551,7 @@ Reading Unicode from a file is therefore simple::
import codecs
f = codecs.open('unicode.rst', encoding='utf-8')
for line in f:
- print repr(line)
+ print(repr(line))
It's also possible to open files in update mode, allowing both reading and
writing::
@@ -553,7 +559,7 @@ writing::
f = codecs.open('test', encoding='utf-8', mode='w+')
f.write(u'\u4500 blah blah blah\n')
f.seek(0)
- print repr(f.readline()[:1])
+ print(repr(f.readline()[:1]))
f.close()
Unicode character U+FEFF is used as a byte-order mark (BOM), and is often
@@ -606,8 +612,8 @@ default filesystem encoding is UTF-8, running the following program::
f.close()
import os
- print os.listdir('.')
- print os.listdir(u'.')
+ print(os.listdir('.'))
+ print(os.listdir(u'.'))
will produce the following output::
@@ -619,7 +625,7 @@ The first list contains UTF-8-encoded filenames, and the second list contains
the Unicode versions.
-
+
Tips for Writing Unicode-aware Programs
---------------------------------------
@@ -665,7 +671,7 @@ this code::
unicode_name = filename.decode(encoding)
f = open(unicode_name, 'r')
# ... return contents of file ...
-
+
However, if an attacker could specify the ``'base64'`` encoding, they could pass
``'L2V0Yy9wYXNzd2Q='``, which is the base-64 encoded form of the string
``'/etc/passwd'``, to read a system file. The above code looks for ``'/'``
@@ -701,7 +707,7 @@ Version 1.02: posted August 16 2005. Corrects factual errors.
.. comment Describe obscure -U switch somewhere?
.. comment Describe use of codecs.StreamRecoder and StreamReaderWriter
-.. comment
+.. comment
Original outline:
- [ ] Unicode introduction
diff --git a/Doc/howto/urllib2.rst b/Doc/howto/urllib2.rst
index dc20b02..05588b9 100644
--- a/Doc/howto/urllib2.rst
+++ b/Doc/howto/urllib2.rst
@@ -134,7 +134,7 @@ This is done as follows::
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.urlencode(data)
- >>> print url_values
+ >>> print(url_values)
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
@@ -202,7 +202,7 @@ e.g. ::
>>> req = urllib2.Request('http://www.pretend_server.org')
>>> try: urllib2.urlopen(req)
>>> except URLError, e:
- >>> print e.reason
+ >>> print(e.reason)
>>>
(4, 'getaddrinfo failed')
@@ -311,8 +311,8 @@ geturl, and info, methods. ::
>>> try:
>>> urllib2.urlopen(req)
>>> except URLError, e:
- >>> print e.code
- >>> print e.read()
+ >>> print(e.code)
+ >>> print(e.read())
>>>
404
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
@@ -339,11 +339,11 @@ Number 1
try:
response = urlopen(req)
except HTTPError, e:
- print 'The server couldn\'t fulfill the request.'
- print 'Error code: ', e.code
+ print('The server couldn\'t fulfill the request.')
+ print('Error code: ', e.code)
except URLError, e:
- print 'We failed to reach a server.'
- print 'Reason: ', e.reason
+ print('We failed to reach a server.')
+ print('Reason: ', e.reason)
else:
# everything is fine
@@ -364,11 +364,11 @@ Number 2
response = urlopen(req)
except URLError, e:
if hasattr(e, 'reason'):
- print 'We failed to reach a server.'
- print 'Reason: ', e.reason
+ print('We failed to reach a server.')
+ print('Reason: ', e.reason)
elif hasattr(e, 'code'):
- print 'The server couldn\'t fulfill the request.'
- print 'Error code: ', e.code
+ print('The server couldn\'t fulfill the request.')
+ print('Error code: ', e.code)
else:
# everything is fine