diff options
Diffstat (limited to 'Doc/howto')
-rw-r--r-- | Doc/howto/doanddont.rst | 15 | ||||
-rw-r--r-- | Doc/howto/functional.rst | 28 | ||||
-rw-r--r-- | Doc/howto/regex.rst | 56 | ||||
-rw-r--r-- | Doc/howto/unicode.rst | 68 | ||||
-rw-r--r-- | Doc/howto/urllib2.rst | 24 |
5 files changed, 99 insertions, 92 deletions
diff --git a/Doc/howto/doanddont.rst b/Doc/howto/doanddont.rst index a322c53..07652bc 100644 --- a/Doc/howto/doanddont.rst +++ b/Doc/howto/doanddont.rst @@ -59,7 +59,7 @@ its least useful properties. Remember, you can never know for sure what names a module exports, so either take what you need --- ``from module import name1, name2``, or keep them in the -module and access on a per-need basis --- ``import module;print module.name``. +module and access on a per-need basis --- ``import module; print(module.name)``. When It Is Just Fine @@ -181,7 +181,7 @@ The following is a very popular anti-idiom :: def get_status(file): if not os.path.exists(file): - print "file not found" + print("file not found") sys.exit(1) return open(file).readline() @@ -199,7 +199,7 @@ Here is a better way to do it. :: try: return open(file).readline() except (IOError, OSError): - print "file not found" + print("file not found") sys.exit(1) In this version, \*either\* the file gets opened and the line is read (so it @@ -264,12 +264,13 @@ More useful functions in :mod:`os.path`: :func:`basename`, :func:`dirname` and There are also many useful builtin functions people seem not to be aware of for some reason: :func:`min` and :func:`max` can find the minimum/maximum of any sequence with comparable semantics, for example, yet many people write their own -:func:`max`/:func:`min`. Another highly useful function is :func:`reduce`. A -classical use of :func:`reduce` is something like :: +:func:`max`/:func:`min`. Another highly useful function is +:func:`functools.reduce`. A classical use of :func:`reduce` is something like +:: - import sys, operator + import sys, operator, functools nums = map(float, sys.argv[1:]) - print reduce(operator.add, nums)/len(nums) + print(functools.reduce(operator.add, nums) / len(nums)) This cute little script prints the average of all numbers given on the command line. The :func:`reduce` adds up all the numbers, and the rest is just some diff --git a/Doc/howto/functional.rst b/Doc/howto/functional.rst index bc12793..280749c 100644 --- a/Doc/howto/functional.rst +++ b/Doc/howto/functional.rst @@ -201,7 +201,7 @@ You can experiment with the iteration interface manually:: >>> L = [1,2,3] >>> it = iter(L) - >>> print it + >>> it <iterator object at 0x8116870> >>> it.next() 1 @@ -221,10 +221,10 @@ be an iterator or some object for which ``iter()`` can create an iterator. These two statements are equivalent:: for i in iter(obj): - print i + print(i) for i in obj: - print i + print(i) Iterators can be materialized as lists or tuples by using the :func:`list` or :func:`tuple` constructor functions:: @@ -274,7 +274,7 @@ dictionary's keys:: >>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, ... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} >>> for key in m: - ... print key, m[key] + ... print(key, m[key]) Mar 3 Feb 2 Aug 8 @@ -316,7 +316,7 @@ elements:: S = set((2, 3, 5, 7, 11, 13)) for i in S: - print i + print(i) @@ -568,18 +568,18 @@ the internal counter. And here's an example of changing the counter: >>> it = counter(10) - >>> print it.next() + >>> it.next() 0 - >>> print it.next() + >>> it.next() 1 - >>> print it.send(8) + >>> it.send(8) 8 - >>> print it.next() + >>> it.next() 9 - >>> print it.next() + >>> it.next() Traceback (most recent call last): File ``t.py'', line 15, in ? - print it.next() + it.next() StopIteration Because ``yield`` will often be returning ``None``, you should always check for @@ -721,7 +721,7 @@ indexes at which certain conditions are met:: f = open('data.txt', 'r') for i, line in enumerate(f): if line.strip() == '': - print 'Blank line at line #%i' % i + print('Blank line at line #%i' % i) ``sorted(iterable, [cmp=None], [key=None], [reverse=False)`` collects all the elements of the iterable into a list, sorts the list, and returns the sorted @@ -1100,7 +1100,7 @@ Here's a small but realistic example:: def log (message, subsystem): "Write the contents of 'message' to the specified subsystem." - print '%s: %s' % (subsystem, message) + print('%s: %s' % (subsystem, message)) ... server_log = functools.partial(log, subsystem='server') @@ -1395,6 +1395,6 @@ features in Python 2.5. for elem in slice[:-1]: sys.stdout.write(str(elem)) sys.stdout.write(', ') - print elem[-1] + print(elem[-1]) diff --git a/Doc/howto/regex.rst b/Doc/howto/regex.rst index b200764..1f26687 100644 --- a/Doc/howto/regex.rst +++ b/Doc/howto/regex.rst @@ -1,5 +1,5 @@ **************************** - Regular Expression HOWTO + Regular Expression HOWTO **************************** :Author: A.M. Kuchling @@ -263,7 +263,7 @@ performing string substitutions. :: >>> import re >>> p = re.compile('ab*') - >>> print p + >>> p <re.RegexObject instance at 80b4150> :func:`re.compile` also accepts an optional *flags* argument, used to enable @@ -387,7 +387,7 @@ interpreter to print no output. You can explicitly print the result of :meth:`match` to make this clear. :: >>> p.match("") - >>> print p.match("") + >>> print(p.match("")) None Now, let's try it on a string that it should match, such as ``tempo``. In this @@ -395,7 +395,7 @@ case, :meth:`match` will return a :class:`MatchObject`, so you should store the result in a variable for later use. :: >>> m = p.match('tempo') - >>> print m + >>> m <_sre.SRE_Match object at 80c4f68> Now you can query the :class:`MatchObject` for information about the matching @@ -432,9 +432,9 @@ will always be zero. However, the :meth:`search` method of :class:`RegexObject` instances scans through the string, so the match may not start at zero in that case. :: - >>> print p.match('::: message') + >>> print(p.match('::: message')) None - >>> m = p.search('::: message') ; print m + >>> m = p.search('::: message') ; print(m) <re.MatchObject instance at 80c9650> >>> m.group() 'message' @@ -447,9 +447,9 @@ in a variable, and then check if it was ``None``. This usually looks like:: p = re.compile( ... ) m = p.match( 'string goes here' ) if m: - print 'Match found: ', m.group() + print('Match found: ', m.group()) else: - print 'No match' + print('No match') Two :class:`RegexObject` methods return all of the matches for a pattern. :meth:`findall` returns a list of matching strings:: @@ -466,7 +466,7 @@ instances as an iterator. [#]_ :: >>> iterator <callable-iterator object at 0x401833ac> >>> for match in iterator: - ... print match.span() + ... print(match.span()) ... (0, 2) (22, 24) @@ -483,7 +483,7 @@ take the same arguments as the corresponding :class:`RegexObject` method, with the RE string added as the first argument, and still return either ``None`` or a :class:`MatchObject` instance. :: - >>> print re.match(r'From\s+', 'Fromage amk') + >>> print(re.match(r'From\s+', 'Fromage amk')) None >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') <re.MatchObject instance at 80c5978> @@ -674,9 +674,9 @@ given location, they can obviously be matched an infinite number of times. For example, if you wish to match the word ``From`` only at the beginning of a line, the RE to use is ``^From``. :: - >>> print re.search('^From', 'From Here to Eternity') + >>> print(re.search('^From', 'From Here to Eternity')) <re.MatchObject instance at 80c1520> - >>> print re.search('^From', 'Reciting From Memory') + >>> print(re.search('^From', 'Reciting From Memory')) None .. % To match a literal \character{\^}, use \regexp{\e\^} or enclose it @@ -686,11 +686,11 @@ given location, they can obviously be matched an infinite number of times. Matches at the end of a line, which is defined as either the end of the string, or any location followed by a newline character. :: - >>> print re.search('}$', '{block}') + >>> print(re.search('}$', '{block}')) <re.MatchObject instance at 80adfa8> - >>> print re.search('}$', '{block} ') + >>> print(re.search('}$', '{block} ')) None - >>> print re.search('}$', '{block}\n') + >>> print(re.search('}$', '{block}\n')) <re.MatchObject instance at 80adfa8> To match a literal ``'$'``, use ``\$`` or enclose it inside a character class, @@ -717,11 +717,11 @@ given location, they can obviously be matched an infinite number of times. match when it's contained inside another word. :: >>> p = re.compile(r'\bclass\b') - >>> print p.search('no class at all') + >>> print(p.search('no class at all')) <re.MatchObject instance at 80c8f28> - >>> print p.search('the declassified algorithm') + >>> print(p.search('the declassified algorithm')) None - >>> print p.search('one subclass is') + >>> print(p.search('one subclass is')) None There are two subtleties you should remember when using this special sequence. @@ -733,9 +733,9 @@ given location, they can obviously be matched an infinite number of times. in front of the RE string. :: >>> p = re.compile('\bclass\b') - >>> print p.search('no class at all') + >>> print(p.search('no class at all')) None - >>> print p.search('\b' + 'class' + '\b') + >>> print(p.search('\b' + 'class' + '\b') ) <re.MatchObject instance at 80c3ee0> Second, inside a character class, where there's no use for this assertion, @@ -773,7 +773,7 @@ of a group with a repeating qualifier, such as ``*``, ``+``, ``?``, or ``ab``. :: >>> p = re.compile('(ab)*') - >>> print p.match('ababababab').span() + >>> print(p.match('ababababab').span()) (0, 10) Groups indicated with ``'('``, ``')'`` also capture the starting and ending @@ -1247,17 +1247,17 @@ It's important to keep this distinction in mind. Remember, :func:`match` will only report a successful match which will start at 0; if the match wouldn't start at zero, :func:`match` will *not* report it. :: - >>> print re.match('super', 'superstition').span() + >>> print(re.match('super', 'superstition').span()) (0, 5) - >>> print re.match('super', 'insuperable') + >>> print(re.match('super', 'insuperable')) None On the other hand, :func:`search` will scan forward through the string, reporting the first match it finds. :: - >>> print re.search('super', 'superstition').span() + >>> print(re.search('super', 'superstition').span()) (0, 5) - >>> print re.search('super', 'insuperable').span() + >>> print(re.search('super', 'insuperable').span()) (2, 7) Sometimes you'll be tempted to keep using :func:`re.match`, and just add ``.*`` @@ -1286,9 +1286,9 @@ doesn't work because of the greedy nature of ``.*``. :: >>> s = '<html><head><title>Title</title>' >>> len(s) 32 - >>> print re.match('<.*>', s).span() + >>> print(re.match('<.*>', s).span()) (0, 32) - >>> print re.match('<.*>', s).group() + >>> print(re.match('<.*>', s).group()) <html><head><title>Title</title> The RE matches the ``'<'`` in ``<html>``, and the ``.*`` consumes the rest of @@ -1304,7 +1304,7 @@ example, the ``'>'`` is tried immediately after the first ``'<'`` matches, and when it fails, the engine advances a character at a time, retrying the ``'>'`` at every step. This produces just the right result:: - >>> print re.match('<.*?>', s).group() + >>> print(re.match('<.*?>', s).group()) <html> (Note that parsing HTML or XML with regular expressions is painful. diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst index 16bd5a8..8b52039 100644 --- a/Doc/howto/unicode.rst +++ b/Doc/howto/unicode.rst @@ -7,6 +7,12 @@ This HOWTO discusses Python's support for Unicode, and explains various problems that people commonly encounter when trying to work with Unicode. +.. XXX fix it +.. warning:: + + This HOWTO has not yet been updated for Python 3000's string object changes. + + Introduction to Unicode ======================= @@ -122,8 +128,8 @@ The first encoding you might think of is an array of 32-bit integers. In this representation, the string "Python" would look like this:: P y t h o n - 0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00 - 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 + 0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00 + 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 This representation is straightforward but using it presents a number of problems. @@ -181,7 +187,7 @@ UTF-8.) UTF-8 uses the following rules: between 128 and 255. 3. Code points >0x7ff are turned into three- or four-byte sequences, where each byte of the sequence is between 128 and 255. - + UTF-8 has several convenient properties: 1. It can handle any Unicode code point. @@ -256,7 +262,7 @@ characters greater than 127 will be treated as errors:: >>> unicode('abcdef' + chr(255)) Traceback (most recent call last): File "<stdin>", line 1, in ? - UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6: + UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6: ordinal not in range(128) The ``errors`` argument specifies the response when the input string can't be @@ -268,7 +274,7 @@ Unicode result). The following examples show the differences:: >>> unicode('\x80abc', errors='strict') Traceback (most recent call last): File "<stdin>", line 1, in ? - UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: + UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128) >>> unicode('\x80abc', errors='replace') u'\ufffdabc' @@ -354,7 +360,7 @@ interprets the string using the given encoding:: >>> u2 = utf8_version.decode('utf-8') # Decode using UTF-8 >>> u == u2 # The two strings match True - + The low-level routines for registering and accessing the available encodings are found in the :mod:`codecs` module. However, the encoding and decoding functions returned by this module are usually more low-level than is comfortable, so I'm @@ -366,8 +372,8 @@ covered here. Consult the Python documentation to learn more about this module. The most commonly used part of the :mod:`codecs` module is the :func:`codecs.open` function which will be discussed in the section on input and output. - - + + Unicode Literals in Python Source Code -------------------------------------- @@ -385,10 +391,10 @@ arbitrary code point. Octal escapes can go up to U+01ff, which is octal 777. >>> s = u"a\xac\u1234\u20ac\U00008000" ^^^^ two-digit hex escape - ^^^^^^ four-digit Unicode escape + ^^^^^^ four-digit Unicode escape ^^^^^^^^^^ eight-digit Unicode escape - >>> for c in s: print ord(c), - ... + >>> for c in s: print(ord(c), end=" ") + ... 97 172 4660 8364 32768 Using escape sequences for code points greater than 127 is fine in small doses, @@ -408,10 +414,10 @@ either the first or second line of the source file:: #!/usr/bin/env python # -*- coding: latin-1 -*- - + u = u'abcdé' - print ord(u[-1]) - + print(ord(u[-1])) + The syntax is inspired by Emacs's notation for specifying variables local to a file. Emacs supports many different variables, but Python only supports 'coding'. The ``-*-`` symbols indicate that the comment is special; within @@ -426,15 +432,15 @@ encoding declaration:: #!/usr/bin/env python u = u'abcdé' - print ord(u[-1]) + print(ord(u[-1])) When you run it with Python 2.4, it will output the following warning:: amk:~$ python p263.py - sys:1: DeprecationWarning: Non-ASCII character '\xe9' - in file p263.py on line 2, but no encoding declared; + sys:1: DeprecationWarning: Non-ASCII character '\xe9' + in file p263.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details - + Unicode Properties ------------------ @@ -450,15 +456,15 @@ The following program displays some information about several characters, and prints the numeric value of one particular character:: import unicodedata - + u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) + unichr(13231) - + for i, c in enumerate(u): - print i, '%04x' % ord(c), unicodedata.category(c), - print unicodedata.name(c) - + print(i, '%04x' % ord(c), unicodedata.category(c), end=" ") + print(unicodedata.name(c)) + # Get numeric value of second character - print unicodedata.numeric(u[1]) + print(unicodedata.numeric(u[1])) When run, this prints:: @@ -545,7 +551,7 @@ Reading Unicode from a file is therefore simple:: import codecs f = codecs.open('unicode.rst', encoding='utf-8') for line in f: - print repr(line) + print(repr(line)) It's also possible to open files in update mode, allowing both reading and writing:: @@ -553,7 +559,7 @@ writing:: f = codecs.open('test', encoding='utf-8', mode='w+') f.write(u'\u4500 blah blah blah\n') f.seek(0) - print repr(f.readline()[:1]) + print(repr(f.readline()[:1])) f.close() Unicode character U+FEFF is used as a byte-order mark (BOM), and is often @@ -606,8 +612,8 @@ default filesystem encoding is UTF-8, running the following program:: f.close() import os - print os.listdir('.') - print os.listdir(u'.') + print(os.listdir('.')) + print(os.listdir(u'.')) will produce the following output:: @@ -619,7 +625,7 @@ The first list contains UTF-8-encoded filenames, and the second list contains the Unicode versions. - + Tips for Writing Unicode-aware Programs --------------------------------------- @@ -665,7 +671,7 @@ this code:: unicode_name = filename.decode(encoding) f = open(unicode_name, 'r') # ... return contents of file ... - + However, if an attacker could specify the ``'base64'`` encoding, they could pass ``'L2V0Yy9wYXNzd2Q='``, which is the base-64 encoded form of the string ``'/etc/passwd'``, to read a system file. The above code looks for ``'/'`` @@ -701,7 +707,7 @@ Version 1.02: posted August 16 2005. Corrects factual errors. .. comment Describe obscure -U switch somewhere? .. comment Describe use of codecs.StreamRecoder and StreamReaderWriter -.. comment +.. comment Original outline: - [ ] Unicode introduction diff --git a/Doc/howto/urllib2.rst b/Doc/howto/urllib2.rst index dc20b02..05588b9 100644 --- a/Doc/howto/urllib2.rst +++ b/Doc/howto/urllib2.rst @@ -134,7 +134,7 @@ This is done as follows:: >>> data['location'] = 'Northampton' >>> data['language'] = 'Python' >>> url_values = urllib.urlencode(data) - >>> print url_values + >>> print(url_values) name=Somebody+Here&language=Python&location=Northampton >>> url = 'http://www.example.com/example.cgi' >>> full_url = url + '?' + url_values @@ -202,7 +202,7 @@ e.g. :: >>> req = urllib2.Request('http://www.pretend_server.org') >>> try: urllib2.urlopen(req) >>> except URLError, e: - >>> print e.reason + >>> print(e.reason) >>> (4, 'getaddrinfo failed') @@ -311,8 +311,8 @@ geturl, and info, methods. :: >>> try: >>> urllib2.urlopen(req) >>> except URLError, e: - >>> print e.code - >>> print e.read() + >>> print(e.code) + >>> print(e.read()) >>> 404 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" @@ -339,11 +339,11 @@ Number 1 try: response = urlopen(req) except HTTPError, e: - print 'The server couldn\'t fulfill the request.' - print 'Error code: ', e.code + print('The server couldn\'t fulfill the request.') + print('Error code: ', e.code) except URLError, e: - print 'We failed to reach a server.' - print 'Reason: ', e.reason + print('We failed to reach a server.') + print('Reason: ', e.reason) else: # everything is fine @@ -364,11 +364,11 @@ Number 2 response = urlopen(req) except URLError, e: if hasattr(e, 'reason'): - print 'We failed to reach a server.' - print 'Reason: ', e.reason + print('We failed to reach a server.') + print('Reason: ', e.reason) elif hasattr(e, 'code'): - print 'The server couldn\'t fulfill the request.' - print 'Error code: ', e.code + print('The server couldn\'t fulfill the request.') + print('Error code: ', e.code) else: # everything is fine |