Merged revisions 80346 via svnmerge from

svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r80346 | senthil.kumaran | 2010-04-22 16:23:30 +0530 (Thu, 22 Apr 2010) | 4 lines Fixing a note on encoding declaration, its usage in urlopen based on review comments from RDM and Ezio. ........
author: Senthil Kumaran <orsenthil@gmail.com> 2010-04-22 10:58:56 (GMT)
committer: Senthil Kumaran <orsenthil@gmail.com> 2010-04-22 10:58:56 (GMT)
commit: d0ab48f1c4bb5cbe96a799f9271f36c51c3debd3 (patch)
tree: f368c94b0c5d92353688741d592df6ca2df2f59e /Doc
parent: a3240dc09795d0078455d8e051c6377050186055 (diff)
download: cpython-d0ab48f1c4bb5cbe96a799f9271f36c51c3debd3.zip
cpython-d0ab48f1c4bb5cbe96a799f9271f36c51c3debd3.tar.gz
cpython-d0ab48f1c4bb5cbe96a799f9271f36c51c3debd3.tar.bz2
1 files changed, 24 insertions, 17 deletions
diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst
index 9496083..8928882 100644
--- a/Doc/library/urllib.request.rst
+++ b/Doc/library/urllib.request.rst
@@ -1072,30 +1072,37 @@ HTTPErrorProcessor Objects
 Examples
 --------
 
-This example gets the python.org main page and displays the first 100 bytes of
+This example gets the python.org main page and displays the first 300 bytes of
 it.::
 
    >>> import urllib.request
    >>> f = urllib.request.urlopen('http://www.python.org/')
-   >>> print(f.read(100))
-   b'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
-   <?xml-stylesheet href="./css/ht2html'
-
-Note that in Python 3, urlopen returns a bytes object by default. In many
-circumstances, you might expect the output of urlopen to be a string. This
-might be a carried over expectation from Python 2, where urlopen returned
-string or it might even the common usecase. In those cases, you should
-explicitly decode the bytes to string.
-
-In the examples below, we have chosen *utf-8* encoding for demonstration, you
-might choose the encoding which is suitable for the webpage you are
-requesting::
+   >>> print(f.read(300))
+   b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
+   xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
+   <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
+   <title>Python Programming '
+
+Note that urlopen returns a bytes object.  This is because there is no way
+for urlopen to automatically determine the encoding of the byte stream
+it receives from the http server. In general, a program will decode
+the returned bytes object to string once it determines or guesses
+the appropriate encoding.
+
+The following W3C document, http://www.w3.org/International/O-charset  , lists
+the various ways in which a (X)HTML or a XML document could have specified its
+encoding information.
+
+As python.org website uses *utf-8* encoding as specified in it's meta tag, we
+will use same for decoding the bytes object. ::
 
    >>> import urllib.request
    >>> f = urllib.request.urlopen('http://www.python.org/')
-   >>> print(f.read(100).decode('utf-8')
-   <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
-   <?xml-stylesheet href="./css/ht2html
+   >>> print(fp.read(100).decode('utf-8'))
+   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+   "http://www.w3.org/TR/xhtml1/DTD/xhtm
+
 
 In the following example, we are sending a data-stream to the stdin of a CGI
 and reading the data it returns to us. Note that this example will only work
author	Senthil Kumaran <orsenthil@gmail.com>	2010-04-22 10:58:56 (GMT)
committer	Senthil Kumaran <orsenthil@gmail.com>	2010-04-22 10:58:56 (GMT)
commit	d0ab48f1c4bb5cbe96a799f9271f36c51c3debd3 (patch)
tree	f368c94b0c5d92353688741d592df6ca2df2f59e /Doc
parent	a3240dc09795d0078455d8e051c6377050186055 (diff)
download	cpython-d0ab48f1c4bb5cbe96a799f9271f36c51c3debd3.zip cpython-d0ab48f1c4bb5cbe96a799f9271f36c51c3debd3.tar.gz cpython-d0ab48f1c4bb5cbe96a799f9271f36c51c3debd3.tar.bz2