diff options
author | Senthil Kumaran <orsenthil@gmail.com> | 2010-04-22 10:53:30 (GMT) |
---|---|---|
committer | Senthil Kumaran <orsenthil@gmail.com> | 2010-04-22 10:53:30 (GMT) |
commit | 0c2d8b8e51e8bcebd21f8fe33ca0c816e3320c4c (patch) | |
tree | 0ba31b3795fd395e623e1606135fd12e353a3aa5 /Doc/library/urllib.request.rst | |
parent | 5e73a819ca7317fa3a58a6f1101c488d850fea57 (diff) | |
download | cpython-0c2d8b8e51e8bcebd21f8fe33ca0c816e3320c4c.zip cpython-0c2d8b8e51e8bcebd21f8fe33ca0c816e3320c4c.tar.gz cpython-0c2d8b8e51e8bcebd21f8fe33ca0c816e3320c4c.tar.bz2 |
Fixing a note on encoding declaration, its usage in urlopen based on review
comments from RDM and Ezio.
Diffstat (limited to 'Doc/library/urllib.request.rst')
-rw-r--r-- | Doc/library/urllib.request.rst | 41 |
1 files changed, 24 insertions, 17 deletions
diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst index 676d5ce..fdc037f 100644 --- a/Doc/library/urllib.request.rst +++ b/Doc/library/urllib.request.rst @@ -1072,30 +1072,37 @@ HTTPErrorProcessor Objects Examples -------- -This example gets the python.org main page and displays the first 100 bytes of +This example gets the python.org main page and displays the first 300 bytes of it. :: >>> import urllib.request >>> f = urllib.request.urlopen('http://www.python.org/') - >>> print(f.read(100)) - b'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> - <?xml-stylesheet href="./css/ht2html' - -Note that in Python 3, urlopen returns a bytes object by default. In many -circumstances, you might expect the output of urlopen to be a string. This -might be a carried over expectation from Python 2, where urlopen returned -string or it might even the common usecase. In those cases, you should -explicitly decode the bytes to string. - -In the examples below, we have chosen *utf-8* encoding for demonstration, you -might choose the encoding which is suitable for the webpage you are -requesting:: + >>> print(f.read(300)) + b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html + xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n + <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n + <title>Python Programming ' + +Note that urlopen returns a bytes object. This is because there is no way +for urlopen to automatically determine the encoding of the byte stream +it receives from the http server. In general, a program will decode +the returned bytes object to string once it determines or guesses +the appropriate encoding. + +The following W3C document, http://www.w3.org/International/O-charset , lists +the various ways in which a (X)HTML or a XML document could have specified its +encoding information. + +As python.org website uses *utf-8* encoding as specified in it's meta tag, we +will use same for decoding the bytes object. :: >>> import urllib.request >>> f = urllib.request.urlopen('http://www.python.org/') - >>> print(f.read(100).decode('utf-8') - <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> - <?xml-stylesheet href="./css/ht2html + >>> print(fp.read(100).decode('utf-8')) + <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtm + In the following example, we are sending a data-stream to the stdin of a CGI and reading the data it returns to us. Note that this example will only work |