summaryrefslogtreecommitdiffstats
path: root/Doc/library/urllib.request.rst
diff options
context:
space:
mode:
authorSenthil Kumaran <senthil@uthcode.com>2012-03-16 01:15:34 (GMT)
committerSenthil Kumaran <senthil@uthcode.com>2012-03-16 01:15:34 (GMT)
commite53d977e8077759e8123da3da563e6b73392ed8b (patch)
treefaa4a6080a07916bd82bec55c7eee60458c3c964 /Doc/library/urllib.request.rst
parentdf2aecbf61a157b8a6a865cb2d946e65b887b774 (diff)
parent6b3434ae04f32a20e20498ec5a594d527e1c24e3 (diff)
downloadcpython-e53d977e8077759e8123da3da563e6b73392ed8b.zip
cpython-e53d977e8077759e8123da3da563e6b73392ed8b.tar.gz
cpython-e53d977e8077759e8123da3da563e6b73392ed8b.tar.bz2
Explain the use of charset parameter with Content-Type header: issue11082
Diffstat (limited to 'Doc/library/urllib.request.rst')
-rw-r--r--Doc/library/urllib.request.rst74
1 files changed, 51 insertions, 23 deletions
diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst
index 0bcd0bc..7aaadedc 100644
--- a/Doc/library/urllib.request.rst
+++ b/Doc/library/urllib.request.rst
@@ -2,9 +2,10 @@
=============================================================
.. module:: urllib.request
- :synopsis: Next generation URL opening library.
+ :synopsis: Extensible library for opening URLs.
.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
+.. sectionauthor:: Senthil Kumaran <senthil@uthcode.com>
The :mod:`urllib.request` module defines functions and classes which help in
@@ -20,16 +21,26 @@ The :mod:`urllib.request` module defines the following functions:
Open the URL *url*, which can be either a string or a
:class:`Request` object.
- *data* may be a bytes object specifying additional data to send to the
+ *data* must be a bytes object specifying additional data to be sent to the
server, or ``None`` if no such data is needed. *data* may also be an
iterable object and in that case Content-Length value must be specified in
the headers. Currently HTTP requests are the only ones that use *data*; the
HTTP request will be a POST instead of a GET when the *data* parameter is
- provided. *data* should be a buffer in the standard
+ provided.
+
+ *data* should be a buffer in the standard
:mimetype:`application/x-www-form-urlencoded` format. The
:func:`urllib.parse.urlencode` function takes a mapping or sequence of
- 2-tuples and returns a string in this format. urllib.request module uses
- HTTP/1.1 and includes ``Connection:close`` header in its HTTP requests.
+ 2-tuples and returns a string in this format. It should be encoded to bytes
+ before being used as the *data* parameter. The charset parameter in
+ ``Content-Type`` header may be used to specify the encoding. If charset
+ parameter is not sent with the Content-Type header, the server following the
+ HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1
+ encoding. It is advisable to use charset parameter with encoding used in
+ ``Content-Type`` header with the :class:`Request`.
+
+ urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header
+ in its HTTP requests.
The optional *timeout* parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified,
@@ -66,9 +77,10 @@ The :mod:`urllib.request` module defines the following functions:
are handled through the proxy when they are set.
The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
- discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``.
- Proxy handling, which was done by passing a dictionary parameter to
- ``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects.
+ discontinued; :func:`urllib.request.urlopen` corresponds to the old
+ ``urllib2.urlopen``. Proxy handling, which was done by passing a dictionary
+ parameter to ``urllib.urlopen``, can be obtained by using
+ :class:`ProxyHandler` objects.
.. versionchanged:: 3.2
*cafile* and *capath* were added.
@@ -83,10 +95,11 @@ The :mod:`urllib.request` module defines the following functions:
.. function:: install_opener(opener)
Install an :class:`OpenerDirector` instance as the default global opener.
- Installing an opener is only necessary if you want urlopen to use that opener;
- otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`.
- The code does not check for a real :class:`OpenerDirector`, and any class with
- the appropriate interface will work.
+ Installing an opener is only necessary if you want urlopen to use that
+ opener; otherwise, simply call :meth:`OpenerDirector.open` instead of
+ :func:`~urllib.request.urlopen`. The code does not check for a real
+ :class:`OpenerDirector`, and any class with the appropriate interface will
+ work.
.. function:: build_opener([handler, ...])
@@ -138,13 +151,21 @@ The following classes are provided:
*url* should be a string containing a valid URL.
- *data* may be a bytes object specifying additional data to send to the
+ *data* must be a bytes object specifying additional data to send to the
server, or ``None`` if no such data is needed. Currently HTTP requests are
the only ones that use *data*; the HTTP request will be a POST instead of a
GET when the *data* parameter is provided. *data* should be a buffer in the
- standard :mimetype:`application/x-www-form-urlencoded` format. The
- :func:`urllib.parse.urlencode` function takes a mapping or sequence of
- 2-tuples and returns a string in this format.
+ standard :mimetype:`application/x-www-form-urlencoded` format.
+
+ The :func:`urllib.parse.urlencode` function takes a mapping or sequence of
+ 2-tuples and returns a string in this format. It should be encoded to bytes
+ before being used as the *data* parameter. The charset parameter in
+ ``Content-Type`` header may be used to specify the encoding. If charset
+ parameter is not sent with the Content-Type header, the server following the
+ HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1
+ encoding. It is advisable to use charset parameter with encoding used in
+ ``Content-Type`` header with the :class:`Request`.
+
*headers* should be a dictionary, and will be treated as if
:meth:`add_header` was called with each key and value as arguments.
@@ -156,8 +177,11 @@ The following classes are provided:
:mod:`urllib`'s default user agent string is
``"Python-urllib/2.6"`` (on Python 2.6).
- The following two arguments, *origin_req_host* and *unverifiable*,
- are only of interest for correct handling of third-party HTTP cookies:
+ An example of using ``Content-Type`` header with *data* argument would be
+ sending a dictionary like ``{"Content-Type":" application/x-www-form-urlencoded;charset=utf-8"}``
+
+ The final two arguments are only of interest for correct handling
+ of third-party HTTP cookies:
*origin_req_host* should be the request-host of the origin
transaction, as defined by :rfc:`2965`. It defaults to
@@ -1107,8 +1131,9 @@ every :class:`Request`. To change this::
opener.open('http://www.example.com/')
Also, remember that a few standard headers (:mailheader:`Content-Length`,
-:mailheader:`Content-Type` and :mailheader:`Host`) are added when the
-:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`).
+:mailheader:`Content-Type` without charset parameter and :mailheader:`Host`)
+are added when the :class:`Request` is passed to :func:`urlopen` (or
+:meth:`OpenerDirector.open`).
.. _urllib-examples:
@@ -1126,9 +1151,12 @@ from urlencode is encoded to bytes before it is sent to urlopen as data::
>>> import urllib.request
>>> import urllib.parse
- >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
- >>> params = params.encode('utf-8')
- >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
+ >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
+ >>> data = data.encode('utf-8')
+ >>> request = urllib.request.Request("http://requestb.in/xrbl82xr")
+ >>> # adding charset parameter to the Content-Type header.
+ >>> request.add_header("Content-Type","application/x-www-form-urlencoded;charset=utf-8")
+ >>> f = urllib.request.urlopen(request, data)
>>> print(f.read().decode('utf-8'))
The following example uses an explicitly specified HTTP proxy, overriding