diff options
author | Senthil Kumaran <senthil@uthcode.com> | 2012-03-16 01:15:34 (GMT) |
---|---|---|
committer | Senthil Kumaran <senthil@uthcode.com> | 2012-03-16 01:15:34 (GMT) |
commit | e53d977e8077759e8123da3da563e6b73392ed8b (patch) | |
tree | faa4a6080a07916bd82bec55c7eee60458c3c964 /Doc | |
parent | df2aecbf61a157b8a6a865cb2d946e65b887b774 (diff) | |
parent | 6b3434ae04f32a20e20498ec5a594d527e1c24e3 (diff) | |
download | cpython-e53d977e8077759e8123da3da563e6b73392ed8b.zip cpython-e53d977e8077759e8123da3da563e6b73392ed8b.tar.gz cpython-e53d977e8077759e8123da3da563e6b73392ed8b.tar.bz2 |
Explain the use of charset parameter with Content-Type header: issue11082
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/library/urllib.parse.rst | 7 | ||||
-rw-r--r-- | Doc/library/urllib.request.rst | 74 |
2 files changed, 55 insertions, 26 deletions
diff --git a/Doc/library/urllib.parse.rst b/Doc/library/urllib.parse.rst index aece714..b33e8fe 100644 --- a/Doc/library/urllib.parse.rst +++ b/Doc/library/urllib.parse.rst @@ -512,9 +512,10 @@ task isn't already covered by the URL parsing functions above. Convert a mapping object or a sequence of two-element tuples, which may either be a :class:`str` or a :class:`bytes`, to a "percent-encoded" - string. The resultant string must be converted to bytes using the - user-specified encoding before it is sent to :func:`urlopen` as the optional - *data* argument. + string. If the resultant string is to be used as a *data* for POST + operation with :func:`urlopen` function, then it should be properly encoded + to bytes, otherwise it would result in a :exc:`TypeError`. + The resulting string is a series of ``key=value`` pairs separated by ``'&'`` characters, where both *key* and *value* are quoted using :func:`quote_plus` above. When a sequence of two-element tuples is used as the *query* diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst index 0bcd0bc..7aaadedc 100644 --- a/Doc/library/urllib.request.rst +++ b/Doc/library/urllib.request.rst @@ -2,9 +2,10 @@ ============================================================= .. module:: urllib.request - :synopsis: Next generation URL opening library. + :synopsis: Extensible library for opening URLs. .. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu> .. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net> +.. sectionauthor:: Senthil Kumaran <senthil@uthcode.com> The :mod:`urllib.request` module defines functions and classes which help in @@ -20,16 +21,26 @@ The :mod:`urllib.request` module defines the following functions: Open the URL *url*, which can be either a string or a :class:`Request` object. - *data* may be a bytes object specifying additional data to send to the + *data* must be a bytes object specifying additional data to be sent to the server, or ``None`` if no such data is needed. *data* may also be an iterable object and in that case Content-Length value must be specified in the headers. Currently HTTP requests are the only ones that use *data*; the HTTP request will be a POST instead of a GET when the *data* parameter is - provided. *data* should be a buffer in the standard + provided. + + *data* should be a buffer in the standard :mimetype:`application/x-www-form-urlencoded` format. The :func:`urllib.parse.urlencode` function takes a mapping or sequence of - 2-tuples and returns a string in this format. urllib.request module uses - HTTP/1.1 and includes ``Connection:close`` header in its HTTP requests. + 2-tuples and returns a string in this format. It should be encoded to bytes + before being used as the *data* parameter. The charset parameter in + ``Content-Type`` header may be used to specify the encoding. If charset + parameter is not sent with the Content-Type header, the server following the + HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1 + encoding. It is advisable to use charset parameter with encoding used in + ``Content-Type`` header with the :class:`Request`. + + urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header + in its HTTP requests. The optional *timeout* parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, @@ -66,9 +77,10 @@ The :mod:`urllib.request` module defines the following functions: are handled through the proxy when they are set. The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been - discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``. - Proxy handling, which was done by passing a dictionary parameter to - ``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects. + discontinued; :func:`urllib.request.urlopen` corresponds to the old + ``urllib2.urlopen``. Proxy handling, which was done by passing a dictionary + parameter to ``urllib.urlopen``, can be obtained by using + :class:`ProxyHandler` objects. .. versionchanged:: 3.2 *cafile* and *capath* were added. @@ -83,10 +95,11 @@ The :mod:`urllib.request` module defines the following functions: .. function:: install_opener(opener) Install an :class:`OpenerDirector` instance as the default global opener. - Installing an opener is only necessary if you want urlopen to use that opener; - otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`. - The code does not check for a real :class:`OpenerDirector`, and any class with - the appropriate interface will work. + Installing an opener is only necessary if you want urlopen to use that + opener; otherwise, simply call :meth:`OpenerDirector.open` instead of + :func:`~urllib.request.urlopen`. The code does not check for a real + :class:`OpenerDirector`, and any class with the appropriate interface will + work. .. function:: build_opener([handler, ...]) @@ -138,13 +151,21 @@ The following classes are provided: *url* should be a string containing a valid URL. - *data* may be a bytes object specifying additional data to send to the + *data* must be a bytes object specifying additional data to send to the server, or ``None`` if no such data is needed. Currently HTTP requests are the only ones that use *data*; the HTTP request will be a POST instead of a GET when the *data* parameter is provided. *data* should be a buffer in the - standard :mimetype:`application/x-www-form-urlencoded` format. The - :func:`urllib.parse.urlencode` function takes a mapping or sequence of - 2-tuples and returns a string in this format. + standard :mimetype:`application/x-www-form-urlencoded` format. + + The :func:`urllib.parse.urlencode` function takes a mapping or sequence of + 2-tuples and returns a string in this format. It should be encoded to bytes + before being used as the *data* parameter. The charset parameter in + ``Content-Type`` header may be used to specify the encoding. If charset + parameter is not sent with the Content-Type header, the server following the + HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1 + encoding. It is advisable to use charset parameter with encoding used in + ``Content-Type`` header with the :class:`Request`. + *headers* should be a dictionary, and will be treated as if :meth:`add_header` was called with each key and value as arguments. @@ -156,8 +177,11 @@ The following classes are provided: :mod:`urllib`'s default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6). - The following two arguments, *origin_req_host* and *unverifiable*, - are only of interest for correct handling of third-party HTTP cookies: + An example of using ``Content-Type`` header with *data* argument would be + sending a dictionary like ``{"Content-Type":" application/x-www-form-urlencoded;charset=utf-8"}`` + + The final two arguments are only of interest for correct handling + of third-party HTTP cookies: *origin_req_host* should be the request-host of the origin transaction, as defined by :rfc:`2965`. It defaults to @@ -1107,8 +1131,9 @@ every :class:`Request`. To change this:: opener.open('http://www.example.com/') Also, remember that a few standard headers (:mailheader:`Content-Length`, -:mailheader:`Content-Type` and :mailheader:`Host`) are added when the -:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`). +:mailheader:`Content-Type` without charset parameter and :mailheader:`Host`) +are added when the :class:`Request` is passed to :func:`urlopen` (or +:meth:`OpenerDirector.open`). .. _urllib-examples: @@ -1126,9 +1151,12 @@ from urlencode is encoded to bytes before it is sent to urlopen as data:: >>> import urllib.request >>> import urllib.parse - >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) - >>> params = params.encode('utf-8') - >>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params) + >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) + >>> data = data.encode('utf-8') + >>> request = urllib.request.Request("http://requestb.in/xrbl82xr") + >>> # adding charset parameter to the Content-Type header. + >>> request.add_header("Content-Type","application/x-www-form-urlencoded;charset=utf-8") + >>> f = urllib.request.urlopen(request, data) >>> print(f.read().decode('utf-8')) The following example uses an explicitly specified HTTP proxy, overriding |