diff options
author | Martin Panter <vadmium+py@gmail.com> | 2016-08-24 06:33:33 (GMT) |
---|---|---|
committer | Martin Panter <vadmium+py@gmail.com> | 2016-08-24 06:33:33 (GMT) |
commit | 3c0d0baf2badfad7deb346d1043f7d83bb92691f (patch) | |
tree | 968ca71729f519aaf6ebb38477efdc73e5ae3ae9 /Doc | |
parent | a790fe7ff86f193670b3d8287b22c72cbe675c7b (diff) | |
download | cpython-3c0d0baf2badfad7deb346d1043f7d83bb92691f.zip cpython-3c0d0baf2badfad7deb346d1043f7d83bb92691f.tar.gz cpython-3c0d0baf2badfad7deb346d1043f7d83bb92691f.tar.bz2 |
Issue #12319: Support for chunked encoding of HTTP request bodies
When the body object is a file, its size is no longer determined with
fstat(), since that can report the wrong result (e.g. reading from a pipe).
Instead, determine the size using seek(), or fall back to chunked encoding
for unseekable files.
Also, change the logic for detecting text files to check for TextIOBase
inheritance, rather than inspecting the “mode” attribute, which may not
exist (e.g. BytesIO and StringIO). The Content-Length for text files is no
longer determined ahead of time, because the original logic could have been
wrong depending on the codec and newline translation settings.
Patch by Demian Brecht and Rolf Krahl, with a few tweaks by me.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/library/http.client.rst | 98 | ||||
-rw-r--r-- | Doc/library/urllib.request.rst | 60 | ||||
-rw-r--r-- | Doc/whatsnew/3.6.rst | 19 |
3 files changed, 126 insertions, 51 deletions
diff --git a/Doc/library/http.client.rst b/Doc/library/http.client.rst index a9ca4b0..9429fb6 100644 --- a/Doc/library/http.client.rst +++ b/Doc/library/http.client.rst @@ -219,39 +219,62 @@ HTTPConnection Objects :class:`HTTPConnection` instances have the following methods: -.. method:: HTTPConnection.request(method, url, body=None, headers={}) +.. method:: HTTPConnection.request(method, url, body=None, headers={}, *, \ + encode_chunked=False) This will send a request to the server using the HTTP request method *method* and the selector *url*. If *body* is specified, the specified data is sent after the headers are - finished. It may be a string, a :term:`bytes-like object`, an open - :term:`file object`, or an iterable of :term:`bytes-like object`\s. If - *body* is a string, it is encoded as ISO-8859-1, the default for HTTP. If - it is a bytes-like object the bytes are sent as is. If it is a :term:`file - object`, the contents of the file is sent; this file object should support - at least the ``read()`` method. If the file object has a ``mode`` - attribute, the data returned by the ``read()`` method will be encoded as - ISO-8859-1 unless the ``mode`` attribute contains the substring ``b``, - otherwise the data returned by ``read()`` is sent as is. If *body* is an - iterable, the elements of the iterable are sent as is until the iterable is - exhausted. - - The *headers* argument should be a mapping of extra HTTP - headers to send with the request. - - If *headers* does not contain a Content-Length item, one is added - automatically if possible. If *body* is ``None``, the Content-Length header - is set to ``0`` for methods that expect a body (``PUT``, ``POST``, and - ``PATCH``). If *body* is a string or bytes object, the Content-Length - header is set to its length. If *body* is a :term:`file object` and it - works to call :func:`~os.fstat` on the result of its ``fileno()`` method, - then the Content-Length header is set to the ``st_size`` reported by the - ``fstat`` call. Otherwise no Content-Length header is added. + finished. It may be a :class:`str`, a :term:`bytes-like object`, an + open :term:`file object`, or an iterable of :class:`bytes`. If *body* + is a string, it is encoded as ISO-8859-1, the default for HTTP. If it + is a bytes-like object, the bytes are sent as is. If it is a :term:`file + object`, the contents of the file is sent; this file object should + support at least the ``read()`` method. If the file object is an + instance of :class:`io.TextIOBase`, the data returned by the ``read()`` + method will be encoded as ISO-8859-1, otherwise the data returned by + ``read()`` is sent as is. If *body* is an iterable, the elements of the + iterable are sent as is until the iterable is exhausted. + + The *headers* argument should be a mapping of extra HTTP headers to send + with the request. + + If *headers* contains neither Content-Length nor Transfer-Encoding, a + Content-Length header will be added automatically if possible. If + *body* is ``None``, the Content-Length header is set to ``0`` for + methods that expect a body (``PUT``, ``POST``, and ``PATCH``). If + *body* is a string or bytes-like object, the Content-Length header is + set to its length. If *body* is a binary :term:`file object` + supporting :meth:`~io.IOBase.seek`, this will be used to determine + its size. Otherwise, the Content-Length header is not added + automatically. In cases where determining the Content-Length up + front is not possible, the body will be chunk-encoded and the + Transfer-Encoding header will automatically be set. + + The *encode_chunked* argument is only relevant if Transfer-Encoding is + specified in *headers*. If *encode_chunked* is ``False``, the + HTTPConnection object assumes that all encoding is handled by the + calling code. If it is ``True``, the body will be chunk-encoded. + + .. note:: + Chunked transfer encoding has been added to the HTTP protocol + version 1.1. Unless the HTTP server is known to handle HTTP 1.1, + the caller must either specify the Content-Length or must use a + body representation whose length can be determined automatically. .. versionadded:: 3.2 *body* can now be an iterable. + .. versionchanged:: 3.6 + If neither Content-Length nor Transfer-Encoding are set in + *headers* and Content-Length cannot be determined, *body* will now + be automatically chunk-encoded. The *encode_chunked* argument + was added. + The Content-Length for binary file objects is determined with seek. + No attempt is made to determine the Content-Length for text file + objects. + .. method:: HTTPConnection.getresponse() Should be called after a request is sent to get the response from the server. @@ -336,13 +359,32 @@ also send your request step by step, by using the four functions below. an argument. -.. method:: HTTPConnection.endheaders(message_body=None) +.. method:: HTTPConnection.endheaders(message_body=None, *, encode_chunked=False) Send a blank line to the server, signalling the end of the headers. The optional *message_body* argument can be used to pass a message body - associated with the request. The message body will be sent in the same - packet as the message headers if it is string, otherwise it is sent in a - separate packet. + associated with the request. + + If *encode_chunked* is ``True``, the result of each iteration of + *message_body* will be chunk-encoded as specified in :rfc:`7230`, + Section 3.3.1. How the data is encoded is dependent on the type of + *message_body*. If *message_body* implements the :ref:`buffer interface + <bufferobjects>` the encoding will result in a single chunk. + If *message_body* is a :class:`collections.Iterable`, each iteration + of *message_body* will result in a chunk. If *message_body* is a + :term:`file object`, each call to ``.read()`` will result in a chunk. + The method automatically signals the end of the chunk-encoded data + immediately after *message_body*. + + .. note:: Due to the chunked encoding specification, empty chunks + yielded by an iterator body will be ignored by the chunk-encoder. + This is to avoid premature termination of the read of the request by + the target server due to malformed encoding. + + .. versionadded:: 3.6 + Chunked encoding support. The *encode_chunked* parameter was + added. + .. method:: HTTPConnection.send(data) diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst index 1291aeb..e619cc1 100644 --- a/Doc/library/urllib.request.rst +++ b/Doc/library/urllib.request.rst @@ -30,18 +30,9 @@ The :mod:`urllib.request` module defines the following functions: Open the URL *url*, which can be either a string or a :class:`Request` object. - *data* must be a bytes object specifying additional data to be sent to the - server, or ``None`` if no such data is needed. *data* may also be an - iterable object and in that case Content-Length value must be specified in - the headers. Currently HTTP requests are the only ones that use *data*; the - HTTP request will be a POST instead of a GET when the *data* parameter is - provided. - - *data* should be a buffer in the standard - :mimetype:`application/x-www-form-urlencoded` format. The - :func:`urllib.parse.urlencode` function takes a mapping or sequence of - 2-tuples and returns an ASCII text string in this format. It should - be encoded to bytes before being used as the *data* parameter. + *data* must be an object specifying additional data to be sent to the + server, or ``None`` if no such data is needed. See :class:`Request` + for details. urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header in its HTTP requests. @@ -192,14 +183,22 @@ The following classes are provided: *url* should be a string containing a valid URL. - *data* must be a bytes object specifying additional data to send to the - server, or ``None`` if no such data is needed. Currently HTTP requests are - the only ones that use *data*; the HTTP request will be a POST instead of a - GET when the *data* parameter is provided. *data* should be a buffer in the - standard :mimetype:`application/x-www-form-urlencoded` format. - The :func:`urllib.parse.urlencode` function takes a mapping or sequence of - 2-tuples and returns an ASCII string in this format. It should be - encoded to bytes before being used as the *data* parameter. + *data* must be an object specifying additional data to send to the + server, or ``None`` if no such data is needed. Currently HTTP + requests are the only ones that use *data*. The supported object + types include bytes, file-like objects, and iterables. If no + ``Content-Length`` header has been provided, :class:`HTTPHandler` will + try to determine the length of *data* and set this header accordingly. + If this fails, ``Transfer-Encoding: chunked`` as specified in + :rfc:`7230`, Section 3.3.1 will be used to send the data. See + :meth:`http.client.HTTPConnection.request` for details on the + supported object types and on how the content length is determined. + + For an HTTP POST request method, *data* should be a buffer in the + standard :mimetype:`application/x-www-form-urlencoded` format. The + :func:`urllib.parse.urlencode` function takes a mapping or sequence + of 2-tuples and returns an ASCII string in this format. It should + be encoded to bytes before being used as the *data* parameter. *headers* should be a dictionary, and will be treated as if :meth:`add_header` was called with each key and value as arguments. @@ -211,8 +210,10 @@ The following classes are provided: :mod:`urllib`'s default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6). - An example of using ``Content-Type`` header with *data* argument would be - sending a dictionary like ``{"Content-Type": "application/x-www-form-urlencoded"}``. + An appropriate ``Content-Type`` header should be included if the *data* + argument is present. If this header has not been provided and *data* + is not None, ``Content-Type: application/x-www-form-urlencoded`` will + be added as a default. The final two arguments are only of interest for correct handling of third-party HTTP cookies: @@ -235,15 +236,28 @@ The following classes are provided: *method* should be a string that indicates the HTTP request method that will be used (e.g. ``'HEAD'``). If provided, its value is stored in the :attr:`~Request.method` attribute and is used by :meth:`get_method()`. - Subclasses may indicate a default method by setting the + The default is ``'GET'`` if *data* is ``None`` or ``'POST'`` otherwise. + Subclasses may indicate a different default method by setting the :attr:`~Request.method` attribute in the class itself. + .. note:: + The request will not work as expected if the data object is unable + to deliver its content more than once (e.g. a file or an iterable + that can produce the content only once) and the request is retried + for HTTP redirects or authentication. The *data* is sent to the + HTTP server right away after the headers. There is no support for + a 100-continue expectation in the library. + .. versionchanged:: 3.3 :attr:`Request.method` argument is added to the Request class. .. versionchanged:: 3.4 Default :attr:`Request.method` may be indicated at the class level. + .. versionchanged:: 3.6 + Do not raise an error if the ``Content-Length`` has not been + provided and could not be determined. Fall back to use chunked + transfer encoding instead. .. class:: OpenerDirector() diff --git a/Doc/whatsnew/3.6.rst b/Doc/whatsnew/3.6.rst index 8b85b22..6d5bbc0 100644 --- a/Doc/whatsnew/3.6.rst +++ b/Doc/whatsnew/3.6.rst @@ -324,6 +324,15 @@ exceptions: see :func:`faulthandler.enable`. (Contributed by Victor Stinner in :issue:`23848`.) +http.client +----------- + +:meth:`HTTPConnection.request() <http.client.HTTPConnection.request>` and +:meth:`~http.client.HTTPConnection.endheaders` both now support +chunked encoding request bodies. +(Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.) + + idlelib and IDLE ---------------- @@ -500,6 +509,16 @@ The :class:`~unittest.mock.Mock` class has the following improvements: (Contributed by Amit Saha in :issue:`26323`.) +urllib.request +-------------- + +If a HTTP request has a non-empty body but no Content-Length header +and the content length cannot be determined up front, rather than +throwing an error, :class:`~urllib.request.AbstractHTTPHandler` now +falls back to use chunked transfer encoding. +(Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.) + + urllib.robotparser ------------------ |