- Issue #3300: make urllib.parse.[un]quote() default to UTF-8.

Code contributed by Matt Giuca. quote() now encodes the input before quoting, unquote() decodes after unquoting. There are new arguments to change the encoding and errors settings. There are also new APIs to skip the encode/decode steps. [un]quote_plus() are also affected.
author: Guido van Rossum <guido@python.org> 2008-08-18 21:44:30 (GMT)
committer: Guido van Rossum <guido@python.org> 2008-08-18 21:44:30 (GMT)
commit: 52dbbb906804f36067ecbc8c89a00cdab545bdb2 (patch)
tree: 1b923b821dc0547f6fa3e30401c7dac177a8f557 /Doc
parent: 4171da5c9d899dc64cb15f177f05b9de05563148 (diff)
download: cpython-52dbbb906804f36067ecbc8c89a00cdab545bdb2.zip
cpython-52dbbb906804f36067ecbc8c89a00cdab545bdb2.tar.gz
cpython-52dbbb906804f36067ecbc8c89a00cdab545bdb2.tar.bz2
1 files changed, 56 insertions, 8 deletions
diff --git a/Doc/library/urllib.parse.rst b/Doc/library/urllib.parse.rst
index a5463e6..0848857 100644
--- a/Doc/library/urllib.parse.rst
+++ b/Doc/library/urllib.parse.rst
@@ -182,36 +182,84 @@ The :mod:`urllib.parse` module defines the following functions:
    string.  If there is no fragment identifier in *url*, return *url* unmodified
    and an empty string.
 
-.. function:: quote(string[, safe])
+.. function:: quote(string[, safe[, encoding[, errors]]])
 
    Replace special characters in *string* using the ``%xx`` escape. Letters,
    digits, and the characters ``'_.-'`` are never quoted. The optional *safe*
-   parameter specifies additional characters that should not be quoted --- its
-   default value is ``'/'``.
+   parameter specifies additional ASCII characters that should not be quoted
+   --- its default value is ``'/'``.
 
-   Example: ``quote('/~connolly/')`` yields ``'/%7econnolly/'``.
+   *string* may be either a :class:`str` or a :class:`bytes`.
 
+   The optional *encoding* and *errors* parameters specify how to deal with
+   non-ASCII characters, as accepted by the :meth:`str.encode` method.
+   *encoding* defaults to ``'utf-8'``.
+   *errors* defaults to ``'strict'``, meaning unsupported characters raise a
+   :class:`UnicodeEncodeError`.
+   *encoding* and *errors* must not be supplied if *string* is a
+   :class:`bytes`, or a :class:`TypeError` is raised.
 
-.. function:: quote_plus(string[, safe])
+   Note that ``quote(string, safe, encoding, errors)`` is equivalent to
+   ``quote_from_bytes(string.encode(encoding, errors), safe)``.
+
+   Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
+
+
+.. function:: quote_plus(string[, safe[, encoding[, errors]]])
 
    Like :func:`quote`, but also replace spaces by plus signs, as required for
    quoting HTML form values.  Plus signs in the original string are escaped
    unless they are included in *safe*.  It also does not have *safe* default to
    ``'/'``.
 
+   Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
+
+.. function:: quote_from_bytes(bytes[, safe])
 
-.. function:: unquote(string)
+   Like :func:`quote`, but accepts a :class:`bytes` object rather than a
+   :class:`str`, and does not perform string-to-bytes encoding.
+
+   Example: ``quote_from_bytes(b'a&\xef')`` yields
+   ``'a%26%EF'``.
+
+.. function:: unquote(string[, encoding[, errors]])
 
    Replace ``%xx`` escapes by their single-character equivalent.
+   The optional *encoding* and *errors* parameters specify how to decode
+   percent-encoded sequences into Unicode characters, as accepted by the
+   :meth:`bytes.decode` method.
+
+   *string* must be a :class:`str`.
+
+   *encoding* defaults to ``'utf-8'``.
+   *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
+   by a placeholder character.
 
-   Example: ``unquote('/%7Econnolly/')`` yields ``'/~connolly/'``.
+   Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
 
 
-.. function:: unquote_plus(string)
+.. function:: unquote_plus(string[, encoding[, errors]])
 
    Like :func:`unquote`, but also replace plus signs by spaces, as required for
    unquoting HTML form values.
 
+   *string* must be a :class:`str`.
+
+   Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
+
+.. function:: unquote_to_bytes(string)
+
+   Replace ``%xx`` escapes by their single-octet equivalent, and return a
+   :class:`bytes` object.
+
+   *string* may be either a :class:`str` or a :class:`bytes`.
+
+   If it is a :class:`str`, unescaped non-ASCII characters in *string*
+   are encoded into UTF-8 bytes.
+
+   Example: ``unquote_to_bytes('a%26%EF')`` yields
+   ``b'a&\xef'``.
+
 
 .. function:: urlencode(query[, doseq])
author	Guido van Rossum <guido@python.org>	2008-08-18 21:44:30 (GMT)
committer	Guido van Rossum <guido@python.org>	2008-08-18 21:44:30 (GMT)
commit	52dbbb906804f36067ecbc8c89a00cdab545bdb2 (patch)
tree	1b923b821dc0547f6fa3e30401c7dac177a8f557 /Doc
parent	4171da5c9d899dc64cb15f177f05b9de05563148 (diff)
download	cpython-52dbbb906804f36067ecbc8c89a00cdab545bdb2.zip cpython-52dbbb906804f36067ecbc8c89a00cdab545bdb2.tar.gz cpython-52dbbb906804f36067ecbc8c89a00cdab545bdb2.tar.bz2