| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
urllib.parse (GH-126663)
harmonize naming for three namedtuple base classes in urllib.parse
|
|
|
|
|
|
|
|
| |
(GH-116903)
Accepting objects with false values (like 0 and []) except empty strings
and byte-like objects and None in urllib.parse functions parse_qsl() and
parse_qs() is now deprecated.
|
|
|
|
|
|
|
| |
(GH-125989)
Although this goes beyond the application of RFC 3986, urljoin()
should support relative base URIs for backward compatibility.
|
|
|
|
|
|
|
|
|
|
|
|
| |
(GH-123273)
* urljoin() with relative reference "?" sets empty query and removes fragment.
* Preserve empty components (authority, params, query, fragment) in urljoin().
* Preserve empty components (authority, params, query) in urldefrag().
Also refactor the code and get rid of double _coerce_args() and
_coerce_result() calls in urljoin(), urldefrag(), urlparse() and
urlunparse().
|
|
|
|
| |
urllib.parse.urlunsplit() (GH-123179)
|
|
|
|
| |
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
|
|
|
|
| |
multiple slashes and no authority (GH-113563)
|
|
|
|
|
|
|
|
| |
* Restore support of None and other false values.
* Raise TypeError for non-zero integers and non-empty sequences.
The regressions were introduced in gh-74668
(bdba8ef42b15e651dc23374a08143cc2b4c4657d).
|
|
|
|
| |
urllib.parse functions parse_qs() and parse_qsl() now support bytes
arguments containing raw and percent-encoded non-ASCII data.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* GH-104554: Add RTSPS support to `urllib/parse.py`
RTSPS is the permanent scheme defined in
https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
alongside RTSP and RTSPU schemes.
* 📜🤖 Added by blurb_it.
---------
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
| |
`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit #25595.
This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/#url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).
---------
Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
|
|
|
|
|
|
|
|
|
| |
of IPv6 or IPvFuture format (#103849)
* Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format
---------
Co-authored-by: Gregory P. Smith <greg@krypto.org>
|
|
|
|
| |
Teach unsplit to retain the `"//"` when assembling `itms-services://?action=generate-bugs` style
[Apple Platform Deployment](https://support.apple.com/en-gb/guide/deployment/depce7cefc4d/web) URLs.
|
|
|
|
|
|
|
|
|
|
|
| |
`urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram.
This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations.
Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"*1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected.
Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile.
Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)'` as a test.
|
|
|
|
|
|
|
|
|
|
|
| |
an alphabetical ASCII character. (#99421)
Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character.
RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )`
RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A`
The WHATWG URL spec defines a scheme like this:
`"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`
|
|
|
| |
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
|
|
|
| |
on large input values. Based on Dennis Sweeney's chunking idea.
|
| |
|
| |
|
| |
|
|
|
|
| |
(#29716)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch to lru_cache in urllib.parse.
urllib.parse now uses functool.lru_cache for its internal URL splitting and
quoting caches instead of rolling its own like its the 90s.
The undocumented internal Quoted class API is now deprecated
as it had no reason to be public and no existing OSS users were found.
The clear_cache() API remains undocumented but gets an explicit test as it
is used in a few projects' (twisted, gevent) tests as well as our own regrtest.
|
|
|
|
| |
(GH-25921)
|
|
|
| |
Automerge-Triggered-By: GH:gpshead
|
|
|
|
|
|
|
|
| |
tabs. (GH-25595)
* issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs.
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
|
|
|
|
|
|
| |
* coerce bytes separator to string
* Add news
* Update Misc/NEWS.d/next/Library/2021-03-11-00-31-41.bpo-42967.2PeQRw.rst
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().
urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.
Co-authored-by: Éric Araujo <merwok@netwok.org>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Éric Araujo <merwok@netwok.org>
|
|
|
|
|
|
|
|
|
|
| |
- concurrent.futures
- ctypes
- http.cookies
- multiprocessing
- queue
- tempfile
- unittest.case
- urllib.parse
|
| |
|
|
|
| |
Ignore leading dots and no longer ignore a trailing newline.
|
|
|
|
|
|
|
|
|
|
| |
* bpo-27657: Fix urlparse() with numeric paths
Revert parsing decision from bpo-754016 in favor of the documented
consensus in bpo-16932 of how to treat strings without a // to
designate the netloc.
* bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.
|
| |
|
| |
|
| |
|
|
|
|
| |
(GH-13017)
|
|
|
|
|
|
| |
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
|
|
|
|
| |
(GH-12201)
|
|
|
|
| |
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
|
| |
|
|
|
|
| |
{Parse,Split}Result.port (GH-6078)
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
The current regex based splitting produces a wrong result. For example::
http://abc#@def
Web browsers parse that URL as ``http://abc/#@def``, that is, the host
is ``abc``, the path is ``/``, and the fragment is ``#@def``.
|
| |
|
|
|
| |
* correct parse_qs and parse_qsl test case descriptions.
|
|
|
|
|
|
|
|
|
|
| |
* bpo-16285: Update urllib quoting to RFC 3986
urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.
Patch by Christian Theune and Ratnadeep Debnath.
|
| |
|
|\ |
|
| |
| |
| |
| | |
Patch by Gergely Imreh and Markus Holtermann.
|