summaryrefslogtreecommitdiffstats
path: root/Lib/urllib/parse.py
Commit message (Collapse)AuthorAgeFilesLines
* gh-126662: harmonize naming for three namedtuple base classes in ↵Stephen Morton2024-11-241-3/+3
| | | | | urllib.parse (GH-126663) harmonize naming for three namedtuple base classes in urllib.parse
* gh-116897: Deprecate generic false values in urllib.parse.parse_qsl() ↵Serhiy Storchaka2024-11-121-8/+17
| | | | | | | | (GH-116903) Accepting objects with false values (like 0 and []) except empty strings and byte-like objects and None in urllib.parse functions parse_qsl() and parse_qs() is now deprecated.
* gh-125926: Fix urllib.parse.urljoin() for base URI with undefined authority ↵Serhiy Storchaka2024-11-071-2/+2
| | | | | | | (GH-125989) Although this goes beyond the application of RFC 3986, urljoin() should support relative base URIs for backward compatibility.
* gh-76960: Fix urljoin() and urldefrag() for URIs with empty components ↵Serhiy Storchaka2024-08-311-38/+62
| | | | | | | | | | | | (GH-123273) * urljoin() with relative reference "?" sets empty query and removes fragment. * Preserve empty components (authority, params, query, fragment) in urljoin(). * Preserve empty components (authority, params, query) in urldefrag(). Also refactor the code and get rid of double _coerce_args() and _coerce_result() calls in urljoin(), urldefrag(), urlparse() and urlunparse().
* gh-85110: Preserve relative path in URL without netloc in ↵Serhiy Storchaka2024-08-211-2/+6
| | | | urllib.parse.urlunsplit() (GH-123179)
* gh-118827: Remove `Quoter` from `urllib.parse` (#118828)Nikita Sobolev2024-06-031-8/+0
| | | | Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
* gh-67693: Fix urlunparse() and urlunsplit() for URIs with path starting with ↵Serhiy Storchaka2024-05-141-1/+1
| | | | multiple slashes and no authority (GH-113563)
* gh-116764: Fix regressions in urllib.parse.parse_qsl() (GH-116801)Serhiy Storchaka2024-03-161-1/+5
| | | | | | | | * Restore support of None and other false values. * Raise TypeError for non-zero integers and non-empty sequences. The regressions were introduced in gh-74668 (bdba8ef42b15e651dc23374a08143cc2b4c4657d).
* gh-74668: Fix support of bytes in urllib.parse.parse_qsl() (GH-115771)Serhiy Storchaka2024-03-051-24/+26
| | | | urllib.parse functions parse_qs() and parse_qsl() now support bytes arguments containing raw and percent-encoded non-ASCII data.
* GH-104554: Add RTSPS support to `urllib/parse.py` (#104605)zentarim2023-06-131-5/+5
| | | | | | | | | | | | | * GH-104554: Add RTSPS support to `urllib/parse.py` RTSPS is the permanent scheme defined in https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml alongside RTSP and RTSPU schemes. * 📜🤖 Added by blurb_it. --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
* gh-102153: Start stripping C0 control and space chars in `urlsplit` (#102508)Illia Volochii2023-05-171-0/+12
| | | | | | | | | `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit #25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/#url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). --------- Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
* gh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are ↵JohnJamesUtley2023-05-101-1/+15
| | | | | | | | | of IPv6 or IPvFuture format (#103849) * Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format --------- Co-authored-by: Gregory P. Smith <greg@krypto.org>
* gh-104139: Add itms-services to uses_netloc urllib.parse. (#104312)Gregory P. Smith2023-05-091-1/+1
| | | | Teach unsplit to retain the `"//"` when assembling `itms-services://?action=generate-bugs` style [Apple Platform Deployment](https://support.apple.com/en-gb/guide/deployment/depce7cefc4d/web) URLs.
* gh-88500: Reduce memory use of `urllib.unquote` (#96763)Gregory P. Smith2022-12-111-11/+19
| | | | | | | | | | | `urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram. This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations. Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"*1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected. Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile. Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)'` as a test.
* gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with ↵Ben Kallus2022-11-131-1/+1
| | | | | | | | | | | an alphabetical ASCII character. (#99421) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`
* gh-96035: Make urllib.parse.urlparse reject non-numeric ports (#98273)Ben Kallus2022-10-201-9/+8
| | | Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
* gh-95865: Further reduce quote_from_bytes memory consumption (#96860)Gregory P. Smith2022-09-191-1/+9
| | | on large input values. Based on Dennis Sweeney's chunking idea.
* gh-95865: Speed up urllib.parse.quote_from_bytes() (GH-95872)Dennis Sweeney2022-08-311-1/+1
|
* gh-84623: Remove unused imports in stdlib (#93773)Victor Stinner2022-06-131-1/+0
|
* Replace with_traceback() with exception chaining and reraising (GH-32074)Oleg Iarygin2022-03-301-3/+2
|
* bpo-45874: Handle empty query string correctly in urllib.parse.parse_qsl ↵Christian Sattler2021-12-121-2/+3
| | | | (#29716)
* bpo-44002: Switch to lru_cache in urllib.parse. (GH-25798)Gregory P. Smith2021-05-121-29/+29
| | | | | | | | | | | | Switch to lru_cache in urllib.parse. urllib.parse now uses functool.lru_cache for its internal URL splitting and quoting caches instead of rolling its own like its the 90s. The undocumented internal Quoted class API is now deprecated as it had no reason to be public and no existing OSS users were found. The clear_cache() API remains undocumented but gets an explicit test as it is used in a few projects' (twisted, gevent) tests as well as our own regrtest.
* bpo-43882 Remove the newline, and tab early. From query and fragments. ↵Senthil Kumaran2021-05-051-3/+5
| | | | (GH-25921)
* bpo-43979: Remove unnecessary operation from urllib.parse.parse_qsl (GH-25756)Dong-hee Na2021-04-301-2/+1
| | | Automerge-Triggered-By: GH:gpshead
* bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and ↵Senthil Kumaran2021-04-291-0/+6
| | | | | | | | tabs. (GH-25595) * issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* bpo-42967: coerce bytes separator to string in urllib.parse_qs(l) (#24818)Ken Jin2021-04-111-0/+1
| | | | | | | * coerce bytes separator to string * Add news * Update Misc/NEWS.d/next/Library/2021-03-11-00-31-41.bpo-42967.2PeQRw.rst
* bpo-42967: Fix urllib.parse docs and make logic clearer (GH-24536)Ken Jin2021-02-151-2/+1
|
* bpo-42967: only use '&' as a query string separator (#24297)Adam Goldschmidt2021-02-141-5/+15
| | | | | | | | | | | bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl(). urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator. Co-authored-by: Éric Araujo <merwok@netwok.org> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Co-authored-by: Éric Araujo <merwok@netwok.org>
* bpo-39481: PEP 585 for a variety of modules (GH-19423)Batuhan Taşkaya2020-04-101-0/+3
| | | | | | | | | | - concurrent.futures - ctypes - http.cookies - multiprocessing - queue - tempfile - unittest.case - urllib.parse
* bpo-37970: update and improve urlparse and urlsplit doc-strings (GH-16458)idomic2020-02-161-6/+35
|
* bpo-39057: Fix urllib.request.proxy_bypass_environment(). (GH-17619)Serhiy Storchaka2020-01-051-2/+2
| | | Ignore leading dots and no longer ignore a trailing newline.
* bpo-27657: Fix urlparse() with numeric paths (#661)Tim Graham2019-10-181-21/+1
| | | | | | | | | | * bpo-27657: Fix urlparse() with numeric paths Revert parsing decision from bpo-754016 in favor of the documented consensus in bpo-16932 of how to treat strings without a // to designate the netloc. * bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.
* bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768)Stein Karlsen2019-10-141-0/+2
|
* bpo-36742: Corrects fix to handle decomposition in usernames (#13812)Steve Dower2019-06-041-3/+3
|
* bpo-35397: Remove deprecation and document urllib.parse.unwrap (GH-11481)Rémi Lapeyre2019-05-271-7/+5
|
* bpo-36742: Fixes handling of pre-normalization characters in urlsplit() ↵Steve Dower2019-04-301-4/+7
| | | | (GH-13017)
* bpo-12910: update and correct quote docstring (#2568)Jörn Hees2019-04-101-13/+20
| | | | | | Fixes some mistakes and misleadings in the quote function docstring: - reserved chars are never actually used by quote code, unreserved chars are - reserved chars were wrong and incomplete - mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
* bpo-36216: Add check for characters in netloc that normalize to separators ↵Steve Dower2019-03-071-0/+17
| | | | (GH-12201)
* bpo-34866: Adding max_num_fields to cgi.FieldStorage (GH-9660)matthewbelisle-wf2018-10-191-3/+19
| | | | Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
* bpo-27485: Rename and deprecate undocumented functions in urllib.parse (GH-2205)Cheryl Sabella2018-04-251-4/+99
|
* bpo-33034: Improve exception message when cast fails for ↵Matt Eaton2018-03-201-1/+5
| | | | {Parse,Split}Result.port (GH-6078)
* bpo-32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value (#4867)Коренберг Марк2017-12-211-4/+6
|
* remove a redundant lower in urllib.parse.urlsplit (#3008)Oren Milman2017-09-031-2/+1
|
* urllib: Simplify splithost by calling into urlparse. (#1849)postmasters2017-06-201-1/+1
| | | | | | | | The current regex based splitting produces a wrong result. For example:: http://abc#@def Web browsers parse that URL as ``http://abc/#@def``, that is, the host is ``abc``, the path is ``/``, and the fragment is ``#@def``.
* bpo-29976: urllib.parse clarify '' in scheme values. (GH-984)Senthil Kumaran2017-05-181-11/+19
|
* correct parse_qs and parse_qsl test case descriptions. (#968)Senthil Kumaran2017-04-051-13/+17
| | | * correct parse_qs and parse_qsl test case descriptions.
* bpo-16285: Update urllib quoting to RFC 3986 (#173)Ratnadeep Debnath2017-02-251-3/+6
| | | | | | | | | | * bpo-16285: Update urllib quoting to RFC 3986 urllib.parse.quote is now based on RFC 3986, and hence includes `'~'` in the set of characters that is not escaped by default. Patch by Christian Theune and Ratnadeep Debnath.
* Issue #28992: Use bytes.fromhex().Serhiy Storchaka2016-12-211-1/+1
|
* Issue #25895: Merge from 3.5Berker Peksag2016-09-161-2/+3
|\
| * Issue #25895: Enable WebSocket URL schemes in urllib.parse.urljoinBerker Peksag2016-09-161-2/+3
| | | | | | | | Patch by Gergely Imreh and Markus Holtermann.