summaryrefslogtreecommitdiffstats
path: root/Lib/urllib
Commit message (Collapse)AuthorAgeFilesLines
* GH-127236: `pathname2url()`: generate RFC 1738 URL for absolute POSIX path ↵Barney Gale2024-11-251-3/+5
| | | | | | | | | | | | | | | (#127194) When handed an absolute Windows path such as `C:\foo` or `//server/share`, the `urllib.request.pathname2url()` function returns a URL with an authority section, such as `///C:/foo` or `//server/share` (or before GH-126205, `////server/share`). Only the `file:` prefix is omitted. But when handed an absolute POSIX path such as `/etc/hosts`, or a Windows path of the same form (rooted but lacking a drive), the function returns a URL without an authority section, such as `/etc/hosts`. This patch corrects the discrepancy by adding a `//` prefix before drive-less, rooted paths when generating URLs.
* gh-127217: Fix pathname2url() for paths starting with multiple slashes on ↵Serhiy Storchaka2024-11-241-0/+4
| | | | Posix (GH-127218)
* gh-126662: harmonize naming for three namedtuple base classes in ↵Stephen Morton2024-11-241-3/+3
| | | | | urllib.parse (GH-126663) harmonize naming for three namedtuple base classes in urllib.parse
* GH-126766: `url2pathname()`: handle 'localhost' authority (#127129)Barney Gale2024-11-221-0/+3
| | | | | Discard any 'localhost' authority from the beginning of a `file:` URI. As a result, file URIs like `//localhost/etc/hosts` are correctly decoded as `/etc/hosts`.
* GH-85168: Use filesystem encoding when converting to/from `file` URIs (#126852)Barney Gale2024-11-191-2/+6
| | | | | | | | Adjust `urllib.request.url2pathname()` and `pathname2url()` to use the filesystem encoding when quoting and unquoting file URIs, rather than forcing use of UTF-8. No changes are needed in the `nturl2path` module because Windows always uses UTF-8, per PEP 529.
* GH-84850: Remove `urllib.request.URLopener` and `FancyURLopener` (#125739)Barney Gale2024-11-191-684/+7
|
* GH-126766: `url2pathname()`: handle empty authority section. (#126767)Barney Gale2024-11-141-0/+4
| | | | | Discard two leading slashes from the beginning of a `file:` URI if they introduce an empty authority section. As a result, file URIs like `///etc/hosts` are correctly parsed as `/etc/hosts`.
* gh-116897: Deprecate generic false values in urllib.parse.parse_qsl() ↵Serhiy Storchaka2024-11-121-8/+17
| | | | | | | | (GH-116903) Accepting objects with false values (like 0 and []) except empty strings and byte-like objects and None in urllib.parse functions parse_qsl() and parse_qs() is now deprecated.
* gh-125926: Fix urllib.parse.urljoin() for base URI with undefined authority ↵Serhiy Storchaka2024-11-071-2/+2
| | | | | | | (GH-125989) Although this goes beyond the application of RFC 3986, urljoin() should support relative base URIs for backward compatibility.
* gh-76960: Fix urljoin() and urldefrag() for URIs with empty components ↵Serhiy Storchaka2024-08-311-38/+62
| | | | | | | | | | | | (GH-123273) * urljoin() with relative reference "?" sets empty query and removes fragment. * Preserve empty components (authority, params, query, fragment) in urljoin(). * Preserve empty components (authority, params, query) in urldefrag(). Also refactor the code and get rid of double _coerce_args() and _coerce_result() calls in urljoin(), urldefrag(), urlparse() and urlunparse().
* gh-85110: Preserve relative path in URL without netloc in ↵Serhiy Storchaka2024-08-211-2/+6
| | | | urllib.parse.urlunsplit() (GH-123179)
* gh-122909: Pass ftp error strings to URLError constructor (#122913)Jeremy Hylton2024-08-201-1/+1
| | | | | | | | | | | * pass the original string error message from the ftplib error to URLError() * Update request.py Change error string for ftp error to be consistent with other errors reported for ftp * Add NEWS entry for change to urllib.request for ftp errors. * Track the change in the ftp error message in the test.
* gh-120417: Add #noqa to used imports in the stdlib (#120421)Victor Stinner2024-06-131-1/+1
| | | | | Tools such as ruff can ignore "imported but unused" warnings if a line ends with "# noqa: F401". It avoids the temptation to remove an import which is used effectively.
* gh-118827: Remove `Quoter` from `urllib.parse` (#118828)Nikita Sobolev2024-06-031-8/+0
| | | | Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
* gh-67693: Fix urlunparse() and urlunsplit() for URIs with path starting with ↵Serhiy Storchaka2024-05-141-1/+1
| | | | multiple slashes and no authority (GH-113563)
* gh-99730: urllib.request: Keep HEAD method on redirect (GH-99731)Harmen Stoppels2024-05-011-0/+1
|
* gh-116764: Fix regressions in urllib.parse.parse_qsl() (GH-116801)Serhiy Storchaka2024-03-161-1/+5
| | | | | | | | * Restore support of None and other false values. * Raise TypeError for non-zero integers and non-empty sequences. The regressions were introduced in gh-74668 (bdba8ef42b15e651dc23374a08143cc2b4c4657d).
* gh-74668: Fix support of bytes in urllib.parse.parse_qsl() (GH-115771)Serhiy Storchaka2024-03-051-24/+26
| | | | urllib.parse functions parse_qs() and parse_qsl() now support bytes arguments containing raw and percent-encoded non-ASCII data.
* gh-115197: Stop resolving host in urllib.request proxy bypass (GH-115210)Weii Wang2024-02-281-42/+35
| | | Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree.
* gh-91539: Small performance improvement of ↵Raphaël Marinier2024-01-151-1/+1
| | | | | urrlib.request.getproxies_environment() (#108771) Small performance improvement of getproxies_environment() when there are many environment variables. In a benchmark with 5k environment variables not related to proxies, and 5 specifying proxies, we get a 10% walltime improvement.
* GH-104554: Add RTSPS support to `urllib/parse.py` (#104605)zentarim2023-06-131-5/+5
| | | | | | | | | | | | | * GH-104554: Add RTSPS support to `urllib/parse.py` RTSPS is the permanent scheme defined in https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml alongside RTSP and RTSPU schemes. * 📜🤖 Added by blurb_it. --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
* gh-105382: Remove urllib.request cafile parameter (#105384)Victor Stinner2023-06-061-28/+2
| | | | Remove cafile, capath and cadefault parameters of the urllib.request.urlopen() function, deprecated in Python 3.6.
* gh-102153: Start stripping C0 control and space chars in `urlsplit` (#102508)Illia Volochii2023-05-171-0/+12
| | | | | | | | | `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit #25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/#url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). --------- Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
* gh-103848: Adds checks to ensure that bracketed hosts found by urlsplit are ↵JohnJamesUtley2023-05-101-1/+15
| | | | | | | | | of IPv6 or IPvFuture format (#103849) * Adds checks to ensure that bracketed hosts found by urlsplit are of IPv6 or IPvFuture format --------- Co-authored-by: Gregory P. Smith <greg@krypto.org>
* gh-104139: Add itms-services to uses_netloc urllib.parse. (#104312)Gregory P. Smith2023-05-091-1/+1
| | | | Teach unsplit to retain the `"//"` when assembling `itms-services://?action=generate-bugs` style [Apple Platform Deployment](https://support.apple.com/en-gb/guide/deployment/depce7cefc4d/web) URLs.
* gh-81403: Fix for CacheFTPHandler in urllib (#13951)Dan Hemberger2023-04-231-0/+6
| | | | | | | | | | | | | | bpo-37222: Fix for CacheFTPHandler in urllib A call to FTP.ntransfercmd must be followed by FTP.voidresp to clear the "end transfer" message. Without this, the client and server get out of sync, which will result in an error if the FTP instance is reused to open a second URL. This scenario occurs for even the most basic usage of CacheFTPHandler. Reverts the patch merged as a resolution to bpo-16270 and adds a test case for the CacheFTPHandler in test_urllib2net.py. Co-authored-by: Senthil Kumaran <senthil@python.org>
* gh-99352: Respect `http.client.HTTPConnection.debuglevel` in ↵Wheeler Law2023-04-211-3/+4
| | | | | | | | | | | | | | | | | | | | | `urllib.request.AbstractHTTPHandler` (#99353) * bugfix: let the HTTP- and HTTPSHandlers respect the value of http.client.HTTPConnection.debuglevel * add tests * add news * ReSTify NEWS and reword a bit. * Address Review Comments. * Use mock.patch.object instead of settting the module level value. * Used test values to assert the debuglevel. --------- Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Senthil Kumaran <senthil@python.org>
* gh-101936: Update the default value of fp from io.StringIO to io.BytesIO ↵Vo Hoang Long2023-02-211-1/+1
| | | | | (gh-102100) Co-authored-by: Long Vo <long.vo@linecorp.com>
* gh-88500: Reduce memory use of `urllib.unquote` (#96763)Gregory P. Smith2022-12-111-11/+19
| | | | | | | | | | | `urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram. This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations. Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"*1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected. Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile. Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)'` as a test.
* gh-98778: Update HTTPError to initialize properly even if fp is None (gh-99966)Dong-hee Na2022-12-081-7/+4
|
* bpo-45975: Simplify some while-loops with walrus operator (GH-29347)Nick Drozd2022-11-261-8/+2
|
* gh-99418: Make urllib.parse.urlparse enforce that a scheme must begin with ↵Ben Kallus2022-11-131-1/+1
| | | | | | | | | | | an alphabetical ASCII character. (#99421) Prevent urllib.parse.urlparse from accepting schemes that don't begin with an alphabetical ASCII character. RFC 3986 defines a scheme like this: `scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )` RFC 2234 defines an ALPHA like this: `ALPHA = %x41-5A / %x61-7A` The WHATWG URL spec defines a scheme like this: `"A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."`
* gh-96035: Make urllib.parse.urlparse reject non-numeric ports (#98273)Ben Kallus2022-10-201-9/+8
| | | Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
* bpo-43564: preserve original exception in args of FTP URLError (#24938)Carl Meyer2022-10-101-1/+1
| | | | | | | * bpo-43564: preserve original error in args of FTP URLError * Add NEWS blurb Co-authored-by: Carl Meyer <carljm@instagram.com>
* gh-91539: improve performance of get_proxies_environment (#91566)Pieter Eendebak2022-10-051-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * improve performance of get_proxies_environment when there are many environment variables * 📜🤖 Added by blurb_it. * fix case of short env name * fix formatting * fix whitespace * whitespace * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * whitespace * Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Carl Meyer <carl@oddbird.net>
* gh-95865: Further reduce quote_from_bytes memory consumption (#96860)Gregory P. Smith2022-09-191-1/+9
| | | on large input values. Based on Dennis Sweeney's chunking idea.
* gh-95865: Speed up urllib.parse.quote_from_bytes() (GH-95872)Dennis Sweeney2022-08-311-1/+1
|
* gh-94172: urllib.request avoids deprecated key_file/cert_file (#94232)Victor Stinner2022-06-261-3/+11
| | | | The urllib.request module no longer uses the deprecated key_file and cert_file parameter of the http.client module.
* gh-94172: urllib.request avoids deprecated check_hostname (#94193)Victor Stinner2022-06-241-2/+6
| | | | | | | | | | | The urllib.request no longer uses the deprecated check_hostname parameter of the http.client module. Add private http.client._create_https_context() helper to http.client, used by urllib.request. Remove the now redundant check on check_hostname and verify_mode in http.client: the SSLContext.check_hostname setter already implements the check.
* gh-84623: Remove unused imports in stdlib (#93773)Victor Stinner2022-06-132-2/+0
|
* bpo-42627: Fix incorrect parsing of Windows registry proxy settings (GH-26307)狂男风2022-05-111-16/+20
|
* Replace with_traceback() with exception chaining and reraising (GH-32074)Oleg Iarygin2022-03-302-9/+6
|
* bpo-46756: Fix authorization check in urllib.request (GH-31353)Serhiy Storchaka2022-02-251-4/+4
| | | | | | Fix a bug in urllib.request.HTTPPasswordMgr.find_user_password() and urllib.request.HTTPPasswordMgrWithPriorAuth.is_authenticated() which allowed to bypass authorization. For example, access to URI "example.org/foobar" was allowed if the user was authorized for URI "example.org/foo".
* bpo-45874: Handle empty query string correctly in urllib.parse.parse_qsl ↵Christian Sattler2021-12-121-2/+3
| | | | (#29716)
* bpo-40321: Add missing test, slightly expand documentation (GH-28760)Łukasz Langa2021-10-061-1/+1
|
* bpo-40321: Support HTTP response status code 308 in urllib.request (#19588)Jochem Schulenklopper2021-10-061-4/+11
| | | | | | | | | | | * Support HTTP response status code 308 in urllib. HTTP response status code 308 is defined in https://tools.ietf.org/html/rfc7538 to be the permanent redirect variant of 307 (temporary redirect). * Update documentation to include http_error_308() * Add blurb for bpo-40321 fix Co-authored-by: Roland Crosby <roland@rolandcrosby.com>
* Update URLs in comments and metadata to use HTTPS (GH-27458)Noah Kantrowitz2021-07-301-1/+1
|
* bpo-44002: Switch to lru_cache in urllib.parse. (GH-25798)Gregory P. Smith2021-05-121-29/+29
| | | | | | | | | | | | Switch to lru_cache in urllib.parse. urllib.parse now uses functool.lru_cache for its internal URL splitting and quoting caches instead of rolling its own like its the 90s. The undocumented internal Quoted class API is now deprecated as it had no reason to be public and no existing OSS users were found. The clear_cache() API remains undocumented but gets an explicit test as it is used in a few projects' (twisted, gevent) tests as well as our own regrtest.
* bpo-43882 Remove the newline, and tab early. From query and fragments. ↵Senthil Kumaran2021-05-051-3/+5
| | | | (GH-25921)
* bpo-43979: Remove unnecessary operation from urllib.parse.parse_qsl (GH-25756)Dong-hee Na2021-04-301-2/+1
| | | Automerge-Triggered-By: GH:gpshead