summaryrefslogtreecommitdiffstats
path: root/Lib/urllib
Commit message (Collapse)AuthorAgeFilesLines
* [3.10] [3.11] gh-102153: Start stripping C0 control and space chars in ↵Miss Islington (bot)2023-05-171-0/+12
| | | | | | | | | | | | | | | | | | | | | | `urlsplit` (GH-102508) (GH-104575) (#104592) gh-102153: Start stripping C0 control and space chars in `urlsplit` (GH-102508) `urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit GH-25595. This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329). I simplified the docs by eliding the state of the world explanatory paragraph in this security release only backport. (people will see that in the mainline /3/ docs) --------- (cherry picked from commit 2f630e1ce18ad2e07428296532a68b11dc66ad10) (cherry picked from commit 610cc0ab1b760b2abaac92bd256b96191c46b941) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Illia Volochii <illia.volochii@gmail.com> Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>
* [3.10] gh-101936: Update the default value of fp from io.StringIO to ↵Miss Islington (bot)2023-02-221-1/+1
| | | | | | | | io.BytesIO (gh-102100) (#102118) gh-101936: Update the default value of fp from io.StringIO to io.BytesIO (gh-102100) (cherry picked from commit 0d4c7fcd4f078708a5ac6499af378ce5ee8eb211) Co-authored-by: Long Vo <long.vo@linecorp.com>
* gh-98778: Update HTTPError to initialize properly even if fp is None (gh-99966)Miss Islington (bot)2022-12-081-7/+4
| | | | | (cherry picked from commit dc8a86893df37e137cfe992e95e7d66cd68e9eaf) Co-authored-by: Dong-hee Na <donghee.na@python.org>
* gh-96035: Make urllib.parse.urlparse reject non-numeric ports (GH-98273)Miss Islington (bot)2022-10-201-9/+8
| | | | | | Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> (cherry picked from commit 6f15ca8c7afa23e1adc87f2b66b958b721f9acab) Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
* [3.10] bpo-43564: preserve original exception in args of FTP URLError ↵Senthil Kumaran2022-10-101-2/+1
| | | | | | | | | | | | | | (GH-24938) (#98138) * bpo-43564: preserve original error in args of FTP URLError * Add NEWS blurb Co-authored-by: Carl Meyer <carljm@instagram.com>. (cherry picked from commit ad817cd5c44416da3752ebf9baf16d650703275c) Co-authored-by: Carl Meyer <carl@oddbird.net> Co-authored-by: Carl Meyer <carl@oddbird.net>
* gh-91539: improve performance of get_proxies_environment (GH-91566)Miss Islington (bot)2022-10-051-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * improve performance of get_proxies_environment when there are many environment variables * 📜🤖 Added by blurb_it. * fix case of short env name * fix formatting * fix whitespace * whitespace * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> * whitespace * Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst Co-authored-by: Carl Meyer <carl@oddbird.net> * Update Lib/urllib/request.py Co-authored-by: Carl Meyer <carl@oddbird.net> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Carl Meyer <carl@oddbird.net> (cherry picked from commit aeb28f51304ebe2ad9fd6a61b6e4e5a03d288aa1) Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
* bpo-42627: Fix incorrect parsing of Windows registry proxy settings (GH-26307)Miss Islington (bot)2022-05-121-16/+20
| | | | | (cherry picked from commit b69297ea23c0ab9866ae8bd26a347a9b5df567a6) Co-authored-by: 狂男风 <CrazyBoyFeng@Live.com>
* bpo-46756: Fix authorization check in urllib.request (GH-31353)Miss Islington (bot)2022-02-251-4/+4
| | | | | | | | | Fix a bug in urllib.request.HTTPPasswordMgr.find_user_password() and urllib.request.HTTPPasswordMgrWithPriorAuth.is_authenticated() which allowed to bypass authorization. For example, access to URI "example.org/foobar" was allowed if the user was authorized for URI "example.org/foo". (cherry picked from commit e2e72567a1c94c548868f6ee5329363e6036057a) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* Update URLs in comments and metadata to use HTTPS (GH-27458) (GH-27478)Miss Islington (bot)2021-07-301-1/+1
| | | | | (cherry picked from commit be42c06bb01206209430f3ac08b72643dc7cad1c) Co-authored-by: Noah Kantrowitz <noah@coderanger.net>
* bpo-43882 Remove the newline, and tab early. From query and fragments. ↵Miss Islington (bot)2021-05-051-3/+5
| | | | | (GH-25936) (cherry picked from commit 985ac016373403e8ad41f8d563c4355ffa8d49ff)
* bpo-43979: Remove unnecessary operation from urllib.parse.parse_qsl (GH-25756)Dong-hee Na2021-04-301-2/+1
| | | Automerge-Triggered-By: GH:gpshead
* bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and ↵Senthil Kumaran2021-04-291-0/+6
| | | | | | | | tabs. (GH-25595) * issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* bpo-42967: coerce bytes separator to string in urllib.parse_qs(l) (#24818)Ken Jin2021-04-111-0/+1
| | | | | | | * coerce bytes separator to string * Add news * Update Misc/NEWS.d/next/Library/2021-03-11-00-31-41.bpo-42967.2PeQRw.rst
* bpo-43075: Fix ReDoS in urllib AbstractBasicAuthHandler (GH-24391)Yeting Li2021-04-071-1/+1
| | | | | | | Fix Regular Expression Denial of Service (ReDoS) vulnerability in urllib.request.AbstractBasicAuthHandler. The ReDoS-vulnerable regex has quadratic worst-case complexity and it allows cause a denial of service when identifying crafted invalid RFCs. This ReDoS issue is on the client side and needs remote attackers to control the HTTP server.
* bpo-42967: Fix urllib.parse docs and make logic clearer (GH-24536)Ken Jin2021-02-151-2/+1
|
* bpo-42967: only use '&' as a query string separator (#24297)Adam Goldschmidt2021-02-141-5/+15
| | | | | | | | | | | bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl(). urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator. Co-authored-by: Éric Araujo <merwok@netwok.org> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Co-authored-by: Éric Araujo <merwok@netwok.org>
* Allow / character in username,password fields in _PROXY envvars. (#23973)Senthil Kumaran2020-12-291-1/+5
|
* bpo-40968: Send http/1.1 ALPN extension (#20959)Christian Heimes2020-11-131-0/+2
| | | Signed-off-by: Christian Heimes <christian@python.org>
* bpo-41471: Ignore invalid prefix lengths in system proxy settings on macOS ↵Ronald Oussoren2020-10-191-0/+5
| | | | (GH-22762)
* bpo-39481: PEP 585 for a variety of modules (GH-19423)Batuhan Taşkaya2020-04-101-0/+3
| | | | | | | | | | - concurrent.futures - ctypes - http.cookies - multiprocessing - queue - tempfile - unittest.case - urllib.parse
* bpo-39503: CVE-2020-8492: Fix AbstractBasicAuthHandler (GH-18284)Victor Stinner2020-04-021-19/+50
| | | | | | | | | | | | | The AbstractBasicAuthHandler class of the urllib.request module uses an inefficient regular expression which can be exploited by an attacker to cause a denial of service. Fix the regex to prevent the catastrophic backtracking. Vulnerability reported by Ben Caller and Matt Schwager. AbstractBasicAuthHandler of urllib.request now parses all WWW-Authenticate HTTP headers and accepts multiple challenges per header: use the realm of the first Basic challenge. Co-Authored-By: Serhiy Storchaka <storchaka@gmail.com>
* bpo-39548: Fix handling of 'WWW-Authenticate' header for Digest Auth (GH-18338)Stephen Balousek2020-02-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | * bpo-39548: Fix handling of 'WWW-Authenticate' header for Digest authentication - The 'qop' value in the 'WWW-Authenticate' header is optional. The presence of 'qop' in the header should be checked before its value is parsed with 'split'. Signed-off-by: Stephen Balousek <stephen@balousek.net> * bpo-39548: Fix handling of 'WWW-Authenticate' header for Digest authentication - Add NEWS item Signed-off-by: Stephen Balousek <stephen@balousek.net> * Update Misc/NEWS.d/next/Library/2020-02-06-05-33-52.bpo-39548.DF4FFe.rst Co-Authored-By: Brandt Bucher <brandtbucher@gmail.com> Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
* bpo-37970: update and improve urlparse and urlsplit doc-strings (GH-16458)idomic2020-02-161-6/+35
|
* bpo-39057: Fix urllib.request.proxy_bypass_environment(). (GH-17619)Serhiy Storchaka2020-01-052-13/+15
| | | Ignore leading dots and no longer ignore a trailing newline.
* bpo-38686: fix HTTP Digest handling in request.py (#17045)PypeBros2019-11-221-2/+4
| | | | | | | | | | | | | | | | | | | | | * fix HTTP Digest handling in request.py There is a bug triggered when server replies to a request with `WWW-Authenticate: Digest` where `qop="auth,auth-int"` rather than mere `qop="auth"`. Having both `auth` and `auth-int` is legitimate according to the `qop-options` rule in §3.2.1 of [[https://www.ietf.org/rfc/rfc2617.txt|RFC 2617]]: > qop-options = "qop" "=" <"> 1#qop-value <"> > qop-value = "auth" | "auth-int" | token > **qop-options**: [...] If present, it is a quoted string **of one or more** tokens indicating the "quality of protection" values supported by the server. The value `"auth"` indicates authentication; the value `"auth-int"` indicates authentication with integrity protection This is description confirmed by the definition of the [//n//]`#`[//m//]//rule// extended-BNF pattern defined in §2.1 of [[https://www.ietf.org/rfc/rfc2616.txt|RFC 2616]] as 'a comma-separated list of //rule// with at least //n// and at most //m// items'. When this reply is parsed by `get_authorization`, request.py only tests for identity with `'auth'`, failing to recognize it as one of the supported modes the server announced, and claims that `"qop 'auth,auth-int' is not supported"`. * 📜🤖 Added by blurb_it. * bpo-38686 review fix: remember why. * fix trailing space in Lib/urllib/request.py Co-Authored-By: Brandt Bucher <brandtbucher@gmail.com>
* Remove binding of captured exceptions when not used to reduce the chances of ↵Pablo Galindo2019-11-191-1/+1
| | | | | | | creating cycles (GH-17246) Capturing exceptions into names can lead to reference cycles though the __traceback__ attribute of the exceptions in some obscure cases that have been reported previously and fixed individually. As these variables are not used anyway, we can remove the binding to reduce the chances of creating reference cycles. See for example GH-13135
* bpo-27657: Fix urlparse() with numeric paths (#661)Tim Graham2019-10-181-21/+1
| | | | | | | | | | * bpo-27657: Fix urlparse() with numeric paths Revert parsing decision from bpo-754016 in favor of the documented consensus in bpo-16932 of how to treat strings without a // to designate the netloc. * bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.
* bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768)Stein Karlsen2019-10-141-0/+2
|
* bpo-25068: urllib.request.ProxyHandler now lowercases the dict keys (GH-13489)Zackery Spytz2019-09-131-0/+1
|
* bpo-12707: deprecate info(), geturl(), getcode() methods in favor of ↵Ashwin Ramaswami2019-09-132-11/+7
| | | | | headers, url, and status properties for HTTPResponse and addinfourl (GH-11447) Co-Authored-By: epicfaace <aramaswamis@gmail.com>
* bpo-35922: Fix RobotFileParser when robots.txt has no relevant crawl delay ↵Rémi Lapeyre2019-06-161-2/+6
| | | | | or request rate (GH-11791) Co-Authored-By: Tal Einat <taleinat+github@gmail.com>
* bpo-36742: Corrects fix to handle decomposition in usernames (#13812)Steve Dower2019-06-041-3/+3
|
* bpo-35397: Remove deprecation and document urllib.parse.unwrap (GH-11481)Rémi Lapeyre2019-05-272-11/+9
|
* bpo-36842: Implement PEP 578 (GH-12613)Steve Dower2019-05-231-0/+1
| | | Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.
* bpo-35907, CVE-2019-9948: urllib rejects local_file:// scheme (GH-13474)Victor Stinner2019-05-221-1/+1
| | | | | | | CVE-2019-9948: Avoid file reading as disallowing the unnecessary URL scheme in URLopener().open() and URLopener().retrieve() of urllib.request. Co-Authored-By: SH <push0ebp@gmail.com>
* bpo-36948: Fix NameError in urllib.request.URLopener.retrieve (GH-13389)Xtreak2019-05-191-5/+5
|
* bpo-36742: Fixes handling of pre-normalization characters in urlsplit() ↵Steve Dower2019-04-301-4/+7
| | | | (GH-13017)
* bpo-12910: update and correct quote docstring (#2568)Jörn Hees2019-04-101-13/+20
| | | | | | Fixes some mistakes and misleadings in the quote function docstring: - reserved chars are never actually used by quote code, unreserved chars are - reserved chars were wrong and incomplete - mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
* bpo-36431: Use PEP 448 dict unpacking for merging two dicts. (GH-12553)Serhiy Storchaka2019-03-271-2/+1
|
* bpo-36216: Add check for characters in netloc that normalize to separators ↵Steve Dower2019-03-071-0/+17
| | | | (GH-12201)
* closes bpo-35309: cpath should be capath (GH-10699)Boštjan Mejak2018-11-251-1/+1
|
* bpo-34866: Adding max_num_fields to cgi.FieldStorage (GH-9660)matthewbelisle-wf2018-10-191-3/+19
| | | | Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
* bpo-21475: Support the Sitemap extension in robotparser (GH-6883)Christopher Beacham2018-05-161-0/+12
|
* bpo-32861: urllib.robotparser fix incomplete __str__ methods. (GH-5711)Michael Lazar2018-05-141-5/+12
| | | | | | The urllib.robotparser's __str__ representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra newlines that were being appended to the end of the string.
* bpo-27485: Rename and deprecate undocumented functions in urllib.parse (GH-2205)Cheryl Sabella2018-04-252-57/+152
|
* bpo-33034: Improve exception message when cast fails for ↵Matt Eaton2018-03-201-1/+5
| | | | {Parse,Split}Result.port (GH-6078)
* Revert unneccessary changes made in bpo-30296 and apply other improvements. ↵Serhiy Storchaka2018-02-261-1/+2
| | | | (GH-2624)
* urllib.request: Remove unused import (GH-5268)INADA Naoki2018-01-221-1/+0
|
* bpo-32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value (#4867)Коренберг Марк2017-12-211-4/+6
|
* bpo-31325: Fix usage of namedtuple in RobotFileParser.parse() (#4529)Berker Peksag2017-11-231-5/+4
|