gh-88500: Reduce memory use of `urllib.unquote` (#96763)

`urllib.unquote_to_bytes` and `urllib.unquote` could both potentially generate `O(len(string))` intermediate `bytes` or `str` objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram. This switches the implementation to using an expanding `bytearray` and a generator internally instead of precomputed `split()` style operations. Microbenchmarks with some antagonistic inputs like `mess = "\u0141%%%20a%fe"*1000` show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected. Memory usage observed manually using `/usr/bin/time -v` on `python -m timeit` runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile. Observed memory usage is ~1/2 for `unquote()` and <1/3 for `unquote_to_bytes()` using `python -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)'` as a test.
author: Gregory P. Smith <greg@krypto.org> 2022-12-11 00:17:39 (GMT)
committer: GitHub <noreply@github.com> 2022-12-11 00:17:39 (GMT)
commit: 2e279e85fece187b6058718ac7e82d1692461e26 (patch)
tree: c0c187ef473fde7f9a9ba0f5ac8f92ade79d02fc /Lib/test
parent: 1bb68ba6d9de6bb7f00aee11d135123163f15887 (diff)
download: cpython-2e279e85fece187b6058718ac7e82d1692461e26.zip
cpython-2e279e85fece187b6058718ac7e82d1692461e26.tar.gz
cpython-2e279e85fece187b6058718ac7e82d1692461e26.tar.bz2
1 files changed, 2 insertions, 0 deletions
diff --git a/Lib/test/test_urllib.py b/Lib/test/test_urllib.py
index f067560..2df74f5 100644
--- a/Lib/test/test_urllib.py
+++ b/Lib/test/test_urllib.py
@@ -1104,6 +1104,8 @@ class UnquotingTests(unittest.TestCase):
         self.assertEqual(result.count('%'), 1,
                          "using unquote(): not all characters escaped: "
                          "%s" % result)
+
+    def test_unquote_rejects_none_and_tuple(self):
         self.assertRaises((TypeError, AttributeError), urllib.parse.unquote, None)
         self.assertRaises((TypeError, AttributeError), urllib.parse.unquote, ())
author	Gregory P. Smith <greg@krypto.org>	2022-12-11 00:17:39 (GMT)
committer	GitHub <noreply@github.com>	2022-12-11 00:17:39 (GMT)
commit	2e279e85fece187b6058718ac7e82d1692461e26 (patch)
tree	c0c187ef473fde7f9a9ba0f5ac8f92ade79d02fc /Lib/test
parent	1bb68ba6d9de6bb7f00aee11d135123163f15887 (diff)
download	cpython-2e279e85fece187b6058718ac7e82d1692461e26.zip cpython-2e279e85fece187b6058718ac7e82d1692461e26.tar.gz cpython-2e279e85fece187b6058718ac7e82d1692461e26.tar.bz2