summaryrefslogtreecommitdiffstats
path: root/Python/hamt.c
diff options
context:
space:
mode:
authorYury Selivanov <yury@edgedb.com>2022-05-23 19:09:59 (GMT)
committerGitHub <noreply@github.com>2022-05-23 19:09:59 (GMT)
commitc1f5c903a7e4ed27190488f4e33b00d3c3d952e5 (patch)
tree6c0fe48100b58b23bd1cb0fcf63a27cb0d2ac7f6 /Python/hamt.c
parenta49721ea075a18a7787ace6752b4eb0954e1b607 (diff)
downloadcpython-c1f5c903a7e4ed27190488f4e33b00d3c3d952e5.zip
cpython-c1f5c903a7e4ed27190488f4e33b00d3c3d952e5.tar.gz
cpython-c1f5c903a7e4ed27190488f4e33b00d3c3d952e5.tar.bz2
gh-93065: Fix HAMT to iterate correctly over 7-level deep trees (GH-93066)
Also while there, clarify a few things about why we reduce the hash to 32 bits. Co-authored-by: Eli Libman <eli@hyro.ai> Co-authored-by: Yury Selivanov <yury@edgedb.com> Co-authored-by: Ɓukasz Langa <lukasz@langa.pl>
Diffstat (limited to 'Python/hamt.c')
-rw-r--r--Python/hamt.c14
1 files changed, 11 insertions, 3 deletions
diff --git a/Python/hamt.c b/Python/hamt.c
index c3cb4e6..908c253 100644
--- a/Python/hamt.c
+++ b/Python/hamt.c
@@ -409,14 +409,22 @@ hamt_hash(PyObject *o)
return -1;
}
- /* While it's suboptimal to reduce Python's 64 bit hash to
+ /* While it's somewhat suboptimal to reduce Python's 64 bit hash to
32 bits via XOR, it seems that the resulting hash function
is good enough (this is also how Long type is hashed in Java.)
Storing 10, 100, 1000 Python strings results in a relatively
shallow and uniform tree structure.
- Please don't change this hashing algorithm, as there are many
- tests that test some exact tree shape to cover all code paths.
+ Also it's worth noting that it would be possible to adapt the tree
+ structure to 64 bit hashes, but that would increase memory pressure
+ and provide little to no performance benefits for collections with
+ fewer than billions of key/value pairs.
+
+ Important: do not change this hash reducing function. There are many
+ tests that need an exact tree shape to cover all code paths and
+ we do that by specifying concrete values for test data's `__hash__`.
+ If this function is changed most of the regression tests would
+ become useless.
*/
int32_t xored = (int32_t)(hash & 0xffffffffl) ^ (int32_t)(hash >> 32);
return xored == -1 ? -2 : xored;