summaryrefslogtreecommitdiffstats
path: root/Modules/gcmodule.c
diff options
context:
space:
mode:
authorPablo Galindo <Pablogsal@gmail.com>2019-12-27 21:55:56 (GMT)
committerGitHub <noreply@github.com>2019-12-27 21:55:56 (GMT)
commit90913985b62845a58f6b9e815121bcf614bd107f (patch)
tree3bc2594d17089262818928ab10445b9dd7a367e8 /Modules/gcmodule.c
parent91874bb07161bb481b6f5ea18ffafe69cb8cac30 (diff)
downloadcpython-90913985b62845a58f6b9e815121bcf614bd107f.zip
cpython-90913985b62845a58f6b9e815121bcf614bd107f.tar.gz
cpython-90913985b62845a58f6b9e815121bcf614bd107f.tar.bz2
Move comment about permanent generation to gcmodule.c (GH-17718)
The comment about the collection rules for the permanent generation was incorrectly referenced by a comment in gcmodule.c (the comment has been moved long ago into a header file). Moving the comment into the relevant code helps with readability and avoids broken references.
Diffstat (limited to 'Modules/gcmodule.c')
-rw-r--r--Modules/gcmodule.c36
1 files changed, 34 insertions, 2 deletions
diff --git a/Modules/gcmodule.c b/Modules/gcmodule.c
index 64afe83..b11ae84 100644
--- a/Modules/gcmodule.c
+++ b/Modules/gcmodule.c
@@ -1381,8 +1381,40 @@ collect_generations(PyThreadState *tstate)
for (int i = NUM_GENERATIONS-1; i >= 0; i--) {
if (gcstate->generations[i].count > gcstate->generations[i].threshold) {
/* Avoid quadratic performance degradation in number
- of tracked objects. See comments at the beginning
- of this file, and issue #4074.
+ of tracked objects (see also issue #4074):
+
+ To limit the cost of garbage collection, there are two strategies;
+ - make each collection faster, e.g. by scanning fewer objects
+ - do less collections
+ This heuristic is about the latter strategy.
+
+ In addition to the various configurable thresholds, we only trigger a
+ full collection if the ratio
+
+ long_lived_pending / long_lived_total
+
+ is above a given value (hardwired to 25%).
+
+ The reason is that, while "non-full" collections (i.e., collections of
+ the young and middle generations) will always examine roughly the same
+ number of objects -- determined by the aforementioned thresholds --,
+ the cost of a full collection is proportional to the total number of
+ long-lived objects, which is virtually unbounded.
+
+ Indeed, it has been remarked that doing a full collection every
+ <constant number> of object creations entails a dramatic performance
+ degradation in workloads which consist in creating and storing lots of
+ long-lived objects (e.g. building a large list of GC-tracked objects would
+ show quadratic performance, instead of linear as expected: see issue #4074).
+
+ Using the above ratio, instead, yields amortized linear performance in
+ the total number of objects (the effect of which can be summarized
+ thusly: "each full garbage collection is more and more costly as the
+ number of objects grows, but we do fewer and fewer of them").
+
+ This heuristic was suggested by Martin von Löwis on python-dev in
+ June 2008. His original analysis and proposal can be found at:
+ http://mail.python.org/pipermail/python-dev/2008-June/080579.html
*/
if (i == NUM_GENERATIONS - 1
&& gcstate->long_lived_pending < gcstate->long_lived_total / 4)