Improving GC collections: dynamic thresholds, single generation gc and time barriers #100403

pablogsal · 2022-12-21T13:48:57Z

In the pursuit of trying to optimize GC runs it has been observed that the weak generational hypothesis may not apply that well to Python. This is because, according to this argument, in the presence of a mixel cycle GC + refcount GC strategy, young objects are mostly cleaned up by reference count, not by the GC. Is important as well that there is no segregation between the GC strategies and that the cycle GC needs to deal with objects that according to this will be mainly cleaned up by reference count alone.

This questions the utility of segregating GC by generations and indeed there is some evidence of this. I have been benchmarking the percentage of success of different generations in some programs (such as blach and mypy and a bunch of HTTP servers) and the success rate of the lower generations is generally small. Here is an example of running black over all the standard library:

Statistics for generation 0

Category	Value
count	157917.000000
mean	1.192775
std	3.560391
min	0.000000
25%	0.000000
50%	0.000000
75%	0.480192
max	86.407768

Statistics for generation 1

Category	Value
count	14346.000000
mean	2.670852
std	11.642815
min	0.000000
25%	0.000000
50%	0.000000
75%	0.388794
max	97.406097

Statistics for generation 2

Category	Value
count	1280.000000
mean	45.698135
std	27.066735
min	0.000000
25%	31.965862
50%	54.618008
75%	67.038842
max	90.592705

I am currently investigating if having a single generation with a dynamic threshold that is similar to the strategy that we use currently for the last generation would be generally better to get better performance.

What do you think?

Linked PRs

gh-100403: Collect GC statistics via -X option #100958

The text was updated successfully, but these errors were encountered:

pablogsal · 2022-12-21T13:49:22Z

CC @tim-one @methane @nascheme

pablogsal · 2022-12-21T13:50:48Z

Here is a proof of concept for these changes if you want to play with it:

#100404

methane · 2022-12-21T16:29:45Z

I agree that current threshold is not good for many applications.

But I still think generations are good for some cases.

Python creates a lot of cyclic objects that lives until process shutdown. For example, modules, classes, functions, namespace dicts, and annotations. (PEP 563 eliminates most cycle of annotations, but it would deprecated by PEP 649.)

For large application using SQLAlchemy, full GC takes more than 10ms.
Younger generation can collect cycles created during web requests without tracing permanent objects.

Of course, permanent generation can be used to avoid full GC traces permanent objects. But it requires manual GC tuning...

pablogsal · 2022-12-21T16:39:40Z

For large application using SQLAlchemy, full GC takes more than 10ms.
Younger generation can collect cycles created during web requests without tracing permanent objects.

Yeah, but they key is that is not clear if a single run of the last generation (or even a few more) is going to be better than a lot of inefficient runs of the lower generations + a few of the full generation.

I am sure there are examples of how both ways can work but I think we need to focus on something that is more common because otherwise we can always find workloads that can give us reasons to do one approach or the other.

Do you have a benchmark of a SQLAlchemy application?

I agree that current threshold is not good for many applications.

The other idea is adjust the thresholds dynamically based on how effective a collection has been. For instance, we can set a target for efficiency (defined as how many objects are collected divided by the size of the generation) and we adjust the threshold for the next collection based on how much the actual collection separates from the target.

For example, if a target is 25% and a collection collects 5% we adjust the threshold with a scale factor f(5/25) making the generation threshold bigger and coversely if we collect 75% we reduce the size by f(75/25).

What is the scale factor is up for discuss.

What do you think of this approach?

pablogsal · 2022-12-21T17:35:29Z

Benchmark of #100404 (:warning: please take into account that this is still a DRAFT and we may be missing something ⚠️ )

All benchmarks:

Benchmark	2022-12-21_09-50-main-a7715ccfba5b	2022-12-21_14-11-gc_nogen-663a9654f714
async_tree_io	1.52 sec	658 ms: 2.31x faster
async_tree_memoization	753 ms	397 ms: 1.89x faster
async_tree_none	621 ms	334 ms: 1.86x faster
async_tree_cpu_io_mixed	924 ms	607 ms: 1.52x faster
xml_etree_iterparse	121 ms	98.7 ms: 1.23x faster
docutils	3.13 sec	2.68 sec: 1.17x faster
float	89.9 ms	77.0 ms: 1.17x faster
xml_etree_parse	173 ms	151 ms: 1.14x faster
unpack_sequence	56.3 ns	51.1 ns: 1.10x faster
deltablue	4.24 ms	3.98 ms: 1.07x faster
async_generators	374 ms	351 ms: 1.07x faster
logging_silent	127 ns	121 ns: 1.05x faster
genshi_xml	60.5 ms	57.6 ms: 1.05x faster
xml_etree_generate	105 ms	101 ms: 1.04x faster
regex_effbot	3.94 ms	3.79 ms: 1.04x faster
pickle_pure_python	364 us	352 us: 1.04x faster
2to3	310 ms	299 ms: 1.03x faster
python_startup	11.2 ms	10.9 ms: 1.03x faster
html5lib	72.5 ms	70.2 ms: 1.03x faster
generators	66.7 ms	64.8 ms: 1.03x faster
json_loads	31.9 us	31.0 us: 1.03x faster
regex_dna	227 ms	222 ms: 1.03x faster
pidigits	232 ms	226 ms: 1.02x faster
python_startup_no_site	8.65 ms	8.46 ms: 1.02x faster
sqlglot_parse	1.81 ms	1.77 ms: 1.02x faster
nbody	113 ms	110 ms: 1.02x faster
unpickle_pure_python	259 us	254 us: 1.02x faster
deepcopy_reduce	3.81 us	3.75 us: 1.02x faster
deepcopy	427 us	420 us: 1.02x faster
go	173 ms	170 ms: 1.02x faster
pyflate	501 ms	493 ms: 1.02x faster
genshi_text	26.3 ms	25.9 ms: 1.01x faster
sqlglot_transpile	2.15 ms	2.13 ms: 1.01x faster
logging_format	8.04 us	7.94 us: 1.01x faster
richards	53.7 ms	53.1 ms: 1.01x faster
coverage	545 ms	538 ms: 1.01x faster
sympy_expand	571 ms	564 ms: 1.01x faster
nqueens	103 ms	102 ms: 1.01x faster
xml_etree_process	72.6 ms	71.7 ms: 1.01x faster
deepcopy_memo	44.5 us	44.0 us: 1.01x faster
mako	12.1 ms	12.0 ms: 1.01x faster
telco	7.77 ms	7.70 ms: 1.01x faster
unpickle_list	5.97 us	5.92 us: 1.01x faster
logging_simple	7.28 us	7.24 us: 1.01x faster
mdp	3.37 sec	3.35 sec: 1.01x faster
sympy_integrate	25.5 ms	25.4 ms: 1.00x faster
raytrace	370 ms	371 ms: 1.00x slower
regex_v8	26.1 ms	26.2 ms: 1.00x slower
chameleon	8.42 ms	8.45 ms: 1.00x slower
hexiom	7.81 ms	7.84 ms: 1.00x slower
scimark_fft	380 ms	382 ms: 1.00x slower
bench_thread_pool	1.00 ms	1.01 ms: 1.01x slower
scimark_sor	138 ms	138 ms: 1.01x slower
sqlglot_normalize	135 ms	136 ms: 1.01x slower
scimark_monte_carlo	79.0 ms	79.6 ms: 1.01x slower
sqlglot_optimize	64.4 ms	64.9 ms: 1.01x slower
unpickle	16.2 us	16.4 us: 1.01x slower
json_dumps	12.1 ms	12.2 ms: 1.01x slower
pprint_pformat	1.77 sec	1.79 sec: 1.01x slower
regex_compile	162 ms	164 ms: 1.01x slower
scimark_sparse_mat_mult	4.64 ms	4.70 ms: 1.01x slower
crypto_pyaes	91.0 ms	92.1 ms: 1.01x slower
pprint_safe_repr	862 ms	877 ms: 1.02x slower
meteor_contest	127 ms	130 ms: 1.02x slower
sqlite_synth	2.52 us	2.57 us: 1.02x slower
chaos	82.4 ms	84.3 ms: 1.02x slower
fannkuch	451 ms	463 ms: 1.03x slower
django_template	41.9 ms	43.1 ms: 1.03x slower
spectral_norm	122 ms	126 ms: 1.03x slower
pickle_list	4.90 us	5.17 us: 1.06x slower
coroutines	33.9 ms	36.0 ms: 1.06x slower
pickle_dict	36.8 us	39.1 us: 1.06x slower
Geometric mean	(ref)	1.05x faster

Benchmark hidden because not significant (6): scimark_lu, bench_mp_pool, sympy_sum, pickle, sympy_str, pathlib

mdboom · 2022-12-21T23:21:16Z

Here's the results on the Faster CPython benchmarking system (basically confirming the above plus or minus the usual noise): https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20221221-3.12.0a3+-663a965

pablogsal · 2022-12-22T01:13:30Z

Here is another different idea: #100421

This PR implements time barriers in the GC. The main idea is to control how much time the GC represents over total runtime. To do that we calculate the running average of the GC time, the total time since the GC was called and if this ratio is bigger than a specific threshold (I suppose this can be configurable) then we hold the GC from running. The idea is that if the GC is running too much then we hold it off.

I think configuring this threshold may be much easier than changing the knobs of the generation thresholds because is much easier to understand what this is achieving.

I think this is not going to be as exciting as the previous ideas and the system call for the time has the potential to slow down some use cases unless we combine it with other approaches.

pablogsal · 2022-12-22T01:20:28Z

And here is yet another idea: #100422

This implements dynamic thresholds based on how successful a specific collection was. When we do a collection we calculate how many objects have been collected divided by how many objects were initially considered. Then we update the threshold for that generation based on the difference between the current ratio of collected to total objects and a given configurable target ratio. The update uses a sigmoid that saturates between 0 and 1 and peaks over given values of max_threshold and min_threshold.

This has the advantage over the first method that keeps the generations.

Benchmarks for this idea:

All benchmarks:

Benchmark	2022-12-21_09-50-main-a7715ccfba5b	2022-12-22_00-36-gc_thresholds-a292d5ccb575
async_tree_io	1.52 sec	740 ms: 2.06x faster
async_tree_memoization	753 ms	442 ms: 1.70x faster
async_tree_none	621 ms	371 ms: 1.67x faster
async_tree_cpu_io_mixed	924 ms	647 ms: 1.43x faster
float	89.9 ms	80.9 ms: 1.11x faster
unpack_sequence	56.3 ns	51.8 ns: 1.09x faster
genshi_xml	60.5 ms	58.1 ms: 1.04x faster
xml_etree_parse	173 ms	167 ms: 1.03x faster
xml_etree_iterparse	121 ms	117 ms: 1.03x faster
regex_effbot	3.94 ms	3.83 ms: 1.03x faster
html5lib	72.5 ms	70.7 ms: 1.03x faster
deltablue	4.24 ms	4.14 ms: 1.02x faster
pidigits	232 ms	227 ms: 1.02x faster
coroutines	33.9 ms	33.2 ms: 1.02x faster
pathlib	19.5 ms	19.1 ms: 1.02x faster
async_generators	374 ms	366 ms: 1.02x faster
go	173 ms	170 ms: 1.02x faster
json_dumps	12.1 ms	11.9 ms: 1.02x faster
xml_etree_generate	105 ms	103 ms: 1.02x faster
telco	7.77 ms	7.62 ms: 1.02x faster
logging_silent	127 ns	125 ns: 1.02x faster
json_loads	31.9 us	31.3 us: 1.02x faster
scimark_lu	137 ms	135 ms: 1.02x faster
sqlglot_parse	1.81 ms	1.78 ms: 1.02x faster
xml_etree_process	72.6 ms	71.6 ms: 1.01x faster
meteor_contest	127 ms	126 ms: 1.01x faster
scimark_fft	380 ms	376 ms: 1.01x faster
sqlglot_normalize	135 ms	134 ms: 1.01x faster
chameleon	8.42 ms	8.35 ms: 1.01x faster
nqueens	103 ms	103 ms: 1.00x faster
regex_v8	26.1 ms	26.0 ms: 1.00x faster
regex_compile	162 ms	162 ms: 1.00x slower
hexiom	7.81 ms	7.83 ms: 1.00x slower
sqlglot_optimize	64.4 ms	64.5 ms: 1.00x slower
sympy_expand	571 ms	573 ms: 1.00x slower
bench_thread_pool	1.00 ms	1.00 ms: 1.00x slower
crypto_pyaes	91.0 ms	91.4 ms: 1.01x slower
pprint_pformat	1.77 sec	1.78 sec: 1.01x slower
unpickle_pure_python	259 us	261 us: 1.01x slower
fannkuch	451 ms	455 ms: 1.01x slower
scimark_sor	138 ms	139 ms: 1.01x slower
sympy_sum	205 ms	207 ms: 1.01x slower
logging_simple	7.28 us	7.35 us: 1.01x slower
pprint_safe_repr	862 ms	869 ms: 1.01x slower
django_template	41.9 ms	42.2 ms: 1.01x slower
sympy_str	353 ms	356 ms: 1.01x slower
sympy_integrate	25.5 ms	25.7 ms: 1.01x slower
bench_mp_pool	29.9 ms	30.2 ms: 1.01x slower
deepcopy	427 us	432 us: 1.01x slower
logging_format	8.04 us	8.13 us: 1.01x slower
unpickle	16.2 us	16.4 us: 1.01x slower
scimark_sparse_mat_mult	4.64 ms	4.70 ms: 1.01x slower
deepcopy_memo	44.5 us	45.1 us: 1.01x slower
chaos	82.4 ms	83.5 ms: 1.01x slower
python_startup	11.2 ms	11.4 ms: 1.01x slower
python_startup_no_site	8.65 ms	8.80 ms: 1.02x slower
spectral_norm	122 ms	124 ms: 1.02x slower
genshi_text	26.3 ms	26.9 ms: 1.02x slower
sqlite_synth	2.52 us	2.58 us: 1.02x slower
2to3	310 ms	319 ms: 1.03x slower
mako	12.1 ms	12.5 ms: 1.03x slower
pickle_list	4.90 us	5.09 us: 1.04x slower
unpickle_list	5.97 us	6.20 us: 1.04x slower
pickle_dict	36.8 us	38.6 us: 1.05x slower
regex_dna	227 ms	239 ms: 1.05x slower
Geometric mean	(ref)	1.03x faster

This is how it compares to the first method:

All benchmarks:

Benchmark	2022-12-21_14-11-gc_nogen-663a9654f714	2022-12-22_00-36-gc_thresholds-a292d5ccb575
coroutines	36.0 ms	33.2 ms: 1.08x faster
meteor_contest	130 ms	126 ms: 1.03x faster
logging_silent	121 ns	125 ns: 1.03x slower
pickle_pure_python	352 us	364 us: 1.04x slower
genshi_text	25.9 ms	26.9 ms: 1.04x slower
python_startup_no_site	8.46 ms	8.80 ms: 1.04x slower
mako	12.0 ms	12.5 ms: 1.04x slower
deltablue	3.98 ms	4.14 ms: 1.04x slower
async_generators	351 ms	366 ms: 1.04x slower
unpickle_list	5.92 us	6.20 us: 1.05x slower
python_startup	10.9 ms	11.4 ms: 1.05x slower
float	77.0 ms	80.9 ms: 1.05x slower
2to3	299 ms	319 ms: 1.07x slower
async_tree_cpu_io_mixed	607 ms	647 ms: 1.07x slower
regex_dna	222 ms	239 ms: 1.08x slower
xml_etree_parse	151 ms	167 ms: 1.11x slower
async_tree_memoization	397 ms	442 ms: 1.11x slower
async_tree_none	334 ms	371 ms: 1.11x slower
async_tree_io	658 ms	740 ms: 1.12x slower
docutils	2.68 sec	3.08 sec: 1.15x slower
xml_etree_iterparse	98.7 ms	117 ms: 1.19x slower
Geometric mean	(ref)	1.02x slower

Benchmark hidden because not significant (57): json_dumps, pathlib, django_template, fannkuch, pickle_list, sqlglot_normalize, scimark_fft, spectral_norm, pickle_dict, scimark_lu, chameleon, chaos, scimark_monte_carlo, telco, regex_compile, pprint_safe_repr, crypto_pyaes, regex_v8, raytrace, sqlglot_optimize, go, pprint_pformat, hexiom, xml_etree_process, bench_thread_pool, scimark_sparse_mat_mult, pickle, pidigits, sqlite_synth, scimark_sor, unpickle, sqlglot_parse, mdp, sympy_str, html5lib, nqueens, genshi_xml, json_loads, sympy_sum, sqlglot_transpile, regex_effbot, unpack_sequence, bench_mp_pool, sympy_integrate, deepcopy_reduce, logging_simple, sympy_expand, coverage, richards, nbody, pyflate, xml_etree_generate, logging_format, deepcopy_memo, unpickle_pure_python, generators, deepcopy

nascheme · 2022-12-22T22:56:24Z

I suspect it really depends on the application. It could be the case that some (many?) applications could completely turn off the cyclic GC and only rely on ref counting. However, I imagine there are applications that generate cyclic garbage quickly and would perform quite badly with a single generation GC. It takes quite a lot more time to do the full collection and the young generation is effective at collecting recently created cyclic garbage.

The dynamic threshold sounds like a good idea to me. The challenge will be to make something that works for all CPython users. We have something that reduces the number of full collections, from gcmodule.c:

            /* Avoid quadratic performance degradation in number
               of tracked objects (see also issue #4074):

               To limit the cost of garbage collection, there are two strategies;
                 - make each collection faster, e.g. by scanning fewer objects
                 - do less collections
               This heuristic is about the latter strategy.

               In addition to the various configurable thresholds, we only trigger a
               full collection if the ratio

                long_lived_pending / long_lived_total

               is above a given value (hardwired to 25%).

               The reason is that, while "non-full" collections (i.e., collections of
               the young and middle generations) will always examine roughly the same
               number of objects -- determined by the aforementioned thresholds --,
               the cost of a full collection is proportional to the total number of
               long-lived objects, which is virtually unbounded.

               Indeed, it has been remarked that doing a full collection every
               <constant number> of object creations entails a dramatic performance
               degradation in workloads which consist in creating and storing lots of
               long-lived objects (e.g. building a large list of GC-tracked objects would
               show quadratic performance, instead of linear as expected: see issue #4074).

               Using the above ratio, instead, yields amortized linear performance in
               the total number of objects (the effect of which can be summarized
               thusly: "each full garbage collection is more and more costly as the
               number of objects grows, but we do fewer and fewer of them").

               This heuristic was suggested by Martin von Löwis on python-dev in
               June 2008. His original analysis and proposal can be found at:
               http://mail.python.org/pipermail/python-dev/2008-June/080579.html
            */

pablogsal · 2022-12-23T00:44:36Z

The dynamic threshold sounds like a good idea to me. The challenge will be to make something that works for all CPython users. We have something that reduces the number of full collections, from gcmodule.c:

What do you think of the approach of

#100403 (comment)

I think this could be interesting because I think is easy to understand how it changes the dynamic thresholds and I also think is easy to understand what the knobs will try to achieve in the gc behaviour.

Also it had good performance numbers in the benchmark suite

gpshead · 2022-12-24T02:39:49Z

@nanjekyejoannah as well (who is also interested in GC if I recall correctly?)

nascheme · 2023-01-03T19:56:28Z

Bench marking real applications is good but I think we should also run some simulations or do some analysis to estimate how the GC will be behave under different scenarios. To that end, I started writing a simulator. It is still crude and likely buggy.

To help with simulation and modelling, some profiling would help to determine the cost of minor and major GC runs. My first guess is that the cost is roughly O(n) where n is the number of objects in the generation. If that's correct, I don't see how the dynamic threshold change will help, at least how it is written. Each newly object gets examined once in the young generation and so doing it less often doesn't save any time, it just makes the gen 0 collection take longer. That's another thing to be concerned about: GC pause time. Making gen 0 and gen 1 large will increase GC pause time. It is a "stop the world" GC and some applications might not be happy with much longer pauses.

In my basic simulator, I set the fraction of objects that are trash. If that's roughly how things work, I don't think the threshold algorithm in gh-100422 is what we want. Say 2% of newly created objects immediately become cyclic trash. Since it is targeting 20% of the objects in gen 0 as trash, it will keep increasing the threshold until it reaches the upper limit. Making the generation larger does not increase the % of objects that are trash and I don't see how that's likely in any scenario. Perhaps the threshold needs to be based on something like (number of trash objects found / number of total live objects). However, that approach would result in GC pauses becoming long as the total number of live objects gets large.

BTW, it looks to me like gh-100422 sets thresholds for older generations so high that collections basically don't happen. Those thresholds are based on the number of collections, not number of objects added to the generation.

Some ideas on how to make progress:

add more GC metrics, enabled by a -X option or gc.set_debug(). E.g. time spent in different parts of the GC, number of objects surviving each generation. Use this to collect data from real applications and benchmarks
improve the simulator to match what the GC does, output metrics after simulation run
define objectives for optimization, trade-offs are: time spent in GC, length of GC pauses, overall memory use (fraction of trash vs total live objects).
design tuning knobs or automatic tuning the optimizes those objectives

Regarding your "Statistics" tables, those are interesting but it's not exactly clear to me what the numbers mean. Is the "count" directly from the GC generation count field? If so, I think gen 0 count is based on the number of objects in the generation whereas the gen 1 and gen 2 are based on number of collections. And I guess the rest of the table rows are based on the number of trash objects found. If that's all correct, gen 0 is not so bad as you think at finding trash whereas gen 1 is pretty useless. Maybe we should only have two generations. If would be interesting to see statistisc based on the fraction of objects found to be trash and the number of objects in each generation. Also, the amount of time for the GC vs size of generation. We might deduce the O(...) behaviour from that.

pablogsal · 2023-01-03T20:09:21Z

To help with simulation and modelling, some profiling would help to determine the cost of minor and major GC runs. My first guess is that the cost is roughly O(n) where n is the number of objects in the generation. If that's correct, I don't see how the dynamic threshold change will help, at least how it is written.

But under that assumption, if you make simulations that have a constant percentage of trash per collection, the size of the generation doesn't matter because the only thing that would determine the time in GC over a fixed time is the total number of objects that we have seen in the GC, and unless you add more assumptions that should not matter.

On the other hand, I think the key here is that having less GC collections allows refcount to kill most of the stuff that's not cyclic trash and therefore the collections would be more efficient. I think this is the key aspect that can optimize the GC time in general and you won't be able to capture this with a simulation that simulates fixed percentages.

Regarding your "Statistics" tables, those are interesting but it's not exactly clear to me what the numbers mean. Is the "count" directly from the GC generation count field? If so, I think gen 0 count is based on the number of objects in the generation whereas the gen 1 and gen 2 are based on number of collections. And I guess the rest of the table rows are based on the number of trash objects found. If that's all correct, gen 0 is not so bad as you think at finding trash whereas gen 1 is pretty useless. Maybe we should only have two generations. If would be interesting to see statistisc based on the fraction of objects found to be trash and the number of objects in each generation. Also, the amount of time for the GC vs size of generation. We might deduce the O(...) behaviour from that.

Apologies for the lack of clarity on the tables. "Count" is just the number of data points but that value alone doesn't tell you much so I would suggest ignoring it. The rest are the % of collected objects over the number of objects available at the start of a collection of that generation. As this is done across many different runs then you get different statistics. For example:

|mean | 1.192775|
|std | 3.560391|
|min | 0.000000|
|25% | 0.000000|
|50% | 0.000000|
|75% | 0.480192|
|max |86.407768|

Means that on average a collection of this generation collects 1% of objects in the generation, with a standard deviation of 3.5%. The 25/75 are the quantiles of the distribution and min and max are the efficiency of the worst and best collection. In this case the worst collection collected 0% of the objects and the best collection collected 86% of the objects.

BTW, it looks to me like gh-100422 sets thresholds for older generations so high that collections basically don't happen. Those thresholds are based on the number of collections, not number of objects added to the generation.

Yeah, I suspect that most of the improvements come from making the generations bigger. That's another question: maybe the generations we have currently are too small for many applications. Maybe just changing the numbers may help a bit. Coming up with 'good numbers' I suppose is going to be the most challenging part but I cannot see any proof that the status quo is maximizing any specific objective. Does anyone know where the numbers came from?

However, that approach would result in GC pauses becoming long as the total number of live objects gets large.

Yeah, although the idea is that over the long run, the actual total time of GC should be less. I assume these are two different axes to optimize, depending on your application.

If would be interesting to see statistisc based on the fraction of objects found to be trash and the number of objects in each generation.

I may be missing something but those are the statistics that I showed: is the % of trashed colledted objects over the objects in that generation.

nascheme · 2023-01-03T21:41:43Z

On the other hand, I think the key here is that having less GC collections
allows refcount to kill most of the stuff that's not cyclic trash and
therefore the collections would be more efficient. I think this is the key
aspect that can optimize the GC time in general and you won't be able to
capture this with a simulation that simulates fixed percentages.

Ah, that's a good point and would explain how your change could improve
performance. If gen 0 is too small, the GC looks at a lot of objects that die
due to ref counting. That's wasted time.

It would be interesting to gather some data about that. I.e. when an object
dies which GC generation is in. Maybe need a special build that adds a field
to the GC head to keep track of the current generation. Otherwise it is too
expensive to compute.

The rest are the % of collected objects over the number of objects available
at the start of a collection of that generation.

Oh, I totally misunderstood then. That changes things a lot. The distribution
is highly skewed so "mean" is likely pretty useless. It could be useful to see
higher percentiles, e.g. 90, 95, 99%.

So gen 0 and 1 are finding hardly any trash since the 75th percentile is less
than 0.5%.

The gen 2 numbers are surprising too, the 50th percentile is 54.6%? That seems
really high to me. With those numbers, having only a single generation is
probably better.

Does anyone know where the numbers came from?

Originally I just picked some thresholds without too much thought. There was
some futher tweaking of them but, AFAIK, no one ever did a lot of analysis to
determine what they should be.

[...] those are the statistics that I showed: is the % of trashed colledted
objects over the objects in that generation.

Right, I misunderstood. Those numbers are useful and surprising, at least to
me.

pablogsal · 2023-01-03T23:31:18Z

It would be interesting to gather some data about that. I.e. when an object
dies which GC generation is in. Maybe need a special build that adds a field
to the GC head to keep track of the current generation. Otherwise it is too
expensive to compute.

Oh, fantastic suggestion. let me play with this a bit and will maybe propose a PR

Right, I misunderstood. Those numbers are useful and surprising, at least to
me.

Let me try to gather them again and point you to a branch to reproduce them so we can play with different workloads. Maybe we could also add a configure flag or something to have this in the interpreter itself.

carljm · 2023-01-04T00:34:23Z

If we develop a statistics-gathering patch that's been well-tested and we're confident it collects useful data, assuming I can reasonably backport it to Python 3.8 or 3.10, I should be able to run it on an IG prod host and collect stats from our workload.

…s provided

…stats is provided

…able-pystats is provided

…en --enable-pystats is provided Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

Port of Pablo's pythonGH-100403 to Python 3.11.x.

nascheme · 2023-01-12T21:30:48Z

I made a small change to allow the threshold to be set by an env var:

diff --git a/Modules/gcmodule.c b/Modules/gcmodule.c
index 829d6d9d004..2d8e22a795f 100644
--- a/Modules/gcmodule.c
+++ b/Modules/gcmodule.c
@@ -195,8 +195,15 @@ _PyGC_InitState(GCState *gcstate)
     };
     gcstate->generation0 = GEN_HEAD(gcstate, 0);
     INIT_HEAD(gcstate->permanent_generation);
-
 #undef INIT_HEAD
+
+    char *s = Py_GETENV("PYTHONGCTHRESHOLD");
+    if (s) {
+        long n = atol(s);
+        if (n > 0) {
+            gcstate->generations[0].threshold = n;
+        }
+    }
 }

I gathered stats for a web application and for the Django test suite, at 700 and 7000 for the GC threshold. The black application doesn't seem to create too much cyclic garbage while it runs so I expect the thresholds could be set high for it.

I added a new chart "Collected per run time" which is "objects collected / "running time", e.g. objects collected per ms of collection time.

Web app @ threshold 700:

Web app @ threshold 7000:

Using a threshold of 7000 makes youngest collection much more effective, >600 objects collected/per ms of runtime vs <200.

Some investigation shows that the HTML form framework used by the web app creates some reference cycles. So each web page generated that contains HTML forms can create work the GC. Black creates some cyclic garbage too (NFAState, LineGenerator, Leaf, Node) but not very much.

Django test suite @ 700:

Django test suite @ 7000:

Unlike the web app, Django doesn't benefit as much from using 7000 as the threshold. Total GC run time goes down, which is good.

I would like to do more testing and analysis. My initial impression is that 700 is too small for the threshold, especially for apps that don't create a lot of cyclic trash. Also, it seems we do a bunch of first gen collections before the interpreter finishing initializing. That seems like some low hanging fruit to claim. E.g. disable automatic GC until startup is done. We are not likely creating a lot of cyclic garbage.

Again, I think we need to think more about what are our objectives. Some possible ones: minimum runtime (i.e. max throughput), minimize GC pauses (min latency), minimize memory usage (minimize time cyclic trash remains alive). With threshold as 700, first generation collections seem to take a fraction of 1 ms. Small enough that I doubt anyone notices them. If we increase the size so it takes 10 ms, people may notice. E.g. 100 FPS is 10 ms per frame.

nascheme · 2023-01-13T05:21:31Z

Some more testing, trying different thresholds. Script to generate:
https://gist.github.com/nascheme/8652a77af79837a2cd6e24e88b457fc6

threshold	gen	num collections	mean collected cycles	mean time [ns]	mean total objects	total time [s]	time per total [ns/obj]	time per collected [ns/obj]	ratio collected
500	0	16797	35	49,581	736	0.83	67	1415	5%
500	1	1526	1,002	616,292	4,832	0.94	128	615	21%
500	2	109	21,842	69,297,688	213,383	7.55	325	3173	10%
5000	0	1039	1,793	907,897	5,701	0.94	159	506	31%
5000	1	94	21,532	10,784,989	34,436	1.01	313	501	63%
5000	2	12	60,610	99,815,000	255,447	1.20	391	1647	24%
50000	0	101	35,198	17,684,436	49,616	1.79	356	502	71%
50000	1	9	105,951	87,971,778	153,789	0.79	572	830	69%
50000	2	4	49,514	90,719,000	213,088	0.36	426	1832	23%

I'm not sure why the "time per total" number goes up when the threshold is higher. I would expect some efficiency gain with larger number of objects. For this app, I would say 5000 is near the sweet spot. With threshold 50,000, the mean time to collect gen0 is 17 ms vs 0.9 ms and "time per collected object" only slightly better.

pablogsal · 2023-01-18T22:01:23Z

Some more stats based on this super mario simulator based on pygame:

https://github.com/mx0c/super-mario-python

Base

Generation 0: 797 objects collected, 0.02 s, 86534 total objects
Generation 1: 433 objects collected, 0.01 s, 50447 total objects
Generation 2: 32132 objects collected, 0.04 s, 136712 total objects
Total GC time: 0.07 s

1st gen @ 7000

Generation 0: 803 objects collected, 0.01 s, 74589 total objects
Generation 1: 333 objects collected, 0.01 s, 48351 total objects
Generation 2: 32271 objects collected, 0.04 s, 136765 total objects
Total GC time: 0.06 s

It seems that this application is largely unaffected by the change, modulo some outliers in the first gen.

nascheme · 2023-01-19T20:17:20Z

A few more thoughts about this. The 700 threshold value was set long ago. The commit that set the value:

commit 3263dc2b1538ddd948bb60a5ccee817e512a3671
Author: Jeremy Hylton <jeremy@alum.mit.edu>
Date:   Tue Sep 5 15:44:50 2000 +0000

    compromise value for threshold0: not too high, not too low

diff --git a/Modules/gcmodule.c b/Modules/gcmodule.c
index 28263c07228..06f51ed6cd7 100644
--- a/Modules/gcmodule.c
+++ b/Modules/gcmodule.c
@@ -40,7 +40,7 @@ static int generation = 0; /* current generation being collected */
 
 /* collection frequencies, XXX tune these */
 static int enabled = 1; /* automatic collection enabled? */
-static int threshold0 = 5000; /* net new containers before collection */
+static int threshold0 = 700; /* net new containers before collection */
 static int threshold1 = 10;  /* generation0 collections before collecting 1 */
 static int threshold2 = 10;  /* generation1 collections before collecting 2 */

At that time, only a few "container objects" had GC support (e.g. lists, tuples, dicts). Many more types have gained support for cyclic GC. That makes collections happen more often than they originally did. Also, computer memory space has expanded significantly in 20 years. So, it seems likely 700 is too small as a default.

The fact that threshold1 and threshold2 are based on #-of-collections rather than #-of-objects makes tuning more complex. It would be nice to base them objects as well but when an object is freed, we don't know which generation it is in. We always subtract one from the gen0 count.

Regarding the dynamic tuning of the threshold, what are the reasons to not just always use the high threshold limit? You could argue that you are using more memory that could be freed if you collect more often. However, if you are creating a lot of cyclic garbage, the threshold will be reached sooner and maybe a high threshold is better.

There is a danger with a high threshold that a lot of memory can be used by cyclic trash that is not accounted for by tracking #-of-objects. E.g. you have some large array objects. Then, a higher GC threshold will make your program use much more memory.

It would be nice if we could track total memory use rather than #-of-objects as a threshold. I tried adding this function:

size_t
_PyMem_GetArenaBytes(void)
{
    if (!_PyMem_PymallocEnabled()) {
        return 0;
    }
    /* # of arenas actually allocated. */
    size_t narenas = 0;
    for (uint i = 0; i < maxarenas; ++i) {
        uintptr_t base = arenas[i].address;

        /* Skip arenas which are not allocated. */
        if (arenas[i].address == (uintptr_t)NULL)
            continue;
        narenas += 1;
    }
    return narenas * ARENA_SIZE;
}

And then changing GC triggering to check both #-of-objects threshold and also the incremental size of _PyMem_GetArenaBytes(). Counting arenas is faster than looking a pools and probably good enough. It doesn't really work though because this doesn't account for allocations larger than the obmalloc max size. I tried adding memory tracking for _PyObject_Malloc, _PyObject_Calloc, _PyObject_Free. The accounting is a bit tricky. If we switched to mimalloc, we could track total memory use more easily, I think. Or, we could use OS APIs like what's used by Modules/resource.c, e.g. to get data-resident-size. Should we be looking at other resources, like file descriptors? Since we strongly encourage using context handlers, maybe not.

Even if we can track memory used by PyMem/PyObject_Malloc, it is still possible extensions allocate memory using their own malloc and we wouldn't see that. So, maybe using an OS API for find the process memory size is better. On platforms with OS support, we could base the GC collection on both #-of-objects and increase in process memory size. On other platforms, use just #-of-objects.

pablogsal · 2023-01-19T22:47:42Z

Regarding the dynamic tuning of the threshold, what are the reasons to not just always use the high threshold limit? You could argue that you are using more memory that could be freed if you collect more often. However, if you are creating a lot of cyclic garbage, the threshold will be reached sooner and maybe a high threshold is better.

The idea was that we don't want to impact latency too much if there are a lot of cycle objects. If we just grow and there is A LOT of cycles, then the latency will grow unbounded. In those cases we want to back off a bit and do more small collections.

Maybe there are better metrics to make "dynamic" than the one I am using there, though.

And then changing GC triggering to check both #-of-objects threshold and also the incremental size of _PyMem_GetArenaBytes(). Counting arenas is faster than looking a pools and probably good enough. It doesn't really work though because this doesn't account for allocations larger than the obmalloc max size. I tried adding memory tracking for _PyObject_Malloc, _PyObject_Calloc, _PyObject_Free. The accounting is a bit tricky. If we switched to mimalloc, we could track total memory use more easily, I think. Or, we could use OS APIs like what's used by Modules/resource.c, e.g. to get data-resident-size. Should we be looking at other resources, like file descriptors? Since we strongly encourage using context handlers, maybe not.Even if we can track memory used by PyMem/PyObject_Malloc, it is still possible extensions allocate memory using their own malloc and we wouldn't see that. So, maybe using an OS API for find the process memory size is better. On platforms with OS support, we could base the GC collection on both #-of-objects and increase in process memory size. On other platforms, use just #-of-objects.

This is interesting but still has some challenges. One of the problems is that unless we only look at pools that are assigned to GC objects we can make huge mistakes when estimating size. The OS APIs report resident size but that can be too deceiving for many reasons and that also has the problem that's very difficult to transform into some number that represents something that the GC can do about.

For example, I can imagine a situation where someone mmaped a huge chunk of memory (or some C extension) and that causes the GC to run in crazy mode because it thinks that is able to clean some of that memory when maybe is not possible. The same problem happens with arenas or the PyMem/PyObject_Malloc APIs: we should only account for memory of gc objects because someone could have a gigantic string and then the GC would run in crazy mode.

Seems that mimalloc could help us if we force everyone to use the python memory apis and we segment objects based in GC and non-GC, but that has a lot of other problems.

Another approach is running getsizeof() on every object that is added to the GC, but that can slow down everything by a lot and also can be inaccurate in some cases.

nascheme · 2023-01-20T00:03:01Z

The idea was that we don't want to impact latency too much if there are a lot of cycle objects. If we just grow and there is A LOT of cycles, then the latency will grow unbounded. In those cases we want to back off a bit and do more small collections.

I don't mean to increase the threshold without bound so the latency shouldn't be unbounded. Just set the threshold to the high value and keep it there, was my idea. If a higher threshold works when there are not a lot of trash cycles, why doesn't it also work when are many cycles?

For example, I can imagine a situation where someone mmaped a huge chunk of memory (or some C extension) and that causes the GC to run in crazy mode because it thinks that is able to clean some of that memory when maybe is not possible. The same problem happens with arenas or the PyMem/PyObject_Malloc APIs: we should only account for memory of gc objects because someone could have a gigantic string and then the GC would run in crazy mode.

I was thinking to use getrusage() and ru_maxrss. I was checking the incremental increase in RSS. So once the GC runs, it wouldn't run again until RSS increased by some amount, I was using 5 MB. Also, my condition for running the GC was the object count and the increase in RSS.

Something like:

--- a/Modules/gcmodule.c
+++ b/Modules/gcmodule.c
@@ -180,6 +180,10 @@ _PyInitGCStats() {
 }
 #endif
 
+/* likely should be some multiple larger than obmalloc ARENA_SIZE */
+static size_t pymem_size_threshold = 5000000;
+static size_t pymem_size_last = 0;
+
 void
 _PyGC_InitState(GCState *gcstate)
 {
@@ -2435,6 +2450,9 @@ PyObject_IS_GC(PyObject *obj)
     return _PyObject_IS_GC(obj);
 }
 
+// implemented in obmalloc.c, estimate RSS size of process
+extern size_t _PyMem_GetTotalUsedBytes(void);
+
 void
 _PyObject_GC_Link(PyObject *op)
 {
@@ -2452,9 +2470,19 @@ _PyObject_GC_Link(PyObject *op)
         !gcstate->collecting &&
         !_PyErr_Occurred(tstate))
     {
-        gcstate->collecting = 1;
-        gc_collect_generations(tstate);
-        gcstate->collecting = 0;
+        size_t pymem_size = _PyMem_GetTotalUsedBytes();
+        size_t pymem_inc = pymem_size - pymem_size_last;
+        if (pymem_inc > pymem_size_threshold) {
+            gcstate->collecting = 1;
+            gc_collect_generations(tstate);
+            gcstate->collecting = 0;
+            pymem_size_last = _PyMem_GetTotalUsedBytes();
+        } else {
+            gcstate->generations[0].count = 0;
+        }
     }
 }

Another approach is running getsizeof() on every object that is added to the GC, but that can slow down everything by a lot and also can be inaccurate in some cases.

That sounds way too expensive to me. You could walk obmalloc pools and add up what's used that way (you know size class for each pool). Then have a way of tracking sizes of large objects. Still sounds too slow to me.

pablogsal · 2023-01-20T00:19:44Z

I was thinking to use getrusage() and ru_maxrss. I was checking the incremental increase in RSS. So once the GC runs, it wouldn't run again until RSS increased by some amount, I was using 5 MB. Also, my condition for running the GC was the object count and the increase in RSS.

But ru_maxrss will spike for a lot of unrelated reasons, no? Opening files, dlopening shared objects, shared libraries allocating big memory pools...etc

gpshead · 2023-01-20T01:43:29Z

... other non-Python threads doing useful things... etc.

gpshead · 2023-01-20T01:50:00Z

Fundamentally there cannot be a fixed threshold that is perfect. Every application has a different data structure shape. And static value is a compromise no matter when it is chosen or what it is set to. That doesn't mean changing it is a bad idea.

Similarly, any dynamic value is probably prone to an antagonistic load that leads to it doing worse in some situations. Again, compromise. This is why we have gc module APIs. So that applications that actually ever get to the point of identifying a need to improve their Python GC behavior rather than lower hanging fruit in their stack have some metrics to read and dials to twist.

I suggest focusing first on exposing more detailed metrics from the GC than our existing get_stats() and get_count() APIs offer. As new public gc module APIs. So that some history of time spent doing what and when can be presented.

ex: Is there a way to have --enable-pystats like data and more always enabled without a notable performance hit? Or at least something that can be enabled by an API call or environment variable rather than a need to recompile with a configure #define?

nascheme · 2023-01-21T00:16:09Z

I made a quick-and-dirty branch to test my idea of also checking for an increase in process memory use. I decided that the simplest thing to do with malloc()/calloc()/realloc()/free() is to just count the bytes allocated and not deduct on free(). I'm using the allocated bytes as an additional condition for triggering GC and so we never do any more collections, only fewer. Estimating what's used by obmalloc is easier, just count up used arenas.

Branch:
https://github.com/nascheme/cpython/tree/gc_estimate_process_size

A re-run of my web app benchmark, similar to the table in my above comment. Notice that for the 500 object threshold (first row), the average size of gen0 is 5,206. That means that gen0 collection is being skipped about 9 of 10 times because the memory estimate returned by _PyMem_EstimateTotalBytes didn't increase enough. I'm using 5 MB as the memory increase threshold.

An idea would be to add gc.set_memory_threshold() and set it to something (e.g. few megabytes) by default. Setting it to 0 would revert to the current Python GC behaviour, only considering an increase in object count in gen0. The logic is that if the value returned by _PyMem_EstimateTotalBytes is not going up too much, we can defer collection of gen0 and hope that reference counting frees that memory. The risk is we are not counting resource usage not captured by _PyMem_EstimateTotalBytes. However, this approach seems less risky to me compared with just increasing the object count threshold.

threshold	gen	num collections	mean collected cycles	mean time [ns]	mean total objects	total time [s]	time per total [ns/obj]	time per collected [ns/obj]	ratio collected
500	0	1176	1,398	802,900	5,206	0.94	154	574	27%
500	1	106	21,115	9,826,632	30,416	1.04	323	465	69%
500	2	13	37,056	82,745,154	229,967	1.08	360	2233	16%
5000	0	692	2,790	1,403,973	7,847	0.97	179	503	36%
5000	1	62	32,909	16,666,484	45,656	1.03	365	506	72%
5000	2	9	45,681	89,475,778	234,440	0.81	382	1959	19%
50000	0	96	35,462	17,017,979	49,757	1.63	342	480	71%
50000	1	8	102,482	81,490,750	151,906	0.65	536	795	67%
50000	2	4	57,971	95,446,250	217,941	0.38	438	1646	27%

nascheme · 2023-01-21T00:19:05Z

Is there a way to have --enable-pystats like data and more always enabled without a notable performance hit? Or at least something that can be enabled by an API call or environment variable rather than a need to recompile with a configure #define?

I think so. Some stuff that Pablo's stats branch collects might be too expensive. However, I think we could collect/report some more useful things without a lot of extra overhead. Then we can have a gc.set_debug() or -X command line flag to turn it on. Right now we are flying a bit blind in terms of how real apps behave.

pablogsal · 2023-01-21T01:31:52Z

I can modify my PR to do that in addition to the pystats collection.

What things you think would be useful to have in the regular builds that will not be too prohibitive?

pablogsal · 2023-01-21T16:01:03Z

I have modified #100958 to be activated via a -X option.

pablogsal changed the title ~~Single generation GC~~ Improving GC collections: dynamic thresholds, single generation gc and time barriers Dec 22, 2022

bedevere-bot mentioned this issue Jan 11, 2023

gh-100403: Collect GC statistics via -X option #100958

Open

pablogsal added a commit to pablogsal/cpython that referenced this issue Jan 11, 2023

pythongh-100403: Collect GC statistics when --enable-pystats is provided

bfbf366

pablogsal added a commit to pablogsal/cpython that referenced this issue Jan 11, 2023

fixup! pythongh-100403: Collect GC statistics when --enable-pystats i…

adc9eb7

…s provided

pablogsal added a commit to pablogsal/cpython that referenced this issue Jan 12, 2023

fixup! fixup! pythongh-100403: Collect GC statistics when --enable-py…

e298f62

…stats is provided

pablogsal added a commit to pablogsal/cpython that referenced this issue Jan 12, 2023

fixup! fixup! fixup! pythongh-100403: Collect GC statistics when --en…

346a444

…able-pystats is provided

pablogsal added a commit to pablogsal/cpython that referenced this issue Jan 12, 2023

fixup! fixup! fixup! fixup! pythongh-100403: Collect GC statistics wh…

1d251c2

…en --enable-pystats is provided Signed-off-by: Pablo Galindo <pablogsal@gmail.com>

nascheme added a commit to nascheme/cpython that referenced this issue Jan 12, 2023

generate GC stats

20d0368

Port of Pablo's pythonGH-100403 to Python 3.11.x.

methane mentioned this issue Mar 17, 2023

Why zip is slower than list comprehension when list is very long? #102724

Closed

markshannon mentioned this issue Aug 23, 2023

Incremental cycle GC #108362

Open

iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving GC collections: dynamic thresholds, single generation gc and time barriers #100403

Improving GC collections: dynamic thresholds, single generation gc and time barriers #100403

pablogsal commented Dec 21, 2022 •

edited by bedevere-bot

Loading

pablogsal commented Dec 21, 2022

pablogsal commented Dec 21, 2022

methane commented Dec 21, 2022

pablogsal commented Dec 21, 2022 •

edited

Loading

pablogsal commented Dec 21, 2022 •

edited

Loading

mdboom commented Dec 21, 2022

pablogsal commented Dec 22, 2022 •

edited

Loading

pablogsal commented Dec 22, 2022 •

edited

Loading

nascheme commented Dec 22, 2022

pablogsal commented Dec 23, 2022 •

edited

Loading

gpshead commented Dec 24, 2022

nascheme commented Jan 3, 2023

pablogsal commented Jan 3, 2023 •

edited

Loading

nascheme commented Jan 3, 2023

pablogsal commented Jan 3, 2023 •

edited

Loading

carljm commented Jan 4, 2023

nascheme commented Jan 12, 2023

nascheme commented Jan 13, 2023

pablogsal commented Jan 18, 2023 •

edited

Loading

nascheme commented Jan 19, 2023

pablogsal commented Jan 19, 2023 •

edited

Loading

nascheme commented Jan 20, 2023

pablogsal commented Jan 20, 2023

gpshead commented Jan 20, 2023

gpshead commented Jan 20, 2023

nascheme commented Jan 21, 2023

nascheme commented Jan 21, 2023

pablogsal commented Jan 21, 2023

pablogsal commented Jan 21, 2023

Improving GC collections: dynamic thresholds, single generation gc and time barriers #100403

Improving GC collections: dynamic thresholds, single generation gc and time barriers #100403

Comments

pablogsal commented Dec 21, 2022 • edited by bedevere-bot Loading

Statistics for generation 0

Statistics for generation 1

Statistics for generation 2

Linked PRs

pablogsal commented Dec 21, 2022

pablogsal commented Dec 21, 2022

methane commented Dec 21, 2022

pablogsal commented Dec 21, 2022 • edited Loading

pablogsal commented Dec 21, 2022 • edited Loading

All benchmarks:

mdboom commented Dec 21, 2022

pablogsal commented Dec 22, 2022 • edited Loading

pablogsal commented Dec 22, 2022 • edited Loading

All benchmarks:

All benchmarks:

nascheme commented Dec 22, 2022

pablogsal commented Dec 23, 2022 • edited Loading

gpshead commented Dec 24, 2022

nascheme commented Jan 3, 2023

pablogsal commented Jan 3, 2023 • edited Loading

nascheme commented Jan 3, 2023

pablogsal commented Jan 3, 2023 • edited Loading

carljm commented Jan 4, 2023

nascheme commented Jan 12, 2023

nascheme commented Jan 13, 2023

pablogsal commented Jan 18, 2023 • edited Loading

Base

1st gen @ 7000

nascheme commented Jan 19, 2023

pablogsal commented Jan 19, 2023 • edited Loading

nascheme commented Jan 20, 2023

pablogsal commented Jan 20, 2023

gpshead commented Jan 20, 2023

gpshead commented Jan 20, 2023

nascheme commented Jan 21, 2023

nascheme commented Jan 21, 2023

pablogsal commented Jan 21, 2023

pablogsal commented Jan 21, 2023

pablogsal commented Dec 21, 2022 •

edited by bedevere-bot

Loading

pablogsal commented Dec 21, 2022 •

edited

Loading

pablogsal commented Dec 21, 2022 •

edited

Loading

pablogsal commented Dec 22, 2022 •

edited

Loading

pablogsal commented Dec 22, 2022 •

edited

Loading

pablogsal commented Dec 23, 2022 •

edited

Loading

pablogsal commented Jan 3, 2023 •

edited

Loading

pablogsal commented Jan 3, 2023 •

edited

Loading

pablogsal commented Jan 18, 2023 •

edited

Loading

pablogsal commented Jan 19, 2023 •

edited

Loading