Add a time-based decay to the linear allocation model. #55174

PeterSolMS · 2021-07-05T13:23:02Z

Real world scenario that motivated this allocated a huge amount of data once a day, leading to high gen 0 and gen 1 budgets (several GB).

After this, there was relatively little activity, cause gen 1 budget to take several hours to normalize.

The new logic discounts the previous desired allocation over a period of 5 minutes.

Real world scenario that motivated this allocated a huge amount of data once a day, leading to high gen 0 and gen 1 budgets (several GB). After this, there was relatively little activity, cause gen 1 budget to take several hours to normalize. The new logic discounts the previous desired allocation over a period of 5 minutes.

ghost · 2021-07-05T13:23:13Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Real world scenario that motivated this allocated a huge amount of data once a day, leading to high gen 0 and gen 1 budgets (several GB).

After this, there was relatively little activity, cause gen 1 budget to take several hours to normalize.

The new logic discounts the previous desired allocation over a period of 5 minutes.

Author:	PeterSolMS
Assignees:	-
Labels:	`area-GC-coreclr`
Milestone:	-

PeterSolMS · 2021-07-05T13:24:11Z

This addresses issue #52592.

mangod9 · 2021-07-06T16:33:54Z

src/coreclr/gc/gc.cpp

@@ -36767,6 +36765,9 @@ size_t gc_heap::desired_new_allocation (dynamic_data* dd,
        float     f = 0;
        size_t    max_size = dd_max_size (dd);
        size_t    new_allocation = 0;
+        float     time_since_previous_collection_secs = (dd_time_clock (dd) - dd_previous_time_clock (dd))*1e-6f;
+
+


Nit: remove one empty line?

Looking at the code, I think I should remove both of the empty lines - no reason to separate the declarations here, I think.

mangod9 · 2021-07-06T16:35:51Z

src/coreclr/gc/gc.cpp

 {
    if ((allocation_fraction < 0.95) && (allocation_fraction > 0.0))
    {
+        const float decay_time = 5*60.0f; // previous desired allocation expires over 5 minutes
+        float decay_factor = (decay_time <= time_since_previous_collection_secs) ?


will we build a test scenario in GCPerfSim to validate that the decay has the desired effect?

I will try, but it may be a bit involved.

Maoni0 · 2021-07-07T04:40:12Z

this seems to be along the same lines of what we are doing for WKS GC in generation_to_condemn where we check against dd_time_clock_interval so I'm wondering if we should take advantage of that, ie, use dd_time_clock_interval (but we should specify different values for Server GC - I would suggest much bigger values there) as decay_time so it's generation aware.

also the way the change is now is modifying the existing behavior, it seems like it would be better to not modify the existing behavior when a GC happens soon enough, and start decaying when the elapsed time between this and last GC (of that generation) is longer than decay_time (and the longer it has been the less effect it should have). what do you think?

PeterSolMS · 2021-07-07T13:28:37Z

I thought about making the decay_time generation aware along the lines you suggested, but didn't see a good reason to do it. Gen 1 is the only generation where we collect routinely without having exhausted the budget.

Regarding your remark about modifying existing behavior, I did this on purpose so a case where gen 1 GCs happen often enough, but very early (i.e. only exhausting a tiny fraction of the gen 1 budget), wouldn't lead to us taking a long time to ramp down the gen 1 budget. As it happens, a decay time of 5 minutes is short enough to fix the test case (which does about 6 gen 1 GCs in the first hour after loading the big chunk of data), but it seems unwise to rely on this - with more load, of course the interval between gen 1 GCs may drop below the decay_time.

Maoni0 · 2021-07-08T02:10:00Z

I thought about making the decay_time generation aware along the lines you suggested, but didn't see a good reason to do it. Gen 1 is the only generation where we collect routinely without having exhausted the budget.

in high memory load situation we could also collect gen2 often without having exhausted the budget but you could argue that the next gen2 will also happen without having exhausted budget due to the same reason and also gen2's surv rate is usually very high anyway.

Regarding your remark about modifying existing behavior, I did this on purpose so a case where gen 1 GCs happen often enough, but very early (i.e. only exhausting a tiny fraction of the gen 1 budget), wouldn't lead to us taking a long time to ramp down the gen 1 budget. As it happens, a decay time of 5 minutes is short enough to fix the test case (which does about 6 gen 1 GCs in the first hour after loading the big chunk of data), but it seems unwise to rely on this - with more load, of course the interval between gen 1 GCs may drop below the decay_time.

it's always a worry that when you change the existing perf behavior you will regress someone, thus the concern.

also the idea of the fraction is we take some percentage from the current budget and some from the previous budget, but now with decay_factor we are simply discounting some of budget. shouldn't this be

float prev_allocation_factor = (1.0 - allocation_fraction) * decay_factor;

new_allocation = (size_t)((1.0 - prev_allocation_factor) * new_allocation + 
                          prev_allocation_factor * previous_desired_allocation);

if we want to make the previous budget count for less based on the time factor?

PeterSolMS · 2021-07-14T07:11:58Z

also the idea of the fraction is we take some percentage from the current budget and some from the previous budget, but now with decay_factor we are simply discounting some of budget. shouldn't this be

float prev_allocation_factor = (1.0 - allocation_fraction) * decay_factor;

new_allocation = (size_t)((1.0 - prev_allocation_factor) * new_allocation +
prev_allocation_factor * previous_desired_allocation);

I think you are absolutely right. Will change the code accordingly.

…out in code review.

Maoni0

LGTM!

PeterSolMS requested review from cshung, Maoni0 and mangod9 July 5, 2021 13:23

dotnet-issue-labeler bot added the area-GC-coreclr label Jul 5, 2021

mangod9 reviewed Jul 6, 2021

View reviewed changes

Remove extraneous empty lines.

24a65c7

runfoapp bot mentioned this pull request Jul 8, 2021

Unable to connect to the remote server error blocking clean CI #55312

Closed

Corrected a problem in the computation of the new allocation pointed …

13f87e0

…out in code review.

Maoni0 approved these changes Jul 14, 2021

View reviewed changes

PeterSolMS merged commit 0ac3a5b into dotnet:main Jul 16, 2021

loop-evgeny mentioned this pull request Jul 19, 2021

Resident Set Size on Linux remains much higher than managed memory usage in .NET 5.0 #52592

Closed

ManickaP mentioned this pull request Jul 20, 2021

[QUIC] Remove AppContext switch from S.N.Quic #56027

Merged

ghost locked as resolved and limited conversation to collaborators Aug 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a time-based decay to the linear allocation model. #55174

Add a time-based decay to the linear allocation model. #55174

PeterSolMS commented Jul 5, 2021

ghost commented Jul 5, 2021

PeterSolMS commented Jul 5, 2021

mangod9 Jul 6, 2021

PeterSolMS Jul 7, 2021

mangod9 Jul 6, 2021

PeterSolMS Jul 7, 2021

Maoni0 commented Jul 7, 2021

PeterSolMS commented Jul 7, 2021

Maoni0 commented Jul 8, 2021

PeterSolMS commented Jul 14, 2021

Maoni0 left a comment

Add a time-based decay to the linear allocation model. #55174

Add a time-based decay to the linear allocation model. #55174

Conversation

PeterSolMS commented Jul 5, 2021

ghost commented Jul 5, 2021

PeterSolMS commented Jul 5, 2021

mangod9 Jul 6, 2021

Choose a reason for hiding this comment

PeterSolMS Jul 7, 2021

Choose a reason for hiding this comment

mangod9 Jul 6, 2021

Choose a reason for hiding this comment

PeterSolMS Jul 7, 2021

Choose a reason for hiding this comment

Maoni0 commented Jul 7, 2021

PeterSolMS commented Jul 7, 2021

Maoni0 commented Jul 8, 2021

PeterSolMS commented Jul 14, 2021

Maoni0 left a comment

Choose a reason for hiding this comment