Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Commit

Permalink
BGC tuning using the free list with PI loops (#26695)
Browse files Browse the repository at this point in the history
This is the 1st part of BGC tuning using FL (free list) with a PID loop. Historically the most significant factor for triggering a BGC is based on the allocation budget. This experimental feature triggers based on the FL in gen2/3 with a PID loop (by default we only use PI, no D) so we use the allocation calculated based on the FL to determine when to trigger the next BGC.

    The goal of the PI feedback loop

The end goal of the tuning is keep a const physical ML (memory load). The ML goal is specified as a percentage (meaning the percent of physical memory in use). And we aim to do as few BGCs as possible to achieve this memory load. 

This is most useful for the case where you are seeing that a large percent of free space in gen2 isn't getting used when a BGC is triggered; or you have plenty of memory but the GC heap size only takes up a small percentage and you want to delay BGCs to reduce the CPU consumed by BGC.

    Enable the FL tuning PI loop

Since this is an experimental feature, by default it's disabled. To enable set this env var:

set COMPlus_BGCFLTuningEnabled=1

When the FL tuning is enabled, by default we set the ML load to 75%. You can change it with this env var:

set COMPlus_BGCMemGoal=X

Note as with any COMPlus var, the value is interpreted as a hex number, not dec.

    Perf consideration of the current PI loop

Of course there’s always perturbation. From BGC’s POV there are 2 categories of perturbation –

1) from GC’s own perf characteristics changes, for example, suddenly we see a lot of pins from gen1 that get promoted into gen2.
2) non GC factors – this could be due to sudden increase of native memory usage in the process; or other processes on the same machine simply increase/decrease their memory usage.

And generally we don’t want to do something very high like 90% ‘cause it’s hard to react when the memory is tight – GC would need to compact and currently BGC does not compact. So for now we have to assume that “retracting the heap is difficult” which means we want our PI loop to be fairly conservative.

So we actually have another PI loop (the inner loop) to make sure the “sweep flr” is at a reasonable value. “Sweep flr” is the FLR (Free List Ratio) before BGC rebuilds the free list – so you can think of this as the smallest flr during a BGC. So the inner loop has a “sweep flr” goal of 20% by default which is pretty conservative. And when we can incrementally compact I would expect to reduce this by a fair amount. Another possibility is we do not set this as a fixed number and rather calculate a reasonable one dynamically based on what we observe how the free list is used.

Of course just because BGC does not compact it doesn’t mean that the total gen2 size cannot get smaller. It could get smaller just by objects at the end of gen2 naturally dying.

+ Initialization of the PI loops

We have to have some way to get this whole thing started so I usually do a few BGCs to reach 2/3 mem load goal then start using PI loops to decide when to trigger the next BGC.

+ Panic mode

I use a very simple rule to see if I should panic, ie, do an NGC2. If we observe the memory load is (goal + N%) where N is just a number we determine, we do an NGC2. This actually turned out to give decent results because we give it ample opportunity to allow some oscillation around goal (instead of panicking prematurely).

+ Implementation notes

When FL tuning is not enabled there should be no effect.

Record things when BGC starts, BGC sweep ends and BGC end.

I have other mechanisms like the D term, FF (feed forward) and smoothing. I have experimented with them in the past. Currently they are not enabled by default but can be enabled with COMPlus env vars.

Currently this doesn't work great with LOH because we have a fundamental limitation which is if we give free space to gen2 it's difficult to give it to LOH. One thing the user could do is to adjust the LOH threshold so most of the large object allocations happen in gen2.
  • Loading branch information
Maoni0 authored Oct 23, 2019
1 parent 71fe5c4 commit 30bb5b5
Show file tree
Hide file tree
Showing 4 changed files with 2,113 additions and 111 deletions.
Loading

0 comments on commit 30bb5b5

Please sign in to comment.