-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GC] Fix freelist crash - don't modify during check-only operation #104876
Conversation
Tagging subscribers to this area: @dotnet/gc |
The build analysis failure is #104883, which wasn't open yet when this testing ran but is closed now so rerunning just build analysis doesn't work. |
@@ -21507,10 +21507,13 @@ int gc_heap::generation_to_condemn (int n_initial, | |||
} | |||
|
|||
#ifdef USE_REGIONS | |||
if (!try_get_new_free_region()) | |||
if (!check_only_p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Using &&
to avoid changing many lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That annoyed me a bit too, but I followed the pattern at line 21544 instead. I assumed that it was to make it clear by separating the "mode" of the function and the check itself, which made sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
/ba-g see above comment about #104883 |
/backport to release/8.0-staging |
Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/9962984209 |
…104992) Backport of #104876 to release/8.0-staging /cc @markples ## Customer Impact - [x] Customer reported - [ ] Found internally GC can crash in region code if GC.RegisterForFullGCNotification is used. ## Regression - [x] Yes - [ ] No Regression is part of general move to regions, but specific code was introduced in #48691 ## Testing The behavior is very difficult to repro and likely hadn't occurred before because it relies on a race condition, a region allocation failing, and GC.RegisterForFullGCNotification be used. I built a custom GC that always triggers the region allocation failure path and was able to repro and verify the fix in GCPerfSim. ## Risk Low: only impacts GC.RegisterForFullGCNotification path; only impacts notifications; may cause a notification for a full GC to be skipped, but only in a case where no other reasons for full gc occur -and- the code would be subject to the race condition and crash
RegisterForFullGCNotification
setsgc_heap::fgn_maxgen_percent
, which guards calls tocheck_for_full_gc
to determine if a notification should be sent.gc_heap::generation_to_condemn
. One part ofcheck_for_full_gc
callsgeneration_to_condemn
with the parametercheck_only_p
set toTRUE
.generation_to_condemn
callstry_get_new_free_region
, which ensures that a free region is available. It checks the freelist, but if none are found it creates a new one and adds it to the freelist.check_for_full_gc
does not (and should not) contain the necessary synchronization to make this safe. This can lead to corruption of the basic region freelist, often ending up withnullptr
links between nodes.This caused crashed in production and the cause was determined by using a custom GC build with additional runtime checks. The fix is to simply honor the
check_only_p
parameter and not calltry_get_new_free_region
when it is set. The problem can be reproed by changingtry_get_new_free_region
to always create a new region. This fix handles that repro.This fix does not impact normal GC operation. It can cause
check_for_full_gc
to no longer detect that full collection is going to occur, but only if the failure to create or initialize a new region is the only reason why that full collection is going to occur. However, those cases are open to the race condition that can cause the freelist corruption is possible.