Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/8.0-staging] Guard against -1 Returned from sysconf for the Cache Sizes Causing Large Gen0 Sizes and Budgets for Certain Linux Distributions. #100575

Merged
merged 14 commits into from
Apr 15, 2024

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Apr 3, 2024

Backport of #100502 to release/8.0-staging

/cc @mrsharm

Customer Impact

  • Customer reported
  • Found internally

All runtimes running server GC running on Linux distributions that return -1 for the LEVEL4_CACHE_SIZE from sysconf are affected by the underlying issue that this PR addresses. The implications of this issue are that the gen0 budget is computed as a function of the total memory available and not the L3 cache size as before for certain Linux distributions such as Debian 12, and Ubuntu 22.04. Relying on the physical memory which is much larger than the L3 cache size results in extremely large gen0 budgets and thereby causing larger heap sizes and consequently, more memory utilization.

The internal customer impact reduced their overall memory utilization from 1 GB to 300 MB and were satisfied with the result.

Regression

  • Yes
  • No

This regression is as a result of changes to the result of sysconf for certain Linux distributions. The result that we are now guarding against is -1. The code that introduced this behavior by not vetting the output of sysconf (a long) and casting it to a size_t was: #71029; this was in the .NET 7 timeframe but probably didn't emanate to our user base since the values of sysconf for the L4 Cache Size for newer distributions started showing up as -1 if they didn't exist and where in previous cases such as for Debian 11, they had been 0. This was confirmed from running getconf -a | grep CACHE on multiple distributions of Linux and the ones that exhibited this behavior e.g., Debian 12 indicated that there had been a change in the behavior by outputting -1.

Testing

  • The fix was tested on an internal customer, and we confirmed that the Gen0 budgets were equivalent to that of the baseline. In the regressed case, they were a factor of > x10 the original budget and therefore, the memory utilization increased by that factor. We observed from the traces a decrease in the heap size and the gen0 budget of a factor of 10.

Risk

The risk of the fix is high as the code path is called for all SVR instances for Linux. However, this is a fix for an already existing issue and has been tested and verified.

Full Details can be found: #100502

Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@jkotas
Copy link
Member

jkotas commented Apr 3, 2024

The result that we are now guarding against is -1

FWIW, @Maoni0 noticed that the error handling is not right during codereview two years ago: #71029 (comment) . Unfortunately, this feedback was not followed up upon.

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved. we will take for consideration in 8.0.x

@leecow leecow added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Apr 11, 2024
@leecow leecow added this to the 8.0.5 milestone Apr 11, 2024
@mrsharm
Copy link
Member

mrsharm commented Apr 15, 2024

The build failures are known issues: #100035 => This PR can be merged as the errors aren't related.

@ericstj ericstj merged commit 6f29267 into release/8.0-staging Apr 15, 2024
108 of 116 checks passed
@ericstj
Copy link
Member

ericstj commented Apr 15, 2024

/ba-g all failures were classified as Known but the check did not flip to green.

@jkotas jkotas deleted the backport/pr-100502-to-release/8.0-staging branch April 17, 2024 01:25
@github-actions github-actions bot locked and limited conversation to collaborators May 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-GC-coreclr Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants