Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/9.0] Fix perf problems found in investigation of issue #107728 #107817

Merged
merged 1 commit into from
Sep 14, 2024

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Sep 13, 2024

Backport of #107806 to release/9.0

/cc @davidwrighton

Customer Impact

  • Customer reported
  • Found internally

This impacts customers who have multithreaded applications which use a number of apis which need the statics of a type to be fully initialized. The effect is that there is slowdown, especially in the case of multiple threads attempting to run code at once which requires the class initializer on types to have been run. The customer who discovered this was using unoptimized code which accessed thread statics, which was probably the biggest performance problem that was exposed by this bug. It was filed as #107728

Regression

  • Yes
  • No

Regression was introduced with PR #99183

Testing

The fix was verified by the production of a bespoke test case which stressed the multithreaded performance of unoptimized jitted code accessing thread statics. This is not a scenario which is part of our performance benchmarks due to a general lack of support in benchmark.net for test whether or not tests cause thread contention. Single threaded performance was effected by the bug, but the impact was minimal and below the noise floor of our general purpose microbenchmark tester.

Risk

Low risk, this change mostly works by checking flags early instead of after taking locks. The flag checking in this case is also entirely using volatile loads with barriers, so the memory ordering risks are minimal.

IMPORTANT: If this backport is for a servicing release, please verify that:

  • The PR target branch is release/X.0-staging, not release/X.0.

  • If the change touches code that ships in a NuGet package, you have added the necessary package authoring and gotten it explicitly reviewed.

…ClassInitThrowing` didn't check to see if the class had been initialized before taking a lock - `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated - And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all.

Perf with a small benchmark which does complex multithreaded work...

| Runtime | Time |
| ---- | ---- |
| .NET 8 | 00.9414682 s |
| .NET 9 before this fix |  22.8079382 |
| .NET 9 with this fix | 00.2004539 |

Fixes #107728
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved. please get a code review. when ready this can be merged

@jeffschwMSFT jeffschwMSFT added the Servicing-approved Approved for servicing release label Sep 13, 2024
@jkotas jkotas merged commit 46cfb74 into release/9.0 Sep 14, 2024
91 of 94 checks passed
@jkotas jkotas deleted the backport/pr-107806-to-release/9.0 branch September 14, 2024 01:59
@alexyakunin
Copy link

Thanks, guys - that was quick!

@alexyakunin
Copy link

alexyakunin commented Oct 10, 2024

Is this fix available in .NET 9 RC2? If yes, I guess there is another regression (~ 15%) that I see in one of my tests.

@mangod9
Copy link
Member

mangod9 commented Oct 10, 2024

yes this should be included in RC2. Is it now a different test which is showing a regression?

@alexyakunin
Copy link

Yes. I don't have an isolated sample, so here is how to reproduce this:

  1. Clone https://github.com/ActualLab/Fusion
  2. Run-PerformanceTest.cmd net9.0 sqlite -> ~107-109M ops/s on my Ryzen 7950X3D
  3. Run-PerformanceTest.cmd net8.0 sqlite -> ~125-130M ops/s

@alexyakunin
Copy link

For the sake of clarity, the test isn't about DB performance - it's mainly about Fusion caching logic.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants