-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible regression with [ThreadStatic] in debug builds on .NET 9 RC1 #107728
Comments
Tagging subscribers to this area: @mangod9 |
Thanks for reporting the issue @alexyakunin. Can you please clarify whether you are comparing the perf with .NET 8 or a previous preview version of 9? That would help narrow down the issue. Adding @davidwrighton since there was some work on ThreadStatics recently. |
Yes, I'm comparing the perf. of .NET 9 RC1 vs .NET 8 (8.0.8). |
I've confirmed this is a regression in |
- `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock - `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated - And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all. Perf with a small benchmark which does complex multithreaded work... | Runtime | Time | | ---- | ---- | | .NET 8 | 00.9414682 s | | .NET 9 before this fix | 22.8079382 | | .NET 9 with this fix | 00.2004539 | Fixes dotnet#107728
- `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock - `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated - And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all. Perf with a small benchmark which does complex multithreaded work... | Runtime | Time | | ---- | ---- | | .NET 8 | 00.9414682 s | | .NET 9 before this fix | 22.8079382 | | .NET 9 with this fix | 00.2004539 | Fixes #107728
…ClassInitThrowing` didn't check to see if the class had been initialized before taking a lock - `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated - And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all. Perf with a small benchmark which does complex multithreaded work... | Runtime | Time | | ---- | ---- | | .NET 8 | 00.9414682 s | | .NET 9 before this fix | 22.8079382 | | .NET 9 with this fix | 00.2004539 | Fixes #107728
…ClassInitThrowing` didn't check to see if the class had been initialized before taking a lock - `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated - And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all. (#107817) Perf with a small benchmark which does complex multithreaded work... | Runtime | Time | | ---- | ---- | | .NET 8 | 00.9414682 s | | .NET 9 before this fix | 22.8079382 | | .NET 9 with this fix | 00.2004539 | Fixes #107728 Co-authored-by: David Wrighton <davidwr@microsoft.com>
Thanks, guys - that was quick! |
…et#107806) - `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock - `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated - And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all. Perf with a small benchmark which does complex multithreaded work... | Runtime | Time | | ---- | ---- | | .NET 8 | 00.9414682 s | | .NET 9 before this fix | 22.8079382 | | .NET 9 with this fix | 00.2004539 | Fixes dotnet#107728
…et#107806) - `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock - `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated - And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all. Perf with a small benchmark which does complex multithreaded work... | Runtime | Time | | ---- | ---- | | .NET 8 | 00.9414682 s | | .NET 9 before this fix | 22.8079382 | | .NET 9 with this fix | 00.2004539 | Fixes dotnet#107728
Description
One of our tests started to run ~20x slower on .NET 9 RC1, if it's compiled in Debug configuration.
That's where it spends most of its time:
There is another branch with similar ~38%, and it's also on the same
ThreadRandom.get_Instance
call.ThreadRandom
looks as follows:What's weird is that all of my attempts to reproduce this perf. regression on isolated console app failed, even though a console app running the code of this specific unit test also manifests the regression.
Configuration
Windows 11 machine.
Regression?
Yes, it looks like this is a perf. regression.
Data
No additional data.
Analysis
Not sure what it could be.
The text was updated successfully, but these errors were encountered: