-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert small atomic fallbacks to managed #99011
Conversation
Tagging subscribers to this area: @mangod9 Issue DetailsMakes the non-intrinsic implementations of Exchange/CompareExchange for small types be implemented as loops using 32b variants instead of calling into native code. I'm not sure what's the performance difference here as the only platforms that should use those will be RISC-V and Mono (ARM32 is handled with #97792 and LoongArch64 seems trivial to do but I have no way to test it). I've based the idea for the implementation on the fact that Linux Kernel and Libatomic use such 32b operations for their fallbacks (I took no code from those however as they're both GPL). I'm not 100% sure whether doing this in managed makes sense here since it's a lot of code, but it also removed all the C++ paths from 3 runtimes.
|
/// <returns>The original value of <paramref name="location1"/>.</returns> | ||
/// <exception cref="NullReferenceException">The address of location1 is a null pointer.</exception> | ||
[Intrinsic] | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AggressiveInlining here just a copy&paste? It does not sound like a good idea to aggressively inline all this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this should be similar in size to what native compilers emit for RISC-V and they do inline that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One benefit to inlining this is that with #99130 it lets the JIT fold most of the bit operations here when passed a ref to a static field.
I think it is fine to do this for CoreCLR/NativeAOT. As you have said, it is just a fallback that should be only used during platform bring up. We effectively require JIT to expand these inline for best perf. I am not sure about Mono. @vargaz @AlekseyTs Thoughts? |
It'd be nice if somebody from the Samsung RISC-V team could benchmark this on a RISC-V device to see the performance difference. |
cc @gbalykov @HJLeee @wscho77 @clamp03 @JongHeonChoi @t-mustafin @viewizard |
Seems like Mono crashes with this implementation, could somebody say if the implementation assumptions are invalid there or if there's some bug instead? |
You may want to temporarily enable the fallback implementation for CoreCLR and see whether it passes all tests. |
It's currently tested with ARM32 and it asserts which I've fixed in #99019. |
I am out of office. @bartlomiejko Could your team can check the performance difference? |
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
All failures here seem unrelated, this seems to be only waiting for a review of the assumptions on Mono I think. |
Seems like there's still some assert here:
|
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
Fixed now with #100060.
I guess we still need somebody from the Mono team to check this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Unsafe.AsPointer
uses
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Hamish Arblaster <hamarb123@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mono LGTM
@MichalPetryka I would leave the small ops in atomic.h
- they are generally useful. Also it might be worthwhile to open a follow-up issue for mono to add intrinsics for these operations in the JIT and interpreter.
I could do that but I think they're simple enough that they could be readded when needed.
|
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Interlocked.cs
Outdated
Show resolved
Hide resolved
@jkotas does the musl arm assert here seem related to the changes here for you? runtime/src/coreclr/vm/methodtable.cpp Line 7377 in ac07ea6
|
This is a known issue logged here: #86273 |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Makes the non-intrinsic implementations of Exchange/CompareExchange for small types be implemented as loops using 32b variants instead of calling into native code.
I'm not sure what's the performance difference here as the only platforms that should use those will be RISC-V and Mono (ARM32 is handled with #97792 and LoongArch64 seems trivial to do but I have no way to test it).
I've based the idea for the implementation on the fact that Linux Kernel and Libatomic use such 32b operations for their fallbacks (I took no code from those however as they're both GPL).
The approach also relies on an implementation detail of the GCs with that it'll keep refs backtracked to 4B aligned and in the same object to avoid pinning.
I'm not 100% sure whether doing this in managed makes sense here since it's a lot of code, but it also removed all the C++ paths from 3 runtimes.
cc @jkotas