-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT] Replace GVMLookupForSlot internal cache with generic cache similar to CastCache #89331
Conversation
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsFixes: #85927
|
I am using the following crude microbenchmark to estimate perf differences: using System.Diagnostics;
namespace ConsoleApp29
{
class c0
{
public virtual void M1<T>()
{
if (Program.i == 0) throw null;
}
}
class c1 : c0 { public override void M1<T>() {if (Program.i == 0) throw null;} }
class c2 : c0 { public override void M1<T>() {if (Program.i == 0) throw null;} }
class c3 : c0 { public override void M1<T>() {if (Program.i == 0) throw null;} }
class c4 : c0 { public override void M1<T>() {if (Program.i == 0) throw null;} }
class c5 : c0 { public override void M1<T>() {if (Program.i == 0) throw null;} }
class c6 : c0 { public override void M1<T>() {if (Program.i == 0) throw null;} }
class c7 : c0 { public override void M1<T>() {if (Program.i == 0) throw null;} }
class c8 : c0 { public override void M1<T>() {if (Program.i == 0) throw null;} }
class c9 : c0 { public override void M1<T>() {if (Program.i == 0) throw null;} }
internal class Program
{
public static int i = 1;
c0 v0 = new c0();
c0 v1 = new c1();
c0 v2 = new c2();
c0 v3 = new c3();
c0 v4 = new c4();
c0 v5 = new c5();
c0 v6 = new c6();
c0 v7 = new c7();
c0 v8 = new c8();
c0 v9 = new c9();
void Test2(c0 arg)
{
arg.M1<object>();
arg.M1<int>();
arg.M1<long>();
arg.M1<Exception>();
arg.M1<int[]>();
arg.M1<string>();
arg.M1<decimal>();
arg.M1<Guid>();
arg.M1<int?>();
arg.M1<object[]>();
}
void Test1()
{
Test2(v0);
Test2(v1);
Test2(v2);
Test2(v3);
Test2(v4);
Test2(v5);
Test2(v6);
Test2(v7);
Test2(v8);
Test2(v9);
}
void Test()
{
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 1000000; i++)
{
Test1();
}
sw.Stop();
System.Console.WriteLine(sw.ElapsedMilliseconds);
}
static void Main(string[] args)
{
Program p = new Program();
for(; ; )
{
p.Test();
}
}
}
} Results vary a bit between runs, but the new implementation is generally faster. Here are results of two best runs: ==== Original (recent main)
==== New (about 30% faster)
|
The slow helper is still not tail-called, BTW. |
|
||
// Initialize the cache eagerly to avoid null checks. | ||
private static GenericCache<Key, Value> s_cache; | ||
internal static void Initialize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use EagerStaticClassConstruction
for this so that it has the same structure as cast cache?
EagerStaticClassConstruction
should be better for trimming. The compiler should be able to trim the initialization if the type is not otherwise used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know the differences. I just used what this cache used before assuming there was a reason to use the Initialize
pattern. Makes sense to switch to EagerStaticClassConstruction
.
@@ -775,6 +775,7 @@ | |||
<Compile Include="$(MSBuildThisFileDirectory)System\Runtime\CompilerServices\AsyncValueTaskMethodBuilderT.cs" /> | |||
<Compile Include="$(MSBuildThisFileDirectory)System\Runtime\CompilerServices\AsyncVoidMethodBuilder.cs" /> | |||
<Compile Include="$(MSBuildThisFileDirectory)System\Runtime\CompilerServices\CastCache.cs" Condition="'$(FeatureCoreCLR)' == 'true' or '$(FeatureNativeAot)' == 'true'" /> | |||
<Compile Include="$(MSBuildThisFileDirectory)System\Runtime\CompilerServices\GenericCache.cs" Condition="'$(FeatureCoreCLR)' == 'true' or '$(FeatureNativeAot)' == 'true'" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: You can use Condition="'$(FeatureMono)' != 'true'"
to make this condition a bit simpler.
|
||
#endif // CORECLR | ||
// wraps existing table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this wrapping work? It won't intialize the other fields like _lastFlushSize
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at jitted code and if I am not mistaken it looks like the struct gets promoted and unused fields disappear. When we operate with a cache that wraps an existing table, we only do TryGet
, and TryGet
only needs the _table
field.
I looked at the code in the debugger, which is not so trivial because of "no debugging this" attributes. There is a strange call there, most likely related to accessing a static field, but otherwise the codegen looked like the struct gets completely erased and we use _table
as a scalar variable.
It would be nice if someone could confirm that this is the case. There might be a "proper" way to just dump the jitted methods and see what happens there and how the code changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am more worried about that this constructor creates CastCache
instance that is partially initialized. It makes it hard to see that none of the places that use this instance won't use the uninitialized fields (_maxCacheSize
, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code paths that add/resize/flush are not supposed to be used in a cache that wraps existing table since logically they do not own the table. An eventual crash due to uninitialized fields would be expected.
I guess we could have more explicit asserts - if you are off the TryGet path the min/max sizes should be nonzero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be better to structure the code such that these partially initialized CastCache
instances do not need to be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be better to structure the code such that these partially initialized CastCache instances do not need to be created.
That can be done if we pass the table directly to TryGet
. I think that will work better. We would not need to add asserts and worry about the struct being optimized away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve removed the wrapping scenario. Just pass the existing table, if available, to ‘TryGet’.
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/TypeLoaderExports.cs
Show resolved
Hide resolved
Yes, I think so. It's hard to work around this particular limitation -- as soon as we end up with |
I see. That seems hard to avoid when working with structs. |
_auxResult = auxResult; | ||
} | ||
} | ||
|
||
public static unsafe void ActivatorCreateInstanceAny(ref object ptrToData, IntPtr pEETypePtr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method looks like a dead code. Delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thanks! |
Fixes: #85927
CastCache
. It is completely lock-free and slightly faster than the currently used caching scheme.CastCache
can now also be used for other purposes (no longer assumes a single static instance)