-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Optimize out write barriers for fields in ref-like structs #103503
Conversation
using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
public class MyBench
{
[Benchmark]
public void Bench()
{
MyStruct s = default;
Test(ref s, null, null);
}
[MethodImpl(MethodImplOptions.NoInlining)]
void Test(ref MyStruct s, object o1, object o2)
{
s.A = o1;
s.B = o2;
}
ref struct MyStruct
{
public object A;
public object B;
}
} |
Benchmark results on Intel
|
I think we can also do this for objects allocated on the stack due to escape analysis. |
We already do that where it's legal afair. Although, in some cases we have to switch to slower "checked" write barriers when "object on stack" feature is enabled since |
PTAL @jakobbotsch @AndyAyersMS cc @dotnet/jit-contrib (CI failures are unrelated). Diffs |
@EgorBot -arm64 -profiler using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
public class MyBench
{
[Benchmark]
public void Bench()
{
MyStruct s = default;
Test(ref s, null, null);
}
[MethodImpl(MethodImplOptions.NoInlining)]
void Test(ref MyStruct s, object o1, object o2)
{
s.A = o1;
s.B = o2;
}
ref struct MyStruct
{
public object A;
public object B;
}
} |
Benchmark results on Arm64
Flame graphs: Main vs PR 🔥 For clean |
src/coreclr/jit/lclvars.cpp
Outdated
@@ -2051,6 +2051,12 @@ unsigned Compiler::compMap2ILvarNum(unsigned varNum) const | |||
return (unsigned)ICorDebugInfo::RETBUF_ILNUM; | |||
} | |||
|
|||
if (info.compMethodInfo->args.hasImplicitThis() && varNum == info.compThisArg) | |||
{ | |||
// implicit this doesn't map to any IL arg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// implicit this doesn't map to any IL arg | |
// implicit this doesn't map to any method signature arg |
assuming this change is otherwise correct that I am not sure about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted those changes since compMap2ILvarNum
is also used in debug info generation and I don't want to break it accidentally. I put the workaround directly to my function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has the same change in our aliasing model as #97997 that we rejected did (relevant comment: #97997 (comment)).
Basically, before this change, it is ok in our aliasing model to write code like this:
struct A { public object X; }
ref struct B { public object X; }
void Foo(B* b)
{
((A*)b)->X = "abc";
}
void Bar(A* a)
{
Foo((B*)a);
}
The optimization made by this PR is only valid if we consider this kind of aliasing to be UB. I am fine with doing that if the diffs justify it, but it is something to be aware about and explicit around if we take this change.
src/coreclr/jit/assertionprop.cpp
Outdated
// Check if we deal with `ref ByrefLikeStruct` argument | ||
unsigned lclNum = (unsigned)vnStore->CoercedConstantValue<ssize_t>(funcApp.m_args[0]); | ||
assert(lclNum != BAD_VAR_NUM); | ||
if (comp->lvaIsParameter(lclNum)) | ||
{ | ||
if (comp->info.compThisArg == lclNum) | ||
{ | ||
// No barrier needed if current class is byref-like | ||
// and the destination is an implicit "this" | ||
return comp->eeIsByrefLike(comp->info.compClassHnd) ? GCInfo::WriteBarrierForm::WBF_NoBarrier | ||
: GCInfo::WriteBarrierForm::WBF_BarrierUnknown; | ||
} | ||
|
||
// Find the corresponding argument index. Basically, we need to ignore the hidden args | ||
// such as generic context, return buffer, etc.. | ||
unsigned ilArg = comp->compMap2ILvarNum(lclNum); | ||
switch (static_cast<int>(ilArg)) | ||
{ | ||
// Ignore non-user args | ||
case ICorDebugInfo::RETBUF_ILNUM: | ||
case ICorDebugInfo::VARARGS_HND_ILNUM: | ||
case ICorDebugInfo::TYPECTXT_ILNUM: | ||
case ICorDebugInfo::UNKNOWN_ILNUM: | ||
return GCInfo::WriteBarrierForm::WBF_BarrierUnknown; | ||
|
||
default: | ||
break; | ||
} | ||
assert(ilArg != BAD_VAR_NUM); | ||
|
||
// A small inconsistency: various signature walking VM APIs (e.g. getArgClass) | ||
// do not expose implicit "this" args, while compMap2ILvarNum does. | ||
if (comp->info.compMethodInfo->args.hasImplicitThis()) | ||
{ | ||
assert(ilArg > 0); | ||
ilArg--; | ||
} | ||
|
||
// First, get corresponding CORINFO_ARG_LIST_HANDLE arg data | ||
CORINFO_ARG_LIST_HANDLE currentArg = comp->info.compMethodInfo->args.args; | ||
for (unsigned i = 0; i < ilArg; i++) | ||
{ | ||
currentArg = comp->info.compCompHnd->getArgNext(currentArg); | ||
} | ||
|
||
// Now get its type and strip `ref` so we can get CORINFO_CLASS_HANDLE of the | ||
// underlying type | ||
CORINFO_CLASS_HANDLE byrefCls = comp->eeGetArgClass(&comp->info.compMethodInfo->args, currentArg); | ||
if (byrefCls != NO_CLASS_HANDLE) | ||
{ | ||
CORINFO_CLASS_HANDLE actualCls; | ||
CorInfoType actualClsType = comp->info.compCompHnd->getChildType(byrefCls, &actualCls); | ||
|
||
// Only structs can be byref-like | ||
if (actualClsType == CORINFO_TYPE_VALUECLASS) | ||
{ | ||
assert(actualCls != NO_CLASS_HANDLE); | ||
if (comp->eeIsByrefLike(actualCls)) | ||
{ | ||
// It's byref-like, barrier is not needed | ||
return GCInfo::WriteBarrierForm::WBF_NoBarrier; | ||
} | ||
} | ||
} | ||
} | ||
return GCInfo::WriteBarrierForm::WBF_BarrierUnknown; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This look potentially quite expensive. You may want to try running an SPMI collection on this PR to get more accurate diffs, there are a lot of misses.
src/coreclr/jit/assertionprop.cpp
Outdated
// Now get its type and strip `ref` so we can get CORINFO_CLASS_HANDLE of the | ||
// underlying type | ||
CORINFO_CLASS_HANDLE byrefCls = comp->eeGetArgClass(&comp->info.compMethodInfo->args, currentArg); | ||
if (byrefCls != NO_CLASS_HANDLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is actually checking that this is a byref to that type? Would this potentially pass for an array as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire code sequence is a bit unfortunate... it would be much cleaner/less scary looking if we had just saved enough information in LclVarDsc
(or a side table, similar to lvaParameterPassingInfo
) to get the full class handle for the parameter. I suppose the main problem around that is that we don't actually call getArgClass
on all parameters today, only on TYP_REF
parameters.
I wonder if we should just bite the bullet and call getArgClass
on all parameters, or perhaps change getArgType
to unconditionally return a class handle (today it only returns the class handle for value classes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree -- we should compute this early when we first process the method and local sigs and cache it on the local var.
I think it is fine to make this form of aliasing UB, in particular when it involves managed types. |
Current diffs are not too big, but I am wondering if we can, sort of, mark more internal structs as Also, perhaps, e.g. NativeAOT can do this as part of IL analysis as an optimization |
@EgorBot -profiler using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
public class MyBench
{
[Benchmark]
public void Bench()
{
MyStruct s = default;
Test(ref s, null, null);
}
[MethodImpl(MethodImplOptions.NoInlining)]
void Test(ref MyStruct s, object o1, object o2)
{
s.A = o1;
s.B = o2;
}
ref struct MyStruct
{
public object A;
public object B;
}
} |
Benchmark results on Intel
Flame graphs: Main vs PR 🔥 For clean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also be able to make the conservative VNs for these structs more aggressive (they can match the liberal vns), since they can't be modified cross-thread.
src/coreclr/jit/assertionprop.cpp
Outdated
// Now get its type and strip `ref` so we can get CORINFO_CLASS_HANDLE of the | ||
// underlying type | ||
CORINFO_CLASS_HANDLE byrefCls = comp->eeGetArgClass(&comp->info.compMethodInfo->args, currentArg); | ||
if (byrefCls != NO_CLASS_HANDLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree -- we should compute this early when we first process the method and local sigs and cache it on the local var.
@jakobbotsch @AndyAyersMS can you take a look again? I ended up reserving a bitfield in win-x64 Release layout
I also considered re-using lvClassHnd or m_Layout fields for it, but that required a bit more changes. Same Diffs with many missing contexts (jit-diff is the same). |
src/coreclr/vm/jitinterface.cpp
Outdated
case ELEMENT_TYPE_BYREF: | ||
typeHnd = ptr.GetTypeHandleThrowing(pModule, &typeContext); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do it for all types? If not, can we at least do it for ELEMENT_TYPE_PTR
as well?
The documentation of the JIT-EE API should be updated (vcTypeRet
should probably be renamed too). Also, what about crossgen2/NAOT implementations of this API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does doing this cause the operand type of the byref/pointer to be loaded eagerly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what about crossgen2/NAOT implementations of this API?
They seem to already always return it, so my test snippet is optimized there too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does doing this cause the operand type of the byref/pointer to be loaded eagerly?
How do I check that? I presume since it's a parameter it's going to be loaded anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do it for all types?
Any particular reason to do that? Isn't, like you mentioned, it might lead to redundant type load events we don't need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I check that? I presume since it's a parameter it's going to be loaded anyway?
Maybe @jkotas can answer this question?
Any particular reason to do that? Isn't, like you mentioned, it might lead to redundant type load events we don't need?
The reason would be to make the JIT-EE API more regular in its contract, and avoid us having to call back to the EE side in the other cases too (e.g. we want this information for class types as well, and we currently do another JIT-EE call to get it). We could remove getArgClass
entirely if getArgType
just returned the class handle always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might lead to redundant type load events we don't need?
Once we load the type, the future attempts to load the type are cheap. There is not such a thing as "redundant type load events".
The point is to avoid loading the type in the first place. As you can see in the ELEMENT_TYPE_PTR
comment below, eagerly loading types that do not need to be loaded can even break stuff. Avoiding unnecessary type loading is goodness for startup.
I think the JIT should never need to load the unmanaged pointer types. It should treat all unmanaged pointers as opaque void*
.
This optimization should be based on the type provided in the stfld metadata token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization should be based on the type provided in the stfld metadata token.
Actually, this might simplify changes a lot, let me try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization should be based on the type provided in the stfld metadata token.
This would also be much more in line with how our normal modelling of aliasing and stores/loads work and effectively would mean that we aren't changing the aliasing model, so if we can do that it seems like it would be much cleaner overall.
src/coreclr/jit/lclvars.cpp
Outdated
{ | ||
CORINFO_CLASS_HANDLE clsHnd = info.compCompHnd->getArgClass(&info.compMethodInfo->args, argLst); | ||
lvaSetClass(varDscInfo->varNum, clsHnd); | ||
} | ||
else if (type == CORINFO_TYPE_BYREF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we already avoid the write barrier if this was an unmanaged pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like this:
unsafe void Test(string* a, string b)
{
*a = b;
}
isn't it not safe to remove WB here? And, hopefully, nobody does this anyway 🙂
Currenly it does emit a checked WB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be unsafe to remove the write barrier there.
I think it means we should make this feature work for CORINFO_TYPE_PTR
too, for parity between managed and unmanaged pointers. E.g.
[MethodImpl(MethodImplOptions.NoInlining)]
void Test(MyStruct* s, object o1, object o2)
{
s->A = o1;
s->B = o2;
}
ref struct MyStruct
{
public object A;
public object B;
}
should avoid the write barrier as well, like if it was a managed pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE that MyStruct* s
is issueing a warning in Roslyn as an attempt to take an address of a managed type. Do we want to also eagerly load all types under unmanaged pointers in hope somebody do this (I presume it's rare/never).
cc @jkotas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to assume that both ref MyByRefLikeStruct
and MyByRefLikeStruct*
never point into GC heap here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm after is that we have consistency in optimizations like these. Switching a byref to a pointer because the target is on the stack should not come with surprising perf regressions like new write barriers.
5436aba
to
ace3a9d
Compare
Thanks for reviews and the suggestion, |
Closes #9512
Remove write barriers for fields of
ref ByrefLikeStruct
arguments since we know that such refs always point to stack, Example:Codegen for
Test(...)
on Main:This PR:
Similar case when ref struct is "this":