Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not encode safe points with -1 offset. #104336

Merged
merged 9 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 8 additions & 31 deletions src/coreclr/gcinfo/gcinfoencoder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1216,42 +1216,19 @@ void GcInfoEncoder::Build()

///////////////////////////////////////////////////////////////////////
// Normalize call sites
// Eliminate call sites that fall inside interruptible ranges
///////////////////////////////////////////////////////////////////////

_ASSERTE(m_NumCallSites == 0 || numInterruptibleRanges == 0);

UINT32 numCallSites = 0;
for(UINT32 callSiteIndex = 0; callSiteIndex < m_NumCallSites; callSiteIndex++)
{
UINT32 callSite = m_pCallSites[callSiteIndex];
// There's a contract with the EE that says for non-leaf stack frames, where the
// method is stopped at a call site, the EE will not query with the return PC, but
// rather the return PC *minus 1*.
// The reason is that variable/register liveness may change at the instruction immediately after the
// call, so we want such frames to appear as if they are "within" the call.
// Since we use "callSite" as the "key" when we search for the matching descriptor, also subtract 1 here
// (after, of course, adding the size of the call instruction to get the return PC).
callSite += m_pCallSiteSizes[callSiteIndex] - 1;
callSite += m_pCallSiteSizes[callSiteIndex];

_ASSERTE(DENORMALIZE_CODE_OFFSET(NORMALIZE_CODE_OFFSET(callSite)) == callSite);
UINT32 normOffset = NORMALIZE_CODE_OFFSET(callSite);

BOOL keepIt = TRUE;

for(UINT32 intRangeIndex = 0; intRangeIndex < numInterruptibleRanges; intRangeIndex++)
{
InterruptibleRange *pRange = &pRanges[intRangeIndex];
if(pRange->NormStopOffset > normOffset)
{
if(pRange->NormStartOffset <= normOffset)
{
keepIt = FALSE;
}
break;
}
}

if(keepIt)
m_pCallSites[numCallSites++] = normOffset;
m_pCallSites[numCallSites++] = normOffset;
}

GCINFO_WRITE_VARL_U(m_Info1, NORMALIZE_NUM_SAFE_POINTS(numCallSites), NUM_SAFE_POINTS_ENCBASE, NumCallSitesSize);
Expand Down Expand Up @@ -1418,7 +1395,7 @@ void GcInfoEncoder::Build()

for(pCurrent = pTransitions; pCurrent < pEndTransitions; )
{
if(pCurrent->CodeOffset > callSite)
if(pCurrent->CodeOffset >= callSite)
{
couldBeLive |= liveState;

Expand Down Expand Up @@ -1773,7 +1750,7 @@ void GcInfoEncoder::Build()
{
for(pCurrent = pTransitions; pCurrent < pEndTransitions; )
{
if(pCurrent->CodeOffset > callSite)
if(pCurrent->CodeOffset >= callSite)
{
// Time to record the call site

Expand Down Expand Up @@ -1872,7 +1849,7 @@ void GcInfoEncoder::Build()

for(pCurrent = pTransitions; pCurrent < pEndTransitions; )
{
if(pCurrent->CodeOffset > callSite)
if(pCurrent->CodeOffset >= callSite)
{
// Time to encode the call site

Expand Down Expand Up @@ -1919,7 +1896,7 @@ void GcInfoEncoder::Build()

for(pCurrent = pTransitions; pCurrent < pEndTransitions; )
{
if(pCurrent->CodeOffset > callSite)
if(pCurrent->CodeOffset >= callSite)
{
// Time to encode the call site
GCINFO_WRITE_VECTOR(m_Info1, liveState, CallSiteStateSize);
Expand Down
22 changes: 11 additions & 11 deletions src/coreclr/inc/gcinfotypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -678,8 +678,8 @@ void FASTCALL decodeCallPattern(int pattern,
#define NORMALIZE_SIZE_OF_STACK_AREA(x) ((x)>>2)
#define DENORMALIZE_SIZE_OF_STACK_AREA(x) ((x)<<2)
#define CODE_OFFSETS_NEED_NORMALIZATION 1
#define NORMALIZE_CODE_OFFSET(x) (x) // Instructions are 2/4 bytes long in Thumb/ARM states,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching change will need to be made in https://github.com/dotnet/diagnostics/tree/main/src/shared/gcdump so that GC info dumping in SOS works.

There is a challenge though: SOS wants to support all runtimes, so the SOS copy will need to work for both old and new GC info encoding - we will need to switch based on the current runtime version somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@VSadov VSadov Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, same problem exists for https://github.com/dotnet/runtime/tree/main/src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/Amd64

There is no corresponding location for arm[64]. Do we use Amd64 for all cases?
It is the RISC arches that are more interesting, since the encoding size changes.


Figured that:

                        // Arm, Arm64, LoongArch64 and RISCV64 use the same GcInfo format as Amd64

The arch specific behavior is done via consulting with _readyToRunReader.Machine`

Copy link
Member Author

@VSadov VSadov Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are no changes needed on x64 for the uses of GcInfo that do enumeration. The offsets are different, but that is a matter of FindSafePoint(offset) kind of use, as the offset would need to no longer do -1 adjustment in nonleaf cases. I do not see such use in ReadyToRun thing.

Basically - for the use like foreach(safePoint in ...) {foreach(slot in SlotsAt(safePoint)} , on x64 everything should work as before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RISC is more interesting as GcInfo deserializer would need to denormalize code offsets based on GcInfo version. Not just the offsets used in safepoints, there are other places.

I would expect that GcInfo deserializer that wants to support different runtimes already needs to figure the version of GCInfo. This is not the first time it changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diagnostics/SOS GcInfo deserializer looks like just a copy of the one in the runtime. It is version-aware (can handle v1 and v2) and gets the version from the supplied token.

We may just need to copy/paste the new version once this change goes in.

Logged a follow-up issue for that: dotnet/diagnostics#4795

#define DENORMALIZE_CODE_OFFSET(x) (x) // but the safe-point offsets are encoded with a -1 adjustment.
#define NORMALIZE_CODE_OFFSET(x) ((x)>>1) // Instructions are 2/4 bytes long in Thumb/ARM states,
#define DENORMALIZE_CODE_OFFSET(x) ((x)<<1)
#define NORMALIZE_REGISTER(x) (x)
#define DENORMALIZE_REGISTER(x) (x)
#define NORMALIZE_NUM_SAFE_POINTS(x) (x)
Expand Down Expand Up @@ -734,9 +734,9 @@ void FASTCALL decodeCallPattern(int pattern,
#define DENORMALIZE_STACK_BASE_REGISTER(x) ((x)^29)
#define NORMALIZE_SIZE_OF_STACK_AREA(x) ((x)>>3)
#define DENORMALIZE_SIZE_OF_STACK_AREA(x) ((x)<<3)
#define CODE_OFFSETS_NEED_NORMALIZATION 0
#define NORMALIZE_CODE_OFFSET(x) (x) // Instructions are 4 bytes long, but the safe-point
#define DENORMALIZE_CODE_OFFSET(x) (x) // offsets are encoded with a -1 adjustment.
#define CODE_OFFSETS_NEED_NORMALIZATION 1
#define NORMALIZE_CODE_OFFSET(x) ((x)>>2) // Instructions are 4 bytes long
#define DENORMALIZE_CODE_OFFSET(x) ((x)<<2)
#define NORMALIZE_REGISTER(x) (x)
#define DENORMALIZE_REGISTER(x) (x)
#define NORMALIZE_NUM_SAFE_POINTS(x) (x)
Expand Down Expand Up @@ -789,9 +789,9 @@ void FASTCALL decodeCallPattern(int pattern,
#define DENORMALIZE_STACK_BASE_REGISTER(x) ((x) == 0 ? 22 : 3)
#define NORMALIZE_SIZE_OF_STACK_AREA(x) ((x)>>3)
#define DENORMALIZE_SIZE_OF_STACK_AREA(x) ((x)<<3)
#define CODE_OFFSETS_NEED_NORMALIZATION 0
#define NORMALIZE_CODE_OFFSET(x) (x) // Instructions are 4 bytes long, but the safe-point
#define DENORMALIZE_CODE_OFFSET(x) (x) // offsets are encoded with a -1 adjustment.
#define CODE_OFFSETS_NEED_NORMALIZATION 1
#define NORMALIZE_CODE_OFFSET(x) ((x)>>2) // Instructions are 4 bytes long
#define DENORMALIZE_CODE_OFFSET(x) ((x)<<2)
#define NORMALIZE_REGISTER(x) (x)
#define DENORMALIZE_REGISTER(x) (x)
#define NORMALIZE_NUM_SAFE_POINTS(x) (x)
Expand Down Expand Up @@ -844,9 +844,9 @@ void FASTCALL decodeCallPattern(int pattern,
#define DENORMALIZE_STACK_BASE_REGISTER(x) ((x) == 0 ? 8 : 2)
#define NORMALIZE_SIZE_OF_STACK_AREA(x) ((x)>>3)
#define DENORMALIZE_SIZE_OF_STACK_AREA(x) ((x)<<3)
#define CODE_OFFSETS_NEED_NORMALIZATION 0
#define NORMALIZE_CODE_OFFSET(x) (x) // Instructions are 4 bytes long, but the safe-point
#define DENORMALIZE_CODE_OFFSET(x) (x) // offsets are encoded with a -1 adjustment.
#define CODE_OFFSETS_NEED_NORMALIZATION 1
#define NORMALIZE_CODE_OFFSET(x) ((x)>>2) // Instructions are 4 bytes long
#define DENORMALIZE_CODE_OFFSET(x) ((x)<<2)
#define NORMALIZE_REGISTER(x) (x)
#define DENORMALIZE_REGISTER(x) (x)
#define NORMALIZE_NUM_SAFE_POINTS(x) (x)
Expand Down
23 changes: 0 additions & 23 deletions src/coreclr/jit/codegencommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1515,29 +1515,6 @@ void CodeGen::genExitCode(BasicBlock* block)
if (compiler->getNeedsGSSecurityCookie())
{
genEmitGSCookieCheck(jmpEpilog);

if (jmpEpilog)
{
// Dev10 642944 -
// The GS cookie check created a temp label that has no live
// incoming GC registers, we need to fix that

unsigned varNum;
LclVarDsc* varDsc;

/* Figure out which register parameters hold pointers */

for (varNum = 0, varDsc = compiler->lvaTable; varNum < compiler->lvaCount && varDsc->lvIsRegArg;
varNum++, varDsc++)
{
noway_assert(varDsc->lvIsParam);

gcInfo.gcMarkRegPtrVal(varDsc->GetArgReg(), varDsc->TypeGet());
Copy link
Member Author

@VSadov VSadov Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have correct GC info at the label after GS cookie check as cookie check has no effect on GC liveness. (maybe it did in the past?)

This was just noop or was messing things up by forcing argument registers to be GC-alive when they would not have GC references in them. Perhaps it was compensating for some behavior that is long gone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was hidden because we did not allow interrupting thread on this instruction in a leaf frame (but we can and should) and in nonleaf this would not show up because this is a tailcall.

}

GetEmitter()->emitThisGCrefRegs = GetEmitter()->emitInitGCrefRegs = gcInfo.gcRegGCrefSetCur;
GetEmitter()->emitThisByrefRegs = GetEmitter()->emitInitByrefRegs = gcInfo.gcRegByrefSetCur;
}
}

genReserveEpilog(block);
Expand Down
6 changes: 3 additions & 3 deletions src/coreclr/jit/emit.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10502,9 +10502,9 @@ regMaskTP emitter::emitGetGCRegsKilledByNoGCCall(CorInfoHelpFunc helper)
// of the last instruction in the region makes GC safe again.
// In particular - once the IP is on the first instruction, but not executed it yet,
// it is still safe to do GC.
// The only special case is when NoGC region is used for prologs/epilogs.
// In such case the GC info could be incorrect until the prolog completes and epilogs
// may have unwindability restrictions, so the first instruction cannot have GC.
// The only special case is when NoGC region is used for prologs.
// In such case the GC info could be incorrect until the prolog completes, so the first
// instruction cannot have GC.

void emitter::emitDisableGC()
{
Expand Down
3 changes: 1 addition & 2 deletions src/coreclr/jit/emitinl.h
Original file line number Diff line number Diff line change
Expand Up @@ -594,8 +594,7 @@ bool emitter::emitGenNoGCLst(Callback& cb)
emitter::instrDesc* id = emitFirstInstrDesc(ig->igData);
assert(id != nullptr);
assert(id->idCodeSize() > 0);
if (!cb(ig->igFuncIdx, ig->igOffs, ig->igSize, id->idCodeSize(),
ig->igFlags & (IGF_FUNCLET_PROLOG | IGF_FUNCLET_EPILOG | IGF_EPILOG)))
Copy link
Member Author

@VSadov VSadov Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that including epilogs in this set was just compensating for the bug related to GS cookies and for overly conservative asserts in NativeAOT hijacker.

It is fairly common on ARM64 to see:

            bl      <somewhere>
            ldp    <some regs>  // <-- this IP is certainly unwindable, how else can we unwind from the call above? 
            ldp    <more regs>
            ldp     fp, lr, [sp], #0x60

The first instruction of an epilog is no different from a first instruction in a regular no-GC region.
We have not executed any instructions yet that destroy unwindability or GC-safeness.

if (!cb(ig->igFuncIdx, ig->igOffs, ig->igSize, id->idCodeSize(), ig->igFlags & (IGF_FUNCLET_PROLOG)))
{
return false;
}
Expand Down
5 changes: 2 additions & 3 deletions src/coreclr/jit/gcencode.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4027,8 +4027,7 @@ class InterruptibleRangeReporter
// Report everything between the previous region and the current
// region as interruptible.

bool operator()(
unsigned igFuncIdx, unsigned igOffs, unsigned igSize, unsigned firstInstrSize, bool isInPrologOrEpilog)
bool operator()(unsigned igFuncIdx, unsigned igOffs, unsigned igSize, unsigned firstInstrSize, bool isInProlog)
{
if (igOffs < m_uninterruptibleEnd)
{
Expand All @@ -4044,7 +4043,7 @@ class InterruptibleRangeReporter
// Once the first instruction in IG executes, we cannot have GC.
// But it is ok to have GC while the IP is on the first instruction, unless we are in prolog/epilog.
unsigned interruptibleEnd = igOffs;
if (!isInPrologOrEpilog)
if (!isInProlog)
{
interruptibleEnd += firstInstrSize;
}
Expand Down
6 changes: 6 additions & 0 deletions src/coreclr/nativeaot/Runtime/thread.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -675,10 +675,16 @@ void Thread::HijackCallback(NATIVE_CONTEXT* pThreadContext, void* pThreadToHijac
if (runtime->IsConservativeStackReportingEnabled() ||
codeManager->IsSafePoint(pvAddress))
{
// IsUnwindable is precise on arm64, but can give false negatives on other architectures.
// (when IP is on the first instruction of an epilog, we still can unwind,
// but we can tell if the instruction is the first only if we can navigate instructions backwards and check)
// The preciseness of IsUnwindable is tracked in https://github.com/dotnet/runtime/issues/101932
#if defined(TARGET_ARM64)
// we may not be able to unwind in some locations, such as epilogs.
// such locations should not contain safe points.
// when scanning conservatively we do not need to unwind
ASSERT(codeManager->IsUnwindable(pvAddress) || runtime->IsConservativeStackReportingEnabled());
#endif

// if we are not given a thread to hijack
// perform in-line wait on the current thread
Expand Down
30 changes: 2 additions & 28 deletions src/coreclr/nativeaot/Runtime/unix/UnixNativeCodeManager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -213,44 +213,18 @@ void UnixNativeCodeManager::EnumGcRefs(MethodInfo * pMethodInfo,

#ifdef TARGET_ARM
// Ensure that code offset doesn't have the Thumb bit set. We need
// it to be aligned to instruction start to make the !isActiveStackFrame
// branch below work.
// it to be aligned to instruction start
ASSERT(((uintptr_t)codeOffset & 1) == 0);
#endif

bool executionAborted = ((UnixNativeMethodInfo*)pMethodInfo)->executionAborted;

if (!isActiveStackFrame && !executionAborted)
{
// the reasons for this adjustment are explained in EECodeManager::EnumGcRefs
codeOffset--;
}

GcInfoDecoder decoder(
GCInfoToken(gcInfo),
GcInfoDecoderFlags(DECODE_GC_LIFETIMES | DECODE_SECURITY_OBJECT | DECODE_VARARG),
codeOffset
);

if (isActiveStackFrame)
{
// CONSIDER: We can optimize this by remembering the need to adjust in IsSafePoint and propagating into here.
// Or, better yet, maybe we should change the decoder to not require this adjustment.
// The scenario that adjustment tries to handle (fallthrough into BB with random liveness)
// does not seem possible.
if (!decoder.HasInterruptibleRanges())
{
decoder = GcInfoDecoder(
GCInfoToken(gcInfo),
GcInfoDecoderFlags(DECODE_GC_LIFETIMES | DECODE_SECURITY_OBJECT | DECODE_VARARG),
codeOffset - 1
);

assert(decoder.IsInterruptibleSafePoint());
}
}

ICodeManagerFlags flags = (ICodeManagerFlags)0;
bool executionAborted = ((UnixNativeMethodInfo*)pMethodInfo)->executionAborted;
if (executionAborted)
flags = ICodeManagerFlags::ExecutionAborted;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -440,9 +440,8 @@ void CoffNativeCodeManager::EnumGcRefs(MethodInfo * pMethodInfo,
PTR_uint8_t gcInfo;
uint32_t codeOffset = GetCodeOffset(pMethodInfo, safePointAddress, &gcInfo);

bool executionAborted = ((CoffNativeMethodInfo *)pMethodInfo)->executionAborted;

ICodeManagerFlags flags = (ICodeManagerFlags)0;
bool executionAborted = ((CoffNativeMethodInfo *)pMethodInfo)->executionAborted;
if (executionAborted)
flags = ICodeManagerFlags::ExecutionAborted;

Expand All @@ -453,36 +452,13 @@ void CoffNativeCodeManager::EnumGcRefs(MethodInfo * pMethodInfo,
flags = (ICodeManagerFlags)(flags | ICodeManagerFlags::ActiveStackFrame);

#ifdef USE_GC_INFO_DECODER
if (!isActiveStackFrame && !executionAborted)
{
// the reasons for this adjustment are explained in EECodeManager::EnumGcRefs
codeOffset--;
}

GcInfoDecoder decoder(
GCInfoToken(gcInfo),
GcInfoDecoderFlags(DECODE_GC_LIFETIMES | DECODE_SECURITY_OBJECT | DECODE_VARARG),
codeOffset
);

if (isActiveStackFrame)
{
// CONSIDER: We can optimize this by remembering the need to adjust in IsSafePoint and propagating into here.
// Or, better yet, maybe we should change the decoder to not require this adjustment.
// The scenario that adjustment tries to handle (fallthrough into BB with random liveness)
// does not seem possible.
if (!decoder.HasInterruptibleRanges())
{
decoder = GcInfoDecoder(
GCInfoToken(gcInfo),
GcInfoDecoderFlags(DECODE_GC_LIFETIMES | DECODE_SECURITY_OBJECT | DECODE_VARARG),
codeOffset - 1
);

assert(decoder.IsInterruptibleSafePoint());
}
}

if (!decoder.EnumerateLiveSlots(
pRegisterSet,
isActiveStackFrame /* reportScratchSlots */,
Expand Down
Loading
Loading