-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
src/jit/lower.cpp
Outdated
// object). This is not a "regular" indirection. | ||
// (Note that the user check could be costly.) | ||
LIR::Use indirUse; | ||
if (BlockRange().TryGetUse(indir, &indirUse) && indirUse.User()->OperIsIndir()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of this code is responsible for the sole diff in this PR:
--- D:\Projects\jit\diffs\dasmset_16\base\System.Private.CoreLib.dasm 2019-10-04 20:51:01.000000000 +-0300
+++ D:\Projects\jit\diffs\dasmset_16\diff\System.Private.CoreLib.dasm 2019-10-04 20:51:01.000000000 +-0300
@@ -439051,6 +439051,5 @@
G_M6933_IG02:
- add rdx, r8
- movups xmm0, xmmword ptr [rdx]
+ movups xmm0, xmmword ptr [rdx+r8]
movups xmmword ptr [rcx], xmm0
mov rax, rcx
ContainCheckHWIntrinsicAddr
calls TryCreateAddrMode
with isIndir = true
but then TryCreateAddrMode
decides that it knows better...
src/jit/emitxarch.cpp
Outdated
|
||
dispIns(id); | ||
emitCurIGsize += sz; | ||
emitIns_R_ARX(ins, attr, reg, base, REG_NA, 1, disp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A ton of duplicated code...
src/jit/lowerarmarch.cpp
Outdated
} | ||
blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindUnroll; | ||
|
||
src->AsIntCon()->SetIconValue(fill); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value wasn't updated when it was 0, even though the value that was checked for wasn't the original value but a masked one...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, so if src->AsIntCon()->IconValue() == 0x11100
we would get a wrong init?
We should be able to repro it with CEE_INITBLK
il.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, I changed because it simply was confusing - lowering was masking and checking for 0 and codegen was NOT masking but also checking for 0. But yes, it turns out that it is a real bug on ARM:
52822000 movz w0, #0x1100
72A00020 movk w0, #1 LSL #16
9101C3A1 add x1, fp, #112 // [V05 loc0]
B9000020 str w0, [x1]
39001020 strb w0, [x1,#4]
XARCH lowering was correct but also did things differently - masking, multiplying & updating, checking for 0.
src/jit/lowerxarch.cpp
Outdated
{ | ||
initVal->gtIntCon.gtIconVal = 0x01010101 * fill; | ||
fill *= 0x0101010101010101LL; | ||
src->gtType = TYP_LONG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the type of the constant but doesn't change the type of GT_INIT_VAL
. Doesn't seem to have any ill side effects so I left it unchanged. Anyway I think that the type of GT_INIT_VAL
should really be TYP_STRUCT
.
src/jit/lowerxarch.cpp
Outdated
{ | ||
initVal->gtIntCon.gtIconVal = 0x01010101 * fill; | ||
fill *= 0x0101010101010101LL; | ||
src->gtType = TYP_LONG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the type was changed to long even when the fill was 0. That isn't necessary and resulted in xor rax, rax
rather than xor eax, eax
being generated in disassembly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see diffs from changing the handling of 0? Seems like that would be pretty common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only textual diffs. There are hacks in the emitter that prevented a 64 bit xor from being emitted in the first place, like here:
Lines 11360 to 11366 in 088299e
case EA_8BYTE: | |
// TODO-AMD64-CQ: Better way to not emit REX.W when we don't need it | |
// Don't need to zero out the high bits explicitly | |
if ((ins != INS_xor) || (reg1 != reg2)) | |
{ | |
code = AddRexWPrefix(ins, code); | |
} |
src/jit/lowerxarch.cpp
Outdated
// so we need to check if there's a long enough sequence of non-GC slots. | ||
ClassLayout* layout = blkNode->GetLayout(); | ||
unsigned slots = layout->GetSlotCount(); | ||
for (unsigned i = 0; i < slots; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use 3 loops when you can do it with one?!
src/jit/lowerxarch.cpp
Outdated
addr->ClearContained(); | ||
} | ||
} | ||
else if (!source->IsMultiRegCall() && !source->OperIsSimdOrHWintrinsic()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multireg call/simd/hw intrinsic does not ever happen.
@dotnet/jit-contrib |
e104e49
to
8336b44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for extracting this out of #21711 .
Some questions about the first 8 commits here, will return later.
I think you could speed up review/merge of these changes if cut them on even smaller parts (for example 7ed454a could be merged right now if you extract it into a separate PR, 9ab3434 also could be merged and it doesn't look like part of "Refactor init/copy block codegen").
src/jit/codegenarmarch.cpp
Outdated
|
||
assert(dstAddr->isUsedFromReg()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have you decided to delete these asserts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genConsumeReg
eventually asserts if there's no register assigned to the node
coreclr/src/jit/codegenlinear.cpp
Line 1306 in 4e702da
assert(tree->gtHasReg()); |
So the assert is redundant. In general the JIT codebase seems to be having a problem with assert redundancy.
} | ||
#endif | ||
buildInternalIntRegisterDefForNode(blkNode, regMask); | ||
} | ||
|
||
if (size >= XMM_REGSIZE_BYTES) | ||
{ | ||
// If we have a buffer larger than XMM_REGSIZE_BYTES, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were these comments wrong or unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commenting throughout the JIT codebase is in general pretty terrible. The first rule about commenting is that comments should explain why the code does something when that isn't obvious and NOT what the code does. What the code does should be obvious from, well, the code itself.
Now, in complex code it's not always possible to write clear enough code. Or sometimes the code is clear but is large and saying a few words about what it does is helpful because it speeds up understanding of the code.
That's not the case here. It's 3 lines of trivial code and 5 lines of comments.
Sorry about this monster PR. Initially I was just looking to get the destination address of init block contained because that was already done for copy block and was missing from init block for no obvious reason. Unfortunately lowering, lsra build & codegen are rather tighly coupled so I had to investigate what needs changing and then things spiralled out of control. Unused variables, confusing names, duplicated code, code inconsistencies for now reason and whatnot. |
BTW - it's probably not too late to do that. There are some 3 commits near the end that would definitely fit well in 2 separate PRs. Preferences? |
It would be nice, in additional I think it is useful to get independent ci testing for each of them to make sure that they don't hide regressions from each others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more questions/comments.
src/jit/lowerarmarch.cpp
Outdated
} | ||
blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindUnroll; | ||
|
||
src->AsIntCon()->SetIconValue(fill); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, so if src->AsIntCon()->IconValue() == 0x11100
we would get a wrong init?
We should be able to repro it with CEE_INITBLK
il.
src/jit/lowerarmarch.cpp
Outdated
|
||
#ifdef DEBUG | ||
// CpObj must always have at least one GC-Pointer as a member. | ||
assert(objNode->GetLayout()->HasGCPtr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, before we were checking:
if (blkNode->OperGet() == GT_STORE_OBJ)
assert(blkNode->AsObj()->GetLayout()->HasGCPtr());
now there is
if (blkNode->OperIs(GT_STORE_OBJ) && (!blkNode->AsObj()->GetLayout()->HasGCPtr() || blkNode->gtBlkOpGcUnsafe))
so if before all GT_STORE_OBJ
had at least one GCPtr then who passes !blkNode->AsObj()->GetLayout()->HasGCPtr()
check after that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice catch. This change shouldn't probably be in this PR, though it's small and doesn't hurt.
One of the removed commits (mikedn@8336b44) is moving the code from fgMorphUnsafeBlk
to lowering. That's responsible for changing OBJs without GC pointers into BLKs.
I'm also working on #27053 that will make BLKs even more rare. Since introducing ClassLayout
and making OBJ
a small node there's no longer any need to discard struct type information from IR. We should use OBJ whenever type information is available, with or without GC pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's still the case that after morph we have no GT_OBJ
nodes that have zero gc pointers. So I'm not sure why this would change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently GT_OBJ
nodes always have GC pointers. The plan is to use GT_OBJ
nodes even in cases when no GC pointers are present, at least until lowering. After lowering it probably doesn't matter anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - in that case perhaps we don't need to change it.
// b) The size of the struct to initialize is smaller than INITBLK_UNROLL_LIMIT bytes. | ||
void CodeGen::genCodeForInitBlkUnroll(GenTreeBlk* initBlkNode) | ||
{ | ||
#ifndef _TARGET_ARM64_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not #ifdef _TARGET_ARM_
?
are you going to implement this function later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the ifdef will just disappear in #21711. Besides, I find it convenient to place the small block before the big block when you some sort of if/else construct. You don't need to scroll through the entire big block to see what the small block is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions and some minor comment nits.
src/jit/codegenarmarch.cpp
Outdated
{ | ||
dstAddrBaseReg = genConsumeReg(dstAddr); | ||
|
||
// TODO-Cleanup: This matches the old code behavior in order to get 0 diffs. It's otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something you're planning to change? Should there be an issue?
Also, it should be TODO-ARMArch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will be fixed in the PR from where all this cleanup was extracted from: 1004804
src/jit/lowerarmarch.cpp
Outdated
|
||
#ifdef DEBUG | ||
// CpObj must always have at least one GC-Pointer as a member. | ||
assert(objNode->GetLayout()->HasGCPtr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's still the case that after morph we have no GT_OBJ
nodes that have zero gc pointers. So I'm not sure why this would change.
src/jit/lowerarmarch.cpp
Outdated
|
||
size_t classSize = objNode->GetLayout()->GetSize(); | ||
size_t blkSize = roundUp(classSize, TARGET_POINTER_SIZE); | ||
if (blkNode->OperIs(GT_STORE_OBJ) && (!blkNode->AsObj()->GetLayout()->HasGCPtr() || blkNode->gtBlkOpGcUnsafe)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this check was here before. It should probably be asserting !blkNode->AsObj()->GetLayout()->HasGCPtr()
, but perhaps that's best left for another PR.
{ | ||
// TODO-CQ: We could unroll even when the initialization value is not a constant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: TODO-XArch-CQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is in lowerxarch it is not arch specific, you can always insert a mul x, 0x01010101
to deal with non-const initialization. Not sure if anyone is doing this, though since initblk
was exposed in the Unsafe
class perhaps someone will complain about it.
src/jit/lowerxarch.cpp
Outdated
{ | ||
initVal->gtIntCon.gtIconVal = 0x01010101 * fill; | ||
fill *= 0x0101010101010101LL; | ||
src->gtType = TYP_LONG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see diffs from changing the handling of 0? Seems like that would be pretty common
I haven't rebased this yet in case there is more feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks!
/azp run outerloop |
Pull request contains merge conflicts. |
Unfortunatle I won't be able to do a review until Monday, could you please do a rebase now so we can check pri1 and stress jobs? |
Thanks, conflicts fixed. Just |
/azp run outerloop |
No pipelines are associated with this pull request. |
Huh, the assert I added to check for address containment fired. I'd say that the assert is in itself correct, what's dubious is the IR that gets produced from this piece of IL:
is imported as:
which eventually ends up as
And the indir address gets contained because its type is Interestingly, I've run into a somewhat related issue with |
Sure, I've no problem splitting this even further. The main reason the original PR got so big is that for a long time nothing was merged and it would have been difficult for me to keep multiple PRs coordinated. Not to mention that the CI just isn't reliable, more builds -> more chance of random failures. |
That is true, sorry about that, but we are trying to do these things better now... |
Normally 0 sized block ops should be simply removed during lowering. But these are rare enough that adding special code to deal with them is questionable. Instead unroll such block ops, the unroll codegen simply won't generate any code for a 0 sized block op. Of course, some registers will still be allocated.
Non-GC GT_STORE_OBJ nodes were changed to GT_STORE_BLK in block copy ops but not in block init ops. Since block init is effectvely non-GC it is preferable to be consistent and change to GT_STORE_BLK in both cases.
/azp list |
CI/CD Pipelines for this repository: |
/azp run coreclr-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question about removing a TODO comment
|
||
if (blkNode->gtBlkOpKind == GenTreeBlk::BlkOpKindUnroll) | ||
{ | ||
// TODO-ARM-CQ: Currently we generate a helper call for every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you removed this TODO
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NYI_ARM bellow should suffice. And anyway I'll enable unroll on ARM in the "improve" PR: d871e37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I can live with that, though believe it or not, I do occasionally grep for the TODOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have that too but in lowering:
coreclr/src/jit/lowerarmarch.cpp
Lines 280 to 285 in 2424f34
#endif // _TARGET_ARM64_ | |
{ | |
// TODO-ARM-CQ: Currently we generate a helper call for every initblk we encounter. | |
// Later on we should implement loop unrolling code sequences to improve CQ. | |
blkNode->gtBlkOpKind = GenTreeBlk::BlkOpKindHelper; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks again for splitting that.
Note: the stress testing looks greener than I expected it (no infra/timeouts)
Is it worth recollecting diffs now? |
I've ran x64/x86 and altjit arm32/arm64 diffs again - none. The one I mentioned in the first comment in this issue was coming from one of commits that was removed from this PR. Looking through failed jitstress tests, looks like stuff with a history... |
I expected that |
Hmm, why? The inconsistency was that init block codegen accepted both BLK and OBJ and generated the same code in both cases while copy block also accepted both but generated different code. The change simply makes init block uses only BLK.
Yes, that would generate diffs if you run into such a case. Let me look for a test that does this, I'm pretty sure there's at least one. |
Got it, I had the impression that we were calling |
Here's a 0 sized block test that @CarolEidt recently added while fixing a bug: https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/Regression/JitBlue/GitHub_24846/GitHub_24846.cs TestCopy produces the best diff: --- D:\Projects\jit\diffs\dasmset_1\base\test.dasm 2019-10-16 21:18:05.000000000 +-0300
+++ D:\Projects\jit\diffs\dasmset_1\diff\test.dasm 2019-10-16 21:18:05.000000000 +-0300
@@ -15,14 +15,4 @@
G_M38074_IG01:
- push rdi
- push rsi
- mov rdi, rcx
G_M38074_IG02:
- mov rsi, rdx
- xor ecx, ecx
- rep movsb
-G_M38074_IG03:
- pop rsi
- pop rdi
ret Since unrolled block copy does not place any restrictions on the source and destination registers the JIT doesn't even need to move the registers so no code is being generated. The block init case is more interesting - you can see how registers are still getting allocated even if no actual initialization code is generated: --- D:\Projects\jit\diffs\dasmset_1\base\test.dasm 2019-10-16 21:18:05.000000000 +-0300
+++ D:\Projects\jit\diffs\dasmset_1\diff\test.dasm 2019-10-16 21:18:05.000000000 +-0300
@@ -160,6 +145,4 @@
G_M7755_IG05:
- xor rsi, rsi
- mov rdi, qword ptr [rbp-20H]
- xor ecx, ecx
- rep movsb
+ xor rax, rax
+ mov rdx, qword ptr [rbp-18H] At least the |
Something like that actually happens but it's a different issue and the fix is in one of the removed commits. Anyway, it doesn't result in actual code differences, you'll only see less labels in diffs due to the GC non-interruptible IGs created by the emitter for disable/enable GC. |
Extracted from #21711 for easier handling.
A bunch of cleanup to make the init/copy block codegen code more consistent, readable and easier to change. One notable change is that
GenTreeAddrMode
is no longer a large node, it didn't need be. This simplifies things a bit because the originalADD
node can be directly turned into aLEA
node, instead of creating a new node and having to find the use and update it.