Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Support some assignment decomposition in physical promotion #85105

Merged

Conversation

jakobbotsch
Copy link
Member

Add support for directly initializing and copying into replacements
instead of doing a struct local and read back. Physically promoted
struct locals used as sources are still handled conservatively (by first
writing them back to stack, then doing the copy).

For example, for a case like

void Foo()
{
    S s = _field;
    s.A = s.B + 3;
    Consume(s);
}

struct S
{
    public int A, B;
}

We see the following:

STMT00000 ( 0x000[E-] ... 0x006 )
               [000003] -A-XG------                         ▌  ASG       struct (copy)
               [000002] D------N---                         ├──▌  LCL_VAR   struct<Program+S, 8> V01 loc0
               [000001] ---XG------                         └──▌  FIELD     struct Program:_field
               [000000] -----------                            └──▌  LCL_VAR   ref    V00 this          (last use)
Processing block operation [000003] that involves replacements
New statement:
STMT00000 ( 0x000[E-] ... 0x006 )
               [000029] -A-XG------                         ▌  COMMA     int
               [000021] -A-XG------                         ├──▌  ASG       int
               [000015] D------N---                         │  ├──▌  LCL_VAR   int    V03 tmp1
               [000020] ---XG------                         │  └──▌  IND       int
               [000018] -----------                         │     └──▌  ADD       ref
               [000016] -----------                         │        ├──▌  LCL_VAR   ref    V00 this
               [000017] -----------                         │        └──▌  CNS_INT   long   8
               [000028] -A-XG------                         └──▌  ASG       int
               [000022] D------N---                            ├──▌  LCL_VAR   int    V04 tmp2
               [000027] ---XG------                            └──▌  IND       int
               [000025] -----------                               └──▌  ADD       ref
               [000023] -----------                                  ├──▌  LCL_VAR   ref    V00 this
               [000024] -----------                                  └──▌  CNS_INT   long   12

The logic is currently quite rudimentary when it comes to
holes/uncovered parts of the struct. For example, in the above case if
we add another unused field at the end of S then the result is:

STMT00000 ( 0x000[E-] ... 0x006 )
               [000003] -A-XG------                         ▌  ASG       struct (copy)
               [000002] D------N---                         ├──▌  LCL_VAR   struct<Program+S, 12> V01 loc0
               [000001] ---XG------                         └──▌  FIELD     struct Program:_field
               [000000] -----------                            └──▌  LCL_VAR   ref    V00 this          (last use)
Processing block operation [000003] that involves replacements
Struct operation is not fully covered by replaced fields. Keeping struct operation.
New statement:
STMT00000 ( 0x000[E-] ... 0x006 )
               [000030] -A-XG------                         ▌  COMMA     struct
               [000021] -A-XG------                         ├──▌  ASG       int
               [000015] D------N---                         │  ├──▌  LCL_VAR   int    V03 tmp1
               [000020] ---XG------                         │  └──▌  IND       int
               [000018] -----------                         │     └──▌  ADD       ref
               [000016] -----------                         │        ├──▌  LCL_VAR   ref    V00 this
               [000017] -----------                         │        └──▌  CNS_INT   long   8
               [000029] -A-XG------                         └──▌  COMMA     struct
               [000028] -A-XG------                            ├──▌  ASG       int
               [000022] D------N---                            │  ├──▌  LCL_VAR   int    V04 tmp2
               [000027] ---XG------                            │  └──▌  IND       int
               [000025] -----------                            │     └──▌  ADD       ref
               [000023] -----------                            │        ├──▌  LCL_VAR   ref    V00 this
               [000024] -----------                            │        └──▌  CNS_INT   long   12
               [000003] -A-XG------                            └──▌  ASG       struct (copy)
               [000002] D------N---                               ├──▌  LCL_VAR   struct<Program+S, 12> V01 loc0
               [000001] ---XG------                               └──▌  FIELD     struct Program:_field
               [000000] -----------                                  └──▌  LCL_VAR   ref    V00 this

In this case it would be significantly more efficient to copy only the
remainder, which is just a small part of the struct. However, in the
general case it is not easy to predict the most efficient way to do
this, and in some cases we cannot even represent the hole in JIT IR (if
it involves GC pointers), so I have left this for a future change for
now. Liveness should also be beneficial for that as there are many cases
where we would expect the remainder to be dead.

Add support for directly initializing and copying into replacements
instead of doing a struct local and read back. Physically promoted
struct locals used as sources are still handled conservatively (by first
writing them back to stack, then doing the copy).

For example, for a case like

```
void Foo()
{
    S s = _field;
    s.A = s.B + 3;
    Consume(s);
}

struct S
{
    public int A, B;
}
```

We see the following:

```
STMT00000 ( 0x000[E-] ... 0x006 )
               [000003] -A-XG------                         ▌  ASG       struct (copy)
               [000002] D------N---                         ├──▌  LCL_VAR   struct<Program+S, 8> V01 loc0
               [000001] ---XG------                         └──▌  FIELD     struct Program:_field
               [000000] -----------                            └──▌  LCL_VAR   ref    V00 this          (last use)
Processing block operation [000003] that involves replacements
New statement:
STMT00000 ( 0x000[E-] ... 0x006 )
               [000029] -A-XG------                         ▌  COMMA     int
               [000021] -A-XG------                         ├──▌  ASG       int
               [000015] D------N---                         │  ├──▌  LCL_VAR   int    V03 tmp1
               [000020] ---XG------                         │  └──▌  IND       int
               [000018] -----------                         │     └──▌  ADD       ref
               [000016] -----------                         │        ├──▌  LCL_VAR   ref    V00 this
               [000017] -----------                         │        └──▌  CNS_INT   long   8
               [000028] -A-XG------                         └──▌  ASG       int
               [000022] D------N---                            ├──▌  LCL_VAR   int    V04 tmp2
               [000027] ---XG------                            └──▌  IND       int
               [000025] -----------                               └──▌  ADD       ref
               [000023] -----------                                  ├──▌  LCL_VAR   ref    V00 this
               [000024] -----------                                  └──▌  CNS_INT   long   12
```

The logic is currently quite rudimentary when it comes to
holes/uncovered parts of the struct. For example, in the above case if
we add another unused field at the end of S then the result is:

```
STMT00000 ( 0x000[E-] ... 0x006 )
               [000003] -A-XG------                         ▌  ASG       struct (copy)
               [000002] D------N---                         ├──▌  LCL_VAR   struct<Program+S, 12> V01 loc0
               [000001] ---XG------                         └──▌  FIELD     struct Program:_field
               [000000] -----------                            └──▌  LCL_VAR   ref    V00 this          (last use)
Processing block operation [000003] that involves replacements
Struct operation is not fully covered by replaced fields. Keeping struct operation.
New statement:
STMT00000 ( 0x000[E-] ... 0x006 )
               [000030] -A-XG------                         ▌  COMMA     struct
               [000021] -A-XG------                         ├──▌  ASG       int
               [000015] D------N---                         │  ├──▌  LCL_VAR   int    V03 tmp1
               [000020] ---XG------                         │  └──▌  IND       int
               [000018] -----------                         │     └──▌  ADD       ref
               [000016] -----------                         │        ├──▌  LCL_VAR   ref    V00 this
               [000017] -----------                         │        └──▌  CNS_INT   long   8
               [000029] -A-XG------                         └──▌  COMMA     struct
               [000028] -A-XG------                            ├──▌  ASG       int
               [000022] D------N---                            │  ├──▌  LCL_VAR   int    V04 tmp2
               [000027] ---XG------                            │  └──▌  IND       int
               [000025] -----------                            │     └──▌  ADD       ref
               [000023] -----------                            │        ├──▌  LCL_VAR   ref    V00 this
               [000024] -----------                            │        └──▌  CNS_INT   long   12
               [000003] -A-XG------                            └──▌  ASG       struct (copy)
               [000002] D------N---                               ├──▌  LCL_VAR   struct<Program+S, 12> V01 loc0
               [000001] ---XG------                               └──▌  FIELD     struct Program:_field
               [000000] -----------                                  └──▌  LCL_VAR   ref    V00 this
```

In this case it would be significantly more efficient to copy only the
remainder, which is just a small part of the struct. However, in the
general case it is not easy to predict the most efficient way to do
this, and in some cases we cannot even represent the hole in JIT IR (if
it involves GC pointers), so I have left this for a future change for
now. Liveness should also be beneficial for that as there are many cases
where we would expect the remainder to be dead.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 20, 2023
@ghost ghost assigned jakobbotsch Apr 20, 2023
@ghost
Copy link

ghost commented Apr 20, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Add support for directly initializing and copying into replacements
instead of doing a struct local and read back. Physically promoted
struct locals used as sources are still handled conservatively (by first
writing them back to stack, then doing the copy).

For example, for a case like

void Foo()
{
    S s = _field;
    s.A = s.B + 3;
    Consume(s);
}

struct S
{
    public int A, B;
}

We see the following:

STMT00000 ( 0x000[E-] ... 0x006 )
               [000003] -A-XG------                         ▌  ASG       struct (copy)
               [000002] D------N---                         ├──▌  LCL_VAR   struct<Program+S, 8> V01 loc0
               [000001] ---XG------                         └──▌  FIELD     struct Program:_field
               [000000] -----------                            └──▌  LCL_VAR   ref    V00 this          (last use)
Processing block operation [000003] that involves replacements
New statement:
STMT00000 ( 0x000[E-] ... 0x006 )
               [000029] -A-XG------                         ▌  COMMA     int
               [000021] -A-XG------                         ├──▌  ASG       int
               [000015] D------N---                         │  ├──▌  LCL_VAR   int    V03 tmp1
               [000020] ---XG------                         │  └──▌  IND       int
               [000018] -----------                         │     └──▌  ADD       ref
               [000016] -----------                         │        ├──▌  LCL_VAR   ref    V00 this
               [000017] -----------                         │        └──▌  CNS_INT   long   8
               [000028] -A-XG------                         └──▌  ASG       int
               [000022] D------N---                            ├──▌  LCL_VAR   int    V04 tmp2
               [000027] ---XG------                            └──▌  IND       int
               [000025] -----------                               └──▌  ADD       ref
               [000023] -----------                                  ├──▌  LCL_VAR   ref    V00 this
               [000024] -----------                                  └──▌  CNS_INT   long   12

The logic is currently quite rudimentary when it comes to
holes/uncovered parts of the struct. For example, in the above case if
we add another unused field at the end of S then the result is:

STMT00000 ( 0x000[E-] ... 0x006 )
               [000003] -A-XG------                         ▌  ASG       struct (copy)
               [000002] D------N---                         ├──▌  LCL_VAR   struct<Program+S, 12> V01 loc0
               [000001] ---XG------                         └──▌  FIELD     struct Program:_field
               [000000] -----------                            └──▌  LCL_VAR   ref    V00 this          (last use)
Processing block operation [000003] that involves replacements
Struct operation is not fully covered by replaced fields. Keeping struct operation.
New statement:
STMT00000 ( 0x000[E-] ... 0x006 )
               [000030] -A-XG------                         ▌  COMMA     struct
               [000021] -A-XG------                         ├──▌  ASG       int
               [000015] D------N---                         │  ├──▌  LCL_VAR   int    V03 tmp1
               [000020] ---XG------                         │  └──▌  IND       int
               [000018] -----------                         │     └──▌  ADD       ref
               [000016] -----------                         │        ├──▌  LCL_VAR   ref    V00 this
               [000017] -----------                         │        └──▌  CNS_INT   long   8
               [000029] -A-XG------                         └──▌  COMMA     struct
               [000028] -A-XG------                            ├──▌  ASG       int
               [000022] D------N---                            │  ├──▌  LCL_VAR   int    V04 tmp2
               [000027] ---XG------                            │  └──▌  IND       int
               [000025] -----------                            │     └──▌  ADD       ref
               [000023] -----------                            │        ├──▌  LCL_VAR   ref    V00 this
               [000024] -----------                            │        └──▌  CNS_INT   long   12
               [000003] -A-XG------                            └──▌  ASG       struct (copy)
               [000002] D------N---                               ├──▌  LCL_VAR   struct<Program+S, 12> V01 loc0
               [000001] ---XG------                               └──▌  FIELD     struct Program:_field
               [000000] -----------                                  └──▌  LCL_VAR   ref    V00 this

In this case it would be significantly more efficient to copy only the
remainder, which is just a small part of the struct. However, in the
general case it is not easy to predict the most efficient way to do
this, and in some cases we cannot even represent the hole in JIT IR (if
it involves GC pointers), so I have left this for a future change for
now. Liveness should also be beneficial for that as there are many cases
where we would expect the remainder to be dead.

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch jakobbotsch marked this pull request as ready for review April 20, 2023 21:28
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @AndyAyersMS

This passed jit-experimental.

Diffs with physical promotion.

Diffs without old promotion. Looks like there's a ton of generated regex code that has the same kind of regression in x86 that ends up disproportionally affecting the benchmarks diffs. Appears to be a case where the promoted fields end up without registers, and the moved def of the field appears to no longer be a candidate for having a single spill def location. I will investigate further tomorrow, but this PR should be good to review now (also need to address some formatting issues).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I am not clear on how the apparently flow-sensitive values of Replacement.NeedsReadBack and 'Replacement.NeedsWriteBack` are handled. Say a struct with promotions is assigned to differently in then/else and then some promoted piece is accessed below the join point, how does that access know if it needs to be read back?

I suppose the answer is that immediately after each assignment if a read back is needed then you just do it there instead of trying to defer, instead of trying to find "optimal" placement downstream somewhere...?

src/coreclr/jit/promotion.cpp Show resolved Hide resolved
src/coreclr/jit/promotion.cpp Outdated Show resolved Hide resolved
// Parameters:
// lclNum - the local
//
void IncrementRefCount(unsigned lclNum)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Share this with local address visitor?

Also the local address visitor's UpdateEarlyRefCount does things with weighted ref counts which would be difficult to duplicate here -- is there any adverse interaction caused by not updating them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Share this with local address visitor?

Will do.

Also the local address visitor's UpdateEarlyRefCount does things with weighted ref counts which would be difficult to duplicate here -- is there any adverse interaction caused by not updating them?

The ref counts we are updating here are only for "existing" locals that appear under assignments, so I don't think the weighted ref counts are problematic.

The remainder of the pass is not handling the ref counts properly for the locals it introduces, I will submit a separate change to get that sorted out. There won't be an adverse effect of not updating the weighted ref count since we only use that to undo "old" promotion (the other use of them, for omitting some implicit byref copies, was replaced by liveness if you recall).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced an LclVarDsc::incLvRefCntSaturating and used it from both. Didn't seem too easy to factor the rest of the promotion-related common logic due to the weighted ref-count logic you mentioned.

src/coreclr/jit/promotion.cpp Show resolved Hide resolved
{
JITDUMP("Copy [%06u] is between two physically promoted locals with replacements\n",
Compiler::dspTreeID(asg));
JITDUMP("*** Conservative: Phys<->phys copies not yet supported; inserting conservative write-back\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case uncommon or something you want to work on later?

Would it be interesting to check for self assignment or do we just not see those?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to handle sources in an upcoming change (I have most of this code written). Will keep in mind to add a check for self assignments then.

if (!covered)
{
JITDUMP("Struct operation is not fully covered by replaced fields. Keeping struct operation.\n");
result.AddStatement(asg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am confused here -- it seems like result already accumulated IR for the handled cases and now we're appending a full copy, but some of the copied bits may be stale if we didn't write them back to the source -- but we know not to read from the corresponding bits in the destination?

If so maybe expand on the comment below to say "For handled replacements, we must not read back..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copied bits won't be stale here since this PR always (conservatively) writes the source replacements back to its struct local to begin with, but yes, for the replacements that were handled we won't read them back from the struct local due to the loop below, so it wouldn't matter even if they were.

Once we get to handling sources this "covering" copy will need to happen before the copying of the source replacements since otherwise we would overwrite the fresh bits (from the source's replacements) with stale bits (from the source's struct local).

@jakobbotsch
Copy link
Member Author

Overall I am not clear on how the apparently flow-sensitive values of Replacement.NeedsReadBack and 'Replacement.NeedsWriteBack` are handled. Say a struct with promotions is assigned to differently in then/else and then some promoted piece is accessed below the join point, how does that access know if it needs to be read back?

I suppose the answer is that immediately after each assignment if a read back is needed then you just do it there instead of trying to defer, instead of trying to find "optimal" placement downstream somewhere...?

Currently the invariant is that at the entrance to all BBs the replacement locals always contain the freshest value. So there is "resolution" done at the end of each BB to make that so if NeedsReadBack is true:

for (unsigned i = 0; i < numLocals; i++)
{
if (replacements[i] == nullptr)
{
continue;
}
for (Replacement& rep : *replacements[i])
{
assert(!rep.NeedsReadBack || !rep.NeedsWriteBack);
if (rep.NeedsReadBack)
{
JITDUMP("Reading back replacement V%02u.[%03u..%03u) -> V%02u at the end of " FMT_BB "\n", i,
rep.Offset, rep.Offset + genTypeSize(rep.AccessType), rep.LclNum, bb->bbNum);
GenTree* dst = m_compiler->gtNewLclvNode(rep.LclNum, rep.AccessType);
GenTree* src = m_compiler->gtNewLclFldNode(i, rep.AccessType, rep.Offset);
GenTree* asg = m_compiler->gtNewAssignNode(dst, src);
m_compiler->fgInsertStmtNearEnd(bb, m_compiler->fgNewStmtFromTree(asg));
rep.NeedsReadBack = false;
}
rep.NeedsWriteBack = true;
}
}

There's definitely more clever stuff we can do to optimize the insertion of the read backs/write backs.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr superpmi-replay

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member Author

Looks like there's a ton of generated regex code that has the same kind of regression in x86 that ends up disproportionally affecting the benchmarks diffs. Appears to be a case where the promoted fields end up without registers, and the moved def of the field appears to no longer be a candidate for having a single spill def location.

#85251 is an attempt at fixing the problem and resolves the regression for the collection:

[15:31:32] 10,193 contexts with diffs (9,045 improvements, 522 regressions, 626 same size)
[15:31:32]   -46,981/+7,885 bytes

@jakobbotsch
Copy link
Member Author

@jakobbotsch jakobbotsch merged commit 2e0033c into dotnet:main Apr 25, 2023
@jakobbotsch jakobbotsch deleted the physical-promotion-decompose-assignments branch April 25, 2023 13:57
@ghost ghost locked as resolved and limited conversation to collaborators May 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants