-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformations: Implement stencil inlining. #2615
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2615 +/- ##
==========================================
+ Coverage 90.41% 90.42% +0.01%
==========================================
Files 471 472 +1
Lines 59138 59271 +133
Branches 5611 5638 +27
==========================================
+ Hits 53471 53598 +127
- Misses 4224 4228 +4
- Partials 1443 1445 +2 ☔ View full report in Codecov by Sentry. |
How nice. I am curious about the performance numbers. |
Me too! There is some polishing to do, it does not seem to work exactly as expected on big kernels. |
fa327fe
to
7453f78
Compare
…from duplicated ones.
Co-authored-by: Sasha Lopoukhine <superlopuh@gmail.com>
3f78657
to
c98e78c
Compare
c98e78c
to
6570843
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this now ready to review?
Just finished my checks, it is now!
Here they are:
|
Nice. The performance looks good. Why do we get a relative error here? Should inlining not perform exactly the same compuation? |
Are these seconds? |
The relative error is xDSL inlined vs OEC inlined. I'm not sure what exactly causes the slight differences yet, but could look into it; it might be quite the rabbit hole, given that different versions of CUDA, MLIR and clang are at play. Regarding inlining doing the same computations; I'm not confident about OEC - or me missing something - on this side yet. In my tests, OEC plains out crashes on some examples if I don't use inlining. On some other examples that run without issue, OEC's inlining appear to change the results relatively significantly; I could look into that too, most likely in priority to xDSL vs old MLIR. That's why I reported the relative error of both frameworks with inlining enabled for now. It is the case without surprise from OEC and at least demonstrating that things are consistent there. I don't mind waiting until all those grey areas are clarified if anybody prefers |
milliseconds! Just the first thing I got to work on that side, I can now fine-tune if anyone wants to see different measures. Those are 512 iteratons over 64x64x64 domains with a halo size of 4 in all directions (i.e. 72x72x72 buffers, computation over the central 64x64x64) NB: 512 iterations without bufferswapping, just repeating the same output buffer update from the same inputs. I'm actually not sure how this influences performance measurements on GPU 🤔 But FWIW, both frameworks are measured the same way here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with the reported relative error, so I approve
What's the plan for this? |
Since there's no activity, I'll close this, feel free to reopen and merge! |
@n-io, would it be possible to split this into smaller reviewable chunks and to make the tests less integrationy and more unit-y? As it stands, if someone introduces a bug in the pass, I have no idea where to begin debugging it. It would be great to have clear inputs and outputs for each of the rewrite patterns. |
I am faced with a very similar challenge regarding the suggestion you are making. |
Do you mean you don't know where to begin making the tests more targeted to the individual rewrites? |
@superlopuh Having had a look at this again, what I'm actually not fully clear about is that from my pov the filecheck tests appear to already be in a form similar to what you're suggesting, unit test-like rather than integration-like. For instance, the last test covers the pass operating on the |
) and not any( | ||
# Don't inline any dynamic accesses. | ||
isinstance(use.operation, DynAccessOp) | ||
for consumer_operand in consumer.operands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
for consumer_operand in consumer.operands | |
for consumer_operand, arg in zip(consumer.operands, consumer.region.block.args, strict=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code is doing a lookup rather than a zip at this point.
xdsl/transforms/stencil_inlining.py
Outdated
for operand in consumer.operands: | ||
if isinstance(operand.owner, Operation): | ||
if (operand.owner is not producer) and is_before_in_block( | ||
producer, operand.owner | ||
): | ||
return False | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for operand in consumer.operands: | |
if isinstance(operand.owner, Operation): | |
if (operand.owner is not producer) and is_before_in_block( | |
producer, operand.owner | |
): | |
return False | |
return True | |
return not any( | |
isinstance(operand.owner, Operation) and (operand.owner is not producer) and is_before_in_block(producer, operand.owner) | |
for operand in consumer.operands | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you resolved this but it's not a big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the tests could be more targeted but if you promise to help fix any bugs that might arise in this part of the codebase in the future I'm good to merge :)
Co-authored-by: Sasha Lopoukhine <superlopuh@gmail.com>
Co-authored-by: Sasha Lopoukhine <superlopuh@gmail.com>
Apologies for the monster PR; I might be able to split in two 🤔