Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformations: Implement stencil inlining. #2615

Merged
merged 18 commits into from
Dec 10, 2024
Merged

Conversation

PapyChacal
Copy link
Collaborator

Apologies for the monster PR; I might be able to split in two 🤔

@PapyChacal PapyChacal added the transformations Changes or adds a transformatio label May 21, 2024
@PapyChacal PapyChacal self-assigned this May 21, 2024
Copy link

codecov bot commented May 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.42%. Comparing base (7ee976d) to head (994f9b0).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2615      +/-   ##
==========================================
+ Coverage   90.41%   90.42%   +0.01%     
==========================================
  Files         471      472       +1     
  Lines       59138    59271     +133     
  Branches     5611     5638      +27     
==========================================
+ Hits        53471    53598     +127     
- Misses       4224     4228       +4     
- Partials     1443     1445       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@PapyChacal PapyChacal marked this pull request as ready for review May 21, 2024 16:24
@AntonLydike AntonLydike changed the title transformations: implement stencil inlining. transformations: Implement stencil inlining. May 22, 2024
@tobiasgrosser
Copy link
Contributor

How nice. I am curious about the performance numbers.

xdsl/tools/command_line_tool.py Outdated Show resolved Hide resolved
@PapyChacal
Copy link
Collaborator Author

How nice. I am curious about the performance numbers.

Me too!

There is some polishing to do, it does not seem to work exactly as expected on big kernels.
Because of my KISS implementation, this pass as-is also combinatorial explode on big kernels.
So I'll first integrate with #2623, to cut off those explosions and be able to pinpoint where exactly I missed something still!

@PapyChacal PapyChacal marked this pull request as draft May 22, 2024 12:42
@PapyChacal PapyChacal force-pushed the emilien/stencil-inlining branch from fa327fe to 7453f78 Compare May 22, 2024 12:45
@PapyChacal PapyChacal force-pushed the emilien/stencil-inlining branch from 3f78657 to c98e78c Compare May 27, 2024 12:56
@PapyChacal PapyChacal force-pushed the emilien/stencil-inlining branch from c98e78c to 6570843 Compare May 27, 2024 12:58
Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now ready to review?

@PapyChacal
Copy link
Collaborator Author

@georgebisbas

Is this now ready to review?

Just finished my checks, it is now!

@tobiasgrosser

How nice. I am curious about the performance numbers.

Here they are:

kernel OEC xDSL xDSL w/ inlining Relative error w/ inlining
laplace 332 313 326 3.19949e-11
fastwavesuv 109 409 122 6.31987e-08
fvtp2d_flux 76 525 72 4.16758e-16
fvtp2d_qi 107 549 80 9.55451e-13
fvtp2d_qj 137 987 131 0
hadvuv 91 585 90 3.41916e-08
hadvuv5th 109 570 109 4.04618e-08
hdiffsa 74 294 83 4.44881e-12
nh_p_grad 94 323 104 1.57696e-13
p_grad_c 57 158 67 1.34918e-14
uvbke 28 48 28 3.77537e-16

@PapyChacal PapyChacal marked this pull request as ready for review May 28, 2024 11:08
@tobiasgrosser
Copy link
Contributor

Nice. The performance looks good. Why do we get a relative error here? Should inlining not perform exactly the same compuation?

@georgebisbas
Copy link
Contributor

Are these seconds?

@PapyChacal
Copy link
Collaborator Author

Nice. The performance looks good. Why do we get a relative error here? Should inlining not perform exactly the same compuation?

The relative error is xDSL inlined vs OEC inlined.

I'm not sure what exactly causes the slight differences yet, but could look into it; it might be quite the rabbit hole, given that different versions of CUDA, MLIR and clang are at play.

Regarding inlining doing the same computations; I'm not confident about OEC - or me missing something - on this side yet. In my tests, OEC plains out crashes on some examples if I don't use inlining. On some other examples that run without issue, OEC's inlining appear to change the results relatively significantly; I could look into that too, most likely in priority to xDSL vs old MLIR.

That's why I reported the relative error of both frameworks with inlining enabled for now. It is the case without surprise from OEC and at least demonstrating that things are consistent there.

I don't mind waiting until all those grey areas are clarified if anybody prefers

@PapyChacal
Copy link
Collaborator Author

PapyChacal commented May 28, 2024

Are these seconds?

milliseconds! Just the first thing I got to work on that side, I can now fine-tune if anyone wants to see different measures.

Those are 512 iteratons over 64x64x64 domains with a halo size of 4 in all directions (i.e. 72x72x72 buffers, computation over the central 64x64x64)

NB: 512 iterations without bufferswapping, just repeating the same output buffer update from the same inputs. I'm actually not sure how this influences performance measurements on GPU 🤔 But FWIW, both frameworks are measured the same way here.

Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with the reported relative error, so I approve

@superlopuh
Copy link
Member

What's the plan for this?

@superlopuh
Copy link
Member

Since there's no activity, I'll close this, feel free to reopen and merge!

@superlopuh superlopuh closed this Nov 3, 2024
@n-io n-io reopened this Dec 4, 2024
@superlopuh
Copy link
Member

@n-io, would it be possible to split this into smaller reviewable chunks and to make the tests less integrationy and more unit-y? As it stands, if someone introduces a bug in the pass, I have no idea where to begin debugging it. It would be great to have clear inputs and outputs for each of the rewrite patterns.

@n-io
Copy link
Collaborator

n-io commented Dec 4, 2024

@n-io, would it be possible to split this into smaller reviewable chunks and to make the tests less integrationy and more unit-y? As it stands, if someone introduces a bug in the pass, I have no idea where to begin debugging it.

I am faced with a very similar challenge regarding the suggestion you are making.

@superlopuh
Copy link
Member

Do you mean you don't know where to begin making the tests more targeted to the individual rewrites?

@n-io
Copy link
Collaborator

n-io commented Dec 10, 2024

@superlopuh Having had a look at this again, what I'm actually not fully clear about is that from my pov the filecheck tests appear to already be in a form similar to what you're suggesting, unit test-like rather than integration-like. For instance, the last test covers the pass operating on the stencil.buffer op, which hadn't been included in any previous tests. There is one for dyn_access, and some to cover small unit test-like specific scenarios.

) and not any(
# Don't inline any dynamic accesses.
isinstance(use.operation, DynAccessOp)
for consumer_operand in consumer.operands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
for consumer_operand in consumer.operands
for consumer_operand, arg in zip(consumer.operands, consumer.region.block.args, strict=True)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code is doing a lookup rather than a zip at this point.

Comment on lines 105 to 111
for operand in consumer.operands:
if isinstance(operand.owner, Operation):
if (operand.owner is not producer) and is_before_in_block(
producer, operand.owner
):
return False
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for operand in consumer.operands:
if isinstance(operand.owner, Operation):
if (operand.owner is not producer) and is_before_in_block(
producer, operand.owner
):
return False
return True
return not any(
isinstance(operand.owner, Operation) and (operand.owner is not producer) and is_before_in_block(producer, operand.owner)
for operand in consumer.operands
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you resolved this but it's not a big deal

Copy link
Member

@superlopuh superlopuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the tests could be more targeted but if you promise to help fix any bugs that might arise in this part of the codebase in the future I'm good to merge :)

n-io and others added 4 commits December 10, 2024 11:56
Co-authored-by: Sasha Lopoukhine <superlopuh@gmail.com>
Co-authored-by: Sasha Lopoukhine <superlopuh@gmail.com>
@n-io n-io merged commit fd238ba into main Dec 10, 2024
14 checks passed
@n-io n-io deleted the emilien/stencil-inlining branch December 10, 2024 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
transformations Changes or adds a transformatio
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants