Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELAX][PASS] Annotate Custom Scope layout pass for Adreno GPU #17599

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

srkreddy1238
Copy link
Contributor

@srkreddy1238 srkreddy1238 commented Jan 21, 2025

Texture scope annotation is handled by

  • Layout conversion from 4D to 5D with convert_layout pass
  • Legalization of ops with Adreno specific legalization and fall back legalization
  • FuseOps & FuseTIR
  • Now, over the fused TIR annotate the texture scopes by hint_on_device
  • RealizeVDevice will take care of injecting to_device as needed.
  • Also, introduced SpecializeTIRParams to update the fused TIR the prim function buffer var map with new scope information.

Changes in FuseOps and FuseTIR are to forward op attr and op pattern info. This info is used for Texture specific scoping decisions.

Texture scope annotation is handled by
- Layout conversion from 4D to 5D with convert_layout pass
- Legalization of ops with Adreno specific legalization
- FuseOps & FuseTIR
- Now, over the fused TIR annotate the scopes by hint_on_device
- RealizeVDevice will take care of injecting to_device as needed.
- Also, introduced SpecializeTIRParams to update the fused
TIR the prim function buffer var map with new scope information.

Changes in FuseOps and FuseTIR are to forward op attr and op pattern info.
This info is used for Texture specific scoping decisions.
@srkreddy1238 srkreddy1238 force-pushed the annotate_texture_scope branch from fe15d5b to 1549733 Compare January 21, 2025 16:31
@srkreddy1238
Copy link
Contributor Author

@tvm-bot rerun

@tqchen
Copy link
Member

tqchen commented Jan 25, 2025

@Hzfengsy do you mind take a look given it touches FuseOps/TIR

@@ -141,7 +141,7 @@ inline int FindVDeviceIndexByTargetKind(const VDevice& vdevice, const IRDocsifie
int kind_index = 0;
for (size_t i = 0; i < vdevices.size(); ++i) {
auto vdev = Downcast<VDevice>(vdevices[i]);
if (vdev.same_as(vdevice)) {
if (vdev == vdevice) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we would like to deduplicate vdevice globally via ptr equality, is it possible to get the pass to generate the same vdevice instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be possible (I think RealizeVDevice pass does similar). Let me explore...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tqchen have made changes to maintain ptr equality for the pass (AnnotateCustomMemoryScope). But, we have an issue as detailed below.

The vdevices GlobalInfo is populated from the pass (AnnotateCustomMemoryScope). Now, when the mod is printed in python after this pass the ptr equality fails, i.e vdevice id printed as -1 ( vdevice="opencl:-1:global"). Additionally, printing mod just before returning AnnotateCustomMemoryScope or while in next pass (cpp) works fine.

Also, if the vdevices are populated from python (text cases) and accessed in cpp seem to be working fine.

Functionally, I can proceed here as further pipeline is in cpp. Pls advice.

@@ -1092,6 +1095,10 @@ def LegalizeOps(
legalization function is not registered. By default we don't print
warnings.

add_attributes : bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to compose instead? e.g. run attribute attach pass after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After legalization pass we don't have any trace of operator specific attributes.

include/tvm/relax/transform.h Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Jan 25, 2025

also cc @yongwww for memory scope related changes

Copy link
Member

@Hzfengsy Hzfengsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments

python/tvm/relax/transform/optimize_batchnorm.py Outdated Show resolved Hide resolved
src/relax/op/tensor/binary.cc Show resolved Hide resolved
@srkreddy1238
Copy link
Contributor Author

@tvm-bot rerun

Copy link
Member

@Hzfengsy Hzfengsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants