Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One-line change to correctly dispatch to cpu function for inference #2881

Closed
wants to merge 1 commit into from

Conversation

TroyGarden
Copy link
Contributor

Summary: # context

Reviewed By: dstaay-fb

Differential Revision: D48574563

Copy link

netlify bot commented Jul 23, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 44ad142
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66a0867d6dc97400081adbd7
😎 Deploy Preview https://deploy-preview-2881--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48574563

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48574563

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48574563

TroyGarden added a commit to TroyGarden/FBGEMM that referenced this pull request Jul 24, 2024
…ytorch#2881)

Summary:
Pull Request resolved: pytorch#2881

# context
* the fundamental issue is that a "dispatch_to_cpu" function is calling the "autograd" function
* a side issue is the poor naming: `permute_pooled_embs_auto_grad` sounds like an "autograd version" but actually as an operator the real backend (CPU, GPU, META, AUTOGRAD, etc.) is determined by the dispatcher.
* this defact exists from day-1 since this operator was developed (D31271923).
* the impact is that in the inference flow where the autograd is not needed, although the dispatcher does call the CPU version of this operator, the actual function calls the autograd function, which is not desired.

# how to do it properly:
* For a user-faced operator, we first need to give it a good name, like `permute_multi_embedding`.
* then we need to implement functions for the necessary backends, commonly 4: CPU, CUDA, META, and AUTOGRAD
* it's better to name the function as {operator_name}_{backend}, and **very important to link them correctly**, here is a good example: D48720379
* in training, the dispatcher will **ALWAYS** pickup the autograd version, then [calls cuda/cpu/meta version from the autograd function](https://fburl.com/code/2d1acuop) based on device or other context.
* in inference, the dispatcher will ignore the autograd, and directly calls the cpu/cuda/meta version

Reviewed By: dstaay-fb, sryap

Differential Revision: D48574563
…ytorch#2881)

Summary:
Pull Request resolved: pytorch#2881

# context
* the fundamental issue is that a "dispatch_to_cpu" function is calling the "autograd" function
* a side issue is the poor naming: `permute_pooled_embs_auto_grad` sounds like an "autograd version" but actually as an operator the real backend (CPU, GPU, META, AUTOGRAD, etc.) is determined by the dispatcher.
* this defact exists from day-1 since this operator was developed (D31271923).
* the impact is that in the inference flow where the autograd is not needed, although the dispatcher does call the CPU version of this operator, the actual function calls the autograd function, which is not desired.

# how to do it properly:
* For a user-faced operator, we first need to give it a good name, like `permute_multi_embedding`.
* then we need to implement functions for the necessary backends, commonly 4: CPU, CUDA, META, and AUTOGRAD
* it's better to name the function as {operator_name}_{backend}, and **very important to link them correctly**, here is a good example: D48720379
* in training, the dispatcher will **ALWAYS** pickup the autograd version, then [calls cuda/cpu/meta version from the autograd function](https://fburl.com/code/2d1acuop) based on device or other context.
* in inference, the dispatcher will ignore the autograd, and directly calls the cpu/cuda/meta version

Reviewed By: dstaay-fb, sryap

Differential Revision: D48574563
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48574563

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 500c5fa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants