Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Configuring low precision training in torchtune #504

Closed
wants to merge 1 commit into from

Conversation

rohan-varma
Copy link
Member

Context

  • See RFC.

Changelog

  • ...

Test plan

  • ....

Copy link

pytorch-bot bot commented Mar 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/504

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5a037e9 with merge base 9c75d48 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 14, 2024
Copy link

netlify bot commented Mar 14, 2024

Deploy Preview for torchtune-preview ready!

Name Link
🔨 Latest commit 5a037e9
🔍 Latest deploy log https://app.netlify.com/sites/torchtune-preview/deploys/65f38f6390ed570008ee180f
😎 Deploy Preview https://deploy-preview-504--torchtune-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.


#### TL;DR
- Single device (both full finetune and LoRA) and multi device will support a flag, `dtype`, that can be either [bf16, fp32] to configure low precision training. This will initialize the model in the lower precision, so all parameters, activations, gradients, and opt. states will be in this precision to optimize memory savings. We will not enable torch.autocast.
- We will actively de-invest in fp16 training since most recent HW has support for bf16. In terms of consumer HW arches we'd like to support, 4090, 3090, A6000 support bf16, only T4 does not support bf16. This means in particular we won't have a memory efficient ( < 16GB) finetuning solution that runs reliably on T4's (but we will for the other mentioned consumer GPUs).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de-invest in fp16 for MVP specifically

@rohan-varma rohan-varma changed the title Create low_prec.md [RFC] Configuring low precision training in torchtune Mar 15, 2024
@joecummings joecummings added the rfc Request for comments label Mar 18, 2024
@rohan-varma rohan-varma closed this May 6, 2024
@rohan-varma rohan-varma deleted the rohan-varma-patch-5 branch May 6, 2024 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. rfc Request for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants