[RFC] Configuring low precision training in torchtune #504

rohan-varma · 2024-03-14T23:59:28Z

Context

See RFC.

Changelog

...

Test plan

....

pytorch-bot · 2024-03-14T23:59:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/504

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5a037e9 with merge base 9c75d48 ():

NEW FAILURE - The following job has failed:

Lint / lint (3.10) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

netlify · 2024-03-14T23:59:46Z

✅ Deploy Preview for torchtune-preview ready!

Name	Link
🔨 Latest commit	`5a037e9`
🔍 Latest deploy log	https://app.netlify.com/sites/torchtune-preview/deploys/65f38f6390ed570008ee180f
😎 Deploy Preview	https://deploy-preview-504--torchtune-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

rohan-varma · 2024-03-15T00:03:08Z

low_prec.md

+
+#### TL;DR
+- Single device (both full finetune and LoRA) and multi device will support a flag, `dtype`, that can be either [bf16, fp32] to configure low precision training. This will initialize the model in the lower precision, so all parameters, activations, gradients, and opt. states will be in this precision to optimize memory savings. We will not enable torch.autocast.
+- We will actively de-invest in fp16 training since most recent HW has support for bf16. In terms of consumer HW arches we'd like to support, 4090, 3090, A6000 support bf16, only T4 does not support bf16. This means in particular we won't have a memory efficient ( < 16GB) finetuning solution that runs reliably on T4's (but we will for the other  mentioned consumer GPUs).


de-invest in fp16 for MVP specifically

Create low_prec.md

5a037e9

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 14, 2024

rohan-varma requested review from ebsmothers and kartikayk March 14, 2024 23:59

rohan-varma commented Mar 15, 2024

View reviewed changes

rohan-varma changed the title ~~Create low_prec.md~~ [RFC] Configuring low precision training in torchtune Mar 15, 2024

This was referenced Mar 16, 2024

[Recipes] Bunch of refactoring #511

Merged

Separate full finetune into multi-gpu and single device recipes #482

Merged

joecummings added the rfc Request for comments label Mar 18, 2024

rohan-varma closed this May 6, 2024

rohan-varma deleted the rohan-varma-patch-5 branch May 6, 2024 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Configuring low precision training in torchtune #504

[RFC] Configuring low precision training in torchtune #504

rohan-varma commented Mar 14, 2024

pytorch-bot bot commented Mar 14, 2024 •

edited

Loading

netlify bot commented Mar 14, 2024 •

edited

Loading

rohan-varma Mar 15, 2024

[RFC] Configuring low precision training in torchtune #504

[RFC] Configuring low precision training in torchtune #504

Conversation

rohan-varma commented Mar 14, 2024

Context

Changelog

Test plan

pytorch-bot bot commented Mar 14, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/504

❌ 1 New Failure

netlify bot commented Mar 14, 2024 • edited Loading

✅ Deploy Preview for torchtune-preview ready!

rohan-varma Mar 15, 2024

Choose a reason for hiding this comment

pytorch-bot bot commented Mar 14, 2024 •

edited

Loading

netlify bot commented Mar 14, 2024 •

edited

Loading