[Recipes] Bunch of refactoring #511

rohan-varma · 2024-03-16T08:09:17Z

Context

As part of [RFC] Configuring low precision training in torchtune #504, we'd like to revamp the way low precision is configured across recipes in torchtune.
Also does some refactoring to the recipes to ensure names for dtype are the same, fp16 is disabled across recipes, bf16 is checked, etc

Changelog

Remove full_bf16 and instead add support for dtype which now maps to the "full" low precision as mentioned in [RFC] Configuring low precision training in torchtune #504.
Disable support for fp16, raise an error if fp16 is set.
Raise an error if fp16 dtype is set in all recipes.
Unify memory logging / call to log.info(memory_stats_log(...)).

Test plan

CI
Run all recipes

Follow-ups

Documentation for dtype is not added / updated yet. Will be doing this as a fast follow on once all recipes are in a consistent state with regard to dtypes.

pytorch-bot · 2024-03-16T08:09:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/511

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0b4f481 with merge base 4f73f75 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

netlify · 2024-03-16T08:09:33Z

✅ Deploy Preview for torchtune-preview ready!

Name	Link
🔨 Latest commit	`0b4f481`
🔍 Latest deploy log	https://app.netlify.com/sites/torchtune-preview/deploys/65fa1da94ee8430008d937ba
😎 Deploy Preview	https://deploy-preview-511--torchtune-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

rohan-varma · 2024-03-16T08:12:29Z

recipes/lora_finetune_single_device.py

-            opt_state_dict=checkpoint_dict[utils.OPT_KEY]
-            if self._resume_from_checkpoint
-            else None,
+            opt_state_dict=(


I don't think our linters that run in CI are consistent with VSCode autoformatters, hence changes like these sneaking in. Shall we look into this? cc @ebsmothers @NicolasHug

VSCode autoformatter or pre-commit hooks? If the former I think it's low-pri tbh

ebsmothers · 2024-03-19T21:14:32Z

recipes/lora_finetune_single_device.py

        if (
-            cfg.full_bf16
+            self._training_precision == torch.bfloat16


Note we also have this utility now

I don't think I can use that util as-is since it will bail out for CPU devices, but I want this recipe to run on CPU for current CI.

rohan-varma added 3 commits March 14, 2024 23:52

Upd

ea16903

Merge branch 'main' of github.com:pytorch-labs/torchtune

2144c4f

Upd

0104e36

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 16, 2024

upd

957c0d0

rohan-varma commented Mar 16, 2024

View reviewed changes

rohan-varma requested a review from ebsmothers March 16, 2024 08:13

rohan-varma added 3 commits March 16, 2024 01:14

upd

44d8646

Merge branch 'main' of github.com:pytorch-labs/torchtune into prec

3ba1040

Upd

64e9641

ebsmothers reviewed Mar 19, 2024

View reviewed changes

ebsmothers approved these changes Mar 19, 2024

View reviewed changes

rohan-varma added 2 commits March 19, 2024 14:42

Upd

1394509

Upd

65a8d77

rohan-varma changed the title ~~[LoRA single device] Refactor full_bf16 to dtype in config~~ [Recipes] Bunch of refactoring Mar 19, 2024

rohan-varma added 6 commits March 19, 2024 15:31

Upd

bde6ffc

Upd

2ed4a52

Upd

7184b61

Upd

901b2e7

Upd

a59091d

Upd

0b4f481

rohan-varma merged commit 8afa221 into main Mar 20, 2024
21 checks passed

joecummings deleted the prec branch April 11, 2024 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Recipes] Bunch of refactoring #511

[Recipes] Bunch of refactoring #511

rohan-varma commented Mar 16, 2024 •

edited

Loading

pytorch-bot bot commented Mar 16, 2024 •

edited

Loading

netlify bot commented Mar 16, 2024 •

edited

Loading

rohan-varma Mar 16, 2024

ebsmothers Mar 19, 2024

ebsmothers Mar 19, 2024

rohan-varma Mar 19, 2024

[Recipes] Bunch of refactoring #511

[Recipes] Bunch of refactoring #511

Conversation

rohan-varma commented Mar 16, 2024 • edited Loading

Context

Changelog

Test plan

Follow-ups

pytorch-bot bot commented Mar 16, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/511

✅ No Failures

netlify bot commented Mar 16, 2024 • edited Loading

✅ Deploy Preview for torchtune-preview ready!

rohan-varma Mar 16, 2024

Choose a reason for hiding this comment

ebsmothers Mar 19, 2024

Choose a reason for hiding this comment

ebsmothers Mar 19, 2024

Choose a reason for hiding this comment

rohan-varma Mar 19, 2024

Choose a reason for hiding this comment

rohan-varma commented Mar 16, 2024 •

edited

Loading

pytorch-bot bot commented Mar 16, 2024 •

edited

Loading

netlify bot commented Mar 16, 2024 •

edited

Loading