Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Recipes] Bunch of refactoring #511

Merged
merged 15 commits into from
Mar 20, 2024
Merged

[Recipes] Bunch of refactoring #511

merged 15 commits into from
Mar 20, 2024

Conversation

rohan-varma
Copy link
Member

@rohan-varma rohan-varma commented Mar 16, 2024

Context

Changelog

  • Remove full_bf16 and instead add support for dtype which now maps to the "full" low precision as mentioned in [RFC] Configuring low precision training in torchtune #504.
  • Disable support for fp16, raise an error if fp16 is set.
  • Raise an error if fp16 dtype is set in all recipes.
  • Unify memory logging / call to log.info(memory_stats_log(...)).

Test plan

  • CI
  • Run all recipes

Follow-ups

  • Documentation for dtype is not added / updated yet. Will be doing this as a fast follow on once all recipes are in a consistent state with regard to dtypes.

Copy link

pytorch-bot bot commented Mar 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/511

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0b4f481 with merge base 4f73f75 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 16, 2024
Copy link

netlify bot commented Mar 16, 2024

Deploy Preview for torchtune-preview ready!

Name Link
🔨 Latest commit 0b4f481
🔍 Latest deploy log https://app.netlify.com/sites/torchtune-preview/deploys/65fa1da94ee8430008d937ba
😎 Deploy Preview https://deploy-preview-511--torchtune-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

opt_state_dict=checkpoint_dict[utils.OPT_KEY]
if self._resume_from_checkpoint
else None,
opt_state_dict=(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think our linters that run in CI are consistent with VSCode autoformatters, hence changes like these sneaking in. Shall we look into this? cc @ebsmothers @NicolasHug

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VSCode autoformatter or pre-commit hooks? If the former I think it's low-pri tbh

if (
cfg.full_bf16
self._training_precision == torch.bfloat16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we also have this utility now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can use that util as-is since it will bail out for CPU devices, but I want this recipe to run on CPU for current CI.

@rohan-varma rohan-varma changed the title [LoRA single device] Refactor full_bf16 to dtype in config [Recipes] Bunch of refactoring Mar 19, 2024
@rohan-varma rohan-varma merged commit 8afa221 into main Mar 20, 2024
21 checks passed
@joecummings joecummings deleted the prec branch April 11, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants