Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nightly tests for qat_lora_fintune_distributed #2085

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

andrewor14
Copy link
Contributor

@andrewor14 andrewor14 commented Nov 27, 2024

Summmary: These were not caught in CI because the tests require torchao 0.7+. Here is an example of the failure: https://github.com/pytorch/torchtune/actions/runs/12037600747/job/33561415524

Test Plan:
(on local devgpu)

(pytorch-3.10) andrewor@devgpu022:~/local/torchtune $ pytest -m integration_test tests/recipes/test_qat_lora_finetune_distributed.py
Expected artifacts for test run are:
small-ckpt-tune-03082024.pt
small-ckpt-meta-03082024.pt
small-ckpt-hf-03082024.pt
small-ckpt-tune-llama3-05052024.pt
small-ckpt-hf-reward-07122024.pt
small-ckpt-meta-vision-10172024.pt
small-ckpt-hf-vision-10172024.pt
tokenizer.model
tokenizer_llama3.model
File already exists locally: /tmp/test-artifacts/small-ckpt-tune-03082024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-meta-03082024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-hf-03082024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-tune-llama3-05052024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-hf-reward-07122024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-meta-vision-10172024.pt
File already exists locally: /tmp/test-artifacts/small-ckpt-hf-vision-10172024.pt
File already exists locally: /tmp/test-artifacts/tokenizer.model
File already exists locally: /tmp/test-artifacts/tokenizer_llama3.model
=================================================================================================== test session starts ====================================================================================================
platform linux -- Python 3.10.13, pytest-7.4.0, pluggy-1.4.0
rootdir: /data/users/andrewor/torchtune
configfile: pyproject.toml
plugins: hypothesis-6.87.3, anyio-4.2.0, integration-0.2.3, xdist-3.6.1, mock-3.14.0, cov-5.0.0
collected 4 items                                                                                                                                                                                                          

tests/recipes/test_qat_lora_finetune_distributed.py ....                                                                                                                                                             [100%]

=============================================================================================== 4 passed in 75.60s (0:01:15) ===============================================================================================

These were not caught in CI because the tests require torchao 0.7+.
Here is an example of the failure:

https://github.com/pytorch/torchtune/actions/runs/12037600747/job/33561415524
Copy link

pytorch-bot bot commented Nov 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2085

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b47896d with merge base 437a8ff (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 27, 2024
Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!

@ebsmothers ebsmothers merged commit ecf8d22 into pytorch:main Nov 27, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants