-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TPU] XLA changes for finetuning #110
Conversation
Would it be simpler to have a separate file for tpu finetuning (for example, |
cc @Liyang90 for review |
TODO: if we decide to keep these changes in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some early questions!
When using TPUs, the current code in For example, let's say we want to run finetune on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making these distinctions is fine. Because they are still true in the general multi-node case, not just with XLA.
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
@carmocca I noticed this PR is now closed. Do we no longer need to xla changes? |
Sorry, it was an accident! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment. Rest looks good
This PR adds changes specific for finetuning on TPUs.
Used TPU v4-8 with
for stablelm-base-alpha-3b; resulting finetuning time ~= 4500s.