-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for Pipeline Parallelism #279
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits. LGTM, thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 Awesome work for enabling PP, and improving training tests!! Just left some small nits.
@@ -177,6 +195,15 @@ def main(): | |||
else: | |||
model_args, data_args, training_args = parser.parse_args_into_dataclasses() | |||
|
|||
if model_args.use_auth_token is not None: | |||
warnings.warn( | |||
"The `use_auth_token` argument is deprecated and will be removed in v4.34. Please use `token` instead.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which package? transformers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But I'm not in favor of adding that. These files are updated automatically by cloning the examples from Transformers. This is a bad side effect but I think that's ok.
I addressed the comments. @dacorvo @JingyaHuang wdyt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let's get it merged!
This PR adds support for Pipeline Parallelism on a single node for the Llama architecture.
Multi-node training and other possible relevant architectures will be added in later PRs.
Training
PP works
Variants
Only tested via tests here (it will be tested more in the PR for multi-node training).
Checkpointing
Tests
tests/distributed/test_model_parallelization.py
tests/test_examples.py
tests/distributed/test_common.py
Other