-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pippy deferred init #2310
Pippy deferred init #2310
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2310 +/- ##
==========================================
- Coverage 69.82% 69.45% -0.37%
==========================================
Files 77 77
Lines 3420 3438 +18
Branches 57 57
==========================================
Hits 2388 2388
- Misses 1029 1047 +18
Partials 3 3
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@@ -46,37 +39,42 @@ pippy: | |||
input_names: ['input_ids'] # input arg names to the model, this is required for FX tracing | |||
model_type: "HF" # set the model type to HF if you are using Huggingface model other wise leave it blank or any other model you use. | |||
rpc_timeout: 1800 | |||
num_worker_threads: 512 #number of threads for rpc worker usually 512 is a good number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to such large value for pippy to work? Will be good to verify, if there is any special requirement override the default value of 16 for RPC num_worker_threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, this number has been set mostly for training to avoid deadlock caused by RPC thread pool drain. For inference it seems it can be lowered like 4x. There is no clear number guidance here checked with Ke.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Hamid. I have left few review comments based on the testing
@@ -46,6 +46,6 @@ def hf_model(model_str): | |||
repo_id=args.model_name, | |||
revision=args.revision, | |||
cache_dir=args.model_path, | |||
use_auth_token=True, | |||
use_auth_token=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restrict to pytorch only files by passing allow_patterns param with below values:
allow_patterns = [".json", ".pt", ".bin", ".txt", "*.model"]
|
||
handler: | ||
max_length: 80 # max length of tokens for tokenizer in the handler | ||
model_name: "/home/ubuntu/serve/examples/large_models/Huggingface_pippy/model/models--facebook--opt-30b/snapshots/ceea0a90ac0f6fae7c2c34bcb40477438c152546" #the path to the checkpoints, in this example downloaded file. Please change to your model path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a hard coded path for model. Can we update it with HF cache dir to make it more generic for user to follow? We can set env
HUGGINGFACE_HUB_CACHE and TRANSFORMERS_CACHE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific to this example so if someone use the download script here, then they will end up having the path. Overall we are expecting user to pass the checkpoint path.
|
||
handler: | ||
model_path: "/home/ubuntu/serve/examples/large_models/Huggingface_pippy/model/models--facebook--opt-30b/snapshots/ceea0a90ac0f6fae7c2c34bcb40477438c152546" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
||
# Check that the required keys are present in the "pippy" section | ||
assert ( | ||
"chunks" in ctx.model_yaml_config["pippy"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use default value of 1 if not specified for chunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR adds deferred init to pippy for large model inference.