Pippy deferred init #2310

HamidShojanazeri · 2023-05-03T04:59:48Z

Description

This PR adds deferred init to pippy for large model inference.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ X] New feature (non-breaking change which adds functionality)
[ X] This change requires a documentation update

codecov · 2023-05-03T05:49:14Z

Codecov Report

Merging #2310 (2e05a27) into master (614bfc0) will decrease coverage by 0.37%.
The diff coverage is 0.00%.

❗ Current head 2e05a27 differs from pull request most recent head 2082720. Consider uploading reports for the commit 2082720 to get more accurate results

@@            Coverage Diff             @@
##           master    #2310      +/-   ##
==========================================
- Coverage   69.82%   69.45%   -0.37%     
==========================================
  Files          77       77              
  Lines        3420     3438      +18     
  Branches       57       57              
==========================================
  Hits         2388     2388              
- Misses       1029     1047      +18     
  Partials        3        3

Impacted Files	Coverage Δ
ts/handler_utils/distributed/pt_pippy.py	`0.00% <0.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

examples/large_models/Huggingface_pippy/Readme.md

chauhang · 2023-05-03T16:10:46Z

examples/large_models/Huggingface_pippy/Readme.md

@@ -46,37 +39,42 @@ pippy:
    input_names: ['input_ids'] # input arg names to the model, this is required for FX tracing
    model_type: "HF" # set the model type to HF if you are using Huggingface model other wise leave it blank or any other model you use.
    rpc_timeout: 1800
+    num_worker_threads: 512 #number of threads for rpc worker usually 512 is a good number


Do we need to such large value for pippy to work? Will be good to verify, if there is any special requirement override the default value of 16 for RPC num_worker_threads

right, this number has been set mostly for training to avoid deadlock caused by RPC thread pool drain. For inference it seems it can be lowered like 4x. There is no clear number guidance here checked with Ke.

examples/large_models/Huggingface_pippy/model-config.yaml

examples/large_models/Huggingface_pippy/Readme.md

ts/handler_utils/distributed/pt_pippy.py

chauhang

Thanks Hamid. I have left few review comments based on the testing

chauhang · 2023-05-04T07:25:10Z

examples/large_models/Huggingface_pippy/Download_model.py

@@ -46,6 +46,6 @@ def hf_model(model_str):
    repo_id=args.model_name,
    revision=args.revision,
    cache_dir=args.model_path,
-    use_auth_token=True,
+    use_auth_token=False,


Restrict to pytorch only files by passing allow_patterns param with below values:

allow_patterns = [".json", ".pt", ".bin", ".txt", "*.model"]

lxning · 2023-05-04T17:58:14Z

examples/large_models/Huggingface_pippy/Readme.md


 handler:
    max_length: 80 # max length of tokens for tokenizer in the handler
+    model_name: "/home/ubuntu/serve/examples/large_models/Huggingface_pippy/model/models--facebook--opt-30b/snapshots/ceea0a90ac0f6fae7c2c34bcb40477438c152546" #the path to the checkpoints, in this example downloaded file. Please change to your model path.


Here is a hard coded path for model. Can we update it with HF cache dir to make it more generic for user to follow? We can set env
HUGGINGFACE_HUB_CACHE and TRANSFORMERS_CACHE

This is specific to this example so if someone use the download script here, then they will end up having the path. Overall we are expecting user to pass the checkpoint path.

lxning · 2023-05-04T17:58:34Z

examples/large_models/Huggingface_pippy/model-config.yaml


 handler:
+    model_path: "/home/ubuntu/serve/examples/large_models/Huggingface_pippy/model/models--facebook--opt-30b/snapshots/ceea0a90ac0f6fae7c2c34bcb40477438c152546"


examples/large_models/Huggingface_pippy/Readme.md

chauhang · 2023-05-09T22:00:32Z

ts/handler_utils/distributed/pt_pippy.py

+
+    # Check that the required keys are present in the "pippy" section
+    assert (
+        "chunks" in ctx.model_yaml_config["pippy"]


Can we use default value of 1 if not specified for chunks?

chauhang

Left few comments

agunapal

LGTM

HamidShojanazeri added 8 commits May 3, 2023 04:30

remove HF auth

8248f2d

update steps

9cc58e0

add model checkpoint path

0dbf73e

add model checkpoint path

49feaf2

add deferred init

3493210

add deferred init

b86a0fa

fix keys

26068ba

clean up

2684f9b

HamidShojanazeri requested review from msaroufim, chauhang, agunapal and lxning May 3, 2023 04:59