deepspeed base handler and example #2218

lxning · 2023-04-04T04:02:14Z

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
reg.txt

Logs for Test A

Test B
Logs for Test B

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

codecov · 2023-04-04T04:25:19Z

Codecov Report

Merging #2218 (f7afc24) into master (a9e218a) will decrease coverage by 0.67%.
The diff coverage is 0.00%.

❗ Current head f7afc24 differs from pull request most recent head 417e8be. Consider uploading reports for the commit 417e8be to get more accurate results

@@            Coverage Diff             @@
##           master    #2218      +/-   ##
==========================================
- Coverage   70.40%   69.73%   -0.67%     
==========================================
  Files          75       77       +2     
  Lines        3392     3420      +28     
  Branches       57       57              
==========================================
- Hits         2388     2385       -3     
- Misses       1001     1032      +31     
  Partials        3        3

Impacted Files	Coverage Δ
ts/handler_utils/distributed/deepspeed.py	`0.00% <0.00%> (ø)`
...orch_handler/distributed/base_deepspeed_handler.py	`0.00% <0.00%> (ø)`

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

msaroufim

Left a bunch of feedback, the two main things I'm worried about is how you will test these changes in CI and whether there is a way to control some of the code duplication across this PR and Hamid's

msaroufim · 2023-04-10T17:32:40Z

ts/handler_utils/distributed/huggingface.py

Why do we need this empty file?

will remove it.

msaroufim · 2023-04-10T18:28:10Z

ts/torch_handler/distributed/base_deepspeed.py

+    def initialize(self, ctx: Context):
+        ds_engine = get_ds_engine(self.model, ctx)
+        self.model = ds_engine.module
+        self.device = int(os.getenv("LOCAL_RANK", 0))


just trying to learn, does deepspeed automatically figure out the world size?

yes, it read env to get rank and world size

msaroufim · 2023-04-10T20:48:09Z

ts/handler_utils/distributed/deepspeed.py

+            )
+            if not os.path.exists(ds_config):
+                raise ValueError(
+                    f"{ctx.model_name} has no deepspeed config file {ds_config}"


does deepspeed have some configuration object we can use directly, there's a lot of config files now

Deepspeed has tons of configuration. it is difficult to cover all of them. A central configuration file is the simplest and easiest way to maintain.

msaroufim · 2023-04-10T20:48:24Z

ts/handler_utils/distributed/deepspeed.py

+                raise ValueError(
+                    f"{ctx.model_name} has no deepspeed checkpoint file {checkpoint}"
+                )
+        logging.debug("Creating DeepSpeed engine")


nit: should be info

msaroufim · 2023-04-10T20:48:57Z

ts/handler_utils/distributed/deepspeed.py

+        return ds_engine
+    else:
+        raise ValueError(
+            f"{ctx.model_name} has no deepspeed config in model config yaml file"


can we provide a default one? I'm worried about overhead users will need to learn about a ll the config systems

The example shows the minimum parameters.

msaroufim · 2023-04-10T20:50:04Z

test/pytest/test_handler.py

+        )
+
+
+def test_huggingface_opt_distributed_inference_deepspeed():


Made similar feedback to @HamidShojanazeri but the way I'd like to see this tested is

Have a seperate file for distributed tests

Make sure tests are skipped unless a CUDA multi GPU machine is found

Create a new requirements.txt with optional dependencies including all the distributed libraries to test

msaroufim · 2023-04-10T20:52:00Z

examples/large_models/deepspeed/opt/custom_handler.py

+        self.tokenizer = AutoTokenizer.from_pretrained(model_dir, return_tensors="pt")
+        super().initialize(ctx)
+
+    def preprocess(self, requests):


There's some code duplication between this PR and @HamidShojanazeri PR so now if you need to fix some bug you'd need to fix in as many large model backends as you've integrated, would prefer we refactor this into utils

the two examples apply different model parallel library on the same model. @HamidShojanazeri pr demo pippy, here demo ds. that's why there are common parts to create model and tokenizer.

msaroufim · 2023-04-10T20:52:54Z

examples/large_models/deepspeed/opt/Download_models.py

+
+def hf_model(model_str):
+    api = HfApi()
+    models = [m.modelId for m in api.list_models()]


nit: does this take a long time to run?

no, this should be fast

msaroufim · 2023-04-10T20:54:49Z

examples/large_models/deepspeed/bloom/custom_handler.py

+    def initialize(self, ctx: Context):
+        self.max_length = int(ctx.model_yaml_config["handler"]["max_length"])
+        self.model = AutoModelForCausalLM.from_pretrained(
+            "bigscience/bloom-7b1", use_cache=False


This is fine for an example but I'm still curious how you expect people to use DeepSpeed

Derive the DeepSpeed handler?

Copy paste this file and make appropriate changes?

I'm not sure I'm fan of having a tutorial per large model vendor and then for each one have lots of code duplication between the custom handlers

per our offline sync up, we want to let user clearly know the steps, instead of hiding the details in deepspeed base handler.

will add instruction in readme

msaroufim · 2023-04-10T20:56:06Z

examples/large_models/deepspeed/bloom/Readme.md

@@ -0,0 +1,29 @@
+# Loading large Huggingface models with constrained resources using accelerate
+
+This document briefs on serving large HG models with limited resource using deepspeed.


by limited resource do you mean with CPU offloading? If so I'd mention that explicitly

limited by one GPU device memory. will update doc

HamidShojanazeri

@lxning what is the main difference between OPT and Bloom example here?

HamidShojanazeri · 2023-04-24T18:21:37Z

examples/large_models/deepspeed/bloom/custom_handler.py

+        super().initialize(ctx)
+        self.max_length = int(ctx.model_yaml_config["handler"]["max_length"])
+        self.model = AutoModelForCausalLM.from_pretrained(
+            "bigscience/bloom-7b1", use_cache=False


@lxning could read the model name from config

the main diff b/w bloom example and opt example s to demo the model can be either downloaded via HF model name or via dowmload tool.

HamidShojanazeri · 2023-04-24T21:59:15Z

examples/large_models/deepspeed/opt/Readme.md

+paste the token generated from huggingface hub.
+
+```bash
+python Download_models.py --model_path model --model_name facebook/opt-350m --revision main


can we change this to a bigger model

precommit fmt

c1eeaa6

lxning added 3 commits April 3, 2023 22:15

precommit fmt

69946b7

Merge branch 'master' into feat/ds

d667d81

precommit fmt

a2dd7a1

lxning mentioned this pull request Apr 5, 2023

Serving large models on multiple GPUs #2183

Open

lxning added 3 commits April 5, 2023 22:29

precommit

11a2494

update doc

d42f09a

update requirements

0629e09

lxning self-assigned this Apr 6, 2023

lxning added enhancement New feature or request example labels Apr 6, 2023

lxning added this to the v0.8.0 milestone Apr 6, 2023

Merge branch 'master' into feat/ds

0d3009c

lxning changed the title ~~[WIP] deepspeed base handler and example~~ deepspeed base handler and example Apr 6, 2023

lxning added 7 commits April 6, 2023 17:20

precommit fmt

25af63b

Merge branch 'feat/ds' of github.com:pytorch/serve into feat/ds

8757378

update doc

1464cd1

update doc

0172f75

update handler

6334c06

precommit fmt

7190160

precommit fmt

2823abf

lxning requested review from HamidShojanazeri and msaroufim April 10, 2023 02:00

msaroufim requested changes Apr 10, 2023

View reviewed changes

lxning and others added 5 commits April 21, 2023 22:21

Merge branch 'master' into feat/ds

ee8ac2f

precommit fmt

024472c

update doc

1ab9dbf

update doc

3a34f07

precommit fmt

f0ef41e

lxning and others added 3 commits April 22, 2023 14:27

fix stream regression test

a035b73

precommit fmt

1768bdc

Merge branch 'master' into feat/ds

2cd9b71

msaroufim self-requested a review April 24, 2023 16:33

msaroufim approved these changes Apr 24, 2023

View reviewed changes

HamidShojanazeri reviewed Apr 24, 2023

View reviewed changes

HamidShojanazeri approved these changes Apr 24, 2023

View reviewed changes

lxning added 2 commits April 24, 2023 15:38

delete bloom exampple, and update doc

750be10

Merge branch 'feat/ds' of github.com:pytorch/serve into feat/ds

417e8be

lxning merged commit 49a7993 into master Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepspeed base handler and example #2218

deepspeed base handler and example #2218

lxning commented Apr 4, 2023 •

edited

Loading

codecov bot commented Apr 4, 2023 •

edited

Loading

msaroufim left a comment

msaroufim Apr 10, 2023

lxning Apr 18, 2023

msaroufim Apr 10, 2023

lxning Apr 18, 2023

msaroufim Apr 10, 2023

lxning Apr 18, 2023

msaroufim Apr 10, 2023

msaroufim Apr 10, 2023

lxning Apr 18, 2023

msaroufim Apr 10, 2023

msaroufim Apr 10, 2023

lxning Apr 18, 2023

msaroufim Apr 10, 2023

lxning Apr 18, 2023

msaroufim Apr 10, 2023

lxning Apr 18, 2023

msaroufim Apr 10, 2023

lxning Apr 18, 2023 •

edited

Loading

HamidShojanazeri left a comment

HamidShojanazeri Apr 24, 2023

lxning Apr 24, 2023

HamidShojanazeri Apr 24, 2023

		)


		def test_huggingface_opt_distributed_inference_deepspeed():

		@@ -0,0 +1,29 @@
		# Loading large Huggingface models with constrained resources using accelerate

		This document briefs on serving large HG models with limited resource using deepspeed.

deepspeed base handler and example #2218

deepspeed base handler and example #2218

Conversation

lxning commented Apr 4, 2023 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

codecov bot commented Apr 4, 2023 • edited Loading

Codecov Report

msaroufim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lxning Apr 18, 2023 • edited Loading

Choose a reason for hiding this comment

HamidShojanazeri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lxning commented Apr 4, 2023 •

edited

Loading

codecov bot commented Apr 4, 2023 •

edited

Loading

lxning Apr 18, 2023 •

edited

Loading