Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepspeed base handler and example #2218

Merged
merged 25 commits into from
Apr 24, 2023
Merged

deepspeed base handler and example #2218

merged 25 commits into from
Apr 24, 2023

Conversation

lxning
Copy link
Collaborator

@lxning lxning commented Apr 4, 2023

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Logs for Test A

  • Test B
    Logs for Test B

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@codecov
Copy link

codecov bot commented Apr 4, 2023

Codecov Report

Merging #2218 (f7afc24) into master (a9e218a) will decrease coverage by 0.67%.
The diff coverage is 0.00%.

❗ Current head f7afc24 differs from pull request most recent head 417e8be. Consider uploading reports for the commit 417e8be to get more accurate results

@@            Coverage Diff             @@
##           master    #2218      +/-   ##
==========================================
- Coverage   70.40%   69.73%   -0.67%     
==========================================
  Files          75       77       +2     
  Lines        3392     3420      +28     
  Branches       57       57              
==========================================
- Hits         2388     2385       -3     
- Misses       1001     1032      +31     
  Partials        3        3              
Impacted Files Coverage Δ
ts/handler_utils/distributed/deepspeed.py 0.00% <0.00%> (ø)
...orch_handler/distributed/base_deepspeed_handler.py 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@lxning lxning self-assigned this Apr 6, 2023
@lxning lxning added enhancement New feature or request example labels Apr 6, 2023
@lxning lxning added this to the v0.8.0 milestone Apr 6, 2023
@lxning lxning changed the title [WIP] deepspeed base handler and example deepspeed base handler and example Apr 6, 2023
Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch of feedback, the two main things I'm worried about is how you will test these changes in CI and whether there is a way to control some of the code duplication across this PR and Hamid's

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this empty file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove it.

def initialize(self, ctx: Context):
ds_engine = get_ds_engine(self.model, ctx)
self.model = ds_engine.module
self.device = int(os.getenv("LOCAL_RANK", 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just trying to learn, does deepspeed automatically figure out the world size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it read env to get rank and world size

)
if not os.path.exists(ds_config):
raise ValueError(
f"{ctx.model_name} has no deepspeed config file {ds_config}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does deepspeed have some configuration object we can use directly, there's a lot of config files now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deepspeed has tons of configuration. it is difficult to cover all of them. A central configuration file is the simplest and easiest way to maintain.

raise ValueError(
f"{ctx.model_name} has no deepspeed checkpoint file {checkpoint}"
)
logging.debug("Creating DeepSpeed engine")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be info

return ds_engine
else:
raise ValueError(
f"{ctx.model_name} has no deepspeed config in model config yaml file"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we provide a default one? I'm worried about overhead users will need to learn about a ll the config systems

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example shows the minimum parameters.

)


def test_huggingface_opt_distributed_inference_deepspeed():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made similar feedback to @HamidShojanazeri but the way I'd like to see this tested is

  1. Have a seperate file for distributed tests
  2. Make sure tests are skipped unless a CUDA multi GPU machine is found
  3. Create a new requirements.txt with optional dependencies including all the distributed libraries to test

self.tokenizer = AutoTokenizer.from_pretrained(model_dir, return_tensors="pt")
super().initialize(ctx)

def preprocess(self, requests):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some code duplication between this PR and @HamidShojanazeri PR so now if you need to fix some bug you'd need to fix in as many large model backends as you've integrated, would prefer we refactor this into utils

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two examples apply different model parallel library on the same model. @HamidShojanazeri pr demo pippy, here demo ds. that's why there are common parts to create model and tokenizer.


def hf_model(model_str):
api = HfApi()
models = [m.modelId for m in api.list_models()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does this take a long time to run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this should be fast

def initialize(self, ctx: Context):
self.max_length = int(ctx.model_yaml_config["handler"]["max_length"])
self.model = AutoModelForCausalLM.from_pretrained(
"bigscience/bloom-7b1", use_cache=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for an example but I'm still curious how you expect people to use DeepSpeed

  1. Derive the DeepSpeed handler?
  2. Copy paste this file and make appropriate changes?

I'm not sure I'm fan of having a tutorial per large model vendor and then for each one have lots of code duplication between the custom handlers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per our offline sync up, we want to let user clearly know the steps, instead of hiding the details in deepspeed base handler.

will add instruction in readme

@@ -0,0 +1,29 @@
# Loading large Huggingface models with constrained resources using accelerate

This document briefs on serving large HG models with limited resource using deepspeed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by limited resource do you mean with CPU offloading? If so I'd mention that explicitly

Copy link
Collaborator Author

@lxning lxning Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limited by one GPU device memory. will update doc

@msaroufim msaroufim self-requested a review April 24, 2023 16:33
Copy link
Collaborator

@HamidShojanazeri HamidShojanazeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lxning what is the main difference between OPT and Bloom example here?

super().initialize(ctx)
self.max_length = int(ctx.model_yaml_config["handler"]["max_length"])
self.model = AutoModelForCausalLM.from_pretrained(
"bigscience/bloom-7b1", use_cache=False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lxning could read the model name from config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main diff b/w bloom example and opt example s to demo the model can be either downloaded via HF model name or via dowmload tool.

paste the token generated from huggingface hub.

```bash
python Download_models.py --model_path model --model_name facebook/opt-350m --revision main
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change this to a bigger model

@lxning lxning merged commit 49a7993 into master Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request example
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants