-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example: DeepSpeed deferred init with opt-30b #2419
Conversation
…torch/serve into issues/reduce_docker_gpu_size
Codecov Report
@@ Coverage Diff @@
## master #2419 +/- ##
==========================================
- Coverage 72.01% 71.89% -0.12%
==========================================
Files 78 78
Lines 3648 3654 +6
Branches 58 58
==========================================
Hits 2627 2627
- Misses 1017 1023 +6
Partials 4 4
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Co-authored-by: Hamid Shojanazeri <hamid.nazeri2010@gmail.com>
…into issues/benchmark_ds
@ankithagunapal can you pls move the handler, README, requirement.txt and sample_Text to the parent folder from opt. Lets keep only model_config file in opt folder. They seem to be general regardless of the model/ we can always add to readme if needed other models, otherwise it would be repetitive. |
@agunapal , I noticed there are three 'Generated text' lines in the serve output, I think it's because you used three GPUs. However, is it correct each gpu generates one output ? |
Description
This PR shows how to use DeepSpeed using deferred model loading with a large model like opt-30b
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Client
Server
Checklist: