-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update GPT Fast handler to add option to get GPU ID from context #2872
Conversation
if ( | ||
"torchrun" not in ctx.model_yaml_config | ||
and properties.get("gpu_id") is not None | ||
): | ||
gpu_id = properties.get("gpu_id") | ||
else: | ||
gpu_id = self.local_rank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Will update in subsequent commit.
self.device = torch.device(self.map_location + ":" + str(gpu_id)) | ||
|
||
torch.cuda.set_device(self.local_rank) | ||
torch.cuda.set_device(gpu_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Will update in subsequent commit
@@ -44,15 +44,23 @@ def __init__(self): | |||
|
|||
def initialize(self, ctx): | |||
self.context = ctx | |||
properties = self.context.system_properties | |||
rank = maybe_init_dist() | |||
|
|||
self.local_rank = rank if rank is not None else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if rank is none, local rank should be equal to device_id on gpu host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion Li. Are you suggesting that we implement the handler in this way?
def initialize(self, ctx):
self.context = ctx
properties = self.context.system_properties
gpu_id = properties.get("gpu_id")
rank = maybe_init_dist()
self.local_rank = rank if rank is not None else int(gpu_id)
if torch.cuda.is_available():
self.map_location = "cuda"
self.device = torch.device(self.map_location + ":" + str(self.local_rank))
torch.cuda.set_device(self.local_rank)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add "raise exception" when gpu_id < 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I will include the exception
Description
Please read our CONTRIBUTING.md prior to creating your first pull request.
Creating PR to update the handler for the GPT Fast example to support obtaining the GPU ID from context.
Testing:
Created two model artifacts i.e.
gpt_fast_7b.mar
(4 workers without tensor parallelism) andgpt_fast_7b_tp.mar
(1 worker with tensor parallelism and parallel level 4) with modified handler. Performed inference with both models:Case 1:
gpt_fast_7b.mar
Output:
nvidia-smi
output:Tar file with model artifact MAR file and TorchServe logs: gpt_fast_7b.tar.gz
Case 2:
gpt_fast_7b_tp.mar
Output:
nvidia-smi
output:Tar file with model artifact MAR file and TorchServe logs: gpt_fast_7b_tp.tar.gz