Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GPT Fast handler to add option to get GPU ID from context #2872

Merged
merged 5 commits into from
Jan 4, 2024

Conversation

sachanub
Copy link
Collaborator

@sachanub sachanub commented Dec 26, 2023

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Creating PR to update the handler for the GPT Fast example to support obtaining the GPU ID from context.

Testing:

Created two model artifacts i.e. gpt_fast_7b.mar (4 workers without tensor parallelism) and gpt_fast_7b_tp.mar (1 worker with tensor parallelism and parallel level 4) with modified handler. Performed inference with both models:

  • Case 1: gpt_fast_7b.mar

    • Command to start TorchServe and perform inference with curl:
torchserve --start --ncs --model-store model_store --models gpt_fast_7b.mar
curl "http://localhost:8080/predictions/gpt_fast_7b" -T request.json

Output:

is Paris. It is located in the northern central part of the country and is known for its stunning architecture, art museums, fashion, and historical landmarks. The city is home to many famous landmarks such as the Eiffel Tower

nvidia-smi output:

Tue Jan  2 20:36:01 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10G                    On  | 00000000:00:1B.0 Off |                    0 |
|  0%   22C    P0              55W / 300W |  13110MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A10G                    On  | 00000000:00:1C.0 Off |                    0 |
|  0%   22C    P0              56W / 300W |  13110MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A10G                    On  | 00000000:00:1D.0 Off |                    0 |
|  0%   24C    P0              57W / 300W |  13492MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A10G                    On  | 00000000:00:1E.0 Off |                    0 |
|  0%   22C    P0              54W / 300W |  13186MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1092205      C   .../envs/ts_aot_compile/bin/python3.10    13102MiB |
|    1   N/A  N/A   1092203      C   .../envs/ts_aot_compile/bin/python3.10    13102MiB |
|    2   N/A  N/A   1092206      C   .../envs/ts_aot_compile/bin/python3.10    13484MiB |
|    3   N/A  N/A   1092204      C   .../envs/ts_aot_compile/bin/python3.10    13178MiB |
+---------------------------------------------------------------------------------------+

Tar file with model artifact MAR file and TorchServe logs: gpt_fast_7b.tar.gz

  • Case 2: gpt_fast_7b_tp.mar

    • Command to start TorchServe and perform inference with curl:
torchserve --start --ncs --model-store model_store --models gpt_fast_7b_tp.mar
curl "http://localhost:8080/predictions/gpt_fast_7b_tp" -T request.json

Output:

is Paris. It is located in the northern central part of the country and is known for its stunning architecture, art museums, fashion, and historical landmarks. The city is home to many famous landmarks such as the Eiffel Tower

nvidia-smi output:

Tue Jan  2 20:42:18 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10G                    On  | 00000000:00:1B.0 Off |                    0 |
|  0%   21C    P0              55W / 300W |   2298MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A10G                    On  | 00000000:00:1C.0 Off |                    0 |
|  0%   21C    P0              56W / 300W |   2298MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A10G                    On  | 00000000:00:1D.0 Off |                    0 |
|  0%   22C    P0              56W / 300W |   2298MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A10G                    On  | 00000000:00:1E.0 Off |                    0 |
|  0%   21C    P0              54W / 300W |   2298MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1094722      C   .../envs/ts_aot_compile/bin/python3.10     2290MiB |
|    1   N/A  N/A   1094723      C   .../envs/ts_aot_compile/bin/python3.10     2290MiB |
|    2   N/A  N/A   1094724      C   .../envs/ts_aot_compile/bin/python3.10     2290MiB |
|    3   N/A  N/A   1094725      C   .../envs/ts_aot_compile/bin/python3.10     2290MiB |
+---------------------------------------------------------------------------------------+

Tar file with model artifact MAR file and TorchServe logs: gpt_fast_7b_tp.tar.gz

@sachanub sachanub requested a review from mreso December 26, 2023 21:03
Comment on lines 53 to 59
if (
"torchrun" not in ctx.model_yaml_config
and properties.get("gpu_id") is not None
):
gpu_id = properties.get("gpu_id")
else:
gpu_id = self.local_rank
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part is not necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Will update in subsequent commit.

Comment on lines 61 to 63
self.device = torch.device(self.map_location + ":" + str(gpu_id))

torch.cuda.set_device(self.local_rank)
torch.cuda.set_device(gpu_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Will update in subsequent commit

@@ -44,15 +44,23 @@ def __init__(self):

def initialize(self, ctx):
self.context = ctx
properties = self.context.system_properties
rank = maybe_init_dist()

self.local_rank = rank if rank is not None else 0
Copy link
Collaborator

@lxning lxning Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if rank is none, local rank should be equal to device_id on gpu host.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion Li. Are you suggesting that we implement the handler in this way?

def initialize(self, ctx):
        self.context = ctx
        properties = self.context.system_properties
        gpu_id = properties.get("gpu_id")
        rank = maybe_init_dist()

        self.local_rank = rank if rank is not None else int(gpu_id)

        if torch.cuda.is_available():
            self.map_location = "cuda"
            self.device = torch.device(self.map_location + ":" + str(self.local_rank))

            torch.cuda.set_device(self.local_rank)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add "raise exception" when gpu_id < 0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I will include the exception

@sachanub sachanub added this pull request to the merge queue Jan 4, 2024
Merged via the queue into master with commit 39ea211 Jan 4, 2024
13 checks passed
@sachanub sachanub deleted the update_gpt_fast_handler branch January 4, 2024 05:14
@chauhang chauhang added this to the v0.10.0 milestone Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants