Update GPT Fast handler to add option to get GPU ID from context #2872

sachanub · 2023-12-26T20:39:52Z

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Creating PR to update the handler for the GPT Fast example to support obtaining the GPU ID from context.

Testing:

Created two model artifacts i.e. gpt_fast_7b.mar (4 workers without tensor parallelism) and gpt_fast_7b_tp.mar (1 worker with tensor parallelism and parallel level 4) with modified handler. Performed inference with both models:

Case 1: gpt_fast_7b.mar
- Command to start TorchServe and perform inference with curl:

torchserve --start --ncs --model-store model_store --models gpt_fast_7b.mar
curl "http://localhost:8080/predictions/gpt_fast_7b" -T request.json

Output:

is Paris. It is located in the northern central part of the country and is known for its stunning architecture, art museums, fashion, and historical landmarks. The city is home to many famous landmarks such as the Eiffel Tower

nvidia-smi output:

Tue Jan  2 20:36:01 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10G                    On  | 00000000:00:1B.0 Off |                    0 |
|  0%   22C    P0              55W / 300W |  13110MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A10G                    On  | 00000000:00:1C.0 Off |                    0 |
|  0%   22C    P0              56W / 300W |  13110MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A10G                    On  | 00000000:00:1D.0 Off |                    0 |
|  0%   24C    P0              57W / 300W |  13492MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A10G                    On  | 00000000:00:1E.0 Off |                    0 |
|  0%   22C    P0              54W / 300W |  13186MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1092205      C   .../envs/ts_aot_compile/bin/python3.10    13102MiB |
|    1   N/A  N/A   1092203      C   .../envs/ts_aot_compile/bin/python3.10    13102MiB |
|    2   N/A  N/A   1092206      C   .../envs/ts_aot_compile/bin/python3.10    13484MiB |
|    3   N/A  N/A   1092204      C   .../envs/ts_aot_compile/bin/python3.10    13178MiB |
+---------------------------------------------------------------------------------------+

Tar file with model artifact MAR file and TorchServe logs: gpt_fast_7b.tar.gz

Case 2: gpt_fast_7b_tp.mar
- Command to start TorchServe and perform inference with curl:

torchserve --start --ncs --model-store model_store --models gpt_fast_7b_tp.mar
curl "http://localhost:8080/predictions/gpt_fast_7b_tp" -T request.json

Output:

is Paris. It is located in the northern central part of the country and is known for its stunning architecture, art museums, fashion, and historical landmarks. The city is home to many famous landmarks such as the Eiffel Tower

nvidia-smi output:

Tue Jan  2 20:42:18 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10G                    On  | 00000000:00:1B.0 Off |                    0 |
|  0%   21C    P0              55W / 300W |   2298MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A10G                    On  | 00000000:00:1C.0 Off |                    0 |
|  0%   21C    P0              56W / 300W |   2298MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A10G                    On  | 00000000:00:1D.0 Off |                    0 |
|  0%   22C    P0              56W / 300W |   2298MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A10G                    On  | 00000000:00:1E.0 Off |                    0 |
|  0%   21C    P0              54W / 300W |   2298MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1094722      C   .../envs/ts_aot_compile/bin/python3.10     2290MiB |
|    1   N/A  N/A   1094723      C   .../envs/ts_aot_compile/bin/python3.10     2290MiB |
|    2   N/A  N/A   1094724      C   .../envs/ts_aot_compile/bin/python3.10     2290MiB |
|    3   N/A  N/A   1094725      C   .../envs/ts_aot_compile/bin/python3.10     2290MiB |
+---------------------------------------------------------------------------------------+

Tar file with model artifact MAR file and TorchServe logs: gpt_fast_7b_tp.tar.gz

lxning · 2024-01-02T19:15:39Z

examples/large_models/gpt_fast/handler.py

+            if (
+                "torchrun" not in ctx.model_yaml_config
+                and properties.get("gpu_id") is not None
+            ):
+                gpu_id = properties.get("gpu_id")
+            else:
+                gpu_id = self.local_rank


this part is not necessary.

Understood. Will update in subsequent commit.

lxning · 2024-01-02T19:16:19Z

examples/large_models/gpt_fast/handler.py

+            self.device = torch.device(self.map_location + ":" + str(gpu_id))

-            torch.cuda.set_device(self.local_rank)
+            torch.cuda.set_device(gpu_id)


this is wrong

Understood. Will update in subsequent commit

lxning · 2024-01-02T19:17:29Z

examples/large_models/gpt_fast/handler.py

@@ -44,15 +44,23 @@ def __init__(self):

    def initialize(self, ctx):
        self.context = ctx
+        properties = self.context.system_properties
        rank = maybe_init_dist()

        self.local_rank = rank if rank is not None else 0


if rank is none, local rank should be equal to device_id on gpu host.

Thanks for the suggestion Li. Are you suggesting that we implement the handler in this way?

def initialize(self, ctx): self.context = ctx properties = self.context.system_properties gpu_id = properties.get("gpu_id") rank = maybe_init_dist() self.local_rank = rank if rank is not None else int(gpu_id) if torch.cuda.is_available(): self.map_location = "cuda" self.device = torch.device(self.map_location + ":" + str(self.local_rank)) torch.cuda.set_device(self.local_rank)

please add "raise exception" when gpu_id < 0

Understood. I will include the exception

Ubuntu added 4 commits December 26, 2023 20:18

Update GPT Fast handler to add option to get GPU ID from context

2c4f09e

Fix lint checks

aa552b5

Fix lint checks

5488d72

Fix lint checks

9c0df0a

sachanub requested a review from mreso December 26, 2023 21:03

lxning reviewed Jan 2, 2024

View reviewed changes

Update handler

f2bb3a8

lxning approved these changes Jan 3, 2024

View reviewed changes

sachanub added this pull request to the merge queue Jan 4, 2024

Merged via the queue into master with commit 39ea211 Jan 4, 2024
13 checks passed

sachanub deleted the update_gpt_fast_handler branch January 4, 2024 05:14

chauhang added this to the v0.10.0 milestone Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update GPT Fast handler to add option to get GPU ID from context #2872

Update GPT Fast handler to add option to get GPU ID from context #2872

sachanub commented Dec 26, 2023 •

edited

Loading

lxning Jan 2, 2024

sachanub Jan 2, 2024

lxning Jan 2, 2024

sachanub Jan 2, 2024

lxning Jan 2, 2024 •

edited

Loading

sachanub Jan 2, 2024

lxning Jan 2, 2024

sachanub Jan 2, 2024

Update GPT Fast handler to add option to get GPU ID from context #2872

Update GPT Fast handler to add option to get GPU ID from context #2872

Conversation

sachanub commented Dec 26, 2023 • edited Loading

Description

lxning Jan 2, 2024

Choose a reason for hiding this comment

sachanub Jan 2, 2024

Choose a reason for hiding this comment

lxning Jan 2, 2024

Choose a reason for hiding this comment

sachanub Jan 2, 2024

Choose a reason for hiding this comment

lxning Jan 2, 2024 • edited Loading

Choose a reason for hiding this comment

sachanub Jan 2, 2024

Choose a reason for hiding this comment

lxning Jan 2, 2024

Choose a reason for hiding this comment

sachanub Jan 2, 2024

Choose a reason for hiding this comment

sachanub commented Dec 26, 2023 •

edited

Loading

lxning Jan 2, 2024 •

edited

Loading