Segment Anything Fast example #2802

agunapal · 2023-11-20T20:46:15Z

Description

This PR shows how to use Segment Anything Fast with TorchServe

Fixes #(issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Loading Model

2023-11-20T20:33:49,759 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2023-11-20T20:33:49,761 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-11-20T20:33:49,820 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages/ts/configs/metrics.yaml
2023-11-20T20:33:49,902 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages
Current directory: /home/ubuntu/serve/examples/large_models/segment_anything_fast
Temp directory: /tmp
Metrics config path: /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 8
Max heap size: 7936 M
Python executable: /home/ubuntu/anaconda3/envs/ts_sam_test/bin/python
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/ubuntu/serve/examples/large_models/segment_anything_fast/model_store
Initial Models: sam-fast.tar.gz
Log dir: /home/ubuntu/serve/examples/large_models/segment_anything_fast/logs
Metrics dir: /home/ubuntu/serve/examples/large_models/segment_anything_fast/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: log
Disable system metrics: false
Workflow Store: /home/ubuntu/serve/examples/large_models/segment_anything_fast/model_store
Model config: N/A
2023-11-20T20:33:49,907 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-11-20T20:33:49,923 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: sam-fast.tar.gz
2023-11-20T20:33:49,948 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model sam-fast
2023-11-20T20:33:49,948 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model sam-fast
2023-11-20T20:33:49,948 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model sam-fast loaded.
2023-11-20T20:33:49,949 [WARN ] main org.pytorch.serve.ModelServer - Invalid model config in mar, minWorkers:0, maxWorkers:0
2023-11-20T20:33:49,949 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: sam-fast, count: 1
2023-11-20T20:33:49,955 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/ubuntu/anaconda3/envs/ts_sam_test/bin/python, /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000, --metrics-config, /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-11-20T20:33:49,955 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-11-20T20:33:50,022 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2023-11-20T20:33:50,023 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-11-20T20:33:50,024 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2023-11-20T20:33:50,024 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-11-20T20:33:50,025 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2023-11-20T20:33:50,201 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-11-20T20:33:51,007 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,007 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:156.0380859375|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,008 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:134.51285934448242|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,008 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:46.3|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,008 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,009 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,009 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,009 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:30346.6015625|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,009 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:918.88671875|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,010 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:4.4|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,466 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9000, pid=39620
2023-11-20T20:33:51,467 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000
2023-11-20T20:33:51,475 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Successfully loaded /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-11-20T20:33:51,475 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - [PID]39620
2023-11-20T20:33:51,475 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Torch worker started.
2023-11-20T20:33:51,475 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2023-11-20T20:33:51,476 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-sam-fast_1.0 State change null -> WORKER_STARTED
2023-11-20T20:33:51,479 [INFO ] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2023-11-20T20:33:51,484 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000.
2023-11-20T20:33:51,486 [INFO ] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1700512431486
2023-11-20T20:33:51,519 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - model_name: sam-fast, batchSize: 1
2023-11-20T20:33:52,849 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-11-20T20:33:52,850 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-11-20T20:33:52,850 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2023-11-20T20:34:01,945 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Model weights /home/ubuntu/serve/examples/large_models/segment_anything_fast/sam_vit_h_4b8939.pth loaded successfully
2023-11-20T20:34:01,949 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2023-11-20T20:34:01,950 [INFO ] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 10430
2023-11-20T20:34:01,950 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-sam-fast_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2023-11-20T20:34:01,950 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:11997.0|#WorkerName:W-9000-sam-fast_1.0,Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512441
2023-11-20T20:34:01,951 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:34.0|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512441

Inference

python inference.py 
2023-11-20T20:34:18,397 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:sam-fast,model_version:default|#hostname:ip-172-31-11-40,timestamp:1700512458
2023-11-20T20:34:18,400 [INFO ] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT to backend at: 1700512458399
2023-11-20T20:34:18,401 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Backend received inference at: 1700512458
2023-11-20T20:34:23,860 [INFO ] W-9000-sam-fast_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]HandlerTime.Milliseconds:5458.28|#ModelName:sam-fast,Level:Model|#hostname:ip-172-31-11-40,1700512463,640e1c1c-5b33-4ef7-a706-77b7d62a41e5, pattern=[METRICS]
2023-11-20T20:34:23,860 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_METRICS - HandlerTime.ms:5458.28|#ModelName:sam-fast,Level:Model|#hostname:ip-172-31-11-40,requestID:640e1c1c-5b33-4ef7-a706-77b7d62a41e5,timestamp:1700512463
2023-11-20T20:34:23,861 [INFO ] W-9000-sam-fast_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:5458.52|#ModelName:sam-fast,Level:Model|#hostname:ip-172-31-11-40,1700512463,640e1c1c-5b33-4ef7-a706-77b7d62a41e5, pattern=[METRICS]
2023-11-20T20:34:23,861 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_METRICS - PredictionTime.ms:5458.52|#ModelName:sam-fast,Level:Model|#hostname:ip-172-31-11-40,requestID:640e1c1c-5b33-4ef7-a706-77b7d62a41e5,timestamp:1700512463
2023-11-20T20:34:23,863 [INFO ] W-9000-sam-fast_1.0 ACCESS_LOG - /127.0.0.1:54122 "POST /predictions/sam-fast HTTP/1.1" 200 5496
2023-11-20T20:34:23,863 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512463
2023-11-20T20:34:23,864 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:5462790.637|#model_name:sam-fast,model_version:default|#hostname:ip-172-31-11-40,timestamp:1700512463
2023-11-20T20:34:23,864 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:190.123|#model_name:sam-fast,model_version:default|#hostname:ip-172-31-11-40,timestamp:1700512463
2023-11-20T20:34:23,864 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.job.Job - Waiting time ns: 190123, Backend time ns: 5464668223
2023-11-20T20:34:23,864 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512463
2023-11-20T20:34:23,865 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2023-11-20T20:34:23,865 [INFO ] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 5461
2023-11-20T20:34:23,865 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:5.0|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512463

Output

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

…erve into examples/optimized_sam

lxning

Could you provide a notebook example as we decide in the meeting?

lxning · 2023-11-21T18:28:22Z

examples/large_models/segment_anything_fast/custom_handler.py

+        return images
+
+    def inference(self, data):
+        return self.mask_generator.generate(data[0])


data is a batch of requests (ie. a batch of images). data[0] means the first image. Here does not process a batch of requests. Is my understand correct?

Yes, currently SamAutomaticMaskGenerator doesn't support batching. We have brought this up internally

In this case I think it would be better to call the generator batch_size times to process the whole batch or assert with an error if batch_size != 1?

Added an assert

mreso

Left some comments. Most important thing is that the model-config.yaml is missing. Otherwise its already in good shape.

examples/large_models/segment_anything_fast/custom_handler.py

mreso · 2023-11-28T21:56:48Z

examples/large_models/segment_anything_fast/custom_handler.py

+
+            # If the image is sent as bytesarray
+            if isinstance(image, (bytearray, bytes)):
+                image = Image.open(io.BytesIO(image))


Could we switch this to use torchvision.io.decode_image to leverage the gpu for decoding instead?

mreso · 2023-11-29T00:11:49Z

examples/large_models/segment_anything_fast/custom_handler.py

+        return images
+
+    def inference(self, data):
+        return self.mask_generator.generate(data[0])


In this case I think it would be better to call the generator batch_size times to process the whole batch or assert with an error if batch_size != 1?

examples/large_models/segment_anything_fast/install_segment_anything_fast.sh

examples/large_models/segment_anything_fast/README.md

examples/large_models/segment_anything_fast/custom_handler.py

mreso · 2023-11-29T00:39:14Z

examples/large_models/segment_anything_fast/inference.py

+
+
+image = cv2.imread(image_path)
+image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)


If we're only using opencv to perform bgr2rgb and vice versa we can remove the dependency and use Image.open in both cases and perform rgb2bgr in handler with img[:,:,::-1]

I kept opencv since SAM, SAM Fast are using it and it makes it consistent

mreso · 2023-11-29T17:32:48Z

@agunapal Would also be good to add a test which gets unskipped if PT 2.2 and the other dependencies are available

mreso · 2023-11-29T20:29:40Z

examples/large_models/segment_anything_fast/custom_handler.py

+        sam_checkpoint = ctx.model_yaml_config["handler"]["sam_checkpoint"]
+
+        self.model = sam_model_registry[model_type](checkpoint=sam_checkpoint)
+        self.model.to(self.device)


self.device will not be set if cuda is unavailable

Good catch. Added default as cpu. Not sure if we should just assert if no cuda?

default cpu should be fine

chauhang · 2023-12-01T02:03:41Z

examples/large_models/segment_anything_fast/model-config.yaml

+handler:
+    profile: true
+    model_type: "vit_h"
+    sam_checkpoint: "/home/ubuntu/serve/examples/large_models/segment_anything_fast/sam_vit_h_4b8939.pth"


Can we use a relative path here or not have this hard coded?

for torch compiler, do we need to pass any other additional flags in the config file?

We have only 1 known way to do this https://github.com/pytorch/serve/tree/master/examples/large_models/Huggingface_accelerate/llama2

However, this was inconvenient for repeated experiments. We have to move the weights before creating mar file or redownload them.

The current approach is simple.

No flags needed for torch.compile. Everything is done in SAM Fast

…erve into examples/optimized_sam

mreso

LGTM

mreso · 2023-12-02T05:06:38Z

examples/large_models/segment_anything_fast/custom_handler.py

+        sam_checkpoint = ctx.model_yaml_config["handler"]["sam_checkpoint"]
+
+        self.model = sam_model_registry[model_type](checkpoint=sam_checkpoint)
+        self.model.to(self.device)


default cpu should be fine

agunapal and others added 4 commits November 20, 2023 20:43

Segment Anything Fast example

02605ec

Merge branch 'master' into examples/optimized_sam

80c3c26

Segment Anything Fast example

ba1de45

Merge branch 'examples/optimized_sam' of https://github.com/pytorch/s…

3e98a5e

…erve into examples/optimized_sam

agunapal requested a review from chauhang November 20, 2023 20:51

lxning reviewed Nov 21, 2023

View reviewed changes

agunapal changed the title ~~Segment Anything Fast example~~ (WIP)Segment Anything Fast example Nov 27, 2023

mreso requested changes Nov 29, 2023

View reviewed changes

mreso reviewed Nov 29, 2023

View reviewed changes

agunapal added 2 commits December 1, 2023 01:15

Changes to make model inference faster

a7de7af

addressed review comments

faba82e

chauhang reviewed Dec 1, 2023

View reviewed changes

code cleanup

e12005f

agunapal changed the title ~~(WIP)Segment Anything Fast example~~ Segment Anything Fast example Dec 1, 2023

Merge branch 'master' into examples/optimized_sam

6e18903

agunapal requested a review from mreso December 1, 2023 20:33

agunapal added 4 commits December 1, 2023 20:43

review comments

88fc2c8

Merge branch 'examples/optimized_sam' of https://github.com/pytorch/s…

c08ad6e

…erve into examples/optimized_sam

added missing instruction

4ee0812

added python 3.10 dependency

f802b4b

mreso approved these changes Dec 2, 2023

View reviewed changes

mreso added this pull request to the merge queue Dec 2, 2023

Merged via the queue into master with commit f3a2267 Dec 2, 2023
13 checks passed

chauhang added this to the v0.10.0 milestone Feb 27, 2024

agunapal added the torch.compile label Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segment Anything Fast example #2802

Segment Anything Fast example #2802

agunapal commented Nov 20, 2023

lxning left a comment

lxning Nov 21, 2023

agunapal Nov 21, 2023

mreso Nov 29, 2023

agunapal Dec 1, 2023

mreso left a comment

mreso Nov 28, 2023

mreso Nov 29, 2023

mreso Nov 29, 2023

agunapal Dec 1, 2023

mreso commented Nov 29, 2023

mreso Nov 29, 2023

agunapal Dec 1, 2023

mreso Dec 2, 2023

chauhang Dec 1, 2023

chauhang Dec 1, 2023

agunapal Dec 1, 2023

agunapal Dec 1, 2023

mreso left a comment

mreso Dec 2, 2023



		image = cv2.imread(image_path)
		image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

Segment Anything Fast example #2802

Segment Anything Fast example #2802

Conversation

agunapal commented Nov 20, 2023

Description

Type of change

Feature/Issue validation/testing

Checklist:

lxning left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mreso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mreso commented Nov 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mreso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment