Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment Anything Fast example #2802

Merged
merged 12 commits into from
Dec 2, 2023
Merged

Segment Anything Fast example #2802

merged 12 commits into from
Dec 2, 2023

Conversation

agunapal
Copy link
Collaborator

Description

This PR shows how to use Segment Anything Fast with TorchServe

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Loading Model
2023-11-20T20:33:49,759 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2023-11-20T20:33:49,761 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-11-20T20:33:49,820 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages/ts/configs/metrics.yaml
2023-11-20T20:33:49,902 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages
Current directory: /home/ubuntu/serve/examples/large_models/segment_anything_fast
Temp directory: /tmp
Metrics config path: /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 8
Max heap size: 7936 M
Python executable: /home/ubuntu/anaconda3/envs/ts_sam_test/bin/python
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/ubuntu/serve/examples/large_models/segment_anything_fast/model_store
Initial Models: sam-fast.tar.gz
Log dir: /home/ubuntu/serve/examples/large_models/segment_anything_fast/logs
Metrics dir: /home/ubuntu/serve/examples/large_models/segment_anything_fast/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: log
Disable system metrics: false
Workflow Store: /home/ubuntu/serve/examples/large_models/segment_anything_fast/model_store
Model config: N/A
2023-11-20T20:33:49,907 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-11-20T20:33:49,923 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: sam-fast.tar.gz
2023-11-20T20:33:49,948 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model sam-fast
2023-11-20T20:33:49,948 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model sam-fast
2023-11-20T20:33:49,948 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model sam-fast loaded.
2023-11-20T20:33:49,949 [WARN ] main org.pytorch.serve.ModelServer - Invalid model config in mar, minWorkers:0, maxWorkers:0
2023-11-20T20:33:49,949 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: sam-fast, count: 1
2023-11-20T20:33:49,955 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/ubuntu/anaconda3/envs/ts_sam_test/bin/python, /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000, --metrics-config, /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-11-20T20:33:49,955 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-11-20T20:33:50,022 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2023-11-20T20:33:50,023 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-11-20T20:33:50,024 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2023-11-20T20:33:50,024 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-11-20T20:33:50,025 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2023-11-20T20:33:50,201 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-11-20T20:33:51,007 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,007 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:156.0380859375|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,008 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:134.51285934448242|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,008 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:46.3|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,008 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,009 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,009 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,009 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:30346.6015625|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,009 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:918.88671875|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,010 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:4.4|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512431
2023-11-20T20:33:51,466 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9000, pid=39620
2023-11-20T20:33:51,467 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000
2023-11-20T20:33:51,475 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Successfully loaded /home/ubuntu/anaconda3/envs/ts_sam_test/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-11-20T20:33:51,475 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - [PID]39620
2023-11-20T20:33:51,475 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Torch worker started.
2023-11-20T20:33:51,475 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2023-11-20T20:33:51,476 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-sam-fast_1.0 State change null -> WORKER_STARTED
2023-11-20T20:33:51,479 [INFO ] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2023-11-20T20:33:51,484 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000.
2023-11-20T20:33:51,486 [INFO ] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1700512431486
2023-11-20T20:33:51,519 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - model_name: sam-fast, batchSize: 1
2023-11-20T20:33:52,849 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-11-20T20:33:52,850 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-11-20T20:33:52,850 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2023-11-20T20:34:01,945 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Model weights /home/ubuntu/serve/examples/large_models/segment_anything_fast/sam_vit_h_4b8939.pth loaded successfully
2023-11-20T20:34:01,949 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2023-11-20T20:34:01,950 [INFO ] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 10430
2023-11-20T20:34:01,950 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-sam-fast_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2023-11-20T20:34:01,950 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:11997.0|#WorkerName:W-9000-sam-fast_1.0,Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512441
2023-11-20T20:34:01,951 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:34.0|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512441


  • Inference
python inference.py 
2023-11-20T20:34:18,397 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:sam-fast,model_version:default|#hostname:ip-172-31-11-40,timestamp:1700512458
2023-11-20T20:34:18,400 [INFO ] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT to backend at: 1700512458399
2023-11-20T20:34:18,401 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_LOG - Backend received inference at: 1700512458
2023-11-20T20:34:23,860 [INFO ] W-9000-sam-fast_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]HandlerTime.Milliseconds:5458.28|#ModelName:sam-fast,Level:Model|#hostname:ip-172-31-11-40,1700512463,640e1c1c-5b33-4ef7-a706-77b7d62a41e5, pattern=[METRICS]
2023-11-20T20:34:23,860 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_METRICS - HandlerTime.ms:5458.28|#ModelName:sam-fast,Level:Model|#hostname:ip-172-31-11-40,requestID:640e1c1c-5b33-4ef7-a706-77b7d62a41e5,timestamp:1700512463
2023-11-20T20:34:23,861 [INFO ] W-9000-sam-fast_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:5458.52|#ModelName:sam-fast,Level:Model|#hostname:ip-172-31-11-40,1700512463,640e1c1c-5b33-4ef7-a706-77b7d62a41e5, pattern=[METRICS]
2023-11-20T20:34:23,861 [INFO ] W-9000-sam-fast_1.0-stdout MODEL_METRICS - PredictionTime.ms:5458.52|#ModelName:sam-fast,Level:Model|#hostname:ip-172-31-11-40,requestID:640e1c1c-5b33-4ef7-a706-77b7d62a41e5,timestamp:1700512463
2023-11-20T20:34:23,863 [INFO ] W-9000-sam-fast_1.0 ACCESS_LOG - /127.0.0.1:54122 "POST /predictions/sam-fast HTTP/1.1" 200 5496
2023-11-20T20:34:23,863 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512463
2023-11-20T20:34:23,864 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:5462790.637|#model_name:sam-fast,model_version:default|#hostname:ip-172-31-11-40,timestamp:1700512463
2023-11-20T20:34:23,864 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:190.123|#model_name:sam-fast,model_version:default|#hostname:ip-172-31-11-40,timestamp:1700512463
2023-11-20T20:34:23,864 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.job.Job - Waiting time ns: 190123, Backend time ns: 5464668223
2023-11-20T20:34:23,864 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512463
2023-11-20T20:34:23,865 [DEBUG] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2023-11-20T20:34:23,865 [INFO ] W-9000-sam-fast_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 5461
2023-11-20T20:34:23,865 [INFO ] W-9000-sam-fast_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:5.0|#Level:Host|#hostname:ip-172-31-11-40,timestamp:1700512463

Output
kitten_mask_fast

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Copy link
Collaborator

@lxning lxning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a notebook example as we decide in the meeting?

return images

def inference(self, data):
return self.mask_generator.generate(data[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is a batch of requests (ie. a batch of images). data[0] means the first image. Here does not process a batch of requests. Is my understand correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently SamAutomaticMaskGenerator doesn't support batching. We have brought this up internally

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think it would be better to call the generator batch_size times to process the whole batch or assert with an error if batch_size != 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an assert

@agunapal agunapal changed the title Segment Anything Fast example (WIP)Segment Anything Fast example Nov 27, 2023
Copy link
Collaborator

@mreso mreso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Most important thing is that the model-config.yaml is missing. Otherwise its already in good shape.


# If the image is sent as bytesarray
if isinstance(image, (bytearray, bytes)):
image = Image.open(io.BytesIO(image))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we switch this to use torchvision.io.decode_image to leverage the gpu for decoding instead?

return images

def inference(self, data):
return self.mask_generator.generate(data[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think it would be better to call the generator batch_size times to process the whole batch or assert with an error if batch_size != 1?



image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're only using opencv to perform bgr2rgb and vice versa we can remove the dependency and use Image.open in both cases and perform rgb2bgr in handler with img[:,:,::-1]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept opencv since SAM, SAM Fast are using it and it makes it consistent

@mreso
Copy link
Collaborator

mreso commented Nov 29, 2023

@agunapal Would also be good to add a test which gets unskipped if PT 2.2 and the other dependencies are available

sam_checkpoint = ctx.model_yaml_config["handler"]["sam_checkpoint"]

self.model = sam_model_registry[model_type](checkpoint=sam_checkpoint)
self.model.to(self.device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.device will not be set if cuda is unavailable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added default as cpu. Not sure if we should just assert if no cuda?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default cpu should be fine

handler:
profile: true
model_type: "vit_h"
sam_checkpoint: "/home/ubuntu/serve/examples/large_models/segment_anything_fast/sam_vit_h_4b8939.pth"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a relative path here or not have this hard coded?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for torch compiler, do we need to pass any other additional flags in the config file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have only 1 known way to do this https://github.com/pytorch/serve/tree/master/examples/large_models/Huggingface_accelerate/llama2

However, this was inconvenient for repeated experiments. We have to move the weights before creating mar file or redownload them.

The current approach is simple.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No flags needed for torch.compile. Everything is done in SAM Fast

@agunapal agunapal changed the title (WIP)Segment Anything Fast example Segment Anything Fast example Dec 1, 2023
@agunapal agunapal requested a review from mreso December 1, 2023 20:33
Copy link
Collaborator

@mreso mreso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sam_checkpoint = ctx.model_yaml_config["handler"]["sam_checkpoint"]

self.model = sam_model_registry[model_type](checkpoint=sam_checkpoint)
self.model.to(self.device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default cpu should be fine

@mreso mreso added this pull request to the merge queue Dec 2, 2023
Merged via the queue into master with commit f3a2267 Dec 2, 2023
13 checks passed
@chauhang chauhang added this to the v0.10.0 milestone Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants